#
Co-Clustering under the Maximum Norm^{ †}

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

- Co-Clustering
_{$\mathcal{L}$} **Input:**A matrix $\mathcal{A}\in {\mathbb{R}}^{m\times n}$ and two positive integers $k,\ell \in \mathbb{N}$.**Task:**Find a partition of $\mathcal{A}$’s rows into k subsets and a partition of $\mathcal{A}$’s columns into ℓ subsets, such that a given cost function (defined with respect to some norm $\mathcal{L}$) is minimized for the corresponding clustering.

_{$\mathcal{L}$}, namely we study the case of $\mathcal{L}$ being the maximum norm ${L}_{\infty}$, where the problem comes down to minimizing the maximum distance between entries of a cluster . This cost function might be a reasonable choice in practice due to its outlier sensitivity. In network security, for example, there often exists a vast amount of “normal” data points, whereas there are only very few “malicious” data points, which are outliers with respect to certain attributes. The maximum norm does not allow one to put entries with large differences into the same cluster, which is crucial for detecting possible attacks. The maximum norm can also be applied in a discretized setting, where input values are grouped (for example, replaced by integer values) according to their level of deviation from the mean of the respective attribute. It is then not allowed to put values of different ranges of the standard deviation into the same cluster. Last, but not least, we study an even more restricted clustering version, where the partitions of the rows and columns have to contain consecutive subsets. This version subsumes the problem of feature discretization, which is used as a preprocessing technique in data mining applications [5,6,7]. See Section 3.3 for this version.

_{$\mathcal{L}$}(with respect to ${L}_{p}$-norms), presenting several constant-factor approximation algorithms. While their algorithms are almost straightforward, relying on one-dimensionally clustering first the rows and then the columns, their main contribution lies in the sophisticated mathematical analysis of the corresponding approximation factors. Note that Jegelka et al. [8] further generalized this approach to higher dimensions, then called tensor clustering. In this work, we study (efficient) exact instead of approximate solvability. To this end, by focusing on Co-Clustering

_{∞}, we investigate a scenario that is combinatorially easier to grasp. In particular, our exact and combinatorial polynomial-time algorithms exploit structural properties of the input matrix and do not solely depend on one-dimensional approaches.

#### 1.1. Related Work

_{$\mathcal{L}$}, there called bi-clustering, and discusses many applications in bioinformatics and beyond. In particular, they also discuss Hartigan’s [9] special case where the goal is to partition into uniform clusters (that is, each cluster has only one entry value). Our studies indeed generalize this very puristic scenario by not demanding completely uniform clusters (which would correspond to clusters with maximum entry difference zero), but allowing some variation between maximum and minimum cluster entries. Califano et al. [10] aim at clusterings where in each submatrix, the distance between entries within each row and within each column is upper-bounded. Recent work by Wulff et al. [11] considers a so-called “monochromatic” bi-clustering where the cost for each submatrix is defined as the number of minority entries. For binary data, this clustering task coincides with the ${L}_{1}$-norm version of co-clustering, as defined by Anagnostopoulos et al. [2]. Wulff et al. [11] show the NP-hardness of monochromatic bi-clustering for binary data with an additional third value denoting missing entries (which are not considered in their cost function) and give a randomized polynomial-time approximation scheme (PTAS). Except for the work of Anagnostopoulos et al. [2] and Wulff et al. [11], all other investigations mentioned above are empirical in nature.

#### 1.2. Our Contributions

_{$\mathcal{L}$}, thus potentially stimulating a promising field of research.

## 2. Formal Definitions and Preliminaries

#### 2.1. Problem Definition

_{$\mathcal{L}$}with maximum norm is as follows.

- Co-Clustering
_{∞} **Input:**A matrix $\mathcal{A}\in {\mathbb{R}}^{m\times n}$, integers $k,\ell \in \mathbb{N}$ and a cost $c\ge 0$.**Question:**Is there a $(k,\ell )$-co-clustering $(\mathcal{I},\mathcal{J})$ of $\mathcal{A}$ with ${\text{cost}}_{\infty}(\mathcal{I},\mathcal{J})\le c$?

_{∞}to refer to Co-Clustering

_{∞}with constants $k,\ell \in \mathbb{N}$, and by $(k,\ast )$-Co-Clustering

_{∞}, we refer to the case where only k is constant and ℓ is part of the input. Clearly, Co-Clustering

_{∞}is symmetric with respect to k and ℓ in the sense that any $(k,\ell )$-co-clustering of a matrix $\mathcal{A}$ is equivalent to an $(\ell ,k)$-co-clustering of the transposed matrix ${\mathcal{A}}^{T}$. Hence, we always assume that $k\le \ell $.

**Observation**

**1.**

_{∞}is solvable in $O\left(mn\right)$ time for cost zero and also for any size-two alphabet.

**Proof.**

_{∞}input instance. For a $(k,\ell )$-co-clustering with cost zero, it holds that all entries of a cluster are equal. This is only possible if there are at most k different rows and at most ℓ different columns in $\mathcal{A}$, since otherwise, there will be a cluster containing two different entries. Thus, the case $c=0$ can be solved by lexicographically sorting the rows and columns of $\mathcal{A}$ in $O\left(mn\right)$ time (e.g., using radix sort). ☐

**Observation**

**2.**

_{∞}-instance with arbitrary alphabet $\Sigma \subset \mathbb{R}$, one can find in $O\left(\right|\Sigma {|}^{2})$ time an equivalent instance with alphabet ${\Sigma}^{\prime}\subset \mathbb{Z}$ and cost value ${c}^{\prime}\in \mathbb{N}$.

**Proof.**

#### 2.2. Parameterized Algorithmics

## 3. Intractability Results

_{∞}is easy to solve for binary input matrices (Observation 1). In contrast to this, we show in this section that its computational complexity significantly changes as soon as the input matrix contains at least three different entries. In fact, even for very restricted special cases, we can show NP-hardness. These special cases comprise co-clusterings with a constant number of clusters (Section 3.1) or input matrices with only two rows (Section 3.2). We also show the NP-hardness of finding co-clusterings where the row and column partitions are only allowed to contain consecutive blocks (Section 3.3).

#### 3.1. Constant Number of Clusters

_{∞}is NP-hard even if the co-clustering consists only of nine clusters.

**Theorem**

**1.**

_{∞}is NP-hard for Σ = {0, 1, 2}.

**Proof.**

_{∞}instance $(\mathcal{A}\in {\{0,1,2\}}^{m\times n},k:=3,\ell :=3,c:=1)$ as follows. The columns of $\mathcal{A}$ correspond to the vertices V, and the rows correspond to the edges E. For an edge ${e}_{i}\phantom{\rule{3.33333pt}{0ex}}=\phantom{\rule{3.33333pt}{0ex}}\{{v}_{j},{v}_{{j}^{\prime}}\}\in \phantom{\rule{3.33333pt}{0ex}}E$ with $j<{j}^{\prime}$, we set ${a}_{ij}:=0$ and ${a}_{i{j}^{\prime}}:=2$. All other matrix entries are set to 1. Hence, each row corresponding to an edge $\{{v}_{j},{v}_{{j}^{\prime}}\}$ consists of 1-entries except for the columns j and ${j}^{\prime}$, which contain 0 and 2 (see Figure 2). Thus, every co-clustering of $\mathcal{A}$ with a cost at most $c=1$ puts column j and column ${j}^{\prime}$ into different column blocks. We next prove that there is a $(3,3)$-co-clustering of $\mathcal{A}$ with a cost at most $c=1$ if and only if G admits a 3-coloring.

**Corollary**

**1.**

_{∞}with $\Sigma =\{0,1,2\}$ is NP-hard for any $k\ge 3$, even when $\ell \ge 3$ is fixed, and the column blocks are forced to have equal sizes $|{J}_{1}|=\dots =|{J}_{\ell}|$.

**Proof.**

#### 3.2. Constant Number of Rows

**Theorem**

**2.**

_{∞}is NP-hard for $k=m=2$ and unbounded alphabet size $|\Sigma |$.

**Proof.**

_{s}have a pairwise distance at most two in both coordinates. Thus, there exists a square of side length two covering all of them. ☐

#### 3.3. Clustering into Consecutive Clusters

_{∞}is rooted in the fact that we are allowed to choose arbitrary subsets for the corresponding row and column partitions since the problem remains hard even for a constant number of clusters and also with equal cluster sizes. Hence, in this section, we consider a restricted version of Co-Clustering

_{∞}, where the row and the column partition has to consist of consecutive blocks. Formally, for row indices $R=\{{r}_{1},\dots ,{r}_{k-1}\}$ with $1<{r}_{1}<\dots <{r}_{k-1}\le m$ and column indices $C=\{{c}_{1},\dots ,{c}_{\ell -1}\}$ with $1<{c}_{1}<\dots <{c}_{\ell -1}\le n$, the corresponding consecutive $(k,\ell )$-co-clustering $({\mathcal{I}}_{R},{\mathcal{J}}_{C})$ is defined as:

_{∞}problem now is to find a consecutive $(k,\ell )$-co-clustering of a given input matrix with a given cost. Again, also this restriction is not sufficient to overcome the inherent intractability of co-clustering, that is we prove it to be NP-hard. Similarly to Section 3.2, we encounter a close relation of consecutive co-clustering to a geometric problem, namely to find an optimal discretization of the plane; a preprocessing problem with applications in data mining [5,6,7]. The NP-hard Optimal Discretization problem [6] is the following: Given a set $S=B\cup W$ of points in the plane, where each point is either colored black (B) or white (W), and integers $k,\ell \in \mathbb{N}$, decide whether there is a consistent set of k horizontal and ℓ vertical (axis-parallel) lines. That is, the vertical and horizontal lines partition the plane into rectangular regions, such that no region contains two points of different colors (see Figure 4 for an example). Here, a vertical (horizontal) line is a simple number denoting its x- (y-) coordinate.

**Theorem**

**3.**

_{∞}is NP-hard for Σ = {0, 1, 2}.

**Proof.**

_{∞}instance $(\mathcal{A},k+1,\ell +1,c)$ as follows: The matrix $\mathcal{A}\in {\{0,1,2\}}^{m\times n}$ has columns labeled with ${x}_{1}^{\ast},\dots ,{x}_{n}^{\ast}$ and rows labeled with ${y}_{1}^{\ast},\dots ,{y}_{m}^{\ast}$. For $(x,y)\in X\times Y$, the entry ${a}_{xy}$ is defined as zero if $(x,y)\in W$, two if $(x,y)\in B$ and otherwise one. The cost is set to $c:=1$. Clearly, this instance can be constructed in polynomial time.

_{∞}is NP-hard, there still is some difference in its computational complexity compared to the general version. In contrast to Co-Clustering

_{∞}, the consecutive version is polynomial-time solvable for constants k and ℓ by simply trying out all $O\left({m}^{k}{n}^{\ell}\right)$ consecutive partitions of the rows and columns.

## 4. Tractability Results

_{∞}is NP-hard for $k=\ell =3$ and also for $k\phantom{\rule{3.33333pt}{0ex}}=\phantom{\rule{3.33333pt}{0ex}}2$ in the case of unbounded ℓ and $|\Sigma |$. In contrast to these hardness results, we now investigate which parameter combinations yield tractable cases. It turns out (Section 4.2) that the problem is polynomial-time solvable for $k=\ell =2$ and for $k=1$. We can even solve the case $k=2$ and $\ell \ge 3$ for $|\Sigma |=3$ in polynomial time by showing that this case is in fact equivalent to the case $k=\ell =2$. Note that these tractability results nicely complement the hardness results from Section 3. We further show fixed-parameter tractability for the parameters size of the alphabet $|\Sigma |$ and the number of column blocks ℓ (Section 4.3).

_{∞}to CNF-SAT (the satisfiability problem for Boolean formulas in conjunctive normal form). Later on, it will be used in some special cases (see Theorems 5 and 7), because there, the corresponding formula, or an equivalent formula, only consists of clauses containing two literals, thus being a polynomial-time solvable 2-SAT instance.

#### 4.1. Reduction to CNF-SAT Solving

_{∞}via CNF-SAT. The first approach is based on a straightforward reduction of a Co-Clustering

_{∞}instance to one CNF-SAT instance with clauses of size at least four. Note that this does not yield any theoretical improvements in general. Hence, we develop a second approach, which requires solving $O\left(\right|\Sigma {|}^{k\ell})$ many CNF-SAT instances with clauses of size at most $\text{max}\{k,\ell ,2\}$. The theoretical advantage of this approach is that if k and ℓ are constants, then there are only polynomially many CNF-SAT instances to solve. Moreover, the formulas contain smaller clauses (for $k\le \ell \le 2$, we even obtain polynomial-time solvable 2-SAT instances). While the second approach leads to (theoretically) tractable special cases, it is not clear that it also performs better in practice. This is why we conducted some experiments for empirical comparison of the two approaches (in fact, it turns out that the straightforward approach allows one to solve larger instances). In the following, we describe the reductions in detail and briefly discuss the experimental results.

_{∞}to CNF-SAT. We simply introduce a variable ${x}_{i,r}$ (${y}_{j,s}$) for each pair of row index $i\in \left[m\right]$ and row block index $r\in \left[k\right]$ (respectively, column index $j\in \left[n\right]$ and column block index $s\in \left[\ell \right]$) denoting whether the respective row (column) may be put into the respective row (column) block. For each row i, we enforce that it is put into at least one row block with the clause $({x}_{i,1}\vee \dots \vee {x}_{i,k})$ (analogously for the columns). We encode the cost constraints by introducing $k\ell $ clauses $(\neg {x}_{i,r}\vee \neg {x}_{{i}^{\prime},r}\vee \neg {y}_{j,s}\vee \neg {y}_{{j}^{\prime},s})$, $(r,s)\in \left[k\right]\times \left[\ell \right]$ for each pair of entries ${a}_{ij},{a}_{{i}^{\prime}{j}^{\prime}}\in \mathcal{A}$ with $|{a}_{ij}-{a}_{{i}^{\prime}{j}^{\prime}}|>c$. These clauses simply ensure that ${a}_{ij}$ and ${a}_{{i}^{\prime}{j}^{\prime}}$ are not put into the same cluster. Note that this reduction yields a CNF-SAT instance with $km+\ell n$ variables and $O\left({\left(mn\right)}^{2}k\ell \right)$ clauses of size up to $\text{max}\{k,\ell ,4\}$.

_{∞}instance on $\mathcal{A}$, solve $(k,n)$-Co-Clustering

_{∞}and $(m,\ell )$-Co-Clustering

_{∞}separately for input matrix $\mathcal{A}$. Let $({\mathcal{I}}_{1},{\mathcal{J}}_{1})$ and $({\mathcal{I}}_{2},{\mathcal{J}}_{2})$ denote the $(k,n)$- and $(m,\ell )$-co-clustering, respectively, and let their costs be ${c}_{1}:=\text{cost}({\mathcal{I}}_{1},{\mathcal{J}}_{1})$ and ${c}_{2}:=\text{cost}({\mathcal{I}}_{2},{\mathcal{J}}_{2})$. We take $\text{max}\{{c}_{1},{c}_{2}\}$ as a lower bound and ${c}_{1}+{c}_{2}$ as an upper bound on the optimal cost value for an optimal ($k,\ell $)-co-clustering of $\mathcal{A}$. It is straightforward to argue the correctness of the lower bound, and we next show that ${c}_{1}+{c}_{2}$ is an upper bound. Consider any pair $(i,j),({i}^{\prime},{j}^{\prime})\in \left[m\right]\times \left[n\right]$, such that i and ${i}^{\prime}$ are in the same row block of ${\mathcal{I}}_{1}$, and j and ${j}^{\prime}$ are in the same column block of ${\mathcal{J}}_{2}$ (that is, ${a}_{ij}$ and ${a}_{{i}^{\prime}{j}^{\prime}}$ are in the same cluster). Then, it holds $|{a}_{ij}-{a}_{{i}^{\prime}{j}^{\prime}}|\le |{a}_{ij}-{a}_{{i}^{\prime}j}|+|{a}_{{i}^{\prime}j}-{a}_{{i}^{\prime}{j}^{\prime}}|\le {c}_{1}+{c}_{2}$. Hence, just taking the row partitions from $({\mathcal{I}}_{1},{\mathcal{J}}_{1})$ and the column partitions from $({\mathcal{I}}_{2},{\mathcal{J}}_{2})$ gives a combined ($k,\ell $)-co-clustering of cost at most ${c}_{1}+{c}_{2}$.

_{∞}via CNF-SAT does not yield any improvement in terms of polynomial-time solvability. Therefore, we now describe a different approach, which leads to some polynomial-time solvable special cases. To this end, we introduce the concept of cluster boundaries, which are basically lower and upper bounds for the values in a cluster of a co-clustering. Formally, given two integers $k,\ell $, an alphabet Σ and a cost c, we define a cluster boundary to be a matrix $\mathcal{U}=\left({u}_{rs}\right)\in {\Sigma}^{k\times \ell}$. We say that a $(k,\ell )$-co-clustering of $\mathcal{A}$ satisfies a cluster boundary $\mathcal{U}$ if ${\mathcal{A}}_{rs}\subseteq [{u}_{rs},{u}_{rs}+c]$ for all $(r,s)\in \left[k\right]\times \left[\ell \right]$. It can easily be seen that a given $(k,\ell )$-co-clustering has cost at most c if and only if it satisfies at least one cluster boundary $\left({u}_{rs}\right)$, namely the one with ${u}_{rs}=\text{min}{\mathcal{A}}_{rs}$.

_{∞}can be reduced to a certain CNF-SAT instance: Given a cluster boundary $\mathcal{U}$ and a Co-Clustering

_{∞}instance I, find a co-clustering for I that satisfies $\mathcal{U}$. The polynomial-time reduction provided by the following lemma can be used to obtain exact Co-Clustering

_{∞}solutions with the help of SAT solvers, and we use it in our subsequent algorithms.

**Lemma**

**1.**

_{∞}-instance $(\mathcal{A},k,\ell ,c)$ and a cluster boundary $\mathcal{U}$, one can construct in polynomial time a CNF-SAT instance φ with at most $\text{max}\{k,\ell ,2\}$ variables per clause, such that φ is satisfiable if and only if there is a ($k,\ell $)-co-clustering of $\mathcal{A}$, which satisfies $\mathcal{U}$.

**Proof.**

_{∞}and a cluster boundary $\mathcal{U}=\left({u}_{rs}\right)\in {\Sigma}^{k\times \ell}$, we define the following Boolean variables: For each $(i,r)\in \left[m\right]\times \left[k\right]$, the variable ${x}_{i,r}$ represents the expression “row i could be put into row block ${I}_{r}$”. Similarly, for each $(j,s)\in \left[n\right]\times \left[\ell \right]$, the variable ${y}_{j,s}$ represents that “column j could be put into column block ${J}_{s}$”.

_{∞}by solving $O\left(\right|\Sigma {|}^{k\ell})$ many CNF-SAT instances (one for each possible cluster boundary) with $km+\ell n$ variables and $O\left(mnk\ell \right)$ clauses of size at most $\text{max}\{k,\ell ,2\}$. We also implemented this approach for comparison with the straightforward reduction to CNF-SAT above. The bottleneck of this approach, however, is the number of possible cluster boundaries, which grows extremely quickly. While a single CNF-SAT instance can be solved quickly, generating all possible cluster boundaries together with the corresponding CNF formulas becomes quite expensive, such that we could only solve instances with very small values of $|\Sigma |\le 4$ and $k\le \ell \le 5$.

#### 4.2. Polynomial-Time Solvability

_{∞}, that is the variant where all rows belong to one row block.

**Theorem**

**4.**

_{∞}is solvable in $O\left(n\right(m+\text{log}n\left)\right)$ time.

**Proof.**

_{∞}. In fact, it even computes the minimum ${\ell}^{\prime}$, such that $\mathcal{A}$ has a $(1,{\ell}^{\prime})$-co-clustering of cost c. The overall idea is that with only one row block all entries of a column j are contained in a cluster in any solution, and thus, it suffices to consider only the minimum ${\alpha}_{j}$ and the maximum ${\beta}_{j}$ value in column j. More precisely, for a column block $J\subseteq \left[n\right]$ of a solution, it follows that $\text{max}\{{\beta}_{j}\mid j\in J\}-\text{min}\{{\alpha}_{j}\mid j\in J\}\le c$. The algorithm starts with the column ${j}_{1}$ that contains the overall minimum value ${\alpha}_{{j}_{1}}$ of the input matrix, that is ${\alpha}_{{j}_{1}}=\text{min}\{{\alpha}_{j}\mid j\in \left[n\right]\}$. Clearly, ${j}_{1}$ has to be contained in some column block, say ${J}_{1}$. The algorithm then adds all other columns j to ${J}_{1}$ where ${\beta}_{j}\le {\alpha}_{{j}_{1}}+c$, removes the columns ${J}_{1}$ from the matrix and recursively proceeds with the column containing the minimum value of the remaining matrix. We continue with the correctness of the described procedure.

Algorithm 1: Algorithm for (1, ∗)-Co-Clustering_{∞}. |

Input: $\mathcal{A}\in {\mathbb{R}}^{m\times n}$, $\ell \ge 1$, $c\ge 0$. |

Output: A partition of [n] into at most ℓ blocks yielding a cost of at most c, or no if no such partition exists. |

_{∞}$(\left\{\left[m\right]\right\},\{{J}_{1},\dots ,{J}_{{\ell}^{\prime}}\})\le c$.

`no`in Line 5, then it is clearly a no-instance, since the difference between the maximum and the minimum value in a column is larger than c. If

`no`is returned in Line 13, then the algorithm has computed column indices ${j}_{s}$ and column blocks ${J}_{s}$ for each $s\in \left[\ell \right]$, and there still exists at least one index ${j}_{\ell +1}$ in $\mathcal{N}$ when the algorithm terminates. We claim that the columns ${j}_{1},\dots ,{j}_{\ell +1}$ all have to be in different blocks in any solution. To see this, consider any $s,{s}^{\prime}\in [\ell +1]$ with $s<{s}^{\prime}$. By construction, ${j}_{{s}^{\prime}}\notin {J}_{s}$. Therefore, ${\beta}_{{j}_{{s}^{\prime}}}>{\alpha}_{{j}_{s}}+c$ holds, and columns ${j}_{s}$ and ${j}_{{s}^{\prime}}$ contain elements with distance more than c. Thus, in any co-clustering with cost at most c, columns ${j}_{1},\dots ,{j}_{\ell +1}$ must be in different blocks, which is impossible with only ℓ blocks. Hence, we indeed have a no-instance.

**Theorem**

**5.**

_{∞}is solvable in $O\left(\right|\Sigma {|}^{2}mn)$ time.

**Proof.**

**Theorem**

**6.**

_{∞}is $O\left(mn\right)$-time solvable for $|\Sigma |=3$.

**Proof.**

_{∞}instance. We assume without loss of generality that $\alpha <\beta <\gamma $. The case $\ell \le 2$ is solvable in $O\left(mn\right)$ time by Theorem 5. Hence, it remains to consider the case $\ell \ge 3$. As $|\Sigma |=3$, there are four potential values for a minimum-cost ($2,\ell $)-co-clustering. Namely, cost zero (all cluster entries are equal), cost $\beta -\alpha $, cost $\gamma -\beta $ and cost $\gamma -\alpha $. Since any ($2,\ell $)-co-clustering is of cost at most $\gamma -\alpha $ and because it can be checked in $O\left(mn\right)$ time whether there is a ($2,\ell $)-co-clustering of cost zero (Observation 1), it remains to check whether there is a ($2,\ell $)-co-clustering between these two extreme cases, that is for $c\in \{\beta -\alpha ,\gamma -\beta \}$.

**Case**

**1.**

**Case**

**2.**

#### 4.3. Fixed-Parameter Tractability

_{∞}for $c=1$ based on our reduction to CNF-SAT (see Lemma 1). The main idea is, given matrix $\mathcal{A}$ and cluster boundary $\mathcal{U}$, to simplify the Boolean formula ${\varphi}_{\mathcal{A},\mathcal{U}}$ into a 2-Sat formula, which can be solved efficiently. This is made possible by the constraint on the cost, which imposes a very specific structure on the cluster boundary. This approach requires to enumerate all (exponentially many) possible cluster boundaries, but yields fixed-parameter tractability for the combined parameter $(\ell ,|\Sigma \left|\right)$.

**Theorem**

**7.**

_{∞}is $O\left(\right|\Sigma {|}^{3\ell}{n}^{2}{m}^{2})$-time solvable for $c=1$.

**Lemma**

**2.**

_{∞}in $O\left(\right|\Sigma {|}^{k\ell}mn\ell )$ time. Moreover, Co-Clustering

_{∞}is fixed-parameter tractable with respect to the combined parameter ($m,k,\ell ,c$).

**Proof.**

**Lemma**

**3.**

_{∞}, ${h}_{1}$ be an integer, $0<{h}_{1}<m$, and $\mathcal{U}=\left({u}_{rs}\right)$ be a cluster boundary with pairwise different columns, such that $|{u}_{1s}-{u}_{2s}|=1$ for all $s\in \left[\ell \right]$.

**Proof.**

- ${\u266f}_{j}^{x}=0$ for any $x\notin \{\alpha ,\beta ,\gamma \}$,
- ${\u266f}_{j}^{\alpha}\le {h}_{1}$ and
- ${\u266f}_{j}^{\gamma}\le {h}_{2}$.

**Proof.**

_{∞}instance. The proof is by induction on ℓ. For $\ell =1$, the problem is solvable in $O\left(n\right(m+\text{log}n\left)\right)$ time (Theorem 4). We now consider general values of ℓ. Note that if ℓ is large compared to m (that is, ${2}^{m}<{|\Sigma |}^{\ell}$), then one can directly guess the row partition and run the algorithm of Lemma 2. Thus, for the running time bound, we now assume that $\ell <m$. By Observation 2, we can assume that $\Sigma \subset \mathbb{Z}$.

- with equal bounds if ${U}_{1s}={U}_{2s}$,
- with non-overlapping bounds if ${U}_{1s}\cap {U}_{2s}=\varnothing $,
- with properly overlapping bounds otherwise.

**Claim**

**1.**

**Proof.**

_{∞}. By induction, each of these cases can be tested in $O\left(\right|\Sigma {|}^{2(\ell -1)}{n}^{2}m(\ell -1))$ time. Since we test all values of u, this procedure finds a solution with a column block having equal bounds in $O\left(\right|\Sigma |\xb7{|\Sigma |}^{2(\ell -1)}{n}^{2}{m(\ell -1))=O(|\Sigma |}^{2\ell}{n}^{2}{m}^{2})$ time. ☐

**Claim**

**2.**

**Proof.**

**Corollary**

**2.**

_{∞}with $c=1$ is fixed-parameter tractable with respect to parameter $|\Sigma |$ and with respect to parameter ℓ.

**Proof.**

_{∞}with $c=1$, both parameters can be polynomially upper-bounded within each other. Indeed, ${\ell <|\Sigma |}^{2}$ (otherwise, there are two column blocks with identical cluster boundaries, which could be merged) and $|\Sigma |<2(c+1)\ell =4\ell $ (each column block may contain two intervals, each covering at most $c+1$ elements). ☐

## 5. Conclusions

_{∞}problem, contributing a detailed view of its computational complexity landscape. Refer to Table 1 for an overview on most of our results.

_{∞}is polynomial-time solvable for ternary matrices (Theorem 6). Another open question is the computational complexity of higher-dimensional co-clustering versions, e.g., on three-dimensional tensors as input (the most basic case here corresponds to (2,2,2)-Co-Clustering

_{∞}, that is partitioning each dimension into two subsets). Indeed, other than the techniques for deriving approximation algorithms [2,8], our exact methods do not seem to generalize to higher dimensions. Last, but not least, we do not know whether Consecutive Co-Clustering

_{∞}is fixed-parameter tractable or W[1]-hard with respect to the combined parameter $(k,\ell )$.

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Madeira, S.C.; Oliveira, A.L. Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Trans. Comput. Biol. Bioinf.
**2004**, 1, 24–45. [Google Scholar] [CrossRef] [PubMed] - Anagnostopoulos, A.; Dasgupta, A.; Kumar, R. A Constant-Factor Approximation Algorithm for Co-clustering. Theory Comput.
**2012**, 8, 597–622. [Google Scholar] [CrossRef] - Banerjee, A.; Dhillon, I.S.; Ghosh, J.; Merugu, S.; Modha, D.S. A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation. J. Mach. Learn. Res.
**2007**, 8, 1919–1986. [Google Scholar] - Tanay, A.; Sharan, R.; Shamir, R. Biclustering Algorithms: A Survey. In Handbook of Computational Molecular Biology; Chapman & Hall/CRC: Boca Raton, FL, USA, 2005. [Google Scholar]
- Nguyen, S.H.; Skowron, A. Quantization Of Real Value Attributes-Rough Set and Boolean Reasoning Approach. In Proceedings of the Second Joint Annual Conference on Information Sciences, Wrightsville Beach, NC, USA, 28 September–1 October 1995; pp. 34–37.
- Chlebus, B.S.; Nguyen, S.H. On Finding Optimal Discretizations for Two Attributes. In Proceedings of the First International Conference on Rough Sets and Current Trends in Computing (RSCTC’98), Warsaw, Poland, 22–26 June 1998; pp. 537–544.
- Nguyen, H.S. Approximate Boolean Reasoning: Foundations and Applications in Data Mining. In Transactions on Rough Sets V; Springer: Berlin Heidelberg, Germany, 2006; pp. 334–506. [Google Scholar]
- Jegelka, S.; Sra, S.; Banerjee, A. Approximation Algorithms for Tensor Clustering. In Proceedings of the 20th International Conference of Algorithmic Learning Theory (ALT’09), Porto, Portugal, 3–5 October 2009; pp. 368–383.
- Hartigan, J.A. Direct clustering of a data matrix. J. Am. Stat. Assoc.
**1972**, 67, 123–129. [Google Scholar] [CrossRef] - Califano, A.; Stolovitzky, G.; Tu, Y. Analysis of Gene Expression Microarrays for Phenotype Classification. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB’00), AAAI, San Diego, CA, USA, 16–23 August 2000; pp. 75–85.
- Wulff, S.; Urner, R.; Ben-David, S. Monochromatic Bi-Clustering. In Proceedings of the 30th International Conference on Machine Learning (ICML’13), Atlanta, GA, USA, 16–21 June 2013; pp. 145–153.
- Cygan, M.; Fomin, F.V.; Kowalik, Ł.; Lokshtanov, D.; Marx, D.; Pilipczuk, M.; Pilipczuk, M.; Saurabh, S. Parameterized Algorithms; Springer International Publishing: Switzerland, 2015. [Google Scholar]
- Downey, R.G.; Fellows, M.R. Fundamentals of Parameterized Complexity; Springer: London, UK, 2013. [Google Scholar]
- Niedermeier, R. Invitation to Fixed-Parameter Algorithms; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
- Garey, M.R.; Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completeness; W. H. Freeman and Company: New York, NY, USA, 1979. [Google Scholar]
- Fowler, R.J.; Paterson, M.S.; Tanimoto, S.L. Optimal Packing and Covering in the Plane are NP-Complete. Inf. Process. Lett.
**1981**, 12, 133–137. [Google Scholar] [CrossRef] - Biere, A. PicoSAT Essentials. J. Satisf. Boolean Model. Comput.
**2008**, 4, 75–97. [Google Scholar] - Lampert, C.H.; Nickisch, H.; Harmeling, S. Attribute-Based Classification for Zero-Shot Visual Object Categorization. IEEE Trans. Pattern Anal. Mach. Intell.
**2013**, 36, 453–465. [Google Scholar] [CrossRef] [PubMed] - Aspvall, B.; Plass, M.F.; Tarjan, R.E. A Linear-Time Algorithm for Testing the Truth of Certain Quantified Boolean Formulas. Inf. Process. Lett.
**1979**, 8, 121–123. [Google Scholar] [CrossRef]

**Figure 1.**The example shows two (2, 2)-co-clusterings (middle and right) of the same matrix $\mathcal{A}$ (left-hand side). It demonstrates that by sorting rows and columns according to the co-clustering, the clusters can be illustrated as submatrices of this (permuted) input matrix. The cost of the (2, 2)-co-clustering in the middle is three (because of the two left clusters), and that of the (2, 2)-co-clustering on the right-hand side is one.

**Figure 2.**An illustration of the reduction from 3-Coloring.

**Left**: An undirected graph with a proper 3-coloring of the vertices, such that no two neighboring vertices have the same color.

**Right**: The corresponding matrix where the columns are labeled by vertices and the rows by edges with a (3, 3)-co-clustering of cost one. The coloring of the vertices determines the column partition into three columns blocks, whereas the row blocks are generated by the following simple scheme: edges where the vertex with a smaller index is red/blue (dark)/yellow (light) are in the first/second/third row block (e.g., the red-yellow edge {2, 5} is in the first block; the blue-red edge {1, 6} is in the second block; and the yellow-blue edge {3, 4} is in the third block).

**Figure 3.**Example of a Box Cover instance with seven points (

**left**) and the corresponding Co-Clustering

_{∞}matrix containing the coordinates of the points as columns (

**right**). Indicated is a (2, 3)-co-clustering of cost two where the column blocks are colored according to the three squares (of side length two) that cover all points.

**Figure 4.**Example instance of Optimal Discretization (

**left**) and the corresponding instance of Consecutive Co-Clustering

_{∞}(

**right**). The point set consists of white (circles) and black (diamonds) points. A solution for the corresponding Consecutive Co-Clustering

_{∞}instance (shaded clusters) naturally translates into a consistent set of lines.

**Table 1.**Overview of results for $(k,\ell )$-Co-Clustering

_{∞}with respect to various parameter constellations (m: number of rows; $|\Sigma |$: alphabet size; $k/\ell $: size of row/column partition; c: cost). A ⊛ indicates that the corresponding value is considered as a parameter, where FPT (fixed-parameter tractable (FPT)) means that there is an algorithm solving the problem where the super-polynomial part in the running time is a function depending solely on the parameter. Multiple ⊛’s indicate a combined parameterization. Other non-constant values may be unbounded.

m | |Σ| | k | ℓ | c | Complexity |
---|---|---|---|---|---|

- | - | - | - | 0 | P [Observation 1] |

- | 2 | - | - | - | P [Observation 1] |

- | - | 1 | - | - | P [Theorem 4] |

- | - | 2 | 2 | - | P [Theorem 5] |

- | 3 | 2 | - | - | P [Theorem 6] |

- | - | 2 | ⊛ | 1 | FPT [Corollary 2] |

- | ⊛ | 2 | - | 1 | FPT [Corollary 2] |

⊛ | - | ⊛ | ⊛ | ⊛ | FPT [Lemma 2] |

- | 3 | 3 | 3 | 1 | NP-hard [Theorem 1] |

2 | - | 2 | - | 2 | NP-hard [Theorem 2] |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bulteau, L.; Froese, V.; Hartung, S.; Niedermeier, R. Co-Clustering under the Maximum Norm. *Algorithms* **2016**, *9*, 17.
https://doi.org/10.3390/a9010017

**AMA Style**

Bulteau L, Froese V, Hartung S, Niedermeier R. Co-Clustering under the Maximum Norm. *Algorithms*. 2016; 9(1):17.
https://doi.org/10.3390/a9010017

**Chicago/Turabian Style**

Bulteau, Laurent, Vincent Froese, Sepp Hartung, and Rolf Niedermeier. 2016. "Co-Clustering under the Maximum Norm" *Algorithms* 9, no. 1: 17.
https://doi.org/10.3390/a9010017