# A New Modified Histogram Matching Normalization for Time Series Microarray Analysis

## Abstract

## 1. Introduction

## 2. Methods

#### Algorithm for Modified Histogram Matching Normalization

- (1)
- First sort all data in the whole data matrix according to magnitude from low to high;
- (2)
- Partition this sorted dataset into bins $B\left(i\right)$ ($i=1,\dots ,M$), each bin containing exactly N numbers;
- (3)
- Sort each column in the original unsorted data matrix according to magnitude from low to high. This results in an $M\times N$ matrix S with elements ${s}_{ij}$, where each column contains the same elements as in the original unsorted data matrix but in an order where the smallest values are on top and largest at the bottom;
- (4)
- For $i=1,\dots ,M$ and $j=1,\dots ,N$, scale all elements in the ith row of matrix S using the following scaling function f$$f\left({s}_{ij}\right)=\left(max\left(B\right(i\left)\right)-min\left(B\right(i\left)\right)\right)\frac{{s}_{ij}-min\left(S\left(i\right)\right)}{max\left(S\right(i\left)\right)-min\left(S\right(i\left)\right)}+min\left(B\left(i\right)\right)\phantom{\rule{4pt}{0ex}}$$
- (5)
- Return each scaled element in each column back to their original unsorted positions within the columns.

## 3. Results

#### 3.1. Effects on Correlation

**Figure 1.**On the top row (

**B**and

**C**) we have heat maps of the original data (

**A**) and of the same data after applying HMHN and QN respectively. The second row (

**D**,

**E**,

**F**) shows the heat maps of the correlations of the data in the first row. Thus, the color of a block in ith row and jth column of the matrix below indicates how strongly the rows i and j in the matrix above are correlated.

**Figure 2.**Distributions of mean log errors in correlations after QN and MHMN normalization. These distributions were obtained from 500 simulations.

#### 3.2. Effects on Correlation on Real Data

**Figure 3.**In panel (

**A**), we have plotted the smoothed histograms of average errors, when there is no plate-specific noise. In panel (

**B**), similarly, but time-point specific additive noise applied to the data.

#### 3.3. Effects on Reverse-Engineering via ODEs

**Figure 4.**(

**A**) Example of a network structure with 4 nodes and 4 interactions. (

**B**) Solutions of the ODE for $x=({P}_{1},{P}_{2},{P}_{3},{P}_{4})$.

**Figure 5.**In panel (

**A**) we use noiseless data and compare the inferred parameters after QN and MHMN normalizations in each experiment to the original parameters used in generating data. In panel (

**B**) the data contains time-point specific multiplicative noise. In both cases QN gives significantly larger errors than MHMN.

## 4. Conclusions

## Acknowledgements

## Author Contributions

## Conflicts of Interest

## Appendix

## Quantile normalization

- (1)
- First each column is ordered so that the smallest value comes to the top:$${M}^{\prime}=\left(\begin{array}{ccc}2& 4& 4\\ 3& 4& 5\\ 5& 5& 6\end{array}\right)\phantom{\rule{2.em}{0ex}}{I}^{\prime}=\left(\begin{array}{ccc}(3,1)& (2,2)& (2,3)\\ (2,1)& (3,2)& (1,3)\\ (1,1)& (1,2)& (3,3)\end{array}\right)\phantom{\rule{4pt}{0ex}}$$
- (2)
- Then each value is replaced by the row mean. For example the row mean for the first row is $\frac{1}{3}(2+4+4)=3.33\dots $.$${M}^{\u2033}=\left(\begin{array}{ccc}3.33& 3.33& 3.33\\ 4& 4& 4\\ 5.33& 5.33& 5.33\end{array}\right)\phantom{\rule{2.em}{0ex}}{I}^{\prime}=\left(\begin{array}{ccc}(3,1)& (2,2)& (2,3)\\ (2,1)& (3,2)& (1,3)\\ (1,1)& (1,2)& (3,3)\end{array}\right)\phantom{\rule{4pt}{0ex}}$$
- (3)
- Finally each element is returned to their original position:$$QN\left(M\right)=\left(\begin{array}{ccc}5.33& 5.33& 4\\ 4& 3.33& 3.33\\ 3.33& 4& 5.33\end{array}\right)\phantom{\rule{2.em}{0ex}}I=\left(\begin{array}{ccc}(1,1)& (1,2)& (1,3)\\ (2,1)& (2,2)& (2,3)\\ (3,1)& (3,2)& (3,3)\end{array}\right)\phantom{\rule{4pt}{0ex}}$$

## Modified histogram matching normalization

- (2)
- Data is ordered and divided into bins: $B\left(1\right)=(2,3,4)$, $B\left(2\right)=(4,4,5)$ and $B\left(3\right)=(5,5,6)$.
- (3)
- Instead of taking the row means, the scaling function f as defined in Section 2 is applied to each element. For example element ${M}^{\prime}(2,1)=3$ and$$\begin{array}{cc}\hfill f\left(3\right)& =\left(max\left(B\right(2\left)\right)-min\left(B\right(2\left)\right)\right)\frac{(3-min(S\left(2\right)\left)\right)}{(max(S\left(2\right))-min(S\left(2\right)\left)\right)}+min\left(B\left(2\right)\right)\hfill \\ & =\left(5-4\right)\frac{(3-3)}{(5-3)}+4=4\phantom{\rule{4pt}{0ex}}\hfill \end{array}$$This scaling results in the following matrix:$${M}^{\u2033}=\left(\begin{array}{ccc}2& 4& 4\\ 4& 4.5& 5\\ 5& 5& 6\end{array}\right)\phantom{\rule{2.em}{0ex}}{I}^{\prime}=\left(\begin{array}{ccc}(3,1)& (2,2)& (2,3)\\ (2,1)& (3,2)& (1,3)\\ (1,1)& (1,2)& (3,3)\end{array}\right)\phantom{\rule{4pt}{0ex}}$$
- (4)
- Finally the scaled elements are returned into their original positions:$$MHMN\left(M\right)=\left(\begin{array}{ccc}5& 5& 5\\ 4& 4& 4\\ 2& 4.5& 6\end{array}\right)\phantom{\rule{2.em}{0ex}}I=\left(\begin{array}{ccc}(1,1)& (1,2)& (1,3)\\ (2,1)& (2,2)& (2,3)\\ (3,1)& (3,2)& (3,3)\end{array}\right)\phantom{\rule{4pt}{0ex}}$$

