#
The Algorithm and Structure for Digital Normalized Cross-Correlation by Using First-Order Moment^{ †}

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Normalized Cross-Correlation Based on First-Order Moment

#### 2.1. Cross-Correlation

_{k}(k = 0, 1, 2, …, L) that divide the index set i∈{0, 1, …, N − 1} into L subsets, depending on the max value in the correlation kernel { g(i) }. Specifically,

_{k}is a set of indices i that corresponds to g(i) = k in actual. Then a new (L + 1)-point sequence { a

_{k}(n) } is defined by subsets S

_{k}[28], which is

_{k}(n) could be acted as the sum of elements in the sequence { f(n + i) } while the parameter i corresponds to g(i) = k. The computation of the { a

_{k}(n) } is actually a statistics procedure for counting how much k would be accumulated in the computation of the c(n). Therefore, the relationship between { f(n + i) } and { a

_{k}(n) } can be described as:

_{k}(n) }, and $\sum _{k=1}^{L}{a}_{k}(n)k$ in Equation (4b) is a first-order moment of { a

_{k}(n) }. As a result, the Equation (1) can be transformed into:

#### 2.2. Normalized Cross-Correlation

## 3. The Fast Algorithm and Systolic Array for First-Order Moment

#### 3.1. The Fast Algorithm for First-Order Moment

Algorithm 1 Moment (a_{L}(n), a_{L} _{− 1}(n), …, a_{0}(n)) |

Define the array a with two elements |

Initial a $\leftarrow $ ( a_{L}(n), a_{L}(n) ) |

for each k $\in $ [2, L] do // Equation (12) |

a[1] $\leftarrow $ a[1] + a[0] // 1-network F(a) |

a[1] $\leftarrow $ a[1] + a_{L-k+}_{1}(n) |

a[0] $\leftarrow $ a[0] + a_{L-k+}_{1}(n) |

end for |

a[0] $\leftarrow $ a[0] + a_{0}(n) |

return a |

#### 3.2. The Systolic Array for First-Order Moment

_{k}(n) } into this systolic array and get a $({\displaystyle \sum {a}_{k}(n),\text{}}{\displaystyle \sum {a}_{k}(n)k})$.

_{k}(n) (k = 2, …, L) should be input into the (L − 1) 1-networks respectively rather than simultaneously. Generally, a single a

_{k}(n) (k > 0) is input into the (L − k)-th 1-network with a latency n + 2 (L – 1 − k) clock cycle. Hence, in Figure 3, we use the extra latch array to generate latency for a

_{k}(n) before it is input into the corresponding 1-network. The number of latch array and latency time is shown in the note “[ ]”, which leads to the occurrence that different a

_{k}(n) are input into the different 1-networks at regular intervals. As a result, the total execution time of this systolic array to compute N-point $\sum {a}_{k}(n)k$ (n = 0, 1, …, N − 1) is that

#### 3.3. The Improvement of the Fast Algorithm and Systolic Array for First-Order Moment

_{k}(n) } into two smaller moments. This even–odd relationship is illustrated as:

## 4. The Fast Algorithm for Normalized Cross-Correlation

#### 4.1. The Optimization Methods

#### 4.2. The Step of the Fast Algorithm for NCC

**Step 1**- Initializing all a
_{k}(n) = 0 (k = 0, 1, …, L), where a_{0}(n) is indispensable for $\sum {a}_{k}(n)$. **Step 2**- Implementing Equation (3) to acquire the sequence { a
_{k}(n) } using N addition. **Step 3**- Computing $\sum {a}_{k}(n)$, $\sum {a}_{k}(n)k$ by Equation (13) and Figure 4 with 5L/2 − 1additions.
**Step 4**- Computing b(n) by Equation (14) with 1 multiplication, 2 additions and 1 subtraction.
**Step 5**- Inputting $\sum {a}_{k}(n)$, $\sum {a}_{k}(n)k$ and b(n) into Equation (9) for a NCC ρ(n), which need 2 subtractions, 4 multiplications, 1 division and 1 square root calculation.

Algorithm 2 Computing NCC ( n, f, g, b(n-1) ) |

for each a_{k} in the sequence { a_{k} }: a_{k} $\leftarrow $ 0 |

for each i $\in $ [0, N-1] do // Equation (3) |

k $\leftarrow $ g(i) |

a_{k} $\leftarrow $a_{k} + f(n + i) |

end for |

for each k $\in $ [1, L/2] do // Equation (13a) |

s $\leftarrow $ s + a_{2k−1} |

a_{k} $\leftarrow $ a_{2k−1} + a_{2k} |

end for |

a $\leftarrow $ Moment ( a_{L}_{/2}, a_{L}_{/2−1}, …, a_{2}, a_{1}, a_{0}) // Algorithm 1 |

a[1] $\leftarrow $ a[1] << 1 – s // Equation (13b) |

Compute b(n) by b(n-1), f(n + N − 1) and f(n − 1) // Equation (14) |

Compute ρ(n) by a[0], a[1] and b(n) // Equation (9) |

return ρ(n) |

## 5. The Systolic Array for Normalized Cross-Correlation

**A**to compute $\{\text{}{a}_{2k-1}(n)+{a}_{2k}(n)\text{}\}$, the module

**M**to compute the first-order and zero-order moment of { a

_{k}(n) }, and the module

**S**to compute b(n). In each cycle, we simultaneously input N-point f(n + i) into this systolic array and get an NCC result ρ(n). At first, since the direct computation for $\{\text{}{a}_{2k-1}(n)+{a}_{2k}(n)\text{}\}$ needs many adders, a simplified structure for the module

**A**is discussed in Section 5.1.

#### 5.1. The Module **A**

**A**is to acquire an L/2-point sequence $\{\text{}{a}_{2k-1}(n)+{a}_{2k}(n)\text{}\}$ according to Equations (3) and (13) in every clock cycle. It includes L + 1 sub-modules A

_{k}(k = 0, 1, 2, …, L) that firstly count { f(n + i) } to generate corresponding { a

_{k}(n) }, and then sum up the two adjacent a

_{k}(n) to obtain $\{\text{}{a}_{2k-1}(n)+{a}_{2k}(n)\text{}\}$. We assume the execution time of the module

**A**is T

_{A}clock cycles. The N-point f(n + i) should be inputted into the sub-modules { A

_{k}} in a gradual way.

_{k}for less adder and data transfer. For example, for N = 4, L = 4 and { g(i) } = { 1, 2, 3, 4 }, the module

**A**could be simplified as shown in Figure 7 with 2 adder and T

_{A}= 1. However, for N = 4, L = 4 and { g(i) } = { 2, 1, 4, 2 }, the module A would be re-designed as shown in Figure 8 with 2 adder, 3 latches and T

_{A}= log

_{2}4 = 2. Therefore, the structure of the module

**A**should be not fixed, but changed with different sequences { g(i) } to reduce its hardware complexity. We also show the module

**A**using maximum adders when { g(i) } = { 4, 4, 4, 4 } in Figure 9a, and the module

**A**using 0 adders when { g(i) } = { 2, 4, 6, 8 } in Figure 9b. From Figure 7, Figure 8 and Figure 9, it can be obtained the adder number of the module

**A**is from 0 to N − 1, and the latency T

_{A}is from 0 to log

_{2}N.

#### 5.2. The Model **P**

**P**is to implement Equation (9) with 4 multipliers, 1 divider and 1 square root extractor. It receives a $\sum {a}_{k}(n)}k$ and a b(n), and output a corresponding ρ(n) in each cycle. Some fast methods can be applied for the square root operation. In addition, the fixed $\overline{g}$ and $\sum {[g(i)-\overline{g}]}^{2}$ are saved in advance against repeated computation.

#### 5.3. The Systolic Array

**S**implements Equation (14) and generates b(n) by 1 multiplier, 1 accumulator and 1 subtractor. Finally, the module

**P**generates NCC ρ(n). The systolic array’s total adder number is ranged from 2L − 2 to 2L + N − 3, and its multiplier number is 5.

**S**is set as b(0). In the n-th clock cycle, f (n + N − 1) and f (n − 1) would be input into the module

**S**to get b(n) with three clock cycles. Then b(n) is output from the module

**S**to the module

**P**with a latency T

_{A}+ L − 1. The aim is that b(n), $\sum {a}_{k}(n)$ and $\sum {a}_{k}(n)}k$ can arrive in the

**P**at the same time.

## 6. Comparisons

#### 6.1. Algorithm Comparison

_{2}N), the DA-based algorithm is the least addition complexity, and the fast NCC algorithm has zero multiplication. The proposed algorithm uses O(N

^{2}) additions that are more than the FFT-based and the DA-based algorithm, and O(N) multiplications that are more than the fast NCC algorithm. However, the FFT-based algorithm needs float addition and multiplication operations that are more complex than integer operations, the DA-based algorithm requires tedious decode address and very large memories, as well as that the fast NCC algorithm is the most addition complexity and not suitable for high-precision matching [15]. Figure 10 shows the four algorithms’ multiplication and addition number increasing along with N. It is obviously that the proposed algorithm’s multiplication number is lower than both the FFT-based algorithm’s and the DA-based algorithm’s, and its addition number is lower than the fast NCC algorithm’s when N > 320.

- (1)
- With less multiplications and memory.
- (2)
- Simple computational structure due to its simple implementation.
- (3)
- Precision and Fit to discrete domain as it uses integer operations [32].
- (4)
- Without limitations on the length of NCC.
- (5)
- Implementation by simple systolic structure.

#### 6.2. Structure Comparison

**P**is assumed as three clock cycles.

^{N}) ROMs that are hardware-expensive when N > 16. The structure [22] has minimum latency, but its throughput is more than 1. The structure [33] needs the O(P) adder and latency that would increase rapidly with N.

## 7. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Annaby, M.H.; Fouda, Y.M.; Rushdi, M.A. Improved Normalized Cross-Correlation for Defect Detection in Printed-Circuit Boards. IEEE Trans. Semicond. Manuf.
**2019**, 32, 199–211. [Google Scholar] [CrossRef] - Duong, D.H.; Chen, C.S.; Chen, L.C. Absolute Depth Measurement Using Multiphase Normalized Cross-Correlation for Precise Optical Profilometry. Sensors
**2019**, 19, 1–20. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Banharnsakun, A. Feature point matching based on ABC-NCC algorithm. Evol. Syst.
**2018**, 9, 71–80. [Google Scholar] [CrossRef] - Kotenko, I.; Saenko, I.; Branitskiy, A. Applying Big Data Processing and Machine Learning Methods for Mobile Internet of Things Security Monitoring. J. Internet Serv. Inf. Secur.
**2018**, 8, 54–63. [Google Scholar] - Sridevi, M.; Sankaranarayanan, N.; Jyothish, A.; Vats, A.; Lalwani, M. Automatic traffic sign recognition system using fast normalized cross correlation and parallel processing. In Proceedings of the 2017 International Conference on Intelligent Communication and Computational Techniques, Jaipur, India, 22–23 December 2017; pp. 200–204. [Google Scholar]
- Bovik, A.C. Basic Tools for Image Fourier Analysis. In The Essential Guide to Image Processing; Academic: San Diego, CA, USA, 2009. [Google Scholar]
- Wu, P.; Li, W.; Song, W.L. Fast, accurate normalized cross-correlation image matching. J. Intell. Fuzzy Syst.
**2019**, 37, 4431–4436. [Google Scholar] [CrossRef] - Liu, G.Q.; Kreinovich, V. Fast convolution and Fast Fourier Transform under interval and fuzzy. J. Comput. Syst. Sci.
**2010**, 76, 63–76. [Google Scholar] [CrossRef] [Green Version] - Kaso, A.; Li, Y. Computation of the normalized cross-correlation by fast Fourier transform. PLoS ONE
**2018**, 13, e0203434. [Google Scholar] [CrossRef] - Narasimha, M.J. Linear Convolution Using Skew-Cyclic Convolutions. IEEE Signal. Process. Lett.
**2010**, 14, 173–176. [Google Scholar] [CrossRef] - Cheng, L.Z.; Jiang, Z.R. An efficient algorithm for cyclic convolution based on fast-polynomial and fast-W transforms. Circuits Syst. Signal. Process.
**2001**, 20, 77–88. [Google Scholar] - Li, H.; Lee, W.S.; Wang, K. Immature green citrus fruit detection and counting based on fast normalized cross correlation (FNCC) using natural outdoor colour images. Precis. Agric.
**2016**, 17, 678–697. [Google Scholar] [CrossRef] - Tsai, D.M.; Lin, C.T. Fast normalized cross correlation for defect detection. Pattern Recognit. Lett.
**2003**, 24, 2625–2631. [Google Scholar] [CrossRef] - Byard, K. Application of fast cross-correlation algorithms. Electron. Lett.
**2015**, 51, 242–244. [Google Scholar] [CrossRef] - Yoo, J.C.; Choi, B.D.; Choi, H.K. 1-D fast normalized cross-correlation using additions. Digit. Signal. Process.
**2010**, 20, 1482–1493. [Google Scholar] [CrossRef] - Ismail, L.; Guerchi, D. Performance Evaluation of Convolution on the Cell Broadband Engine Processor. IEEE Trans. Parallel Distrib. Syst.
**2011**, 22, 337–351. [Google Scholar] [CrossRef] - Chaudhari, R.E.; Dhok, S.B. An Optimized Approach to Pipelined Architecture for Fast 2D Normalized Cross-Correlation. J. Circuits Syst. Comput.
**2019**, 28, 1950211. [Google Scholar] [CrossRef] - Mehendale, M.; Sharma, M.; Peher, P.K. DA-Based Circuits for Inner-Product Computation. In Arithmetic Circuits for DSP Application; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2017; pp. 77–112. [Google Scholar]
- Cao, L.; Liu, J.G.; Xiong, J.; Zhang, J. Novel structures for cyclic convolution using improved first-order moment algorithm. IEEE Trans. Circuits Syst. I Regul. Pap.
**2014**, 61, 2370–2379. [Google Scholar] [CrossRef] - Carranza, C.; Llamocca, D.; Pattichis, M. Fast 2D Convolutions and Cross-Correlations Using Scalable Architectures. IEEE Trans. Image Process.
**2017**, 26, 2230–2245. [Google Scholar] [CrossRef] - Meher, P.K. Efficient Systolization of Cyclic Convolutions Using Low-Complexity Rectangular Transform Algorithms. In Proceedings of the 2007 International Symposium on Signals, Circuits and Systems ISSCS ’07, Iasi, Romania, 13–14 July 2007; pp. 1–4. [Google Scholar]
- Chen, H.C.; Guo, J.I.; Chang, T.S.; Jen, C.W. A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform. IEEE Trans. Circuits Syst. Video Technol.
**2005**, 15, 445–453. [Google Scholar] [CrossRef] - Syed, N.A.A.; Meher, P.K.; Vinod, A.P. Efficient Cross-Correlation Algorithm and Architecture for Robust Synchronization in Frame-Based Communication Systems. Circuits Syst. Signal. Process.
**2018**, 37, 2548–2573. [Google Scholar] [CrossRef] - Vun, C.H.; Premkumar, A.B.; Zhang, W. A New RNS based DA Approach for Inner Product Computation. IEEE Trans. Circuits Syst. I Regul. Pap.
**2013**, 60, 2139–2152. [Google Scholar] [CrossRef] - Liu, J.G.; Pan, C.; Liu, Z.B. Novel Convolutions using First-order Moments. IEEE Trans. Comput.
**2012**, 61, 1050–1056. [Google Scholar] [CrossRef] - Hua, X.; Liu, J.G. A Novel Fast Algorithm for the Pseudo Winger-Ville Distribution. J. Commun. Technol. Electron.
**2015**, 60, 1238–1247. [Google Scholar] [CrossRef] - Liu, J.G.; Liu, Y.Z.; Wang, G.Y. Fast Discrete W Transforms via Computation of Moments. IEEE Trans. Signal. Process
**2005**, 53, 654–659. [Google Scholar] [CrossRef] - Yazdanpanah, H.; Diniz, P.S.R.; Lima, M.V.S. Low-Complexity Feature Stochastic Gradient Algorithm for Block-Lowpass Systems. IEEE Access
**2019**, 7, 141587–141593. [Google Scholar] [CrossRef] - Viticchie, A.; Basile, C.; Valenza, F.; Lioy, A. On the impossibility of effectively using likely-invariants for software attestation purposes. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl.
**2018**, 9, 1–25. [Google Scholar] - Adhikari, G.; Sahu, S.; Sahani, S.K.; Das, B.K. Fast normalized cross correlation with early elimination condition. In Proceedings of the 2012 International Conference on Recent Trends in Information Technology, Chennai, India, 19–21 April 2012; pp. 136–140. [Google Scholar]
- Blahut, R.E. Fast Algorithms for Digital Signal Processing; Addison-Wesley: Reading, MA, USA, 1984. [Google Scholar]
- Yuan, Y.; Qin, Z.; Xiong, C.Y. Digital image correlation based on a fast convolution strategy. Opt. Lasers Eng.
**2017**, 97, 52–61. [Google Scholar] [CrossRef] - Meher, P.K.; Park, S.Y. A novel DA-based architecture for efficient computation of inner-product of variable vectors. In Proceedings of the 2014 IEEE International Symposium on Circuits and Systems, Melbourne, Australia, 1–5 June 2014; pp. 369–372. [Google Scholar]
- Mukherjee, D.; Mukhopadhyay, S. Fast Hardware Architecture for 2-D Separable Convolution Operation. IEEE Trans. Circuits Syst. II Exp. Briefs
**2018**, 65, 2042–2046. [Google Scholar] [CrossRef] - Yang, Y.J.; Zhang, Y.H.; Li, D.M.; Wang, Z.J. Parallel Correlation Filters for Real-Time Visual Tracking. Sensors
**2019**, 19, 1–22. [Google Scholar] [CrossRef] [Green Version] - Hartl, A.; Annessi, R.; Zseby, T. Subliminal Channels in High-Speed Signature. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl.
**2018**, 9, 30–53. [Google Scholar]

**Figure 9.**The module

**A**using different adders: (

**a**) { g(i) } = {4, 4, 4, 4}; (

**b**) { g(i) } = {2, 4, 6, 8}.

**Figure 10.**The four algorithm’s multiplication and addition number: (

**a**) Multiplication (

**b**) Addition.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pan, C.; Lv, Z.; Hua, X.; Li, H.
The Algorithm and Structure for Digital Normalized Cross-Correlation by Using First-Order Moment. *Sensors* **2020**, *20*, 1353.
https://doi.org/10.3390/s20051353

**AMA Style**

Pan C, Lv Z, Hua X, Li H.
The Algorithm and Structure for Digital Normalized Cross-Correlation by Using First-Order Moment. *Sensors*. 2020; 20(5):1353.
https://doi.org/10.3390/s20051353

**Chicago/Turabian Style**

Pan, Chao, Zhicheng Lv, Xia Hua, and Hongyan Li.
2020. "The Algorithm and Structure for Digital Normalized Cross-Correlation by Using First-Order Moment" *Sensors* 20, no. 5: 1353.
https://doi.org/10.3390/s20051353