# A Self-Organizing Multi-Layer Agent Computing System for Behavioral Clustering Recognition

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

**Clustering recognition.**Clustering is a fundamental pillar for unsupervised motion recognition in the field of computer vision. The mainstream clustering algorithms include three categories: K-means [12], spectral, and hierarchical clustering. The K-means algorithm relies on the value of k, which always needs to be specified before conducting any clustering analysis. For example, Wojciech and Pawel [13] proposed a mixed parallelization of the K-mean algorithm, while Liang et al. [14] used it to handle shape recognition tasks. Li and Hong [15] performed image segmentation processing, while Aditya et al. [16] used fuzzy K-means to complete the adaptive clustering process. Although the K-means algorithm is easy to implement, its disadvantage is that the cluster number K must be predetermined, which is a great challenge for situations where the classification is unknown. Different from the K-means algorithm, spectral clustering is named after the construction of a spectral matrix from a similarity matrix. Spectral clustering converts clustering problems into graph partitioning problems, and identifies the best sub-graphs by searching. For example, Ahmed et al. [17] implemented efficient spectral clustering using Dijkstra’s algorithm, while Wang et al. [18] proposed using message passing and density sensitive similarities to improve spectral clustering. Spectral clustering has a stronger adaptability to data distribution, but its clustering process requires a complete statistical count of individual numbers to effectively establish the spectral matrix. The idea of hierarchical clustering is based on establishing a simple similarity tree between sample pairs. Hierarchical clustering is generally divided into two categories: top-down agglomerative clustering and bottom-up divisive clustering. For example, Xing et al. [19] combined GNN and hierarchical clustering, Saquib et al. [20] improved an efficient parameter-free hierarchy of clusters, and Lin et al. [21,22] used deep CNN in designing hierarchical clustering. The multi-layer agents clustering method proposed in this paper looks similar to top-down hierarchical clustering, but there are essential differences. Hierarchical clustering establishes a clustering tree among a large number of targets, while this paper uses a multi-layer structure to dynamically segment frame images and automatically allocates computing resources to areas with more motion density for fine-grained clustering. Clustering within a single agent is somewhat similar to spectral clustering, where the process matches the best sub-classification.

**Optical flow processing.**In this work, attention is paid to the motion process rather than the specific attributes of objects or background interpretation. In addition to optical flow [23], we are also inspired by MP3 [24], which provides a method for evaluating fixed grid occupancy in continuous frame videos. However, unlike MP3, SMLACS does not calculate all of the dynamic occupancy in the entire frame, which undoubtedly requires huge computational cost, such as the popular optical flow method, and has high computational complexity. Compared to mainstream algorithmic models, SMLACS uses binary encoding to focus only on the motion process, and algorithmic functions are achieved through similarity comparison and minimal logical judgment. By reducing the computational complexity of individual agents via a hierarchical clustering approach with different levels of accuracy, SMLACS can realize clustering of motion behavior at a low computational cost. Such an approach has an additional benefit. For example, the clustering recognition process of SMLACS does not require attention to scene interpretation or object recognition, even scene-based clustering criteria. Furthermore, it is not selective towards video quality and can even handle highly blurred video data.

## 3. Method

#### 3.1. Binary Encoding (BE) and Discrete Temporal Sequence

#### 3.2. Similarity Comparison

#### 3.3. A Single Agent Processing

#### 3.4. Multi-Layer Self-Organizing Structure

#### 3.5. Summary

## 4. Prototype System

## 5. Experiment and Result

#### 5.1. Experiment

#### 5.2. Clustering Performance

#### 5.3. Hardware Evaluation

## 6. Discussion

#### 6.1. Application Scenario

#### 6.2. Limitation and Future Directions

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

BE | Binary Encoding |

MLS | Multi-Layer Structure |

LCS | Longest Common Sequence |

## References

- Luan, Y.; Han, C.; Wang, B. An Unsupervised Video Stabilization Algorithm Based on Key Point Detection. Entropy
**2022**, 24, 1326. [Google Scholar] [CrossRef] - Jing, L.; Tian, Y. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell.
**2021**, 43, 4037–4058. [Google Scholar] [CrossRef] [PubMed] - Wilson, G.; Cook, D.J. A Survey of Unsupervised Deep Domain Adaptation. ACM Trans. Intell. Syst. Technol.
**2020**, 11, 51. [Google Scholar] [CrossRef] [PubMed] - Hamdi, S.; Bouindour, S.; Snoussi, H.; Wang, T.; Abid, M. End-to-End Deep One-Class Learning for Anomaly Detection in UAV Video Stream. J. Imaging
**2021**, 7, 90. [Google Scholar] [CrossRef] [PubMed] - Jaiswal, A.; Babu, A.R.; Zadeh, M.Z.; Banerjee, D.; Makedon, F. A Survey on Contrastive Self-Supervised Learning. Technologies
**2021**, 9, 2. [Google Scholar] [CrossRef] - McLaughlin, N.; Martinez del Rincon, J.; Miller, P. Recurrent Convolutional Network for Video-Based Person Re-identification. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1325–1334. [Google Scholar]
- Feichtenhofer, C.; Pinz, A.; Zisserman, A. Convolutional Two-Stream Network Fusion for Video Action Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1933–1941. [Google Scholar]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the 2015 IEEE ICCV, Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar]
- Feichtenhofer, C. X3D: Expanding Architectures for Efficient Video Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Johnson, B.; Thomas, S.; Rani, J.S. A High-Performance Dense Optical Flow Architecture Based on Red-Black SOR Solver. J. Signal Process. Syst.
**2020**, 92, 357–373. [Google Scholar] [CrossRef] - Dosovitskiy, A.; Fischer, P.; Ilg, E.; Häusser, P.; Hazirbas, C.; Golkov, V.; Smagt, P.v.d.; Cremers, D.; Brox, T. FlowNet: Learning Optical Flow with Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2758–2766. [Google Scholar]
- Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics
**2020**, 9, 1295. [Google Scholar] [CrossRef] - Kwedlo, W.; Czochanski, P.J. A Hybrid MPI/OpenMP Parallelization of K -Means Algorithms Accelerated Using the Triangle Inequality. IEEE Access
**2019**, 7, 42280–42297. [Google Scholar] [CrossRef] - Bai, L.; Liang, J.; Guo, Y. An Ensemble Clusterer of Multiple Fuzzy k -Means Clusterings to Recognize Arbitrarily Shaped Clusters. IEEE Trans. Fuzzy Syst.
**2018**, 26, 3524–3533. [Google Scholar] [CrossRef] - He, L.; Zhang, H. Kernel K-Means Sampling for Nyström Approximation. IEEE Trans. Image Process.
**2018**, 27, 2108–2120. [Google Scholar] [CrossRef] [PubMed] - Karlekar, A.; Seal, A.; Krejcar, O.; Gonzalo-Martin, C. Fuzzy K-Means Using Non-Linear S-Distance. IEEE Access
**2019**, 7, 55121–55131. [Google Scholar] [CrossRef] - Taloba, A.I.; Riad, M.R.; Soliman, T.H.A. Developing an efficient spectral clustering algorithm on large scale graphs in spark. In Proceedings of the 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 5–7 December 2017; pp. 292–298. [Google Scholar] [CrossRef]
- Wang, L.; Ding, S.; Jia, H. An Improvement of Spectral Clustering via Message Passing and Density Sensitive Similarity. IEEE Access
**2019**, 7, 101054–101062. [Google Scholar] [CrossRef] - Xing, Y.; He, T.; Xiao, T.; Wang, Y.; Xiong, Y.; Xia, W.; Wipf, D.; Zhang, Z.; Soatto, S. Learning Hierarchical Graph Neural Networks for Image Clustering. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3447–3457. [Google Scholar] [CrossRef]
- Sarfraz, S.; Sharma, V.; Stiefelhagen, R. Efficient Parameter-Free Clustering Using First Neighbor Relations. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8926–8935. [Google Scholar] [CrossRef] [Green Version]
- Lin, W.A.; Chen, J.C.; Chellappa, R. A Proximity-Aware Hierarchical Clustering of Faces. In Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; pp. 294–301. [Google Scholar] [CrossRef] [Green Version]
- Lin, W.A.; Chen, J.C.; Castillo, C.D.; Chellappa, R. Deep Density Clustering of Unconstrained Faces. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8128–8137. [Google Scholar] [CrossRef]
- Zhai, M.; Xiang, X.; Lv, N.; Kong, X. Optical flow and scene flow estimation: A survey. Pattern Recognit.
**2021**, 114, 107861. [Google Scholar] [CrossRef] - Casas, S.; Sadat, A.; Urtasun, R. MP3: A Unified Model to Map, Perceive, Predict and Plan. In Proceedings of the 2021 IEEE/CVF CVPR, Nashville, TN, USA, 20–25 June 2021; pp. 14398–14407. [Google Scholar] [CrossRef]
- Blachut, K.; Kryjak, T. Real-Time Efficient FPGA Implementation of the Multi-Scale Lucas-Kanade and Horn-Schunck Optical Flow Algorithms for a 4K Video Stream. Sensors
**2022**, 22, 5017. [Google Scholar] [CrossRef] [PubMed] - Lazcano, V.; Rivera, F. GPU Based Horn-Schunck Method to Estimate Optical Flow and Occlusion. In Theory and Applications of Models of Computation; Springer: Cham, Switzerland, 2019; pp. 424–437. [Google Scholar] [CrossRef]
- Seong, H.S.; Rhee, C.E.; Lee, H.J. A Novel Hardware Architecture of the Lucas–Kanade Optical Flow for Reduced Frame Memory Access. IEEE Trans. Circuits Syst. Video Technol.
**2016**, 26, 1187–1199. [Google Scholar] [CrossRef] - Li, Y.; Gao, Y.; Su, Z.; Chen, S.; Liu, L. FPGA Accelerated Real-time Recurrent All-Pairs Field Transforms for Optical Flow. In Proceedings of the 2022 China Automation Congress (CAC), Xiamen, China, 25–27 November 2022; pp. 4799–4804. [Google Scholar] [CrossRef]

**Figure 6.**The case of the dynamic segmentation layer by layer. The four-layer dynamic subdivision represented by (

**a**–

**d**), where (

**a**) represents the coarsest accuracy that covers the entire image, and (

**b**–

**d**) are successively refined subdivisions of the changing area based on the previous layer.

**Figure 7.**Prototype system. The green box in the picture is the FPGA board, the red box is the MIPI camera, and the blue box is the LCD screen.

**Figure 10.**Dynamic division scenarios in operation. (

**a**–

**h**) represents the real-time dynamic recognition situation in the upper left corner of the screen. Each gray box represents an area covered by an agent, where green indicates correct clustering and red indicates clustering errors.

Resource | Available | Utilization | Rate | Utilization | Rate |
---|---|---|---|---|---|

(Excluding Output) | (Excluding Output) | (Whole System) | (Whole System) | ||

LUT | 274,080 | 87,612 | 31.97% | 161,327 | 58.86% |

LUTRAM | 144,000 | − | − | 9396 | 6.53% |

FF | 548,160 | 56,910 | 10.38% | 139,699 | 25.49% |

BRAM | 912 | 85 | 9.32% | 161 | 17.65% |

DSP | 2520 | − | − | 8 | 0.32% |

IO | 328 | 68 | 20.73% | 42 | 12.80% |

BUFG | 404 | 2 | 0.50% | 11 | 2.72% |

Power | Rate | Power | Rate | ||
---|---|---|---|---|---|

Device Static | PS | 0.1 W | 13% | 0.762 W | 15% |

PL | 0.662 W | 87% | |||

Dynamic Power | Clocks | 0.504 W | 12% | 4.297 W | 85% |

Signals | 0.425 W | 10% | |||

Logic | 0.411 W | 10% | |||

BRAM | 0.144 W | 3% | |||

DSP | 0.006 W | <1% | |||

PS | 2.594 W | 60% |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Qian, X.; Yuemaier, A.; Yang, W.; Chen, X.; Liang, L.; Li, S.; Dai, W.; Song, Z.
A Self-Organizing Multi-Layer Agent Computing System for Behavioral Clustering Recognition. *Sensors* **2023**, *23*, 5435.
https://doi.org/10.3390/s23125435

**AMA Style**

Qian X, Yuemaier A, Yang W, Chen X, Liang L, Li S, Dai W, Song Z.
A Self-Organizing Multi-Layer Agent Computing System for Behavioral Clustering Recognition. *Sensors*. 2023; 23(12):5435.
https://doi.org/10.3390/s23125435

**Chicago/Turabian Style**

Qian, Xingyu, Aximu Yuemaier, Wenchi Yang, Xiaogang Chen, Longfei Liang, Shunfen Li, Weibang Dai, and Zhitang Song.
2023. "A Self-Organizing Multi-Layer Agent Computing System for Behavioral Clustering Recognition" *Sensors* 23, no. 12: 5435.
https://doi.org/10.3390/s23125435