# A New Principle toward Robust Matching in Human-like Stereovision

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Problem Statement

- What are the classes of perceived entities?
- What are the identities of perceived entities?
- Where are the locations of perceived entities inside related images?
- Where are the locations of perceived entities inside scenes?

- How to determine the presence of an entity in the left camera’s image plane?
- How to find the match in the right camera’s image plane if an entity has been detected in the left camera’s image plane?

## 3. Similar Works on Stereo Matching

- Methods which make the attempt of matching points within a pair of stereo images [14].
- Methods which make the attempt of matching edges or contours within a pair of stereo images [15].
- Methods which make the attempt of matching line segments within a pair of stereo images [16].
- Methods which make the attempt of matching curves within a pair of stereo images [17].
- Methods which make the attempt of matching regions within a pair of stereo images [18].
- Methods which make the attempts of matching objects within a pair of stereo images [19].

## 4. The Outline of Proposed Principle

- Image acquisition by both cameras.
- Image sampling on video stream from left camera.
- Hybrid feature extraction for each image sample.
- Cognition of image samples if they correspond to the training data of reference entities inside training images.
- Recognition of image samples if they correspond to the possible occurrences of reference entities inside real-time images.
- Forward/Inverse processes of template matching, which work together so as to find the occurrence of matched candidate in the right image, if a recognized entity is present in the left image.

## 5. Top-Down Strategy of Doing Image Sampling

- It is difficult to determine, or to justify, the size of sub-window which is used to scan an input image. If the size of sub-window is allowed to be dynamically changed, then the next question is how to do such dynamic adjustment of sizes.
- The number of obtained image samples is independent of the content inside an input image. For example, an input image may contain a single entity. In this case, the scanning method will still produce many image samples which will be the input to subsequent visual processes of classification, identification, and grouping, etc. Obviously, irrelevant image samples may potentially cause troubles to these visual processes of recognition.

- ${S}_{k}$ with one sample, then $k=1$ and ${d}_{v}\times {d}_{h}\in [1\times 1]$.
- ${S}_{k}$ with two samples, then $k=2$ and ${d}_{v}\times {d}_{h}\in [1\times 2,2\times 1]$.
- ${S}_{k}$ with three samples, then $k=3$ and ${d}_{v}\times {d}_{h}\in [1\times 3,3\times 1]$.
- ${S}_{k}$ with four samples, then $k=4$ and ${d}_{v}\times {d}_{h}\in [1\times 4,4\times 1,2\times 2]$.
- and so on.

## 6. Feature Extraction from Sample Image in Time-Domain

- The mean value ${I}_{a}$ of approximate electromagnetic energy:$${I}_{a}=\frac{1}{UV}\sum _{v=0}^{V-1}\sum _{u=0}^{U-1}I(u,v)$$
- The square-root of the variance ${\sigma}_{I}$ of approximate electromagnetic energy:$${\sigma}_{I}=\sqrt{\frac{{\sum}_{v=0}^{V-1}{\sum}_{u=0}^{U-1}{(I\left(u,v\right)-{I}_{a})}^{2}}{UV}}$$
- The horizontal distribution ${\sigma}_{u}$ of approximate electromagnetic energy:$${\sigma}_{u}=\sqrt{\frac{{\sum}_{v=0}^{V-1}{\sum}_{u=0}^{N-1}I(u,v)\times {(u-{u}_{c})}^{2}}{{\sum}_{v=0}^{V-1}{\sum}_{u=0}^{U-1}I(u,v)}}$$$${u}_{c}=\frac{{\sum}_{v=0}^{V-1}{\sum}_{u=0}^{N-1}\left\{I\right(u,v)\times u\}}{{\sum}_{v=0}^{V-1}{\sum}_{u=0}^{U-1}I(u,v)}$$$${v}_{c}=\frac{{\sum}_{v=0}^{V-1}{\sum}_{u=0}^{N-1}\left\{I\right(u,v)\times v\}}{{\sum}_{v=0}^{V-1}{\sum}_{u=0}^{U-1}I(u,v)}$$
- The vertical distribution ${\sigma}_{v}$ of approximate electromagnetic energy:$${\sigma}_{v}=\sqrt{\frac{{\sum}_{v=0}^{V-1}{\sum}_{u=0}^{N-1}\left\{I\right(u,v)\times {\left(v-{v}_{c}\right)}^{2}\}}{{\sum}_{v=0}^{V-1}{\sum}_{u=0}^{U-1}I(u,v)}}$$

## 7. Feature Extraction from Sample Image in Frequency-Domain

## 8. Cognition Process Using RCE Neural Network

## 9. Recognition Process Using Possibility Function

## 10. Forward/Inverse Processes of Template Matching

- Determine the equation of epipolar line from both stereovision’s calibration parameters (NOTE: such knowledge could be found in any textbook of computer vision) and location a’s coordinates.
- Scan the epipolar line location by location.
- Take image sample $e$ at currently scanned location e.
- Compute the feature vector of image sample $e$.
- Compute the cosine distance between image sample $j$’s feature vector and image sample $e$’s feature vector.
- Repeat the scanning until it is completed.
- Choose the image sample to be the candidate of matched sample $j\u2019$ if it minimizes the cosine distance.
- Use the cosine distance between recognized sample $j$ and the chosen candidate of matched sample $j\u2019$ to compute the possibility value of match (i.e., to use Equation (20)).
- Accept matched sample $j\u2019$ if the possibility value of match is greater than a chosen threshold value (e.g., 0.5).

- Determine the equation of epipolar line from both the stereovision’s calibration parameters and the location a’s coordinates.
- Scan the epipolar line location by location.
- Take image sample $e$ at currently scanned location e.
- Divide image sample e into a matrix of sub-samples $\{{e}_{i},i=\mathrm{1,2},3,\dots \}$.
- Use each sub-sample in $\{{e}_{i},i=\mathrm{1,2},3,\dots \}$ as template and do forward template matching with recognized sample $j$.
- Compute the mean value of all the possibility values which measure the match between all the sub-samples in $\{{e}_{i},i=\mathrm{1,2},3,\dots \}$ and recognized sample $j$. This mean value represents the possibility value for image sample $e$ in right image to match with recognized sample $j$ in left image.
- Repeat the scanning until it is completed.
- Choose the image sample to be the candidate of matched sample $j\u2019$ if it minimizes the possibility values of match (i.e., calculated by Equation (23)).
- Accept the match if the possibility value of match is greater than a chosen threshold value (e.g., 0.5).

## 11. Implementation and Results

#### 11.1. Results of Top-Down Sampling Strategy of Input Images

#### 11.2. Examples of Training Data for Cognition (i.e., Learning)

#### 11.3. Results of Feature Extraction in Time Domain

#### 11.4. Results of Feature Extraction in Frequency Domain

#### 11.5. Results of Cognition

#### 11.6. Results of Recognition

#### 11.7. Results of Stereo Matching

## 12. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Xie, M.; Hu, Z.C.; Chen, H. New Foundation of Artificial Intelligence; World Scientific: Singapore, 2021. [Google Scholar]
- Cassinelli, A.; Reynolds, C.; Ishikawa, M. Augmenting Spatial Awareness with Haptic Radar. In Proceedings of the 2006 10th IEEE International Symposium on Wearable Computers, Montreux, Switzerland, 11–14 October 2006; pp. 61–64. [Google Scholar]
- Li, Y.; Ibanez-Guzman, J. Lidar for Autonomous Driving: The Principles, Challenges, and Trends for Automotive Lidar and Perception Systems. IEEE Signal Process. Mag.
**2020**, 37, 50–61. [Google Scholar] [CrossRef] - Rashidi, A.; Fathi, H.; Brilakis, I. Innovative Stereo Vision-Based Approach to Generate Dense Depth Map of Transportation Infrastructure. Transp. Res. Rec.
**2011**, 2215, 93–99. [Google Scholar] [CrossRef] - Xie, M. Key Steps Toward Development of Humanoid Robots. In Proceedings of the 25th International Conference on Climbing and Walking Robots, Robotics in Natural Settings, Lecture Notes in Networks and Systems, Ponta Delgada, Portugal, 12–14 September 2022; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
- Xie, M.; Velamala, S. Maritime Autonomous Vessels: A Review of RobotX Challenge’s Works. J. Technol. Soc. Sci.
**2018**, 2, 7–14. [Google Scholar] - Gordon, I.E. Theories of Visual Perception, 3rd ed.; Psychology Press: London, UK, 2004. [Google Scholar]
- Bekey, G.A. Autonomous Robots: From Biological Inspiration to Implementation and Control; The MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
- Roberts, D.A.; Sho, Y. The Principles of Deep Learning Theory; The Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
- Wu, X.W.; Sahoo, D.; Hoi, S.C.H. Recent advances in deep learning for object detection. Neurocomputing
**2020**, 396, 39–64. [Google Scholar] [CrossRef] - Rogister, P.; Benosman, R.; Ieng, S.H.; Lichtsteiner, P.; Delbruck, T. Asynchronous Event-Based Binocular Stereo Matching. IEEE Trans. Neural Netw. Learn. Syst.
**2012**, 23, 347–353. [Google Scholar] [CrossRef] [PubMed] - Yang, G.S.; Manela, J.; Happold, M.; Ramanan, D. Hierarchical Deep Stereo Matching on High-Resolution Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5515–5524. [Google Scholar]
- Bleyer, M.; Breiteneder, C. Stereo Matching: State-of-the-Art and Research Challenges. In Advances in Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2013; pp. 143–179. [Google Scholar]
- Chang, J.R.; Chen, Y.S. Pyramid Stereo Matching Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 5410–5418. [Google Scholar]
- Xie, M. A Cooperative Strategy for The Matching of Multi-level Edge Primitives. Image Vis. Comput.
**1995**, 13, 89–99. [Google Scholar] [CrossRef] - Medioni, G.; Nevatia, R. Segment-based stereo matching. Comput. Vis. Graph. Image Process.
**1985**, 31, 2–18. [Google Scholar] [CrossRef] - Zhang, Y.N.; Gerbrands, J.J. Method for matching general stereo planar curves. Image Vis. Comput.
**1995**, 13, 645–655. [Google Scholar] [CrossRef] - Wang, Z.F.; Zhi-Gang Zheng, Z.G. A region based stereo matching algorithm using cooperative optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Li, L.; Fang, M.; Yin, Y.; Lian, J.; Wang, Z. A Traffic Scene Object Detection Method Combining Deep Learning and Stereo Vision Algorithm. In Proceedings of the IEEE International Conference on Real-time Computing and Robotics (RCAR), Xining, China, 15–19 July 2021; pp. 1134–1138. [Google Scholar]
- Yin, X.M.; Guo, D.; Xie, M. Hand image segmentation using color and RCE neural network. Robot. Auton. Syst.
**2001**, 34, 235–250. [Google Scholar] [CrossRef] - Cooper, P.W. The hypersphere in pattern recognition. Inf. Control.
**1962**, 5, 324–346. [Google Scholar] [CrossRef] - Morgan, D.P.; Scofield, C.L. ANN Keyword Recognition. In Neural Networks and Speech Processing; The Springer International Series in Engineering and Computer Science; Springer: Berlin/Heidelberg, Germany, 1991; Volume 130. [Google Scholar]
- Cooper, L.N. How We Remember: Toward an Understanding of Brain and Neural Systems; World Scientific: Singapore, 1995. [Google Scholar]

**Figure 1.**Research Framework Underlying the Development of Intelligent Humanoid Robots and Autonomous Surface Vehicles.

**Figure 2.**Illustration of Stereo Matching. “*” indicates the location of entity in 3D scene as well as its locations in both left image plane and right image plane.

**Figure 3.**Illustration of (

**a**) partial view caused by occlusion in which left camera sees partial view of object 3 while right camera sees partial view of object 1, and (

**b**) partial view caused by image sampling in which the three nearest vehicles partially appear inside samples at row 2 and row 3.

**Figure 4.**Outline of Proposed New Principle Toward Achieving Robust Matching in Human-like Stereovision.

**Figure 5.**Examples of results from the computations of zero-order derivatives, first-order derivatives, second-order derivatives, third-order derivatives, and forth-order derivatives.

**Figure 11.**Ten Sample Images for Training Cognition Module Dedicated to Each Entity Among Triangle, Cross and Circle.

**Figure 12.**The Values of Ten Feature Vectors Computed from Ten Sample Images of Triangle in Time Domain. The statistics computed from these 10 feature vectors are highlighted in red color.

**Figure 13.**The Values of Ten Feature Vectors Computed from Ten Sample Images of Triangle in Frequency Domain. The statistics computed from these 10 feature vectors are highlighted in red color.

**Figure 20.**Results of Testing Recognition with Seven Image Samples, after Doing Cognition with Ten Image Samples Which Have Certain Level of Intended Variations for the Purpose of Appreciating Robustness. The image sample recognized in left image is denoted “1a” while three matching candidates of image samples in right image are denoted “1c”, “1b”, and “1d”, respectively.

**Figure 21.**Results of Stereo Matching Among Three Possible Pairs of Matches Which Are: pair(1a, 1b), pair(1a, 1c), and pair(1a, 1d).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Xie, M.; Lai, T.; Fang, Y.
A New Principle toward Robust Matching in Human-like Stereovision. *Biomimetics* **2023**, *8*, 285.
https://doi.org/10.3390/biomimetics8030285

**AMA Style**

Xie M, Lai T, Fang Y.
A New Principle toward Robust Matching in Human-like Stereovision. *Biomimetics*. 2023; 8(3):285.
https://doi.org/10.3390/biomimetics8030285

**Chicago/Turabian Style**

Xie, Ming, Tingfeng Lai, and Yuhui Fang.
2023. "A New Principle toward Robust Matching in Human-like Stereovision" *Biomimetics* 8, no. 3: 285.
https://doi.org/10.3390/biomimetics8030285