You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

29 October 2022

Application of Graph Structures in Computer Vision Tasks

Department of Data Analysis and Machine Learning, Financial University under the Government of the Russian Federation, pr-kt Leningradsky, 49/2, 125167 Moscow, Russia
This article belongs to the Special Issue Advanced Graph Theory and Combinatorics

Abstract

On the one hand, the solution of computer vision tasks is associated with the development of various kinds of images or random fields mathematical models, i.e., algorithms, that are called traditional image processing. On the other hand, nowadays, deep learning methods play an important role in image recognition tasks. Such methods are based on convolutional neural networks that perform many matrix multiplication operations with model parameters and local convolutions and pooling operations. However, the modern artificial neural network architectures, such as transformers, came to the field of machine vision from natural language processing. Image transformers operate with embeddings, in the form of mosaic blocks of picture and the links between them. However, the use of graph methods in the design of neural networks can also increase efficiency. In this case, the search for hyperparameters will also include an architectural solution, such as the number of hidden layers and the number of neurons for each layer. The article proposes to use graph structures to develop simple recognition networks on different datasets, including small unbalanced X-ray image datasets, widely known the CIFAR-10 dataset and the Kaggle competition Dogs vs Cats dataset. Graph methods are compared with various known architectures and with networks trained from scratch. In addition, an algorithm for representing an image in the form of graph lattice segments is implemented, for which an appropriate description is created, based on graph data structures. This description provides quite good accuracy and performance of recognition. The effectiveness of this approach based, on the descriptors of the resulting segments, is shown, as well as the graph methods for the architecture search.

1. Introduction

Recently, the computer vision applications have become more and more relevant. Among these key computer vision system tasks, are the actual image recognition, object detection and image segmentation. Image processing algorithms are required in various fields of science and technology: from access control systems to intelligent systems for detecting defects in reinforced concrete products. In recent years, various methods, based on deep neural networks, have become popular in this area. The hidden task of any computer vision neural network is to form the features that describe objects in the image and classify them correctly.
However, this approach has a number of disadvantages. The first weakness for neural networks is the computing power. The training process to acquire sufficiently efficient neural networks, in terms of recognition accuracy, requires a lot of processors and a long training time. Moreover, the vast majority of deep learning models needs millions of well-labeled examples for training purposes. Usually, in a number of applied tasks, it is impossible to provide sufficiently large training samples in advance. In this case, it is possible to use the classical mathematical methods of image processing. Unfortunately, it has been shown that they are inferior to neural network models for large image datasets.
A reasonable alternative approach to improve the performance of mathematical methods, is to obtain descriptive feature sets by analogy with neural network models. The description of the structure and the connections between pixels, provide useful information for extraction and can be represented as a graph with their respective connections. Such an approach will potentially improve the image processing quality, compared to the classical mathematical methods. Furthermore, in conditions of insufficient data, graph methods for image recognition can be more efficient than deep neural networks, because the latter are underfitted. Next, let us consider both neural network approaches to solving the computer vision tasks and graph methods for image processing.
The novelty of this article is in the use of graph-based segmentation results for the collection of the descriptors for different objects presented in the image datasets. So, the suggested approach extends the graph segmentation method to recognition. Once the model training is completed, it is possible to use the descriptors for recognition, which make processing faster than using transformers and some convolutional networks. Furthermore, the new approach for searching optimal architectures, is suggested in the text. It provides efficiency (in terms of accuracy) and gains, but requires a lot of computational power for training.

3. Materials

First, it is necessary to consider the datasets that will be used in the experiments. The first dataset is CIFAR-10. The dataset includes 60,000 RGB images. The sizes are 32 × 32 pixels. All images are classified into 10 classes and the data is absolutely balanced. The training set consists of 50,000 images, and the remaining 10,000 images are used for the test. The presented classes are: frog, truck, deer, automobile, bird, horse, ship, cat, dog and airplane. Figure 2 shows some example images from CIFAR-10.
Figure 2. CIFAR-10 images.
So, from Figure 2, it can be seen that all images are standard in this dataset. Preprocessing is needed for the normalization of the data. Prior to the training, the normalization layer provides zero mathematical expectations and unit standard deviations.
Our second dataset is the Kaggle competition dataset called Dogs vs Cats. It consists of different sized images. As for the classes, it has 25,000 balanced images of dogs and cats for the training and the test. The sample split will be performed with 25% for the test images. The sizes of the images are different, so it is necessary to create standard sized images for the transfer learning and projected architectures. The scaling method is also used. All brightness values are divided by 255, so the possible values are from 0 to 1.
Figure 3 shows some examples from this dataset.
Figure 3. Dogs vs Cats images.
Finally, the experiments also use a X-ray image dataset of baggage and cabin luggage. It is unbalanced data, which consists of 200 images of prohibited items and 280 images of allowed items. The sizes are standard and they are 125 × 125 pixels. For preprocessing, gray scaling and normalization are used. Figure 4 shows some examples from the X-ray image dataset.
Figure 4. X-ray images.
It should be noted that the processing of such images is different from the processing of standard optical images. The results for the graph method will also be interesting. Preprocessing for the graph method uses a Gaussian filter with σ = 0.8 to blur the edges of the objects in the images, for a better segmentation.

4. Methods

Let us transform a two-dimensional image into a graph representation. To do this, it is logical to use a lattice or grid structure. Let us also represent each pixel of the image as a vertex, in which the brightness value of the pixel is written. Each pixel has connections with the pixels located on its left, right, top and bottom. The edges can be weighted and the information concerning the difference in brightness can be stored. Such edges may allow the selection of optimal routes for efficient processing. Often there is no sharp change in the brightness inside of the objects.
In [39], an efficient solution for the segmentation on the graphs, based on the Kruskal algorithm, was proposed. The solution can be improved by applying Gaussian filters [40] for a smoother localization of the different image segments.
Based on the Kruskal algorithm, a spanning tree is built. Let pixels be vertices x ( i ) , then the edges symbolized connections can be described as r ( x i , x j ) . Such an edge, characterizes the connection between i -th and j -th pixel. Such edges will be characterized by links or weights w ( r ( x i , x j ) ) . The task is to find the minimum remaining tree in such a graph, for which the total weight of all edges will be minimal, and not a single pixel will be discarded.
Next steps operates the principle of the algorithmic clustering. For each vertex, a set is created and these vertices are further combined into segments. At the same time, it is clear that the vertices in such a segment form the minimum remaining tree. The Gaussian filter can be applied for the image preprocessing with different variances, as it will smooth out the input image, even slightly blurring it.
To calculate the weight of the edges, it is possible to use numerical RGB values, i.e., the vector distance in a 3D space. To improve the quality, other color spaces more characteristic of the human visual system, can be used. Figure 5 shows the source image and an example of the segmentation results.
Figure 5. Segmentation, based on graphs [39].
Following the division into segments, it is necessary to determine what can be written into the descriptors. Such descriptors should store information about each segment, as accurately as possible. Then, it is necessary to search for similar segments, to collect the personal knowledge base. Let us divide the descriptors into two classes: the shape description and the color description. When describing a color, the gradient should also be taken into account. It is clear that the shape is insufficient for the accurate classification, due to the fact that objects are represented by projections, can be overlapped by other objects, etc. It is also not enough to use the color information alone, in the vast majority of tasks.
The zone shape descriptor includes elements of the so-called angular radial transform (ART). Let us write such a transformation, as an expression:
F n m = V n m ( ρ , θ ) , f ( ρ , θ ) = 0 2 π 0 1 V n m * ( ρ , θ ) , f ( ρ , θ ) ρ d ρ d θ
where F n m is the ART conversion factor of the order n and m, f ( ρ , θ ) is the description of the image brightness distribution in the polar coordinate system, V n m ( ρ , θ ) is the basis function of the transformation.
In Expression (1), the basis function can be represented as the expansion of the angular and radial components:
V n m ( ρ , θ ) = A m ( θ ) R n ( ρ )
Finally, the following formulas can be used to calculate the components in Expression (2):
A m ( θ ) = 1 2 π exp ( j m θ ) ,
R n ( ρ ) = { 1 , i f n = 0 , 2 cos ( π n ρ ) , e l s e .
Thus, Expressions (1)–(4) describe a method for obtaining the zone type descriptors. Such descriptors represent the shape of the objects for future recognition.
Obtaining a segment descriptor (ArtDE) involves the following steps:
(1)
Generation of the basis function. Since the basis function is separable, then V n m ( x , y ) is calculated immediately in Cartesian coordinates, so as not to translate from polar coordinates V n m ( ρ , θ ) to the Cartesian coordinates at a later stage. To do this, a set of ART basic functions is created in the form of two four-dimensional arrays, for the real and imaginary parts, respectively.
(2)
Size normalization. The center of mass of the object is mapped to the center of the lookup table. If the size of the image and the lookup table are different, then the linear interpolation is applied to map the image to the corresponding lookup table. Here, the object’s size is defined as twice the maximum distance from the object’s centroid.
(3)
ART transformation. The real and imaginary parts of the ART coefficients, ART_R (12) (3) and ART_I (12) (3), respectively, are calculated by summing the result of the multiplication of a segment pixel by the corresponding position in the lookup table in the raster scan order. Twelve angular and three radial basis functions are used here.
(4)
Zone normalization. The value of each ART coefficient (ART(m) (n)) is calculated by taking the square root of the sum of its imaginary and real parts squared. The resulting values are further divided by the first coefficient (ART (0) (0)) for the normalization.
(5)
Quantization of transformation on 16 levels. Values of the ART coefficients, with the exception of ART_M (0) (0), are quantized by 16 levels, according to Table 1.
Table 1. Correspondence of the ART_M ranges and ART_DE levels.
In such a case, the distance between the two segments can be calculated as the sum of all distance modules of the corresponding ART coefficients.
A similar lattice graph is constructed for the gradient, only the output of Kruskal’s algorithm is used. The graph has eight connections and ensures that the descriptor is invariant to the rotations, as shown in Figure 6.
Figure 6. Acquiring a gradient descriptor. Here circles (1–6) are nodes of the image graph.
Similarly, a graph is constructed for the hue descriptor. The general implementation for the efficient graph-based computer vision methods and codes, are provided in GitHub [41]. Our algorithm modified this approach in order to obtain the descriptors using the mathematical calculations represented above.
Next, let us consider an example where training is carried out with a conceptual approach. There is no specified network architecture. However, either an increase in the width can occur, i.e., adding new neurons, or an increase in the depth, which means adding new layers. Figure 7 shows the main idea of this approach.
Figure 7. Formation of the neural network architecture in the form of a fully connected multilevel graph.
In this case, the assessment of the need to add a new layer occurs on the basis of the validation set. In the event that the addition of neurons into the current layer does not give an increase, the new layer should be added with one neuron. Then, that algorithm extends this layer by neurons until there is no efficiency increase again. Let us consider the algorithm (Algorithm 1) for the suggested solution, in detail.
Algorithm 1 Optimal architecture search
START
  • Initialize the architecture with one neuron in the hidden layer.
  • Initialize Epsilon for the losses, max depth and max width.
  • Train the architecture using the standard back propagation methods.
  • Evaluate the model in the validation set with the Loss (1,1). Here (1,1) means that the neural network has one neuron in the first layer.
  • Extend the neural network in width by adding one neuron in the current layer.
  • Train the new architecture.
  • Evaluate the model in the validation set with the Loss (1,2). Here (1,2) means that the neural network has two neurons in the first layer.
  • While {LOSS (1,I − 1) – LOSS (1,i) > Epsilon or i < max width}:
  • Repeat the extension in the width, train and evaluate.
  • Extend the neural network in the depth by adding a new layer with one neuron.
  • Repeat steps 3–9 for the Loss (j,i).
  • Stop algorithm when j reaches the max depth and I reaches the max width.
END
Thus, for the computer vision tasks, it is possible to reconfigure the last layers of the recognizer in this way.
Next, let us consider the results of such a processing.

5. Results and Discussion

To test the effectiveness of the descriptor invariance, it is necessary to consider the descriptors for images rotated at different angles. Figure 8 shows such images in the example of a recognized palm tree crown, from Figure 5.
Figure 8. Original segment image and rotated 45 degrees.
The ART transformation uses 12 angular basis functions and 3 radial functions. Thus, there are 36 descriptors for the object shape in total. Table 2, Table 3 and Table 4 present the values obtained from the calculation of the first 12 descriptors, the next 12 descriptors, and the last 12 descriptors.
Table 2. Shape descriptors No 1–12.
Table 3. Shape descriptors No 13–24.
Table 4. Shape descriptors No 25–36.
The analysis of Table 2, Table 3 and Table 4 shows that the deviation within one descriptor should not exceed 5 units, and the total is 62. It has been experimentally established that such a difference should not exceed 100 units in total.
Table 5 presents the results of the comparison of the gradient descriptors. All eight pixel links are examined.
Table 5. Gradient descriptor comparison.
It has been established that the total difference of the normalized descriptors should not exceed two units, and for one direction the allowable deviation is 0.3.
It should be noted that, for the selected pixel in terms of the hue gradient, the full correspondence of the RGB channels was observed.
It also should be noted that all limitations were conducted using, not only one image, but entire datasets, such as CIFAR-10. However, this example shows all transformations for the descriptors more appropriately.
Table 6 shows the results for small dataset of X-ray images. The task is to recognize the prohibited items. It should be noted that there was no augmentation for the source sample. However, test image dataset contains about 20% of rotated images. The modern methods are visual transformers [24] and they are also presented in comparison, as ViT. It is also interesting to evaluate the performance of the models. The processing is performed on Intel Core i7-9700k CPU (Intel Corp., Santa Clara, CA, USA).
Table 6. Graph recognition performance, precision and recall for the X-ray images.
So, the suggested graph-based approach allows to obtain better results, in the sense of the recall, and it is faster than the VGG model. However ResNet provides better performance characteristics. The problems with the convolutional networks methods are the small training dataset and the unbalanced data (the prohibited samples are fewer than the allowed samples). ViT is very slow. So, the graph solution is seen as optimal here.
Table 7 shows the accuracy results for the Dogs vs Cats dataset. Such metrics are used because of the balanced data.
Table 7. Graph recognition performance, accuracy for the Dogs vs Cats images.
The graph method provides average results in general comparison to the other methods. The accuracy is less than the accuracy in the deep learning methods, but the performance is better than with VGG and ViT.
It is obvious that the graph structures make it possible to obtain sufficiently good descriptors of the segments and objects. Moreover, the adaptive design of the architecture of the last layers also allows to optimize the quality of work in the test data.
Table 8 shows the results for the different networks with and without the selection of the output layers on the CIFAR-10 dataset. It is a balanced dataset, therefore, to assess the quality, the metric of the correct recognitions proportion is enough. So the model comparison results are presented with a sense of accuracy.
Table 8. Comparison of the recognition of the accuracy and training time.
Using such a method, the training time is a very interesting characteristic. Let us compare the results for the 10 training epochs, using the graphic video card GPU NVIDIA GeForce GTX 1060 (Intel Corporation, Santa Clara, CA, USA).
From Table 8, it is possible to see that the suggested approach makes the training process quite slower.
The analysis of the presented data shows that the quality gain is 2–3% for the different models. In the future, it would be interesting to consider increasing the performance of such learning systems, as is achieved, for example, in OpenVINO Toolkit [42], since such an approach requires much more time and computational resources.
The limitation of the proposed approach is the size of the images, since the graphs are oriented to the relationship between pixels. In the future, it is planned to study the effectiveness of the algorithm for different image dimensions. Moreover, the challenges represent the optimization problems for the search algorithm for the optimal neural network architecture. Potentially, this process can be parallelized, but this also requires further work and research.

6. Conclusions

Interesting results have been obtained on the use of graph structures to create descriptors of image objects that are invariant to the affine transformations. The representations of the image brightness relationships, built on a lattice graph, as well as the obtained shape and color descriptors, show a fairly high proximity of the calculated values of the descriptor parameters to each other. The deviations for the forms do not exceed 100 units out of a possible 540, and often stand at the level of 40–60 units, which corresponds to 10% of the error. The results for the image recognition using the graph based descriptors, are better than the traditional convolutional neural networks for the X-ray images, in term of the recall. The suggested method showed a high efficiency level as the modern transformer architecture models, but the processing of the suggested solution is faster than transformer processing. For the balanced and large data accuracy result, they are smaller than the deep learning methods by about 1–2%, but the graph method works faster than the transformers and the VGG architecture by 1.5–5 times. Only the ResNet performance is better after the transfer learning. In addition, searching for an architecture by expanding and deepening the connections between the neurons and layers, provides more efficient solutions than learning with a given structure of connections. The gain for such architectures as VGG, ResNet, CNN was about 3%. However, the time for training is growing significantly, by about 30–40 times because of the optimal structure search.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Cessac, B. Retinal Processing: Insights from Mathematical Modelling. J. Imaging 2022, 8, 14. [Google Scholar] [CrossRef] [PubMed]
  2. Suryanarayana, G.; Varadarajan, V.; Pillutla, S.R.; Nagajyothi, G.; Kotapati, G. Multiple Degradation Skilled Network for Infrared and Visible Image Fusion Based on Multi-Resolution SVD Updation. Mathematics 2022, 10, 3389. [Google Scholar] [CrossRef]
  3. Schroder, M.; Seidel, K.; Datcu, M. Gibbs random field models for image content characterization. In Proceedings of the IGARSS’97. 1997 IEEE International Geoscience and Remote Sensing Symposium Proceedings. Remote Sensing—A Scientific Vision for Sustainable Development, Singapore, 3–8 August 1997; Volume 1, pp. 258–260. [Google Scholar] [CrossRef]
  4. Andriyanov, N.A.; Vasiliev, K.K. Optimal filtering of multidimensional random fields generated by autoregressions with multiple roots of characteristic equations. CEUR Workshop Proc. 2019, 2391, 72–78. [Google Scholar]
  5. Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
  6. Andriyanov, N.A.; Andriyanov, D.A. The using of data augmentation in machine learning in image processing tasks in the face of data scarcity. J. Phys. Conf. Ser. 2020, 1661, 012018. [Google Scholar] [CrossRef]
  7. Merino, I.; Azpiazu, J.; Remazeilles, A.; Sierra, B. Histogram-Based Descriptor Subset Selection for Visual Recognition of Industrial Parts. Appl. Sci. 2020, 10, 3701. [Google Scholar] [CrossRef]
  8. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–448. [Google Scholar] [CrossRef]
  9. Jain, S.; Singhania, U.; Tripathy, B.; Nasr, E.A.; Aboudaif, M.K.; Kamrani, A.K. Deep Learning-Based Transfer Learning for Classification of Skin Cancer. Sensors 2021, 21, 8142. [Google Scholar] [CrossRef]
  10. Andriyanov, N.A.; Volkov, A.K.; Volkov, A.K.; Gladkikh, A.A. Research of recognition accuracy of dangerous and safe x-ray baggage images using neural network transfer learning. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Irkutsk, Russia, 21–26 September 2021; Volume 1061, p. 012002. [Google Scholar] [CrossRef]
  11. Abou Baker, N.; Zengeler, N.; Handmann, U. A Transfer Learning Evaluation of Deep Neural Networks for Image Classification. Mach. Learn. Knowl. Extr. 2022, 4, 22–41. [Google Scholar] [CrossRef]
  12. ImageNet Classification with Deep Convolutional Neural Networks. Available online: https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 21 September 2022).
  13. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR 2014, 15, 1929–1958. [Google Scholar]
  14. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar]
  15. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv 2014, arXiv:1311.2524. [Google Scholar]
  16. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2016, arXiv:1512.03385. [Google Scholar]
  17. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  18. Andriyanov, N.A.; Dementiev, V.E.; Tashlinskii, A.G. Detection of objects in the images: From likelihood relationships towards scalable and efficient neural networks. Comput. Opt. 2022, 46, 139–159. [Google Scholar] [CrossRef]
  19. Xu, Z.; Lan, S.; Yang, Z.; Cao, J.; Wu, Z.; Cheng, Y. MSB R-CNN: A Multi-Stage Balanced Defect Detection Network. Electronics 2021, 10, 1924. [Google Scholar] [CrossRef]
  20. Girshick, R. Fast R-CNN. arXiv 2015, arXiv:1504.08083. [Google Scholar]
  21. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]
  22. Andriyanov, N.; Khasanshin, I.; Utkin, D.; Gataullin, T.; Ignar, S.; Shumaev, V.; Soloviev, V. Intelligent System for Estimation of the Spatial Position of Apples Based on YOLOv3 and Real Sense Depth Camera D415. Symmetry 2022, 14, 148. [Google Scholar] [CrossRef]
  23. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2017, arXiv:1703.06870. [Google Scholar]
  24. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. Int Conf Learn. Represent. 2021, 1, 1–22. [Google Scholar]
  25. Andriyanov, N.; Papakostas, G. Optimization and Benchmarking of Convolutional Networks with Quantization and OpenVINO in Baggage Image Recognition. In Proceedings of the 2022 VIII International Conference on Information Technology and Nanotechnology (ITNT), Samara, Russia, 23–27 May 2022; pp. 1–4. [Google Scholar] [CrossRef]
  26. Bazi, Y.; Bashmal, L.; Rahhal, M.M.A.; Dayil, R.A.; Ajlan, N.A. Vision Transformers for Remote Sensing Image Classification. Remote Sens. 2021, 13, 516. [Google Scholar] [CrossRef]
  27. Barmpoutis, P.; Yuan, J.; Waddingham, W.; Ross, C.; Hamzeh, K.; Stathaki, T.; Alexander, D.C.; Jansen, M. Multi-scale Deformable Transformer for the Classification of Gastric Glands: The IMGL Dataset. In Cancer Prevention Through Early Detection; CaPTion 2022; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; Volume 13581. [Google Scholar] [CrossRef]
  28. Zhong, Y.; Deng, W. Face Transformer for Recognition. arXiv 2021, arXiv:2103.14803. [Google Scholar]
  29. Đurović, P.; Vidović, I.; Cupec, R. Semantic Component Association within Object Classes Based on Convex Polyhedrons. Appl. Sci. 2020, 10, 2641. [Google Scholar] [CrossRef]
  30. Bae, J.-H.; Yu, G.-H.; Lee, J.-H.; Vu, D.T.; Anh, L.H.; Kim, H.-G.; Kim, J.-Y. Superpixel Image Classification with Graph Convolutional Neural Networks Based on Learnable Positional Embedding. Appl. Sci. 2022, 12, 9176. [Google Scholar] [CrossRef]
  31. Yuan, Z.; Huang, W.; Tang, C.; Yang, A.; Luo, X. Graph-Based Embedding Smoothing Network for Few-Shot Scene Classification of Remote Sensing Images. Remote Sens. 2022, 14, 1161. [Google Scholar] [CrossRef]
  32. Azulay, A.; Weiss, Y. Why do deep convolutional networks generalize so poorly to small image transformations? J. Mach. Learn. Res. 2019, 20, 1–25. [Google Scholar]
  33. Dementyiev, V.E.; Andriyanov, N.A.; Vasilyiev, K.K. Use of Images Augmentation and Implementation of Doubly Stochastic Models for Improving Accuracy of Recognition Algorithms Based on Convolutional Neural Networks. In Proceedings of the 2020 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO), Svetlogorsk, Russia, 1–3 July 2020; pp. 1–4. [Google Scholar] [CrossRef]
  34. Andriyanov, N. Methods for Preventing Visual Attacks in Convolutional Neural Networks Based on Data Discard and Dimensionality Reduction. Appl. Sci. 2021, 11, 5235. [Google Scholar] [CrossRef]
  35. Pechyonkin, M. Understanding Hinton’s Capsule Networks. Part I: Intuition. Available online: https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b (accessed on 22 September 2022).
  36. Zuo, Z.; Shuai, B.; Wang, G.; Liu, X.; Wang, X.; Wang, B.; Chen, Y. Convolutional Recurrent Neural Networks: Learning Spatial Dependencies for Image Representation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 18–26. [Google Scholar]
  37. Dubrovin, E.; Popov, A. Graph representation methods for the Discrete mathematics Instructions Set computer. In Proceedings of the 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), Moscow, Russia, 27–30 January 2020; pp. 1925–1930. [Google Scholar]
  38. Li, C.; Makarychev, M.; Popov, A. Alternative approach to solving computer vision problems using graph structures. Proc. Math. Methods Eng. Technol. Int. Conf. 2019, 3, 30–37. [Google Scholar]
  39. Felzenszwalb, P.F.; Huttenlocher, D.P. Efficient Graph-Based Image Segmentation. Int. J. Comput. Vis. 2004, 59, 1–26. [Google Scholar] [CrossRef]
  40. Deng, G.; Cahill, L.W. An adaptive Gaussian filter for noise reduction and edge detection. Nucl. Sci. Symp. Med. Imaging Conf. 1993, 1, 1615–1620. [Google Scholar]
  41. Guo, R. Efficient Graph-based Image Segmentation (Code). Available online: https://github.com/RuoyuGuo/Efficient_Graph-based_Image_Segmentation (accessed on 14 October 2022).
  42. Andriyanov, N.A. Analysis of the acceleration of neural networks inference on Intel processors based on OpenVINO Toolkit. In Proceedings of the 2020 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO), Svetlogorsk (Kaliningrad region), Russia, 1–3 July 2020; Volume 1, pp. 1–4. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.