Next Article in Journal
Modeling and Evaluation of Power-Aware Software Rejuvenation in Cloud Systems
Previous Article in Journal
Incremental Learning for Classification of Unstructured Data Using Extreme Learning Machine
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks

1
Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China
2
Key Laboratory of Information Technology for Autonomous Underwater Vehicles, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China
3
University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Algorithms 2018, 11(10), 159; https://doi.org/10.3390/a11100159
Submission received: 10 September 2018 / Revised: 4 October 2018 / Accepted: 16 October 2018 / Published: 18 October 2018

Abstract

:
Convolutional neural networks have achieved remarkable improvements in image and video recognition but incur a heavy computational burden. To reduce the computational complexity of a convolutional neural network, this paper proposes an algorithm based on the Winograd minimal filtering algorithm and Strassen algorithm. Theoretical assessments of the proposed algorithm show that it can dramatically reduce computational complexity. Furthermore, the Visual Geometry Group (VGG) network is employed to evaluate the algorithm in practice. The results show that the proposed algorithm can provide the optimal performance by combining the savings of these two algorithms. It saves 75% of the runtime compared with the conventional algorithm.

1. Introduction

Deep convolutional neural networks have achieved remarkable improvements in image and video processing [1,2,3]. However, the computational complexity of these networks has also increased significantly. Since the prediction process of the networks used in real-time applications requires very low latency, the heavy computational burden is a major problem with these systems. Detecting faces from video imagery is still a challenging task [4,5]. The success of convolutional neural networks in these applications is limited by their heavy computational burden.
There have been a number of studies on accelerating the efficiency of convolutional neural networks. Denil et al. [6] indicate that there are significant redundancies in the parameterizations of neural networks. Han et al. [7] and Guo et al. [8] use certain training strategies to compress these neural network models without significantly weakening their performance. Some researchers [9,10,11] have found that low-precision computation is sufficient for the networks. Binary/Ternary Net [12,13] restricts the parameters to two or three values. Zhang et al. [14] used low-rank approximation to reconstruct the convolution matrix, which can reduce the complexity of convolution. These algorithms are effective in accelerating computation in the network, but they also cause a degradation in accuracy. Fast Fourier Transform (FFT) is also useful in reducing the computational complexity of convolutional neural networks without losing accuracy [15,16], but it is only effective for networks with large kernels. However, convolutional neural networks tend to use small kernels because they achieve better accuracy than networks with larger kernels [1]. For these reasons, there is a demand for an algorithm that can accelerate the efficiency of networks with small kernels.
In this paper, we present an algorithm based on Winograd’s minimal filtering algorithm which was proposed by Toom [17] and Cook [18] and generalized by Winograd [19]. The minimal filtering algorithm can reduce the computational complexity of each convolution in the network without losing accuracy. However, the computational complexity is still large for real-time requirements. To reduce further the computational complexity of these networks, we utilize the Strassen algorithm to reduce the number of convolutions in the network simultaneously. Moreover, we evaluate our algorithm with the Visual Geometry Group (VGG) network. Experimental results show that it can save 75% of the time spent on computation when the batch size is 32.
The rest of this paper is organized as follows. Section 2 reviews related work on convolutional neural networks, the Winograd algorithm and the Strassen algorithm. The proposed algorithm is presented in Section 3. Several simulations are included in Section 4, and the work is concluded in Section 5.

2. Related Work

2.1. Convolutional Neural Networks

Machine-learning has produced impressive results in many signal processing applications [20,21]. Convolutional neural networks extend the machine-learning capabilities of neural networks by introducing convolutional layers to the network. Convolutional neural networks are mainly used in image processing. Figure 1 shows the structure of a classical convolutional neural network, LeNet. It consists of two convolutional layers, two subsampling layers and three fully connected layers. Usually, the computation of the convolutional layers occupies most of the network.
Convolutional layers extract features from the input feature maps via different kernels. Suppose there are Q input feature maps of size Mx × Nx and R output feature maps of size My × Ny. The size of the convolutional kernel is Mw × Nw. The computation of the output in a single layer is given by the equation
y r , x , y = q = 1 Q u = 1 Mw v = 1 Nw w r , q , u , v x q , x + u , y + v ,
where X is the input feature map, Y is the output feature map, and W is the kernel. The subscripts x and y indicate the position of the pixel in the feature map. The subscripts u and v indicate the position of the parameter in the kernel. Equation (1) can be rewritten as Equation (2).
y r = q = 1 Q w r , q x q  
Suppose there are P images that are sent together to the neural network, which means the batch size is P. Then the output Y in Equation (2) can be expressed by Equation (3).
y r , p = q = 1 Q w r , q x q , p  
If we regard the yr,p, wr,q and xq,p as the elements of the matrices Y, W and X, respectively, the output can be expressed as the convolution matrix in Equation (4).
Y = W X  
Y = [ y 1 , 1 y 1 , P y R , 1 y R , P ]   W = [ w 1 , 1 w 1 , Q w R , 1 w R , Q ]   X = [ x 1 , 1 x 1 , P x Q , 1 x Q , P ]  
Matrix Y and matrix X are special matrices of feature maps. Matrix W is a special matrix of kernels. This convolutional matrix provides a new view of the computation of the output Y.

2.2. Winograd Algorithm

We denote an r-tap FIR filter with m outputs as F(m, r). The conventional algorithm for F(2, 3) is shown in Equation (6), where d0, d1, d2 and d3 are the inputs of the filter, and h0, h1 and h2 are the parameters of the filter. As Equation (6) shows, it uses 6 multiplications and 4 additions to compute F(2, 3).
F ( 2 , 3 ) = [ d 0 d 1 d 2 d 1 d 2 d 3 ] [ h 0 h 1 h 2 ] = [ d 0 h 0 + d 1 h 1 + d 2 h 2 d 1 h 0 + d 2 h 1 + d 3 h 2 ]  
If we use the minimal filtering algorithm [19] to compute F(m, r), it requires (m + r – 1) multiplications. The process of the algorithm for computing F(2, 3) is shown in Equations (7)–(11).
m 1 = ( d 0 d 2 ) h 0  
m 2 = ( d 1 + d 2 ) 1 2 ( h 0 + h 1 + h 2 )  
m 3 = ( d 2 d 1 ) 1 2 ( h 0 h 1 + h 2 )  
m 4 = ( d 1 d 3 ) h 2  
F ( 2 , 3 ) = [ d 0 h 0 + d 1 h 1 + d 2 h 2 d 1 h 0 + d 2 h 1 + d 3 h 2 ] = [ m 1 + m 2 + m 3 m 2 m 3 m 4 ]  
The computation can be written in matrix form as Equation (12).
F ( 2 , 3 ) = [ 1 1 1 0 0 1 1 1 ] ( ( [ 1 0 0 1 2 1 2 1 2 1 2 1 2 1 2 0 0 1 ] [ h 0 h 1 h 2 ] ) ( [ 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 1 ] [ d 0 d 1 d 2 d 3 ] ) )  
We substitute A, G and B for the matrices in Equation (12). Equation (12) can then be rewritten as Equation (13).
Y = A T ( ( G h ) ( B T d ) ) ,
A = [ 1 0 1 1 1 1 0 1 ] , G = [ 1 0 0 1 2 1 2 1 2 1 2 1 2 1 2 0 0 1 ] , B = [ 1 0 0 0 0 1 - 1 1 1 1 1 0 0 0 0 1 ]  
In Equation (13), indicates element-wise multiplication, and the superscript T indicates the transpose operator. A, G and B are defined in Equation (14).
We can see from Equation (7) to Equation (11) that the whole process needs 4 multiplications. However, it also needs 4 additions to transform data, 3 additions and 2 multiplications by a constant to transform the filter, and 4 additions to transform the final result. (To compare the complexity easily, we regard the multiplication by a constant as an addition.)
The 2-dimensional filters F(m × m, r × r) can be generalized by the filter F(m, r) as Equation (15) [22].
Y = A T ( ( G h G T ) ( B T d B ) ) A  
F(2 × 2, 3 × 3) needs 4 × 4 = 16 multiplications, 32 additions to transform data, 28 additions to transform the filter, and 24 additions to transform the final result. The conventional algorithm needs 36 multiplications to calculate the result. This algorithm can reduce the number of multiplications from 36 to 16.
F(2 × 2, 3 × 3) can be used to compute the convolutional layer with 3 × 3 kernels. Each input feature map can be divided into smaller feature maps in order to use Equation (15). If we substitute U = GwGT and V = BT × B, then Equation (3) can be rewritten as Equation (16).
y r , p = q = 1 Q w r , q x q , p = q = 1 Q A T ( ( G w G T ) ( B T × B ) ) A = q = 1 Q A T ( U r , q V q , p ) A

2.3. Strassen Algorithm

Suppose there are two matrices A and B, and matrix C is the product of A and B. The numbers of the elements in both rows and columns of A, B and C are even. We can partition A, B and C into block matrices of equal sizes as follows:
A = [ A 1 , 1 A 1 , 2 A 2 , 1 A 2 , 2 ] , B = [ B 1 , 1 B 1 , 2 B 2 , 1 B 2 , 2 ] , C = [ C 1 , 1 C 1 , 2 C 2 , 1 C 2 , 2 ]  
According to the conventional matrix multiplication algorithm, we then have Equation (18).
C = A × B = [ A 1 , 1 B 1 , 1 + A 1 , 2 B 2 , 1 A 1 , 1 B 1 , 2 + A 1 , 2 B 2 , 2 A 2 , 1 B 1 , 1 + A 2 , 2 B 2 , 1 A 2 , 1 B 1 , 2 + A 2 , 2 B 2 , 2 ]  
As Equation (18) shows, we need 8 multiplications and 4 additions to complete matrix C. The Strassen algorithm can be used to reduce the number of multiplications [23]. The process of the Strassen algorithm is shown as follows:
I = ( A 1 , 1 + A 2 , 2 ) × ( B 1 , 1 + B 2 , 2 ) ,
I I = ( A 2 , 1 + A 2 , 2 ) × B 1 , 1 ,
I I I = A 1 , 1 × ( B 1 , 2 B 2 , 2 ) ,
I V = A 2 , 2 × ( B 2 , 1 B 1 , 1 ) ,
V = ( A 1 , 1 + A 2 , 2 ) × B 2 , 2 ,
VI = ( A 2 , 1 A 1 , 1 ) × ( B 1 , 1 + B 1 , 2 ) ,
VII = ( A 1 , 2 A 2 , 2 ) × ( B 2 , 1 + B 2 , 2 ) ,
C 1 , 1 = I + IV V + VII ,
C 1 , 2 = III + V ,
C 2 , 1 = II + IV ,
C 2 , 2 = I II + III + VI ,
where I, II, III, IV, V, VI, VII are temporary matrices. The whole process requires 7 multiplications and 18 additions. It reduces the number of multiplications from 8 to 7 without changing the computational results. More multiplications can be saved by using the Strassen algorithm recursively, as long as the numbers of rows and columns of the submatrices are even. If we use N recursions of the Strassen algorithm, then it can save 1 − (7/8)N multiplications. The Strassen algorithm is suitable for the special convolutional matrix in Equation (4) [24]. Therefore, we can use the Strassen algorithm to handle a convolutional matrix.

3. Proposed Algorithm

As we can see from Section 2.2, the Winograd algorithm incurs more additions. To avoid repeating the transform of W and X in Equation (16), we calculate the matrices U and V separately. This can reduce the number of additions incurred by this algorithm. The practical implementation of this algorithm is listed in Algorithm 1. The calculation of output M in Algorithm 1 is the main complexity of multiplication in the whole computation process. To reduce the computational complexity of output M, we can use the Strassen algorithm. Before using the Strassen algorithm, we need to reform the expression of M as follows.
The output M in Algorithm 1 can be written as the equation
M r , p = q = 1 Q ( A T ( U r , q V q , p ) A ) ,
where Ur,q and Vq,p are temporary matrices, and A is the constant parameter matrix. To show the equation easily, we ignore matrix A. (Matrix A is not ignored in the actual implementation of the algorithm.) The output M can then be written as shown in Equation (31).
M r , p = q = 1 Q ( U r , q V q , p )  
We denote three special matrices M, U and V. Mr,p, Ur,q, and Vq,p are the elements of the matrices M, U and V, respectively, as shown in Equation (33). The output M can then be written as a multiplication of matrix U and matrix V.
M = U × V  
M = [ M 1 , 1 M 1 , P M R , 1 M R , P ]   U = [ U 1 , 1 U 1 , Q U R , 1 U R , Q ]   V = [ V 1 , 1 V 1 , P V Q , 1 V Q , P ]  
In this case, we can partition the matrices M, U and V into equal-sized block matrices, and then use the Strassen algorithm to reduce the number of multiplications between Ur,q and Vq,p. The multiplication in the Strassen algorithm is redefined as the element-wise multiplication of matrices Ur,q and Vq,p. We name this new combination as the Strassen-Winograd algorithm.
Algorithm 1. Implementation of the Winograd algorithm.
1 for r = 1 to the number of output maps
2      for q = 1 to the number of input maps
3           U = GwGT
4       end
5  end
6  for p = 1 to batch size
7       for q = 1 to the number of input maps
8           for k = 1 to the number of image tiles
9               V = BTxB
10         end
11       end
12  end
13  for p = 1 to batch size
14       for r = 1 to the number of output maps
15           for j = 1 to the number of image tiles
16               M = zero;
17               for q = 1 to the number of input maps
18                   M = M + AT(U V)A
19               end
20           end
21       end
22  end
To compare theoretically the computational complexity of the conventional algorithm, Strassen algorithm, Winograd algorithm, and Strassen-Winograd algorithm, we list the complexity of multiplication and addition in Table 1. The output feature map size is set to 64 × 64, and the kernel size is set to 3 × 3.
We can see from Table 1 that, although the algorithms cause more additions when the matrix size is small, the number of extra additions is less than the number of decreased multiplications. Moreover, multiplication usually costs more time than addition. Hence the three algorithms are all theoretically effective in reducing the computational complexity.
Figure 2 shows a comparison of the computational complexity ratios. The Strassen algorithm shows less reduction of multiplication when the matrix size is small, but it incurs less additions. The Winograd algorithm shows a stable performance. Moreover, the number of additions slightly decreases as the matrix size increases. For small-sized matrices, the Strassen-Winograd algorithm shows a much better reduction in multiplication complexity than the Strassen algorithm. Although it incurs more additions, the number of extra additions is much less than the number of decreased multiplications. The Strassen-Winograd algorithm shows a similar performance to the Winograd algorithm. When the matrix size is small, the Winograd algorithm shows a slightly better performance, whereas the Strassen-Winograd algorithm and Strassen algorithm perform much better as the matrix size increases.

4. Simulation Results

Several simulations were conducted to evaluate our algorithm. We compare our algorithm with the Strassen algorithm and Winograd algorithm, measuring performance by the runtime in MATLAB R2013b (CPU: Inter(R) Core(TM) i7-3370K). For objectivity, we apply Equation (18) to the conventional algorithm and use it as a benchmark. Moreover, all the input data x and kernel w are randomly generated. We measure the accuracy of our algorithm by the absolute element error in the output feature maps. As a benchmark, we use the conventional algorithm with double precision data, kernels, middle variables and outputs. The other algorithms in this comparison use double precision data and kernels but single precision middle variables and outputs.
The VGG network [1] was applied to our simulation. There are nine different convolutional layers in the VGG network. The parameters of the convolutional layer are shown in Table 2. The depth indicates the number of times a layer occurs in the network. Q indicates the number of input feature maps. R indicates the number of output feature maps. Mw and Nw represent the size of the kernel. My and Ny represent the size of the output feature map. The size of the kernel in the VGG network is 3 × 3. We apply F(2 × 2, 3 × 3) to the operation of convolution. For the computation of the output feature map with size My × Ny, the map is partitioned into (My/2) × (Ny/2) sets, each using one computation of F(2 × 2, 3 × 3).
As Table 2 shows, the numbers of rows and columns are not always even, and the matrices are not always square. To solve this problem, we pad a dummy row or column in the matrices once we encounter an odd number of rows or columns. The matrix can then continue using the Strassen algorithm. We apply these nine convolutional layers in turn to our simulations. For each convolutional layer, we run the four algorithms with different batch sizes from 1 to 32. The runtime consumption of the algorithms is listed in Table 3, and the numerical accuracy of the different algorithms in different layers is shown in Table 4.
Table 4 shows that the Winograd algorithm is slightly more accurate than the Strassen algorithm and Strassen-Winograd algorithm. The maximum element error of these algorithms is 6.16 × 10−4. Compared with the minimum value of 1.09 × 103 in the output feature map, the accuracy loss incurred by these algorithms is negligible. As we can see from Section 2, theoretically, the processes in all of these algorithms do not result in a loss in accuracy. In practice, a loss in accuracy is mainly caused by the single precision data. Because the conventional algorithm with low precision data is sufficiently accurate for deep learning [10,11], we conclude that the accuracy of our algorithm is equally sufficient.
To compare runtime easily, we use the conventional algorithm as a benchmark, and calculate the saving on runtime displayed by the other algorithms. The result is shown in Figure 3.
The Strassen-Winograd algorithm shows a better performance than the benchmark in all layers except layer1. This is because the number of input feature maps Q in layer1 is three, which limits the performance of the algorithm as a small matrix size incurs more additions. Moreover, odd numbers of rows or columns need dummy rows or columns for matrix partitioning, which causes more runtime.
The performance of the Winograd algorithm is stable from layer2 to layer9. It saves 53% of the runtime on average, which is close to the 56% reduction in multiplications. The performances of the Strassen algorithm and Strassen-Winograd algorithm improve as the batch size increases. For example, in layer7, when the batch size is 1, we cannot partition the matrix to use the Strassen algorithm, and there is almost no saving on runtime. The Strassen-Winograd algorithm saves 52% of the runtime, a similar saving as the Winograd algorithm. When the batch size is 2, the Strassen algorithm saves 13% of the runtime, which equates to the 13% reduction in multiplications. The Strassen-Winograd algorithm saves 58% of the runtime, which is close to the 61% reduction in multiplications. As the batch size increases, the Strassen algorithm and Strassen-Winograd algorithm can use more recursions, which can further reduce the number of multiplications and save more runtime. When the batch size is 32, the Strassen-Winograd algorithm saves 75% of the runtime, while the Strassen algorithm and Winograd algorithm save 49% and 53%, respectively.
Though experiments with larger batch sizes were not carried out due to limitations on time and memory, we can see the trend in performance as the batch size increases. This is consistent with the theoretical analysis in Section 3. We conclude therefore that the proposed algorithm can provide the optimal performance by combining the savings of these two algorithms.

5. Conclusions and Future Work

The computational complexity of convolutional neural networks is an urgent problem for real-time applications. Both the Strassen algorithm and Winograd algorithm are effective in reducing the computational complexity without losing accuracy. This paper proposed to combine these algorithms to reduce the heavy computational burden. The proposed strategy was evaluated with the VGG network. Both the theoretical performance assessment and the experimental results show that the Strassen-Winograd algorithm can dramatically reduce the computational complexity.
There remain limitations that need to be addressed in future research. Although the algorithm reduces the computational complexity of convolutional neural networks, the cost is an increased difficulty in implementation, especially in real-time systems and embedded devices. It also increases the difficulty of parallelizing an artificial network for hardware acceleration. In future work, we aim to apply this method to hardware accelerator using practical applications.

Author Contributions

Y.Z. performed the experiments and wrote the paper. D.W. provided suggestions about the algorithm. L.W. analyzed the complexity of the algorithms. P.L. checked the paper.

Funding

This research received no external funding.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Granted 61801469, and in part by the Young Talent Program of Institute of Acoustics, Chinese Academy of Science, under Granted QNYC201622.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv, 2014; arXiv:1409.1556. [Google Scholar]
  2. Liu, N.; Wan, L.; Zhang, Y.; Zhou, T.; Huo, H.; Fang, T. Exploiting Convolutional Neural Networks with Deeply Local Description for Remote Sensing Image Classification. IEEE Access 2018, 6, 11215–11228. [Google Scholar] [CrossRef]
  3. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems, 3–6 December 2012; Volume 60, pp. 1097–1105. [Google Scholar]
  4. Le, N.M.; Granger, E.; Kiran, M. A comparison of CNN-based face and head detectors for real-time video surveillance applications. In Proceedings of the Seventh International Conference on Image Processing Theory, Tools and Applications, Montreal, QC, Canada, 28 November–1 December 2018. [Google Scholar]
  5. Ren, S.; He, K.; Girshick, R. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 91–99. [Google Scholar]
  6. Denil, M.; Shakibi, B.; Dinh, L. Predicting Parameters in Deep Learning. In Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2148–2156. [Google Scholar]
  7. Han, S.; Pool, J.; Tran, J. Learning both Weights and Connections for Efficient Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems, Istanbul, Turkey, 9–12 November 2015; pp. 1135–1143. [Google Scholar]
  8. Guo, Y.; Yao, A.; Chen, Y. Dynamic Network Surgery for Efficient DNNs. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2016; pp. 1379–1387. [Google Scholar]
  9. Qiu, J.; Wang, J.; Yao, S. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 21–23 February 2016; pp. 26–35. [Google Scholar]
  10. Courbariaux, M.; Bengio, Y.; David, J.P. Low Precision Arithmetic for Deep Learning. arXiv, 2014; arXiv:1412.0724. [Google Scholar]
  11. Gupta, S.; Agrawal, A.; Gopalakrishnan, K.; Narayanan, P. Deep Learning with Limited Numerical Precision. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015. [Google Scholar]
  12. Rastegari, M.; Ordonez, V.; Redmon, J. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 525–542. [Google Scholar]
  13. Zhu, C.; Han, S.; Mao, H. Trained Ternary Quantization. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
  14. Zhang, X.; Zou, J.; Ming, X.; Sun, J. Efficient and accurate approximations of nonlinear convolutional networks. In Proceedings of the Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2014; pp. 1984–1992. [Google Scholar]
  15. Mathieu, M.; Henaff, M.; Lecun, Y.; Chintala, S.; Piantino, S.; Lecun, Y. Fast Training of Convolutional Networks through FFTs. arXiv, 2013; arXiv:1312.5851. [Google Scholar]
  16. Vasilache, N.; Johnson, J.; Mathieu, M. Fast Convolutional Nets with fbfft: A GPU Performance Evaluation. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  17. Toom, A.L. The complexity of a scheme of functional elements simulating the multiplication of integers. Dokl. Akad. Nauk SSSR 1963, 150, 496–498. [Google Scholar]
  18. Cook, S.A. On the Minimum Computation Time for Multiplication. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1966. [Google Scholar]
  19. Winograd, S. Arithmetic Complexity of Computations; SIAM: Philadelphia, PA, USA, 1980. [Google Scholar]
  20. Jiao, Y.; Zhang, Y.; Wang, Y.; Wang, B.; Jin, J.; Wang, X. A novel multilayer correlation maximization model for improving CCA-based frequency recognition in SSVEP brain-computer interface. Int. J. Neural Syst. 2018, 28, 1750039. [Google Scholar] [CrossRef] [PubMed]
  21. Wang, R.; Zhang, Y.; Zhang, L. An adaptive neural network approach for operator functional state prediction using psychophysiological data. Integr. Comput. Aided Eng. 2015, 23, 81–97. [Google Scholar] [CrossRef]
  22. Lavin, A.; Gray, S. Fast Algorithms for Convolutional Neural Networks. In Proceedings of the Computer Vision and Pattern Recognition, Caesars Palace, NV, USA, 26 June–1 July 2016; pp. 4013–4021. [Google Scholar]
  23. Strassen, V. Gaussian elimination is not optimal. Numer. Math. 1969, 13, 354–356. [Google Scholar] [CrossRef]
  24. Cong, J.; Xiao, B. Minimizing Computation in Convolutional Neural Networks. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2014, Hamburg, Germany, 15–19 September 2014. [Google Scholar]
Figure 1. The architecture of LeNet.
Figure 1. The architecture of LeNet.
Algorithms 11 00159 g001
Figure 2. Comparisons of the complexity ratio with different matrix sizes.
Figure 2. Comparisons of the complexity ratio with different matrix sizes.
Algorithms 11 00159 g002
Figure 3. Comparisons with different batch sizes.
Figure 3. Comparisons with different batch sizes.
Algorithms 11 00159 g003
Table 1. Computational complexity of different algorithms.
Table 1. Computational complexity of different algorithms.
Matrix SizeConventionalStrassen WinogradStrassen-Winograd
NMulAddMulAddMulAddMulAdd
2294,912278,528258,048303,104131,072344,176114,688401,520
42,359,2962,293,7601,806,3362,416,6401,048,5762,294,208802,8162,908,608
81.89 × 1071.86 × 1071.26 × 1071.81 × 1078.39 × 1061.65 × 1075.62 × 1062.15 × 107
161.51 × 1081.50 × 1088.85 × 1071.31 × 1086.71 × 1071.25 × 1083.93 × 1071.62 × 108
321.21 × 1091.20 × 1096.20 × 1089.39 × 1085.37 × 1089.69 × 1082.75 × 1081.23 × 109
649.66 × 1099.65 × 1094.34 × 1096.65 × 1094.29 × 1097.63 × 1091.93 × 1099.37 × 109
1287.73 × 10107.72 × 10103.04 × 10104.68 × 10103.44 × 10106.06 × 10101.35 × 10107.19 × 1010
2566.18 × 10116.18 × 10112.13 × 10113.29 × 10112.75 × 10114.83 × 10119.45 × 10105.55 × 1011
5124.95 × 10124.95 × 10121.49 × 10122.31 × 10122.20 × 10123.86 × 10126.61 × 10114.29 × 1012
Table 2. Parameters of the convolutional layers in the Visual Geometry Group (VGG) network.
Table 2. Parameters of the convolutional layers in the Visual Geometry Group (VGG) network.
Convolutional Layer123456789
ParametersDepth111113134
Q36464128128256256512512
R6464128128256256512512512
Mw(Nw)333333333
My(Ny)2242241121125656282814
Table 3. Runtime consumption of different algorithms.
Table 3. Runtime consumption of different algorithms.
LayerBatch Size12481632
Layer1Conventional24s48s94s187s375s752s
Strassen24s56s95s191s383s768s
Winograd14s29s57s115s230s462s
Strassen-Winograd14s33s58s117s234s470s
Layer2Conventional493s986s1971s3939s7888s15821s
Strassen492s861s1508s2636s4625s8438s
Winograd299s598s1196s2396s4787s9935s
Strassen-Winograd299s543s992s1818s3348s6468s
Layer3Conventional245s490s980s1962s3916s7858s
Strassen247s433s759s1328s2325s4076s
Winograd128s256s513s1025s2049s4102s
Strassen-Winograd128s229s411s737s1335s2417s
Layer4Conventional488s978s1954s3908s7819s15639s
Strassen494s864s1513s2648s4626s8140s
Winograd254s509s1017s2033s4075s8168s
Strassen-Winograd254s455s814s1466s2645s4811s
Layer5Conventional250s502s1007s2004s4012s8076s
Strassen248s436s761s1328s2317s4078s
Winograd118s236s471s942s1881s3776s
Strassen-Winograd118s209s370s656s1167s2085s
Layer6Conventional498s1001s1998s3995s7948s15892s
Strassen494s868s1507s2646s4643s8102s
Winograd231s462s923s1844s3693s7382s
Strassen-Winograd231s410s725s1286s2296s4089s
Layer7Conventional244s487s980s1940s3910s7820s
Strassen241s421s739s1283s2250s3961s
Winograd116s231s461s920s1839s3680s
Strassen-Winograd116s204s358s630s1111s1961s
Layer8Conventional479s955s1917s3833s7675s15319s
Strassen474s829s1453s2546s4447s7811s
Winograd222s443s884s1766s3524s7068s
Strassen-Winograd223s391s686s1210s2129s3772s
Layer9Conventional118s237s474s951s1900s3823s
Strassen117s206s362s631s1107s1937s
Winograd65s128s254s507s1009s2010s
Strassen-Winograd65s113s197s345s606s1063s
Table 4. Maximum element error of different algorithms in different layers.
Table 4. Maximum element error of different algorithms in different layers.
ConventionalStrassenWinogradStrassen-Winograd
Layer11.25 × 10−63.03 × 10−62.68 × 10−64.01 × 10−6
Layer22.46 × 10−57.59 × 10−54.62 × 10−59.50 × 10−5
Layer32.65 × 10−57.23 × 10−54.83 × 10−59.51 × 10−5
Layer44.94 × 10−51.50 × 10−49.40 × 10−51.78 × 10−4
Layer55.14 × 10−51.46 × 10−41.00 × 10−41.74 × 10−4
Layer69.80 × 10−52.94 × 10−41.88 × 10−43.50 × 10−4
Layer79.92 × 10−52.82 × 10−41.79 × 10−43.39 × 10−4
Layer82.09 × 10−45.89 × 10−43.51 × 10−46.99 × 10−4
Layer91.84 × 10−45.76 × 10−43.50 × 10−46.16 × 10−4

Share and Cite

MDPI and ACS Style

Zhao, Y.; Wang, D.; Wang, L.; Liu, P. A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks. Algorithms 2018, 11, 159. https://doi.org/10.3390/a11100159

AMA Style

Zhao Y, Wang D, Wang L, Liu P. A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks. Algorithms. 2018; 11(10):159. https://doi.org/10.3390/a11100159

Chicago/Turabian Style

Zhao, Yulin, Donghui Wang, Leiou Wang, and Peng Liu. 2018. "A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks" Algorithms 11, no. 10: 159. https://doi.org/10.3390/a11100159

APA Style

Zhao, Y., Wang, D., Wang, L., & Liu, P. (2018). A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks. Algorithms, 11(10), 159. https://doi.org/10.3390/a11100159

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop