# NDARTS: A Differentiable Architecture Search Based on the Neumann Series

^{*}

## Abstract

**:**

## 1. Introduction

- Our proposed method, named NDARTS, utilizes the Neumann series to expand the super-gradient and employs approximations based on the representation of the super-gradient through the Implicit Function Theorem. Our experimental results demonstrate that NDARTS outperforms the baseline algorithm, DARTS, in terms of gradient approximation performance.
- We use an improved evolutionary strategy for parameter optimization between architecture searches, which is more in line with weight-sharing hyper-network design than gradient methods. The training design uses small batch samples to reduce the computational complexity during the training process.
- Our proposed algorithm achieves state-of-the-art performance in multiple datasets: CIFAR-10, CIFAR-100, and ImageNet.

## 2. Methods

#### 2.1. Overview: DARTS

- Create a mixed operation ${o}^{(i,j)}$ parameterized by ${\alpha}^{(i,j)}$ for each edge $(i,j)$, initialize the hyper-net weight $\omega $, and operate weight $\alpha $;
- Train on the training dataset ${D}_{train}$, update $\omega $ by $\omega =\omega -\gamma {\nabla}_{\omega}{L}_{1}(\omega ,\alpha )$;
- Train on the validation dataset ${D}_{val}$, update $\alpha $ by $\alpha =\alpha -{\gamma}_{\alpha}{\nabla}_{\alpha}{L}_{2}^{\prime}(\omega (\alpha ),\alpha )$;
- Stop when the termination condition is met, otherwise return to step 2.

#### 2.2. DARTS Optimization Algorithm Based on Neumann Series Approximation (NDARTS)

#### 2.2.1. NDARTS

- Create a mixed operation $o(i,j)$ parameterized by $\alpha (i,j)$ for each edge $(i,j)$, initialize the hyper-net weight $\omega $, and operate weight $\alpha $;
- Train on the training dataset ${D}_{train}$, using $\omega =\omega -\gamma {\nabla}_{\omega}{L}_{1}(\omega ,\alpha )$ to update $\omega $;
- Train on the validation dataset ${D}_{val}$, using $\alpha =\alpha -{\gamma}_{\alpha}{\nabla}_{\alpha}{L}_{2}^{\prime}(\omega (\alpha ),\alpha )$ to update $\alpha $;
- Stop when the termination condition is met, otherwise return to step 2.

#### 2.2.2. Proof

**Theorem 1.**

**Lemma 1.**

**Lemma 2.**

**Proof of Lemma 1.**

**Proof of Lemma 2.**

**Proof of Theorem 1.**

#### 2.3. Evolutionary Strategy

- Select two individuals as parents for recombination from the current generation $PO{P}_{iter}$. The choice of parents is unbiased.
- Generate a new individual through the recombination of the selected parents.
- Perform mutation and evaluation on the new individuals. At the end of the iteration, select $\mu $ superior individuals from the set of $\lambda $ offspring and $\mu $ parents to form a new generation of $PO{P}_{iter+1}$.

- Randomly sampling noise ${\u03f5}_{1}$, ${\u03f5}_{2}$, …, ${\u03f5}_{n}\in \mathcal{N}$(0, 1) from Gaussian distribution;
- Calculate the loss function value corresponding to each parameter ${L}_{i}=L(\omega +{\u03f5}_{i})$. Add individual ${\omega}_{0}=\omega +\alpha \frac{1}{n\sigma}{\displaystyle \sum _{i=1}^{n}}{L}_{i}{\u03f5}_{i}$ and corresponding fitness function value $L({\omega}_{0})$ to the population;
- Denote the smallest term in $L(\omega ),L({\omega}_{0}),L({\omega}_{1})\dots L({\omega}_{N})$ as $L({\omega}^{\prime})$, update the parameter $\omega ={\omega}^{\prime}$;
- Stop when the termination condition is met, otherwise return to step 1.

## 3. Experiment

#### 3.1. Simulation Experiment

#### 3.1.1. Dataset

#### 3.1.2. Algorithm Settings

- When optimizing neural network weight parameters through the gradient descent method, the optimization of shallower parameters relies on layer-by-layer back-propagation starting from the output layer, while the optimization of deeper parameters also relies on the feature data corresponding to the output values of shallower layers. Once subnetworks with different structures are obtained through training in this way, it is difficult to maintain the dependency relationship between shallow and deep layers, which can easily lead to significant deviations. During the training process using evolutionary algorithms, each weight parameter within the neural network holds a unique and independent position. This allows for individualized training and optimization of each parameter up to a certain extent. Then, different subnetworks can obtain better evaluation results when obtaining parameters from the super-network. Therefore, when conducting experiments in the DARTS search space, the improved evolutionary strategy in Section 2.3 was chosen to optimize the network weight parameters.
- Each epoch is trained using a random small batch of samples [37], which reduces the computational complexity of each epoch while maintaining the training effect. At the same time, the algorithm can break through the sample data size limit and can be extended to large data volumes for calculation.

#### 3.1.3. Search Space

#### 3.1.4. Ablation Experiment

#### 3.1.5. Performance Experiment

**Table 1.**Results of NDARTS (based on DARTS search space) and other NAS algorithms on the CIFAR-10 dataset.

Algorithm | Val Error (%) | Test Error (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|

RandomNAS [39] * | —— | 2.65 | 4.3 | DARTS | Random search |

AmoebaNet * | —— | 3.34 | 5.1 | NASNet | EA |

NASNet * | —— | 2.65 | 5.3 | NASNet | RL |

PNAS * | —— | 3.41 | 5.1 | DARTS | Sequential |

ENAS * | —— | 2.89 | 4.6 | NASNet | RL |

GDAS * [40] | —— | 2.93 | 3.4 | DARTS | Gradient |

SETN * [41] | —— | 2.69 | 4.6 | DARTS | Gradient |

FairDARTS [38] | 3.82 | 2.54 | 2.83 | DARTS | Gradient |

PDARTS | 3.99 | 2.50 | 3.4 | DARTS | Gradient |

PC-DARTS | 3.88 | 2.57 | 3.6 | DARTS | Gradient |

DARTS | 4.38 | 2.76 | 3.4 | DARTS | Gradient |

NDARTS | 3.98 | 2.37 | 3.8 | DARTS | Gradient |

**Table 2.**Results of NDARTS (based on DARTS search space) and other NAS algorithms on the CIFAR-100 dataset.

Algorithm | Val Error (%) | Test Error (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|

RandomNAS * | —— | 17.63 | 4.3 | DARTS | Random |

NASNet-A * | —— | 17.81 | 3.3 | NASNet | RL |

PNAS * | —— | 17.63 | 3.2 | DARTS | Sequential |

GDAS * | —— | 18.38 | 3.4 | DARTS | Gradient |

SETN * | —— | 17.25 | 4.6 | DARTS | Gradient |

PDARTS | 19.97 | 17.4 | 3.4 | DARTS | Gradient |

PC-DARTS | 20.25 | 17.11 | 3.6 | DARTS | Gradient |

DARTS | 21.42 | 17.54 | 3.4 | DARTS | Gradient |

NDARTS | 16.17 | 16.02 | 3.8 | DARTS | Gradient |

**Table 3.**Results of NDARTS (based on DARTS search space) and other NAS algorithms on the ImageNet dataset.

Algorithm | Top1 Test Error (%) | Top5 Test Error (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|

RandomNAS * | 27.1 | —— | 4.3 | DARTS | Random |

AmoebaNet-A * | 25.5 | 8.0 | 5.1 | NASNet | EA |

NASNet-A * | 26.0 | 8.4 | 3.3 | NASNet | RL |

PNAS * | 25.8 | 8.1 | 3.2 | DARTS | Sequential |

GDAS * | 26.0 | 8.5 | 3.4 | DARTS | Gradient |

SETN * | 25.7 | 8.0 | 4.6 | DARTS | Gradient |

SharpDARTS [42] | 25.1 | 7.8 | 4.9 | DARTS | Gradient |

PDARTS * | 24.4 | 7.4 | 3.4 | DARTS | Gradient |

PC-DARTS * | 25.1 | 7.8 | 3.6 | DARTS | Gradient |

DARTS | 26.9 | 8.7 | 3.4 | DARTS | Gradient |

NDARTS | 24.3 | 7.3 | 3.8 | DARTS | Gradient |

#### 3.2. Experiment Results

## 4. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

NAS | Neural Architecture Search |

RL | Reinforcement |

EA | Evolutionary Algorithm |

MDPI | Multidisciplinary Digital Publishing Institute |

## Appendix A

## References

- Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv
**2016**, arXiv:1609.08144. [Google Scholar] - Chen, M.X.; Firat, O.; Bapna, A.; Johnson, M.; Macherey, W.; Foster, G.; Jones, L.; Parmar, N.; Schuster, M.; Chen, Z.; et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. arXiv
**2016**, arXiv:1804.09849. [Google Scholar] [CrossRef] - Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations.Computational and Biological Learning Society, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv
**2017**, arXiv:1704.04861. [Google Scholar] - Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell.
**2017**, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] - Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Tech Report (v5). arXiv
**2014**, arXiv:1311.2524. [Google Scholar] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Dong, X.; Huang, J.; Yang, Y.; Yan, S. More is Less: A More Complicated Network with Less Inference Complexity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res.
**2019**, 20, 1997–2017. [Google Scholar] - Baker, B.; Gupta, O.; Naik, N.; Raskar, R. Designing Neural Network Architectures using Reinforcement Learning. arXiv
**2016**, arXiv:1611.02167. [Google Scholar] - Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Chen, X.; Wang, X. A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. ACM Comput. Surv.
**2022**, 54, 1–34. [Google Scholar] [CrossRef] - Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. arXiv
**2017**, arXiv:1707.07012. [Google Scholar] - Enzo, L.A.; Eduardo, L.; Vasty, Z.; Claudia, R.; John, M. Neural Architecture Search with Reinforcement Learning. arXiv
**2016**, arXiv:1611.01578. [Google Scholar] - Cai, H.; Chen, T.; Zhang, W.; Yu, Y.; Wang, J. Efficient Architecture Search by Network Transformation. arXiv
**2017**, arXiv:1707.04873. [Google Scholar] [CrossRef] - Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized Evolution for Image Classifier Architecture Search. Proc. Aaai Conf. Artif. Intell.
**2018**, 33, 4780–4789. [Google Scholar] [CrossRef] - Liu, H.; Simonyan, K.; Vinyals, O.; Fernando, C.; Kavukcuoglu, K. Hierarchical Representations for Efficient Architecture Search.International Conference on Learning Representations. arXiv
**2018**, arXiv:1711.00436. [Google Scholar] - Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y.L.; Tan, J.; Le, Q.V.; Kurakin, A. Large-Scale Evolution of Image Classifiers. arXiv
**2017**, arXiv:1703.01041. [Google Scholar] - Stanley, K.O.; Miikkulainen, R. Evolving neural networks through augmenting topologies. Evol. Comput.
**2002**, 10, 99–127. [Google Scholar] [CrossRef] [PubMed] - Xie, L.; Yuille, A. Genetic CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Liu, C.; Zoph, B.; Neumann, M.; Shlens, J.; Wei, H.; Li, L.-J.; Li, F.-F.; Yuille, A.; Huang, J.; Murphy, K. Progressive Neural Architecture Search. arXiv
**2017**, arXiv:1712.00559. [Google Scholar] - Kandasamy, K.; Neiswanger, W.; Schneider, J.; Póczos, B.; Xing, E.P. Neural Architecture Search with Bayesian Optimisation and Optimal Transport. arXiv
**2018**, arXiv:1802.07191. [Google Scholar] - Negrinho, R.; Gordon, G. DeepArchitect: Automatically Designing and Training Deep Architectures. arXiv
**2017**, arXiv:1704.08792. [Google Scholar] - Liu, H.; Simonyan, K.; Yang, Y. Darts: Differentiable architecture search. arXiv
**2018**, arXiv:1806.09055. [Google Scholar] - Chen, X.; Xie, L.; Wu, J.; Tian, Q. Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation. arXiv
**2019**, arXiv:1904.12760. [Google Scholar] - Xu, Y.; Xie, L.; Zhang, X.; Chen, X.; Qi, G.J.; Tian, Q.; Xiong, H. PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search. arXiv
**2019**, arXiv:1907.05737. [Google Scholar] - Cai, H.; Zhu, L.; Han, S. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. arXiv
**2018**, arXiv:1812.00332. [Google Scholar] - Zela, A.; Elsken, T.; Saikia, T.; Marrakchi, Y.; Brox, T.; Hutter, F. Understanding and Robustifying Differentiable Architecture Search. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Chen, X.; Hsieh, C.J. Stabilizing Differentiable Architecture Search via Perturbation-based Regularization. arXiv
**2020**, arXiv:2002.05283. [Google Scholar] - Xie, S.; Zheng, H.; Liu, C.; Lin, L. SNAS: Stochastic Neural Architecture Search. arXiv
**2018**, arXiv:1812.09926. [Google Scholar] - Liang, H.; Zhang, S.; Sun, J.; He, X.; Huang, W.; Zhuang, K.; Li, Z. DARTS+: Improved Differentiable Architecture Search with Early Stopping. arXiv
**2019**, arXiv:1909.06035. [Google Scholar] - Hou, P.; Jin, Y.; Chen, Y. Single-DARTS: Towards stable architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 373–382. [Google Scholar]
- Zhu, X.; Li, J.; Liu, Y.; Wang, W. Improving Differentiable Architecture Search via Self-Distillation. arXiv
**2023**, arXiv:2302.05629. [Google Scholar] [CrossRef] - Salimans, T.; Ho, J.; Chen, X.; Sidor, S.; Sutskever, I. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv
**2017**, arXiv:1703.03864. [Google Scholar] - Dong, X.; Yang, Y. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search. arXiv
**2020**, arXiv:2001.00326. [Google Scholar] - Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. In Handbook of Systemic Autoimmune Diseases; 2009; Volume 1, Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 7 October 2023).
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv
**2016**, arXiv:1609.04747. [Google Scholar] - Chu, X.; Zhou, T.; Zhang, B.; Li, J. Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search. arXiv
**2019**, arXiv:1911.12126. [Google Scholar] - Liam, L.I.; University, C.M.; Determined, A.I. Random Search and Reproducibility for Neural Architecture Search. arXiv
**2019**, arXiv:1902.07638. [Google Scholar] - Dong, X.; Yang, Y. Searching for A Robust Neural Architecture in Four GPU Hours. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Dong, X.; Yang, Y. One-Shot Neural Architecture Search via Self-Evaluated Template Network. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019 ICCV, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Hundt, A.; Jain, V.; Hager, G.D. SharpDARTS: Faster and More Accurate Differentiable Architecture Search. arXiv
**2019**, arXiv:1903.09900. [Google Scholar]

**Figure 16.**Performance of different algorithms on the CIFAR-10 dataset in the NAS-Bench-201 search space.

**Figure 17.**Performance of different algorithms on the CIFAR-100 dataset in the NAS-Bench-201 search space.

**Figure 18.**Performance of different algorithms on the ImageNet dataset in the NAS-Bench-201 search space.

**Table 4.**Results of NDARTS and other NAS algorithms on the CIFAR-10 dataset (based on the NAS-Bench-201 search space).

Algorithm | Val Acc (%) | Test Acc (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|

RandomNAS | 72.42 | 76.10 | 0.4 | NAS-Bench-201 | Random |

RL | 89.83 | 92.20 | 0.4 | NAS-Bench-201 | RL |

ENAS | 39.77 | 54.30 | 0.1 | NAS-Bench-201 | RL |

EA | 91.18 | 93.40 | 1.0 | NAS-Bench-201 | EA |

GDAS | 89.78 | 93.49 | 0.8 | NAS-Bench-201 | Gradient |

SETN | 84.24 | 87.67 | 0.3 | NAS-Bench-201 | Gradient |

DARTS | 39.77 | 54.30 | 0.1 | NAS-Bench-201 | Gradient |

NDARTS | 89.94 | 93.67 | 1.3 | NAS-Bench-201 | Gradient |

**Table 5.**Results of NDARTS and other NAS algorithms on the CIFAR-100 dataset (based on the NAS-Bench-201 search space).

Algorithm | Val Acc (%) | Test Acc (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|

RandomNAS | 47.54 | 46.94 | 0.4 | NAS-Bench-201 | Random |

RL | 68.72 | 67.80 | 0.4 | NAS-Bench-201 | RL |

ENAS | 15.03 | 15.61 | 0.1 | NAS-Bench-201 | RL |

EA | 70.88 | 71.00 | 1.0 | NAS-Bench-201 | EA |

GDAS | 71.28 | 70.28 | 0.8 | NAS-Bench-201 | Gradient |

SETN | 58.81 | 58.87 | 0.3 | NAS-Bench-201 | Gradient |

DARTS | 15.03 | 15.61 | 0.1 | NAS-Bench-201 | Gradient |

NDARTS | 71.37 | 70.91 | 1.3 | NAS-Bench-201 | Gradient |

**Table 6.**Results of NDARTS and other NAS algorithms on the ImageNet dataset (based on the NAS-Bench-201 search space).

Algorithm | Val Acc (%) | Test Acc (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|

RandomNAS | 16.53 | 15.63 | 0.4 | NAS-Bench-201 | Random |

RL | 41.90 | 42.23 | 0.4 | NAS-Bench-201 | RL |

ENAS | 16.43 | 16.32 | 0.1 | NAS-Bench-201 | RL |

EA | 44.03 | 44.23 | 1.0 | NAS-Bench-201 | EA |

GDAS | 43.47 | 43.10 | 0.8 | NAS-Bench-201 | Gradient |

SETN | 33.04 | 32.37 | 0.3 | NAS-Bench-201 | Gradient |

DARTS | 16.43 | 16.32 | 0.1 | NAS-Bench-201 | Gradient |

NDARTS | 40.66 | 41.02 | 1.3 | NAS-Bench-201 | Gradient |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Han, X.; Li, C.; Wang, Z.; Liu, G.
NDARTS: A Differentiable Architecture Search Based on the Neumann Series. *Algorithms* **2023**, *16*, 536.
https://doi.org/10.3390/a16120536

**AMA Style**

Han X, Li C, Wang Z, Liu G.
NDARTS: A Differentiable Architecture Search Based on the Neumann Series. *Algorithms*. 2023; 16(12):536.
https://doi.org/10.3390/a16120536

**Chicago/Turabian Style**

Han, Xiaoyu, Chenyu Li, Zifan Wang, and Guohua Liu.
2023. "NDARTS: A Differentiable Architecture Search Based on the Neumann Series" *Algorithms* 16, no. 12: 536.
https://doi.org/10.3390/a16120536