Need Help?
30 May 2024
Interview with Dr. Haotong Qin—Winner of the Electronics 2023 Best Ph.D. Thesis Award

We are pleased to announce one of the winners of the Electronics 2023 Best Ph.D. Thesis Award. This award is for a Ph.D. student or recently qualified researcher who has produced a highly anticipated thesis with impressive academic potential.
The award has been granted to “Hardware-friendly Low-bit Quantization for Neural Network Compression” by Dr. Haotong Qin, Beihang University, China.
The winner will receive CHF 800, a certificate, and a chance to publish a paper free of charge after peer review in Electronics (ISSN: 2079-9292) in 2024.
We congratulate Dr. Haotong Qin on his accomplishments. We would like to take this opportunity to thank all the applicants for submitting their exceptional theses and the Award Committee for voting for and supporting this award.
Dr. Haotong Qin is currently a postdoctoral researcher at the Center for Project-Based Learning (PBL), ETH Zürich, Switzerland, working with PD Dr. Michele Magno. Previously, he received his Ph.D. degree from the School of Computer Science and Engineering (SCSE) and Shen Yuan Honors College, Beihang University in 2024, supervised by Prof. Wei Li and Prof. Xianglong Liu. He was a visiting Ph.D. student at Computer Vision Lab, ETH Zürich. He obtained his B.Sc. degree from SCSE, Beihang University in 2019. He interned at ByteDance, Tencent, and Microsoft Research Asia. He was awarded the 2023 Baidu Scholarship, 2022 ByteDance Scholarship, 2023 KAUST Rising Stars in AI, and 2023 DAAD AInet Fellowship, etc.
His research interests broadly include efficient deep learning, specifically with a focus on deep model compression (e.g., network binarization, quantization, and distillation), efficient generative model (e.g., efficient large language models and diffusion models), neuromorphic computing (e.g., spiking neural network), hardware acceleration (e.g., hardware-aware architecture search), etc. He is/was an Area Chair for BMVC'2024; a Program Committee member for ICML'(2023-2024), NeurIPS'2023, ICLR'2024, CVPR'(2023-2024), ICCV'2023, ECCV'2024, etc.; and an Organizer or Challenge Chair for workshops at CVPR'(2022-2024), AAAI'(2022-2023), and IEEE CAI'2024.
The following is an interview with Dr. Haotong Qin:
1. Could you please give us a brief overview of your research topic and the main objectives of your Ph.D. thesis?
As the thesis title describes, the research focuses on neural network compression, which is a technology that works on compressed bit-width in neural networks. Usually, we use 32-bit parameters for storage and computation. When compressing parameters to a lower bit width such as 8-bit, 2-bit, or even 1-bit, the computation can be very efficient, and the redundant parameter also allows for accuracy retention at a satisfactory level. Thus, my Ph.D. thesis engages in making the quantization technology work in a resource-limited scenario. One aspect is to construct accurate binarized/quantized networks for limited inference resources. For example, when we want to deploy networks on FPGA or mobile devices, the computation units are very limited compared to GPU servers, so they should first be compressed and then deployed on hardware devices for inference. The other aspect refers to the production because the quantization process also costs data and computation resources. For example, we research some post-training quantization, which means that we do not need to retain the whole network or use the whole data set in the quantization process; then, the production of quantization is very efficient. In summary, my Ph.D. research focused on enabling efficient deep learning through low-bit quantization.
2. What motivated you to pursue this research topic, and how did you formulate your research questions?
When I found out that the advanced deep learning models cost more and more with their rapid development, I started considering the topic. For example, the edge CPU is one of the most frequently used devices, but its challenge is much more significant when deploying deep neural networks because of its limited computation resources. If the computation resources requirements become much smaller, others can share the advanced AI fairly. So, I thought we should study how to reduce the computation resources of advanced AI models, such as neural networks. My initial research consisted of an extreme 1-bit compression technique named IR-Net. After the study, I found that advanced neural networks can be deployed and run on edge devices.
3. How did you manage your time and prioritize your tasks during your Ph.D. program, and what strategies did you use to stay focused and motivated?
For my Ph.D. study, my plan was to first conduct extensive surveys of existing works, then propose innovative quantization technologies, and finally to try to make systematic contributions to the field of model quantization study. Therefore, in 2018, around one year before I started my Ph.D. program, I began to organize a survey paper about neural network binarization. From then on, I have had a general understanding of the model compression areas. After that, I published my first paper on the topic of a binarized neural network, which first pointed out the importance of information retention. However, I found that it was too generic. Due to the different tasks and architecture, it struggled to function well on all tasks. In the following research, I began to build accurate quantization operator sets for different architectures and built efficient quantization pipelines for different data and computing resources. These pushed the quantization to be accurate and efficient and, finally, practical in real scenarios. In the last year of my Ph.D., we also built a benchmark for binarization to reflect on existing works that were not practical enough and to point out how to build an accurate and efficient 1-bit quantization method.
4. What were some of the biggest challenges you faced during your Ph.D. journey, and how did you overcome them?
The biggest challenge was to publish my first proposed method, as well as the first paper, at the top AI conference. It may be the biggest challenge for most Ph.D. students at the beginning of their journey, from 0 to 1. In the writing of the paper, the challenge focused on how to present my research or technologies structurally because when I first completed the study, I did not know how to correctly write a paper that was acceptable for the AI conference. Significant efforts and practice have been made for me to achieve this. But looking at it now, it's worth it.
5. When and how did you discover Electronics? What prompted you to apply for this award, and what has your experience been like with Electronics?
In 2019, at the beginning of my Ph.D., I read a review paper about binarized neural networks published in Electronics, which was one of my enlightenment papers. This paper described the status and applications of binarization well at that time, especially in hardware, so I was very impressed. In 2023, when I completed my Ph.D. defense, my advisor sent me an email about the Best Ph.D. Thesis Award. Electronics is a well-known academic journal and what I am researching matches well. I knew this was a great opportunity and I should not waste it. I am grateful to Electronics for providing this opportunity, which is significant to our young researchers.
6. Finally, how do you plan to continue building on your research in the future, and what are your long-term career aspirations?
My Ph.D. research made me realize that efficient deep learning is still very important to the AI community. Especially in this era, we always pursue powerful but computation-intensive generative models (such as large language models and diffusion models). The use of these computationally expensive models makes it difficult not only for individuals but also for universities and other institutions to research, produce, or even use this technology. Therefore, the demand for efficient deep learning has become more urgent and has a wide range of application scenarios. In my future research, I will work on creating resource-saving deep learning models in the era of large models and continuously expand the theoretical and application boundaries of model compression.