1. Introduction
The fifth generation mobile communication technology, 5G, has higher speed, lower delay and denser connections than 4G. With the gradual development of commercial deployment of 5G networks, 5G small cells built to supplement the radiation blind area of the macrocell [
1,
2,
3] are becoming more popular. In the macrostation, it is a common practice to use an application-specific integrated circuit (ASIC) as the solution for the base station hardware module. Different from the 5G massive MIMO supported by outdoor macrocells, small indoor cells generally have a small number of antennas in order to meet low power consumption and low radiation requirements. Therefore, small cells are still suitable for small-dimensional MIMO. In recent years, there have been some studies on channel estimation of 5G small-dimensional MIMO using FPGA [
4]. Because of the high power consumption brought by the FPGA solution, the scheme of using ASICs in small cells is subsequently proposed. In the new 5G technology environment, receivers need to cope with changing channel configurations and functional requirements. Therefore, relatively flexible channel estimators and equalizers designed based on application-specific instruction set processors (ASIPs) are replacing traditional ASICs. In the 4G era, there were studies on channel estimation using ASIPs [
5]. ASIPs implement domain-specific work by setting a dedicated instruction set. Compared with FPGA and general-purpose processors, ASIPs further reduce the construction cost and power consumption of small cells, and also adapt to the iteration of 5G versions in software maintenance. This is due to the flexibility of ASIPs in specific areas and the silicon efficiency brought about by the reuse of ASIP hardware [
6].
In recent years, there have been many studies on channel estimation. For example, reference [
7] implements a multimode decision-oriented channel estimation ASIC for 2 × 2 MIMO-OFDM, which completes the work of using ASICs to realize small-dimensional MIMO in the early years. A software transceiver for a 5G new radio physical uplink shared channel (PUSCH) is proposed in reference [
8]. It completes the software implementation of 5G PUSCH of 2 × 2 MIMO, but lacks the discussion of hardware implementation. A 5G/6G channel estimation vector digital signal processor based on software configuration for dynamic switching between delay and throughput efficient algorithms is proposed in reference [
9]. In order to solve the problem that it is inconvenient to change the hardware algorithm and configuration on the vector processor, reference [
9] chooses the software implementation to realize the dynamic switch between the algorithm and configuration. In recent years, there have been many studies on 5G MIMO based on machine learning. For example, references [
10,
11,
12] use the theory of machine learning to study channel estimation in MIMO systems. The traditional algorithm can be optimized by machine learning, and the result is often better than the original algorithm. However, the convergence time of machine learning is long, which means that it is difficult to meet the requirements of real time and low power consumption of 5G small cells.
To improve small cell MIMO capability, this study used the method based on ASIPs to implement the channel estimation module of 4 × 4 MIMO by developing a dedicated instruction set on the vector processor (VP). This study designed a set of dedicated processor instructions, which could not only run the channel estimation function on the VP, but also use some of the instructions for channel equalization, signal detection and other necessary functional modules in 5G uplink. Using different instruction combinations supported by our dedicated instruction set, different algorithms and functions can be realized with a relatively small number of hardware units. Using the design method of ASIPs, we can obtain higher flexibility than ASICs [
13], leaving a design margin for the development of subsequent base station algorithms and extending the life cycle of the product. At the same time, compared with the CPU based on the general instruction set, the ASIP has lower power consumption and a smaller hardware area [
14], which can reduce the construction cost of small cells.
In fact, the utilization of assembly instructions also follows the Pareto rule. Twenty precent of commonly used instructions account for about 80% of the running time of small cell processors, so 20% of instructions must be accelerated [
15]. Therefore, the main innovation of this paper is to accelerate the instruction set used by ASIPs. Compared with the general vector processor instruction set, it can complete the channel estimation with fewer instructions. The main tasks involved in this study are: at the transmitting end, the demodulation reference signal (DMRS) is mapped to the OFDM resource block according to the 3GPP standard; at the receiving end, the channel estimation algorithm of the pilot point is completed by designing a simple LS estimation algorithm; a dedicated instruction set is designed to use the VP for channel estimation of pilot points and interpolation estimation of non-pilot points; the instructions in the subject loop of the channel estimation module are merged and accelerated, which speeds up the calculation of the processor; finally, the instruction overhead estimates of linear, second-order and Wiener filter interpolation in the time–frequency domain are summarized.
The rest of the paper is organized as follows.
Section 2 introduces the basic model of 5G small cells, including the process of signal sending and receiving and the standard of pilot design.
Section 3 introduces the main algorithms used and the methodology of designing a dedicated instruction set and acceleration for the algorithm.
Section 4 shows the pseudo code of the algorithm, the assembly code of the instruction and the optimization effect after acceleration.
Section 5 contains the conclusion and the significance of the research.
5. Conclusions
In this study, by designing a set of dedicated instructions on the vector processor, the uplink channel estimation module in 5G small cells is realized. By designing a simple LS estimation algorithm at the pilot point, it can be applied to the pilot structure proposed by 3GPP-38211, and it can complete the channel estimation function at the pilot point with fewer instructions. In this study, the channel estimation algorithm is realized with instruction-level acceleration, including reference signal estimation, Wiener, first-order and second-order frequency domain interpolation and time domain interpolation. After using the look-up table method to store the parameters needed for Wiener filter interpolation in several scenarios, the number of instructions and hardware complexity needed to implement Wiener filtering are reduced. Better results can also be obtained by using it. However, its BER performance is lower than that of the machine learning algorithm. In view of the long convergence time of machine learning, the choice in this study is equivalent to the appropriate tradeoff between low hardware delay and high algorithm performance. Actually, we wanted to find relevant reference works on linear, 2nd-order and Wiener interpolation acceleration on the instruction level. We went through papers but did not find useful references, so we can only learn from commercial instruction sets. We thus hope that our work can be the starter to trigger studies in the field. Instruction-level acceleration is implemented on our vector processors and is also applicable to other commercial and academic vector processors. Acceleration is suitable for both subcarrier parallel and data parallel implementations. Through instruction acceleration, compared with the existing general vector processing instruction set, the processor performance of the LS estimation module is improved by 50%, and the processor performance of the frequency domain Wiener filter interpolation is improved by 37.5%. At the same time, this paper gives the instruction costs of the combination of six interpolation methods. This work can provide a reference for other researchers to study the relationship between the complexity of the algorithm and the number of instructions symbolizing the use of hardware costs. The BER VS SNR measure of time–frequency Wiener filter interpolation achieves 4db compared with linear interpolation, which means that our instruction-level acceleration can be an optimal solution. This paper lacks a discussion on the instruction set for hardware architecture implementation, which will become the research direction in the future.