You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

14 April 2022

In-Memory Computing Architecture for a Convolutional Neural Network Based on Spin Orbit Torque MRAM

,
,
,
and
1
Department of Electrical Engineering, National Taiwan University, Taipei 106, Taiwan
2
Department of Communications Engineering, Feng Chia University, Taichung 407, Taiwan
3
Quantum Information Center, Chung Yuan Christian University, Taoyuan 320, Taiwan
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Advances of Future IoE Wireless Network Technology

Abstract

Recently, numerous studies have investigated computing in-memory (CIM) architectures for neural networks to overcome memory bottlenecks. Because of its low delay, high energy efficiency, and low volatility, spin-orbit torque magnetic random access memory (SOT-MRAM) has received substantial attention. However, previous studies used calculation circuits to support complex calculations, leading to substantial energy consumption. Therefore, our research proposes a new CIM architecture with small peripheral circuits; this architecture achieved higher performance relative to other CIM architectures when processing convolution neural networks (CNNs). We included a distributed arithmetic (DA) algorithm to improve the efficiency of the CIM calculation method by reducing the excessive read/write times and execution steps of CIM-based CNN calculation circuits. Furthermore, our method also uses SOT-MRAM to increase the calculation speed and reduce power consumption. Compared with CIM-based CNN arithmetic circuits in previous studies, our method can achieve shorter clock periods and reduce read times by up to 43.3 % without the need for additional circuits.

1. Introduction

Machine learning (ML) models, such as convolutional neural networks (CNNs) and deep neural networks (DNNs), are widely used in real-world applications. However, neural network structures have also increased in size, causing a bottleneck in the Von Neumann accelerator architecture. More specifically, the CPU must retrieve data from memory before processing it and then transfer it back to memory at the end of the computation in a Von Neumann architecture. This leads to additional energy consumption during data transfer, which reduces the energy efficiency of computing devices []. Furthermore, limited memory bandwidth, high memory access latency, and long memory access paths limit inference speeds and cause substantial power consumption regardless of the performance of the logic circuit. However, in-memory computing can effectively overcome the bottlenecks in the Von Neumann architecture. The CIM architecture can achieve low memory access latency, parallel operation, and ultra-low power consumption, and close access to the arithmetic logic unit of the CIM architecture can overcome the bottlenecks of the Von Neumann architecture [].
The error rate of visual recognition by CNNs declined from 28% in 2010 to 3% in 2016 to become better than the 5% error rate in manual (i.e., human) visual recognition []. CNNs have been integrated into embedded systems to solve image classification and pattern recognition problems. However, large CNNs may have millions of parameters and require up to tens of billions of operations to process an image frame []. Therefore, accelerating the convolution operation yields the greatest improvement in performance. Iterative processing of the CNN layers is a common design feature of CNN accelerators. However, the intermediate data are too large to fit on the chip’s cache, and accelerator designs must, thus, use off-chip memory to store intermediate data between layers after processing. Due to the computational requirements of Internet of Things (IoT) and artificial intelligence (AI) applications, the cost of moving data between the central processing unit (CPU) and memory is a key limiter of performance.
CPU and GPU performance growth is approximately 60% per year; however, the performance increases in memory reach only up to 7% each year []. The data transfer rate in memory is insufficiently fast for the computational speed of the CPU; thus, the CPU is typically “data hungry”. Although deep learning processor performance has grown exponentially, most power consumption occurs during the reading and writing of data. Thus, the efficiency of the accelerator has little effect on performance.
In the field of hardware design, the computing units, such as GPUs, CPUs, and isolated memory modules, are interconnected with buses; this design has entailed multiple challenges, such as long memory access latency, limited memory bandwidth, substantial energy requirements for data communication, congestion during input and output (I/O), substantial leakage power consumption when storing network parameters in volatile memory. Additionally, because the memory used for AI accelerators is volatile memory, data are lost if power is lost. Therefore, overcoming these challenges is imperative for AI and CNN applications.
To design a hardware CNN accelerator with improved performance and reduced energy consumption, CIM CNN accelerators [,,] constitute a viable method to overcome the “CNN power and memory wall” problem; these accelerators have been researched extensively. The key concept of CIM is the embedding of logic units within memory to process data by leveraging inherent parallel computing mechanisms and exploiting higher internal memory bandwidth. CIM could lead to remarkable reductions in off-chip data communication latency and energy consumption. In the field of algorithm design, several methods have been proposed to break the memory wall and power wall; these include compressing pretrained networks, quantizing parameters, using binarization, and pruning. Additionally, Intel’s Movidius Neural Compute Stick is a hardware NN accelerator for increased computing performance. In contrast, our approach is based on the MRAM CIM architecture, but the Movidius Neural Compute Stick is an onboard computing architecture. Compared to onboard computing architectures, MRAM CIM-based architectures significantly reduce the costs associated with data exchange between storage and memory. Our architecture has several key advantages, including non-volatility (no data loss in the absence of power), lower power consumption, and higher density. With the increasing demand for on-chip memory on AI chips, MRAM is emerging as an attractive alternative.
This paper follows the same assumptions in the existing works [,] and primarily focuses on methods of reducing hardware power consumption in edge computing without software algorithms. We designed a CIM CNN accelerator that is compatible with all the aforementioned algorithms without modifying the hardware architecture. Notably, we do not tackle the influence of slower peripherals to CNN.
Our contributions can be summarized as the follows:
  • Integrate a DA architecture with the CIM to achieve faster speeds, fewer reads and writes, and lower power consumption.
  • Optimize CNN operations and complete calculations in fewer steps.
  • Integrate the DA architecture with the CIM and magnetic random access memory (MRAM) techniques to replace the original circuit architecture without off-chip memory. All calculations are performed on the cell array; thus, low latency can be achieved.
  • Parallelize the CIM process using calculations in a sense amplifier to reduce power consumption and accelerate calculations.
The rest of this paper is organized as follows. Section 2 describes background and related work. We then describe the details of proposed architecture in Section 3. We provide the experimental process and results in Section 4. Finally, the conclusion is presented in Section 5.

3. Proposed Architecture

Distributed Arithmetic (DA) was first introduced by Croisier et al. in 1989 []. It is an effective method of operations based on memory access and is effectively a bit-serial operation. The execution time depends on the clock speed, read/write speed of the memory, and the length of the operation bit. Figure 4 presents a DA circuit.
Figure 4. DA circuit.
Let us consider the convolution of two N-point vectors x i and a fixed coefficient vector h, which is expressed as follows:
y = i = 0 N 1 x i h i ,
where h = [ h 0 , h 1 , h 2 , h ( n 1 ) ] and the input vector os x = [ x 0 , x 1 , x 2 , x ( n 1 ) ] . Let us assume that x i is expressed in B-bit two’s complement representation as follows.
x i = x i + j = 1 B 1 x i j 2 j
By substituting (3) in (4), the output y can be expressed in an expanded form as follows.
y = i = 0 N 1 x i 0 h i + j = 1 B 1 [ i = 0 N 1 x i j h i ] 2 j
Because h i is constant, there exist 2 n possibilities for i = 0 N 1 x i j h i for j = 1 , 2 , , ( B 1 ) . However, these values can be calculated and stored in memory ahead of time. Thus, we can obtain a partial sum by the bit sequence as the address of the read memory. Therefore, the inner product can be calculated through an accumulation loop of B shifter-adders and by reading the value of the corresponding bit sequence. In our method, DA and the CIM structure are combined to overcome the challenges of the aforementioned model [].

3.1. Integral Architecture

Figure 5 presents a memory circuit comprising eight banks. Each bank comprises 16 mats, and each mat has four cell arrays of size 16 × 1024 for a total of 1 megabyte. The control circuit can only control eight banks simultaneously. Each operation can run 16 parallel mats operations, and each mat has 4 cell arrays. In addition, each cell array can perform four 3 × 3 convolutions simultaneously; thus, the memory architecture can execute 64 convolutions in parallel.
Figure 5. Circuit architecture of memory.

3.2. Achievements Made by the New Architecture

Figure 6 presents our proposed CIM circuit architecture integrating a DA circuit architecture in memory without any digital circuits, such as a full adder or shifter circuit, to implement a DA calculation algorithm. In addition, the CIM architecture requires no additional weighting data; correspondingly, placing the results data and the buffer register on the same cell array can reduce both data access time and power consumption. The execution speed depends on the clock frequency, read speed of memory, and length of the calculation unit. Therefore, the novel CIM architecture performs faster than the traditional DA architecture and has lower power consumption because of the operations performed in memory.
Figure 6. Our CIM circuit architecture.
Due to the advantages of the DA algorithm, the precalculated partial sum can be stored in the memory, and the shifter adder can then be used to accumulate the sum of each part. Therefore, our approach uses only a shifter adder and does not require a multiplier with a long critical path or a large area. Our new CIM architecture avoids the lengthy execution steps and additional circuits required by previous methods.
The main components of the DA architecture circuit are the read-only memory (ROM), reg buffer, full adder, and shifter. The following section describes the structure, operation, and implementation of these components to achieve the DA architecture in memory.

3.2.1. Build ROM and Register (Reg) Buffer in the Memory

MRAM is nonvolatile memory, and its read speed is similar to that of DRAM; thus, MRAM is suitable as the storage unit for a DA architecture. To increase the efficiency of in-memory calculation execution and to achieve lower latency and read/write power consumption, the weighted memory and buffer register stored in the CIM are placed on the same cell array shown, as presented in Figure 7. In addition, these defined memory sizes can be changed because the entire memory space, not only one specific part of the memory, can complete CIM operations.
Figure 7. Storage unit configuration.

3.2.2. Shifter

Because the shifter is unavailable in traditional memory, our method adds N-Metal-Oxide Semiconductor (NMOS) and P-Metal-Oxide Semiconductor (PMOS) to the SA circuit architecture, as presented in Figure 8. This change enables the output of the SA to be written into different columns based on shifter control without reading data out of the cell array or rewriting; these processes would otherwise extend the read/write time.
Figure 8. Shift circuit.

3.2.3. Shifter Full-Adder

This unit is used to complete a full adder operation. First, it calculates MAJ(A,B,Cin) to obtain Cout and obtain a sum in parallel in the following step. Then, ( A C i n ) B is performed to obtain sum-reg. Finally, the left shift to the sum is executed for the next shift-adder operation, as presented in Figure 9.
Figure 9. Steps of the shift adder.

4. Experimental Process and Result

4.1. Experimental Process

The simulation was divided into three stages, as presented in Figure 10. First, the Hspice tool was used to obtain a circuit level result. Second, the result was sent to the Nvsim model for simulation by using the memory architecture-level data obtained in stage 1. Subsequently, read/write power consumption and the memory delay were sent to GEM5 to execute system-level calculations. After these three stages of simulation, the written LeNet-5 algorithm could be executed in GEM5 to obtain the read/write power consumption of the entire algorithm. In our simulation, the parameters are set as follows. We considered the setup of our SPICE and process file, referring to references [,], respectively, with read voltage = 6 mV, current = 1 uA, and register = 6 k Ω . For the setup of NVSIM, we choose 1 MB memory with 8 banks, in which each bank has 16 MATs; each MAT has 4 cell arrays; and each cell array is 16 × 1024 bits in size. Additionally, we considered accelerators using 16-bit gradients; we select MNIST as our benchmark dataset, along with the LeNet-5 NN architecture. For the CNN layers of each 32 × 32 image, we developed a bitwise CNN with six convolutional layers, two average pooling layers and two FC layers, which are equivalently implemented by convolutional layers. After collecting this information, we could conduct comparisons using the experimental data. The detailed process of each stage is described in the following paragraph.
Figure 10. Experimental architecture.

4.1.1. SOT-MRAM Simulation

The SOT-MRAM model is built at the circuit level. The MRAM model used in our research is the same as that presented in []. That study’s authors provided the SOT-MRAM Verilog-A model file that is used to facilitate simulations and verifications of the real-world performance of MRAM memory. Figure 11 presents the simulation results of MRAM Verilog-A in Cadence Virtuoso.
Figure 11. Simulated waveform of an MRAM cell.

4.1.2. Processand Sensing Amplifier (SA)

We used NCSU FreePDK 45 nm [] to simulate the SA circuit architecture and the digital circuit synthesis in Hspice. In addition, we chose StrongARM Latch [], which consumes zero static power and has low latency. Thus, it is suitable for edge computing in the CIM architecture presented as Figure 12.
Figure 12. StrongARM Latch.

4.1.3. Nvsim

Nvsim [] is a circuit-level model used to estimate the performance, energy, and area of new nonvolatile memory (NVM). Nvsim supports various NVM technologies, including STT-MRAM, PCRAM, ReRAM, and traditional NAND flash; thus, it was used and was modified to match the architectures we chose for simulation.

4.1.4. Gem5

Gem5 [] is a modular discrete event-driven simulator for a full-system that combines the advantages of M5 and GEMS. M5 is a highly configurable simulation framework that supports a variety of ISAs and CPU models. In addition, GEMS complements the features of M5 by providing a detailed and flexible memory system, including multiple cache consistency protocols and interconnection models. It is a highly configurable architecture simulator that integrates multiple ISAs and multiple CPU models. In our experiment, we used a single-core Arm A9 CPU clocked at 2 Ghz as the CIM CPU for simulation analysis. Figure 13 presents the entire simulation process in Gem5. First, C code was compiled into a binary file, and Gem5 was then used to simulate the binary file and obtain a states.txt file containing data on the simulated CPU cycles and the read/write times of memory. Then, CPU power consumption could be obtained with the Mcpat tool.
Figure 13. Simulation of an A9 processor running with LeNet-5.

4.2. Experimental Result

4.2.1. IMCE vs. Our Method

We analyzed the number of access times in reading, writing, and overall reading/writing separately; the results were then compared with those in previous IMCE studies. Figure 14 presents the convolution algorithm used for this comparison. The input image data and weight were both 8-bit. Figure 15 present three comparisons of reading and writing times. For reading times only, as presented in Figure 15a, we observed that our method is 49.9% faster that IMCE. As presented in Figure 15b, our method was 22.7% faster than IMCE in writing times. Finally, with regard to overall reading/writing times, as indicated in Figure 15c, our method was 43.3% faster than IMCE overall. This improvement was due to the use of the DA algorithm to substantially reduce the number of reads and, thus, reduce the power consumption during CIM by replacing multiplication with a lookup table.
Figure 14. Convolution process.
Figure 15. (a) Reading times, (b) writing times, and (c) comparison.

4.2.2. Traditional Non-CIM Architecture and CIM Architecture

To determine whether the CIM architecture can effectively reduce computing power consumption, we used a non-CIM architecture and the CIM architecture to run the same NN as presented in Figure 16; this algorithm was also used in our experimental analysis. The main focus of our research is the ConV operation; therefore, the FC layer in the NN handled the CPU operation, and the CIM circuit architecture handled the ConV operation. Table 5 presents a comparison of the power consumption of the two architectures. Figure 17a shows the comparison of CPU power consumption. The difference is primarily because the convolution operation consumes the most power. In the CIM architecture, the convolution operation is moved to memory. Thus, the CPU does not perform the convolution operation, greatly reducing power consumption. Figure 17b illustrates the memory power consumption for comparison. CIM architecture can minimize the data transmission path and more greatly reduce the total memory power consumption. Moreover, because convolution has been moved to memory and also further reduced power consumption, the total read/write power consumption of the memory was lower. Figure 17c presents a comparison of the total power consumption, revealing that the CIM circuit architecture has lower overall power consumption.
Figure 16. LeNet-5 NN architecture.
Table 5. Comparison of non-CIM architecture and CIM architecture.
Figure 17. (a) CPU, (b) memory, and (c) overall.

4.3. Discussion

Our CIM architecture can be used for CPUs, GPUs, FPGAs, and ASIC in different design manners. In-memory computing has two advantages: making computing faster and scaling it to potentially support petabytes of in-memory data. In-memory computing utilizes two key technologies: random-access memory storage and parallelization. When the CPU/GPU processes data from the main memory, frequently used data are stored in fast, energy-efficient caches to enhance performance and energy efficiency. However, in applications that process large amounts of data, most data are read from the main memory because the data to be processed are very large compared to the size of the cache. In this case, the bandwidth of the memory channel between the CPU/GPU and the main memory becomes a performance bottleneck, and a lot of energy is consumed to transfer data between the CPU/GPU and the main memory. To alleviate this bottleneck, the channel bandwidth between the CPU/GPU and main memory needs to be extended but if the current CPU/GPU’s number of pins has reached its limit, further bandwidth improvement faces technical difficulties. In a modern computer structure where data storage and data calculation are separated, such a memory wall problem will be inevitably raised. Our CIM architecture is used to overcome the aforementioned bottleneck by performing operations in memory without moving data to the CPU/GPU. Additionally, our CIM architecture can also be implemented in FPGA or as an ASIC design under the assumption that the MRAM can be well taped out.

5. Conclusions

We proposed a new SOT-MRAM-based CIM architecture for a CNN model that can reduce both power consumption and read/write in comparison with conventional CNN CIM architectures. In addition, our method does not require additional digital circuits, enabling the MRAM cell to retain the advantages of memory for data storage. By conducting a series of experiments, compared with the ICME method [], our proposed method reduces read times by 49.9 % , write times by 22.7 % , and overall read/write times by 43.3 % . Additionally, we evaluated that a CIM model running on an Arm A9 CPU can significantly reduce power consumption. In this paper, we did not tackle the changing magnetic field of MRAM. We used highly configurable architecture simulators SPICE, NVSIM, and GEM5 models to evaluate our proposed SOT CIM-based architecture. Quantifying the changing/switching magnetic field of the MRAM is an open issue. In the future, we will collaborate with industries to realize it.

Author Contributions

Conceptualization, J.-Y.H. and Y.-T.T.; methodology, J.-Y.H. and Y.-T.T.; validation, J.-Y.H., J.-L.S., Y.-T.T., S.-Y.K. and C.-R.C.; formal analysis, J.-Y.H. and Y.-T.T.; investigation, J.-Y.H., J.-L.S., Y.-T.T. and C.-R.C.; resources, Y.-T.T., S.-Y.K. and C.-R.C.; data curation, J.-Y.H. and Y.-T.T.; writing—original draft preparation, J.-Y.H. and Y.-T.T.; writing—review and editing, Y.-T.T., S.-Y.K. and C.-R.C.; supervision, Y.-T.T., S.-Y.K. and C.-R.C.; project administration, Y.-T.T. and C.-R.C.; funding acquisition, Y.-T.T. and C.-R.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Ministry of Science and Technology, Taiwan, under grant MOST 109-2923-E-035-001-MY3, MOST 110-2112-M-033-013, and MOST 110-2221-E-035-034-MY3.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ou, Q.-F.; Xiong, B.-S.; Yu, L.; Wen, J.; Wang, L.; Tong, Y. In-Memory Logic Operations and Neuromorphic Computing in Non-Volatile Random Access Memory. Materials 2020, 13, 3532. [Google Scholar] [CrossRef] [PubMed]
  2. Zou, X.; Xu, S.; Chen, X.; Yan, L.; Han, Y. Breaking the von Neumann Bottleneck: Architecture-Level Processing-in-Memory Technology. Sci. China Inf. Sci. 2021, 64, 1–10. [Google Scholar] [CrossRef]
  3. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2016; pp. 770–778. [Google Scholar]
  4. Deng, Q.; Jiang, L.; Zhang, Y.; Zhang, M.; Yang, J. DRACC: A Dram based Accelerator for Accurate CNN Inference. In Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA, 24–29 June 2018; pp. 1–6. [Google Scholar]
  5. Angizi, S.; He, Z.; Parveen, F.; Fan, D. IMCE: Energy-efficient Bitwise In-emory Convolution Engine for Deep Neural Network. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference, Jeju Island, Korea, 22 January 2018; pp. 111–116. [Google Scholar]
  6. Chi, P.; Li, S.; Xu, C.; Zhang, T.; Zhao, J.; Liu, Y.; Wang, Y.; Xie, Y. Prime: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-based Main Memory. ACM SIGARCH Comput. Archit. News 2016, 44, 27–39. [Google Scholar] [CrossRef]
  7. Li, S.; Niu, D.; Malladi, K.T.; Zheng, H.; Brennan, B.; Xie, Y. DRISA: A Dram-based Reconfigurable In-situ Accelerator. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, Boston, MA, USA, 14–17 October 2017; pp. 288–301. [Google Scholar]
  8. Kim, K.; Shin, H.; Sim, J.; Kang, M.; Kim, L.-S. An Energy-Efficient Processing-in-Memory Architecture for Long Short Term Memory in Spin Orbit Torque MRAM. In Proceedings of the International Conference on Computer-Aided Design, Westminster, CO, USA, 4–7 November 2019; pp. 1–8. [Google Scholar]
  9. Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a Convolutional Neural Network. In Proceedings of the International Conference on Engineering and Technology, Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar]
  10. Zhang, Y.; Wang, G.; Zheng, Z.; Sirakoulis, G. Time-Domain Computing in Memory Using Spintronics for Energy-Efficient Convolutional Neural Network. IEEE Trans. Circuits Syst. 2021, 68, 1193–1205. [Google Scholar] [CrossRef]
  11. Xu, T.; Leppãnen, V. Analysing Emerging Memory Technologies for Big Data and Signal Processing Applications. In Proceedings of the Fifth International Conference on Digital Information Processing and Communications, Sierre, Switzerland, 7–9 October 2015; pp. 104–109. [Google Scholar]
  12. Kazemi, M.; Rowlands, G.E.; Ipek, E.; Buhrman, R.A.; Friedman, E.G. Compact Model for Spin–Orbit Magnetic Tunnel Junctions. IEEE Trans. Electron Devices 2016, 63, 848–855. [Google Scholar] [CrossRef]
  13. White, S.A. Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review. IEEE Assp Mag. 1989, 6, 4–19. [Google Scholar] [CrossRef]
  14. Chen, J.; Zhao, W.; Ha, Y. Area-Efficient Distributed Arithmetic Optimization via Heuristic Decomposition and In-Memory Computing. In Proceedings of the 13th International Conference on ASIC, Chongqing, China, 29 October–1 November 2019; pp. 1–4. [Google Scholar]
  15. Kim, J.; Chen, A.; Behin-Aein, B.; Kumar, S.; Wang, J.P.; Kim, C.H. A Technology-Agnostic MTJ SPICE Model with User-Defined Dimensions for STT-MRAM Scalability Studies. In Proceedings of the 2015 IEEE Custom Integrated Circuits Conference (CICC), San Jose, CA, USA, 28–30 September 2015; pp. 1–17. [Google Scholar]
  16. Ncsu Eda freepdk45. FreePDK45:Contents. 2011. Available online: http://www.eda.ncsu.edu/wiki/ (accessed on 21 December 2020).
  17. Alwani, M.; Chen, H.; Ferdman, M.; Milder, P. Fused-layer cnn accelerators. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, Taiwan, 15–19 October 2016; pp. 1–12. [Google Scholar]
  18. Razavi, B. The StrongARM Latch [A Circuit for All Seasons]. IEEE Solidstate Circuits Mag. 2015, 7, 12–17. [Google Scholar] [CrossRef]
  19. Dong, X.; Xu, C.; Xie, Y.; Jouppi, N.P. Nvsim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2012, 31, 994–1007. [Google Scholar] [CrossRef]
  20. Binkert, N.; Beckmann, B.; Black, G.; Reinhardt, S.K.; Saidi, A.; Basu, A.; Hestness, J.; Hower, D.R.; Krishna, T.; Sardashti, S.; et al. The Gem5 Simulator. Acm Sigarch Comput. Archit. News 2011, 39, 1–7. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.