Hardware Acceleration for RLNC: A Case Study Based on the Xtensa Processor with the Tensilica Instruction-Set Extension
Abstract
:1. Introduction
2. Background and Related Work
2.1. Mathematics of RLNC
2.2. Computation of RLNC
2.3. Xtensa LX5 Processor and TIE Design Flow
3. Hardware Acceleration for RLNC through TIEs
3.1. Accelerating Matrix Multiplication with Multiply-Accumulate (MAC) Hardware
3.2. SIMD
3.3. Matrix Inversion
3.4. FLIX
4. Evaluation
4.1. Evaluation Setup
4.1.1. Evaluation Metrics
- Die size (Number of gates): The die size corresponds to the number of NAND-2 gates (specifically, 2-input NAND gates for 8-bit operands) that are required to map the register-transfer level (RTL) circuit to NAND-2 gates [100] in order to achieve the prescribed RLNC encoding and decoding functionality. The gate count of the TIE is measured as the number of NAND-2 equivalents as every logical equation can be expressed with only NAND gates [101]. We obtained the die size (in number of equivalent NAND-2 gates) directly from the Instruction Set Simulator (ISS) in the Xtensa Development Environment, Version 6.0.2 with the default profiler settings.
- Clock cycles: The number of clock cycles represents the number of computing cycles required to execute RLNC encoding or decoding of a generation of g symbols, each of size m bytes, with the LX5 processor. One clock cycle represents the time required between two successive pulses of the internal oscillator of the processor [102] which, in the current study, operated at a clock frequency of 300 MHz. Similar to the die size, we obtained the clock cycles with the ISS in the Xtensa Development Environment, Version 6.0.2, with the default profiler settings.
- Speedup: We defined the processor speedup as the ratio of the number of computing cycles required by the reference implementation to complete a prescribed task to the number of computing cycles required by the optimized (accelerated) implementation to complete the same prescribed task [103]. We considered the plain-C code executed on the LX5 processor, i.e., the basic LX5 processor without any application specific Tensilica Instruction-set Extension (TIE), to be the reference implementation.
- Throughput: We evaluated the GF and GF RLNC encoding and decoding throughputs as the size () of the generation in bytes divided by the required computing time, obtained as the number of computing cycles times the inverse of the clock frequency. We report the encoding and decoding throughputs in units of MiB/s, whereby 1 MiB = bytes.
- Energy consumption: We obtained the consumed power levels for each TIE acceleration strategy as follows. We conducted RTL synthesis of the LX5 with TIE processor with the Synopsys Design Compiler to obtain a so-called netlist. Then, we employed Mentor Questasim to conduct a netlist simulation of the processor running the algorithm to obtain the toggle activity data for every flip flop and wire. Finally, we employed Synopsys PrimeTime to conduct a power consumption analysis for the netlist under the obtained toggle activity. The obtained power levels represent the power consumed by all elements of an equivalent digital circuit that executes the same functionality as our RLNC encoding and decoding algorithm. Specifically, for the evaluation of the consumed energy, we considered the GF RLNC encoding of a generation followed immediately by the corresponding RLNC decoding to evaluate a complete encoding and decoding sequence. We obtained the average consumed power in Watts = Joules/s for the sequence of RLNC encoding and decoding. We evaluate the energy consumption as the ratio of consumed power (in Joules/s) to the throughput (in MiB/s), i.e., the energy consumption is in units of Joules/MiB of GF RLNC encoding and decoding.
4.1.2. Fixed Instruction-Set Processor Benchmark
4.2. Die Size and Gate Count
4.3. Clock Cycles and Speedup
4.4. RLNC Encoding and Decoding Throughput
4.5. Energy Consumption
5. Conclusions
Author Contributions
Acknowledgments
Conflicts of Interest
References
- Bassoli, R.; Marques, H.; Rodriguez, J.; Shum, K.W.; Tafazolli, R. Network coding theory: A survey. IEEE Commun. Surv. Tutor. 2013, 15, 1950–1978. [Google Scholar] [CrossRef]
- Farooqi, M.Z.; Tabassum, S.M.; Rehmani, M.H.; Saleem, Y. A survey on network coding: From traditional wireless networks to emerging cognitive radio networks. J. Netw. Comput. Appl. 2014, 46, 166–181. [Google Scholar] [CrossRef]
- Ho, T.; Médard, M.; Koetter, R.; Karger, D.R.; Effros, M.; Shi, J.; Leong, B. A random linear network coding approach to multicast. IEEE Trans. Inf. Theory 2006, 52, 4413–4430. [Google Scholar] [CrossRef]
- Amanowicz, M.; Krygier, J. On applicability of network coding technique for 6LoWPAN-based sensor networks. Sensors 2018, 18, 1–20. [Google Scholar] [CrossRef] [PubMed]
- Chau, P.; Bui, T.D.; Lee, Y.; Shin, J. Efficient data uploading based on network coding in LTE-Advanced heterogeneous networks. In Proceedings of the 2017 19th International Conference on Advanced Communication Technology (ICACT), Bongpyeong, Korea, 19–22 February 2017; pp. 252–257. [Google Scholar]
- Ji, X.; Wang, A.; Li, C.; Ma, C.; Peng, Y.; Wang, D.; Hua, Q.; Chen, F.; Fang, D. ANCR–An adaptive network coding routing scheme for WSNs with different-success-rate links. Appl. Sci. 2017, 7, 809. [Google Scholar] [CrossRef]
- Kartsakli, E.; Antonopoulos, A.; Alonso, L.; Verikoukis, C. A cloud-assisted random linear network coding medium access control protocol for healthcare applications. Sensors 2014, 14, 4806–4830. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kang, H.; Yoo, H.; Kim, D.; Chung, Y.S. CANCORE: Context-Aware Network COded REpetition for VANETs. IEEE Access 2017, 5, 3504–3512. [Google Scholar] [CrossRef]
- Li, X.; Chang, Q.; Xu, Y. Queueing characteristics of the best effort network coding strategy. IEEE Access 2016, 4, 5990–5997. [Google Scholar] [CrossRef]
- Liu, J.S.; Lin, C.H.R.; Tsai, J. Delay and energy tradeoff in energy harvesting multi-hop wireless networks with inter-session network coding and successive interference cancellation. IEEE Access 2017, 5, 544–564. [Google Scholar] [CrossRef]
- Khamfroush, H.; Lucani, D.E.; Pahlevani, P.; Barros, J. On optimal policies for network-coded cooperation: theory and implementation. IEEE J. Sel. Areas Commun. 2015, 33, 199–212. [Google Scholar] [CrossRef]
- Nessa, A.; Kadoch, M.; Rong, B. Fountain coded cooperative communications for LTE-A connected heterogeneous M2M network. IEEE Access 2016, 4, 5280–5292. [Google Scholar] [CrossRef]
- Pahlevani, P.; Hundebøll, M.; Pedersen, M.V.; Lucani, D.E.; Charaf, H.; Fitzek, F.H.; Bagheri, H.; Katz, M. Novel concepts for device-to-device communication using network coding. IEEE Commun. Mag. 2014, 52, 32–39. [Google Scholar] [CrossRef]
- Peng, Y.; Wang, X.; Guo, L.; Wang, Y.; Deng, Q. An efficient network coding-based fault-tolerant mechanism in WBAN for smart healthcare monitoring systems. Appl. Sci. 2017, 7, 817. [Google Scholar] [CrossRef]
- Vukobratovic, D.; Tassi, A.; Delic, S.; Khirallah, C. Random linear network coding for 5G mobile video delivery. Information 2018, 9, 72. [Google Scholar] [CrossRef]
- Wang, H.; Wang, S.; Bu, R.; Zhang, E. A novel cross-layer routing protocol based on network coding for underwater sensor networks. Sensors 2017, 17, 1821. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Chen, W.; Li, O.; Hanzo, L. Joint rate and power adaptation for amplify-and-forward two-way relaying relying on analog network coding. IEEE Access 2016, 4, 2465–2478. [Google Scholar] [CrossRef]
- Yang, Y.; Chen, W.; Li, O.; Liu, Q.; Hanzo, L. Truncated-ARQ aided adaptive network coding for cooperative two-way relaying networks: Cross-layer design and analysis. IEEE Access 2016, 4, 9361–9376. [Google Scholar] [CrossRef]
- Zhang, G.; Cai, S.; Xiong, N. The application of social characteristic and L1 optimization in the error correction for network coding in wireless sensor networks. Sensors 2018, 18, 450. [Google Scholar] [CrossRef] [PubMed]
- Joshi, G.; Liu, Y.; Soljanin, E. On the delay-storage trade-off in content download from coded distributed storage systems. IEEE J. Sel. Areas Commun. 2014, 32, 989–997. [Google Scholar] [CrossRef]
- Kumar, N.; Zeadally, S.; Rodrigues, J.J.P.C. QoS-aware hierarchical web caching scheme for online video streaming applications in Internet-based vehicular ad hoc networks. IEEE Trans. Ind. Electron. 2015, 62, 7892–7900. [Google Scholar] [CrossRef]
- Sipos, M.; Gahm, J.; Venkat, N.; Oran, D. Erasure coded storage on a changing network: The untold story. In Proceedings of the 2016 IEEE Global Communications Conference (GLOBECOM), Washington, DC, USA, 4–8 December 2016; pp. 1–6. [Google Scholar]
- Wang, L.; Wu, H.; Han, Z. Wireless distributed storage in socially enabled D2D communications. IEEE Access 2016, 4, 1971–1984. [Google Scholar] [CrossRef]
- Chau, P.; Shin, J.; Jeong, J. Distributed systematic network coding for reliable content uploading in wireless multimedia sensor networks. Sensors 2018, 18, 1824. [Google Scholar] [CrossRef] [PubMed]
- Chen, B.W.; Ji, W.; Jiang, F.; Rho, S. QoE-enabled big video streaming for large-scale heterogeneous clients and networks in smart cities. IEEE Access 2016, 4, 97–107. [Google Scholar] [CrossRef]
- Gkantsidis, C.; Rodriguez, P.R. Network coding for large scale content distribution. In Proceedings of the 2005 24th Annual Joint Conference of the IEEE Computer and Communications Societies, Miami, FL, USA, 13–17 March 2005; pp. 2235–2245. [Google Scholar]
- Li, B.; Niu, D. Random network coding in peer-to-peer networks: from theory to practice. Proc. IEEE 2011, 99, 513–523. [Google Scholar]
- Maboudi, B.; Sehat, H.; Pahlevani, P.; Lucani, D.E. On the performance of the cache coding protocol. Information 2018, 9, 62. [Google Scholar] [CrossRef]
- Rekab-Eslami, M.; Esmaeili, M.; Gulliver, T.A. Multicast convolutional network codes via local encoding kernels. IEEE Access 2017, 5, 6464–6470. [Google Scholar] [CrossRef]
- Skevakis, E.; Lambadaris, I. Optimal control for network coding broadcast. In Proceedings of the 2016 IEEE Global Communications Conference (GLOBECOM), Washington, DC, USA, 4–8 December 2016; pp. 1–6. [Google Scholar]
- Haas, S.; Karnagel, T.; Arnold, O.; Laux, E.; Schlegel, B.; Fettweis, G.; Lehner, W. HW/SW-database-codesign for compressed bitmap index processing. In Proceedings of the 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), London, UK, 6–8 July 2016; pp. 50–57. [Google Scholar]
- Arnold, O.; Haas, S.; Fettweis, G.; Schlegel, B.; Kissinger, T.; Karnagel, T.; Lehner, W. HASHI: An application specific instruction set extension for hashing. In Proceedings of the International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures at International Conference on Very Large Data Bases (VLDB), Hangzhou, China, 1–5 September 2014; pp. 25–33. [Google Scholar]
- Arnold, O.; Neumaerker, F.; Fettweis, G. L2_ISA++: Instruction set architecture extensions for 4G and LTE-advanced MPSoCs. In Proceedings of the 2014 International Symposium on System-on-Chip (SoC), Tampere, Finland, 28–29 October 2014; pp. 1–8. [Google Scholar]
- Gonzalez, R.E. Xtensa: A configurable and extensible processor. IEEE Micro 2000, 20, 60–70. [Google Scholar] [CrossRef]
- Maymounkov, P.; Harvey, N.J.; Lun, D.S. Methods for efficient network coding. In Proceedings of the 44th Annual Allerton Conference on Communication, Control, and Computing, Monticello, Illinois, 27–29 September 2006; pp. 482–491. [Google Scholar]
- Aboutorab, N.; Sadeghi, P.; Sorour, S. Enabling a tradeoff between completion time and decoding delay in instantly decodable network coded systems. IEEE Trans. Commun. 2014, 62, 1296–1309. [Google Scholar] [CrossRef]
- Douik, A.; Karim, M.S.; Sadeghi, P.; Sorour, S. Delivery time reduction for order-constrained applications using binary network codes. In Proceedings of the 2016 IEEE Wireless Communications and Networking Conference, Doha, Qatar, 3–6 April 2016; pp. 1–6. [Google Scholar]
- Douik, A.; Sorour, S.; Al-Naffouri, T.; Alouini, S. Instantly decodable network coding: From centralized to device-to-device communications. IEEE Commun. Surv. Tutor. 2017, 19, 1201–1224. [Google Scholar] [CrossRef]
- Li, X.; Wang, C.C.; Lin, X. Optimal immediately-decodable inter-session network coding (IDNC) schemes for two unicast sessions with hard deadline constraints. In Proceedings of the 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 28–30 September 2011; pp. 784–791. [Google Scholar]
- Qureshi, J.; Foh, C.H.; Cai, J. Online XOR packet coding: Efficient single-hop wireless multicasting with low decoding delay. Comput. Commun. 2014, 39, 65–77. [Google Scholar] [CrossRef]
- Sorour, S.; Valaee, S. Completion delay minimization for instantly decodable network codes. IEEE/ACM Trans. Netw. 2015, 23, 1553–1567. [Google Scholar] [CrossRef]
- Yu, M.; Aboutorab, N.; Sadeghi, P. From instantly decodable to random linear network coded broadcast. IEEE Trans. Commun. 2014, 62, 3943–3955. [Google Scholar] [CrossRef]
- Barros, J.; Costa, R.A.; Munaretto, D.; Widmer, J. Effective delay control in online network coding. In Proceedings of the IEEE INFOCOM, Rio de Janeiro, Brazil, 19–25 April 2009; pp. 208–216. [Google Scholar]
- Costa, R.A.; Munaretto, D.; Widmer, J.; Barros, J. Informed network coding for minimum decoding delay. In Proceedings of the 2008 5th IEEE International Conference on Mobile Ad Hoc and Sensor Systems, Atlanta, GA, USA, 29 September–2 October 2008; pp. 80–91. [Google Scholar]
- Fu, A.; Sadeghi, P.; Médard, M. Dynamic rate adaptation for improved throughput and delay in wireless network coded broadcast. IEEE/ACM Trans. Netw. 2014, 22, 1715–1728. [Google Scholar] [CrossRef]
- Sadeghi, P.; Shams, R.; Traskov, D. An optimal adaptive network coding scheme for minimizing decoding delay in broadcast erasure channels. EURASIP J. Wirel. Commun. Netw. 2010, 2010, 1–14. [Google Scholar] [CrossRef]
- Sorour, S.; Valaee, S. An adaptive network coded retransmission scheme for single-hop wireless multicast broadcast services. IEEE/ACM Trans. Netw. 2011, 19, 869–878. [Google Scholar] [CrossRef]
- Yeow, W.L.; Hoang, A.T.; Tham, C.K. Minimizing delay for multicast-streaming in wireless networks with network coding. In Proceedings of the IEEE INFOCOM, Rio de Janeiro, Brazil, 19–25 April 2009; pp. 190–198. [Google Scholar]
- Heide, J.; Pedersen, M.V.; Fitzek, F.H.; Larsen, T. Cautious view on network coding-From theory to practice. J. Commun. Netw. 2008, 10, 403–411. [Google Scholar] [CrossRef]
- Heide, J.; Pedersen, M.V.; Fitzek, F.H.; Médard, M. On code parameters and coding vector representation for practical RLNC. In Proceedings of the 2011 IEEE International Conference on Communications (ICC), Kyoto, Japan, 5–9 June 2011; pp. 1–5. [Google Scholar]
- Shah-Mansouri, V.; Srinivasan, S. Retransmission scheme for intra-session linear network coding in wireless networks. Int. J. Adv. Intell. Paradigms 2017, 9, 326–346. [Google Scholar] [CrossRef]
- Sundararajan, J.K.; Shah, D.; Medard, M.; Sadeghi, P. Feedback-Based Online Network Coding. IEEE Trans. Inf. Theory 2017, 63, 6628–6649. [Google Scholar] [CrossRef]
- Zeng, D.; Guo, S.; Jin, H.; Leung, V. Segmented network coding for stream-like applications in delay tolerant networks. In Proceedings of the 2011 IEEE Global Telecommunications Conference—GLOBECOM, Kathmandu, Nepal, 5–9 December 2011; pp. 1–5. [Google Scholar]
- Krigslund, J.; Fitzek, F.; Pedersen, M.V. On the combination of multi-layer source coding and network coding for wireless networks. In Proceedings of the 2013 IEEE 18th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), Berlin, Germany, 25–27 September 2013; pp. 1–6. [Google Scholar]
- Matsuzono, K.; Asaeda, H.; Turletti, T. Low latency low loss streaming using in-network coding and caching. In Proceedings of the IEEE INFOCOM 2017—IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar]
- Swamy, V.N.; Rigge, P.; Ranade, G.; Sahai, A.; Nikolić, B. Network coding for high-reliability low-latency wireless control. In Proceedings of the 2016 IEEE Wireless Communications and Networking Conference, Doha, Qatar, 3–6 April 2016; pp. 1–7. [Google Scholar]
- Yu, K.; Yue, J.; Lin, Z.; Åkerberg, J.; Björkman, M. Achieving reliable and efficient transmission by using network coding solution in industrial wireless sensor networks. In Proceedings of the 2016 IEEE 25th International Symposium on Industrial Electronics (ISIE), Santa Clara, CA, USA, 8–10 June 2016; pp. 1162–1167. [Google Scholar]
- Júnior, N.d.S.R.; Tavares, R.C.; Vieira, M.A.; Vieira, L.F.; Gnawali, O. CodeDrip: Improving data dissemination for wireless sensor networks with network coding. Ad Hoc Netw. 2017, 54, 42–52. [Google Scholar] [CrossRef]
- Cloud, J.; Medard, M. Multi-Path Low Delay Network Codes. In Proceedings of the 2016 IEEE Global Communications Conference (GLOBECOM), Washington, DC, USA, 4–8 December 2016; pp. 1–7. [Google Scholar]
- Gabriel, F.; Wunderlich, S.; Pandi, S.; Fitzek, F.H.; Reisslein, M. Caterpillar RLNC With Feedback (CRLNC-FB): Reducing Delay in Selective Repeat ARQ Through Coding. IEEE Access 2018, 6, 44787–44802. [Google Scholar] [CrossRef]
- Garcia-Saavedra, A.; Karzand, M.; Leith, D.J. Low delay random linear coding and scheduling over multiple interfaces. IEEE Trans. Mob. Comput. 2017, 16, 3100–3114. [Google Scholar] [CrossRef]
- Karzand, M.; Leith, D.J.; Cloud, J.; Medard, M. Design of FEC for Low Delay in 5G. IEEE J. Sel. Areas Commun. 2017, 35, 1783–1793. [Google Scholar] [CrossRef]
- Pandi, S.; Gabriel, F.; Cabrera, J.; Wunderlich, S.; Reisslein, M.; Fitzek, F.H.P. PACE: Redundancy Engineering in RLNC for Low-Latency Communication. IEEE Access 2017, 5, 20477–20493. [Google Scholar] [CrossRef]
- Wunderlich, S.; Gabriel, F.; Pandi, S.; Fitzek, F.H.; Reisslein, M. Caterpillar RLNC (CRLNC): A practical finite sliding window RLNC approach. IEEE Access 2017, 5, 20183–20197. [Google Scholar] [CrossRef]
- Choi, S.M.; Lee, K.; Park, J. Massive parallelization for random linear network coding. Appl. Math. Inf. Sci. 2015, 9, 571–578. [Google Scholar]
- Gan, X.B.; Shen, L.; Wang, Z.Y.; Lai, X.; Zhu, Q. Parallelizing network coding using CUDA. Adv. Mater. Res. 2011, 186, 484–488. [Google Scholar] [CrossRef]
- Kim, M.; Park, K.; Ro, W.W. Benefits of using parallelized non-progressive network coding. J. Netw. Comput. Appl. 2013, 36, 293–305. [Google Scholar] [CrossRef]
- Lee, S.; Ro, W.W. Accelerated network coding with dynamic stream decomposition on graphics processing unit. Comput. J. 2012, 55, 21–34. [Google Scholar] [CrossRef]
- Park, J.S.; Baek, S.J.; Lee, K. A highly parallelized decoder for random network coding leveraging GPGPU. Comput. J. 2014, 57, 233–240. [Google Scholar] [CrossRef]
- Shojania, H.; Li, B. Pushing the envelope: Extreme network coding on the GPU. In Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS), Montreal, QC, Canada, 22–26 June 2009; pp. 490–499. [Google Scholar]
- Shojania, H.; Li, B.; Wang, X. Nuclei: GPU-accelerated many-core network coding. In Proceedings of the IEEE Infocom, Rio de Janeiro, Brazil, 19–25 April 2009; pp. 459–467. [Google Scholar]
- Shojania, H.; Li, B. Tenor: making coding practical from servers to smartphones. In Proceedings of the 18th ACM international conference on Multimedia, Firenze, Italy, 25–29 October 2010; pp. 45–54. [Google Scholar]
- Rizvi, S.T.H.; Cabodi, G.; Patti, D.; Francini, G. GPGPU accelerated deep object classification on a heterogeneous mobile platform. Electronics 2016, 5, 88. [Google Scholar] [CrossRef]
- Chen, D.; Cong, J.; Gurumani, S.; Hwu, W.m.; Rupnow, K.; Zhang, Z. Platform choices and design demands for IoT platforms: Cost, power, and performance tradeoffs. IET Cyber Phys.Syst. Theory Appl. 2016, 1, 70–77. [Google Scholar] [CrossRef]
- Rizk, M.; Baghdadi, A.; Jézéquel, M.; Mohanna, Y.; Atat, Y. Design and prototyping flow of flexible and efficient NISC-Based architectures for MIMO Turbo equalization and demapping. Electronics 2016, 5, 1–27. [Google Scholar] [CrossRef]
- Shojania, H.; Li, B. Parallelized progressive network coding with hardware acceleration. In Proceedings of the Fifteenth IEEE International Workshop on Quality of Service, Evanston, IL, USA, 21–22 June 2007; pp. 47–55. [Google Scholar]
- Hernández Marcano, N.J.; Sørensen, C.W.; Cabrera G, J.A.; Wunderlich, S.; Lucani, D.E.; Fitzek, F.H. On goodput and energy measurements of network coding schemes in the Raspberry Pi. Electronics 2016, 5, 1–27. [Google Scholar] [CrossRef]
- Sørensen, C.W.; Hernández Marcano, N.J.; Cabrera Guerrero, J.A.; Wunderlich, S.; Lucani, D.E.; Fitzek, F.H. Easy as Pi: A network coding Raspberry Pi testbed. Electronics 2016, 5, 1–25. [Google Scholar] [CrossRef]
- Arrobo, G.E.; Gitlin, R.D. Minimizing energy consumption for cooperative network and diversity coded sensor networks. In Proceedings of the 2014 Wireless Telecommunications Symposium, Washington, DC, USA, 9–11 April 2014; pp. 1–7. [Google Scholar]
- Fiandrotti, A.; Bioglio, V.; Grangetto, M.; Gaeta, R.; Magli, E. Band codes for energy-efficient network coding with application to P2P mobile streaming. IEEE Trans. Multimedia 2014, 16, 521–532. [Google Scholar] [CrossRef]
- Angelopoulos, G.; Médard, M.; Chandrakasan, A. Energy-aware hardware implementation of network coding. In Proceedings of the International Conference on Research in Networking, Valencia, Spain, 13 May 2011; pp. 137–144. [Google Scholar]
- Tensilica Xtensa LX5 Customizable DPU; Cadence Design Systems: San Jose, CF, USA, 2013.
- Wolkerstorfer, J.; Oswald, E.; Lamberger, M. An ASIC implementation of the AES SBoxes. In Proceedings of the CT-RSA 2002: The Cryptographers’ Track at the RSA Conference 2002, San Jose, CA, USA, 18–22 February 2002; pp. 67–78. [Google Scholar]
- Satoh, A.; Morioka, S.; Takano, K.; Munetoh, S. A compact Rijndael hardware architecture with S-box optimization. In Proceedings of the ASIACRYPT 2001, 7th International Conference on the Theory and Application of Cryptology and Information Security, Gold Coast, Australia, 9–13 December 2001; pp. 239–254. [Google Scholar]
- Standaert, F.X.; Rouvroy, G.; Quisquater, J.J.; Legat, J.D. Efficient implementation of Rijndael encryption in reconfigurable hardware: Improvements and design tradeoffs. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems, Cologne, Germany, 8–10 September 2003; pp. 334–350. [Google Scholar]
- Wang, C.C.; Truong, T.K.; Shao, H.M.; Deutsch, L.J.; Omura, J.K.; Reed, I.S. VLSI architectures for computing multiplications and inverses in GF(2m). IEEE Trans. Comput. 1985, 34, 709–717. [Google Scholar] [CrossRef] [PubMed]
- Guajardo, J.; Paar, C. Efficient algorithms for elliptic curve cryptosystems. In Proceedings of the 17th Annual International Cryptology Conference, Barbara, California, USA, 17–21 August 1997; pp. 342–356. [Google Scholar]
- Paar, C. Efficient VLSI Architectures for Bit Parallel Computation in Galois Fields; VDI-Verlag: Düsseldorf, Germany, 1994. [Google Scholar]
- Sghaier, A.; Zeghid, M.; Massoud, C.; Mahchout, M. Design and Implementation of Low Area/Power Elliptic Curve Digital Signature Hardware Core. Electronics 2017, 6, 46. [Google Scholar] [CrossRef]
- Mangard, S.; Aigner, M.; Dominikus, S. A highly regular and scalable AES hardware architecture. IEEE Trans. Comput. 2003, 52, 483–491. [Google Scholar] [CrossRef]
- Fraleigh, J.B.; Beauregard, R. Linear Algebra, 3rd ed.; Pearson: London, UK, 2013. [Google Scholar]
- Quinnell, E.; Swartzlander, E.; Lemonds, C. Floating-point fused multiply-add architectures. In Proceedings of the 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 4–7 November 2007; pp. 331–337. [Google Scholar]
- MCF5307 ColdFire Integrated Microprocessor User Manual; Technical report for Freescale Semiconductor, Inc.: Austin, TX, USA, 2016.
- Gimmestad, B.J. The Russian Peasant Multiplication Algorithm: A Generalization. Math. Gazette 1991, 75, 169–171. [Google Scholar] [CrossRef]
- Chen, H.; Liu, Z.; Liu, S.; Ma, S. An efficient vector memory unit for SIMD DSP. In Proceedings of the CCF National Conf. on Computer Engineering and Technology (NCCET), Guiyang, China, 29 July–1 August 2014; pp. 1–11. [Google Scholar]
- Jutla, C.S.; Kumar, V.; Rudra, A. On the Circuit Complexity of Isomorphic Galois Field Transformations; No. 93; Electronic Colloquium on Computational Complexity: Potsdam, Germany, 2012. [Google Scholar]
- Tensilica Instruction Extension. Available online: https://en.wikipedia.org/wiki/Tensilica_Instruction_Extension (accessed on 13 February 2017).
- Increasing Processor Computational Performance with More Instruction Parallelism. Available online: https://ip.cadence.com/uploads/927/TIP$_$WP$_$FLIX$_$FINAL-pdf (accessed on 25 July 2018).
- Nurmi, J. Processor Design: System-On-Chip Computing for ASICs and FPGAs; Springer: Dordrecht, The Netherlands, 2007. [Google Scholar]
- Oklobdzija, V. The Computer Architecture: Handbook; CRC Press: Boca Raton, FL, USA, 2002. [Google Scholar]
- Gregg, J. Ones and Zeros: Understanding Boolean Algebra, Digital Circuits, and the Logic of Sets; Wiley-IEEE Press: Hoboken, NJ, USA, 1998. [Google Scholar]
- Hennessy, J.; Patterson, D. Computer Architecture: A Quantitative Approach, 4th ed.; Morgan Kaufmann: Burlington, MA, USA, 2006. [Google Scholar]
- Gropp, W. Lecture 14: Discussing Speedup. Parallel @ Illinois. Available online: http://wgropp.cs.illinois.edu/courses/cs598s16/lectures/lecture14.pdf (accessed on 4 July 2018).
- Arnold, O.; Matus, E.; Noethen, B.; Winter, M.; Limberg, T.; Fettweis, G. Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs. ACM Trans. Embedded Comput. Syst. 2014, 13, 107. [Google Scholar] [CrossRef]
- Wunderlich, S.; Cabrera, J.; Fitzek, F.H.; Reisslein, M. Network coding in heterogeneous multicore IoT nodes with DAG scheduling of parallel matrix block operations. IEEE Internet Things J. 2017, 4, 917–933. [Google Scholar] [CrossRef]
- Halbutogullari, A.; Koc, C.K. Mastrovito multiplier for general irreducible polynomials. IEEE Trans. Comput. 2000, 49, 503–518. [Google Scholar] [CrossRef]
- Gsching, P. Energy Efficient Processors–ARM big.LITTLE Technology. Available online: http://www.ziti.uniheidelberg.de/ziti/uploads/ce_group/seminar/2015-Philipp_Gsching.pdf (accessed on January 2016).
- Nguyen, V.; Nguyen, G.T.; Gabriel, F.; Lucani, D.E.; Fitzek, F.H. Integrating sparsity into Fulcrum codes: Investigating throughput, complexity and overhead. In Proceedings of the 2018 IEEE International Conference on Communications Workshops (ICC Workshops), Kansas City, MO, USA, 20–24 May 2018. [Google Scholar]
Acceleration Strategy | Number of TIE Gates | Increase as a Factor |
---|---|---|
(In Number NAND-2 Gates) | w.r.t. MAC | |
Original (basic Xtensa LX5 w/o TIE, | 0 | – |
used for benchmark C code) | ||
Multiply-Add (MAC) 8 bit | 968 | 0 |
Multiply-Add (MAC) 16 bit | 1183 | 0 |
SIMD MAC 8 bit | 10,542 | 10.9 |
SIMD MAC 16 bit | 11,245 | 9.5 |
SIMD MAC 8/16 bit | 33,768 | 28.5 |
Flexible length instruction extension (FLIX) 8/16 bit | 45,412 | 38.4 |
Inverse 8/16 bit | 12,692 | – |
Operation | Energy Consumption (J/MiB) |
---|---|
ODROID XU3 | 4.3 |
Xtensa LX5 Plain C code w/o TIE | 0.17 |
Xtensa LX5 FLIX TIE | 0.0014 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Acevedo, J.; Scheffel, R.; Wunderlich, S.; Hasler, M.; Pandi, S.; Cabrera, J.; Fitzek, F.H.P.; Fettweis, G.; Reisslein, M. Hardware Acceleration for RLNC: A Case Study Based on the Xtensa Processor with the Tensilica Instruction-Set Extension. Electronics 2018, 7, 180. https://doi.org/10.3390/electronics7090180
Acevedo J, Scheffel R, Wunderlich S, Hasler M, Pandi S, Cabrera J, Fitzek FHP, Fettweis G, Reisslein M. Hardware Acceleration for RLNC: A Case Study Based on the Xtensa Processor with the Tensilica Instruction-Set Extension. Electronics. 2018; 7(9):180. https://doi.org/10.3390/electronics7090180
Chicago/Turabian StyleAcevedo, Javier, Robert Scheffel, Simon Wunderlich, Mattis Hasler, Sreekrishna Pandi, Juan Cabrera, Frank H. P. Fitzek, Gerhard Fettweis, and Martin Reisslein. 2018. "Hardware Acceleration for RLNC: A Case Study Based on the Xtensa Processor with the Tensilica Instruction-Set Extension" Electronics 7, no. 9: 180. https://doi.org/10.3390/electronics7090180
APA StyleAcevedo, J., Scheffel, R., Wunderlich, S., Hasler, M., Pandi, S., Cabrera, J., Fitzek, F. H. P., Fettweis, G., & Reisslein, M. (2018). Hardware Acceleration for RLNC: A Case Study Based on the Xtensa Processor with the Tensilica Instruction-Set Extension. Electronics, 7(9), 180. https://doi.org/10.3390/electronics7090180