Data Alignment on Embedded CPUs for Programmable Control Devices

Hubacz, Marcin; Trybus, Bartosz

doi:10.3390/electronics11142174

Open AccessArticle

Data Alignment on Embedded CPUs for Programmable Control Devices

by

Marcin Hubacz

and

Bartosz Trybus

^*

Department of Informatics and Control, Rzeszow University of Technology, 35-959 Rzeszow, Poland

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(14), 2174; https://doi.org/10.3390/electronics11142174

Submission received: 27 May 2022 / Revised: 5 July 2022 / Accepted: 8 July 2022 / Published: 12 July 2022

(This article belongs to the Special Issue Real-Time Digital Control Technologies and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This work aims to investigate the impact of memory access limitations in microcontrollers and microprocessors on the performance of software that deals with binary data. The research area covers control systems that process data from the IEC 61131-3 standard using a software-implemented virtual machine. Three methods of memory access are considered, namely byte access, memory copying, and direct pointer. Tests of these methods are performed on several CPUs with ARM architecture (with variants), MIPS, RISC-V, Quark, and others, often used as hardware platforms for control devices. The tests cover 1-, 2-, 4-, and 8-byte data sizes, which correspond to the integer types of the IEC 61131-3 standard. By analyzing the results covering both unaligned and aligned data, the goal of this paper is to indicate which of the memory access methods is the most efficient for a particular platform. The research is supplemented with an evaluation of power and memory requirements for a group of STM32 microcontrollers. Therefore, the contribution of this paper rests in indicating the most efficient memory access method for each of more than a dozen CPUs intended for control applications, with consideration of power and memory requirements.

Keywords:

embedded system; memory data alignment; PLC; virtual machine

1. Introduction

Microprocessor systems for embedded applications face the challenge of achieving high functionality and computing power while minimizing the number of hardware resources and power being used. To achieve this, certain simplifications and limitations are usually introduced, some of which relate to how the software should interact with the microprocessor hardware. This in turn requires the preparation of the software in a way that does not violate the limitations imposed by the hardware. One such restriction concerns the random access to data stored in memory by the CPU. The restrictions are particularly acute in applications where large amounts of data are processed. Researchers commonly refer to memory access limitations as the “memory wall” problem [1,2].

Data alignment in memory is one of these limitations that exist in microcontrollers and microprocessors for embedded systems. Specifically, it means fetching and writing to memory portions of data whose size is aligned with the memory addresses [3]. Normally, the addresses refer to single bytes stored in cells; however, processors can also operate on 2, 4, or even 8-byte data. For example, in the case of 2-byte data, the alignment requires the memory address to be even. An odd address means unalignment. Likewise, an aligned address of 4-byte data is the multiplier of four. In systems with architecture restrictions, processor access to multibyte data located at addresses not aligned with the data size may be limited, slowed down, or impossible. Therefore, the aim of this research is to evaluate how, and to what extent, the microcontrollers and microprocessors intended for control applications deal with address alignment by using different memory access methods.

In the example in Figure 1, 1-, 2-, and 4-byte data are considered. On the left, the data are located at addresses aligned to their size. If, however, for multibyte data, we assume unaligned addresses as on the right (indicated in red), it may be found that the processor’s access to such data is difficult or impossible [4,5]. As a result, software that uses these data cannot function properly. For these reasons, for processors that have such limitations, the machine code generated by the compiler must provide addresses that are appropriate for the given platform.

For compiled programming languages, a platform-specific native machine code generator can respect architecture constraints and align multibyte variables to addresses that conform to the processor constraints. In the case of the often used LLVM Compiler Infrastructure solution for various languages and platforms [6,7], this can be handled by a back-end prepared for a CPU involving such limitations [8]. In this way, the memory access restrictions can be largely transparent to the programmer.

In this paper, we focus on programmable control devices that process data of various sizes defined in the IEC 61131-3 standard [9] for programming control systems, including PLCs. The standard defines five programming languages, namely textual Structured Text (ST), Instruction List (IL), graphical Ladder Diagram (LD), Function Block Diagram (FBD), and mixed Sequential Function Chart (SFC). Elementary data types occupy 1, 2, 4, 8, or more bytes. Three methods of accessing such data are considered, namely

byte access;
memory copying;
direct access using a pointer.

Byte access acquires multibyte data by reading single bytes put together into the required format. In the case of writing, the data are divided into individual bytes written to memory one by one. Therefore, the memory must be accessed a few times. Memory copying applies a standard C function whose performance depends on the CPU architecture and compiler. Direct address by means of a pointer is potentially the most efficient method, although low-cost or earlier architectures require alignment of the address. The use of unaligned addresses on modern CPUs, which can deal with unalignment, slows down execution.

The three memory access methods are implemented in a cross-platform virtual machine [10,11] that interprets the portable intermediate code of the IEC 61131-3 control programs prepared in the CPDev development environment [12]. The environment consists of the editors of the five IEC 61131-3 languages, translators and a compiler to intermediate code, and a runtime virtual machine, i.e., a software-implemented processor that executes the compiled code. Each of the three memory access methods can be selected while configuring the environment. In particular, an experimental setup has been implemented in over a dozen microcontrollers and microprocessors with a variety of architectures, including ARM (several variants), MIPS, RISC-V, and Quark, in versions intended for embedded applications.

Results of the performance studies presented here show how each memory access method works on a given CPU. The tests are performed for a set of ST language programs that process data of various sizes. Unaligned and aligned data are considered. Hence, the goal of this research is to indicate which of the methods is the most suitable for a particular microcontroller or microprocessor. The results are supplemented by a comparison of power and memory usage for a few STM series processors running the virtual machine using the three access methods.

The practical need for this research arose during works on extending the capabilities of devices from one of the manufacturers that uses the CPDev environment. It was found that in some of the microcontrollers, the virtual machine works considerably slower than in the others. Thus, the contribution of this paper is an indication of the most appropriate memory access method for each of over a dozen CPUs that execute the IEC 61131-3 control programs. This may be of interest for some industrial control applications.

2. Background

2.1. IEC 61131-3 Data Types

As indicated before, the IEC 61131-3 standard defines five programming languages and the structure of control software [9]. Programming enables the use of simple (elementary), generic, and user-defined data types. Simple types can be divided into several groups: integers, real numbers, dates and times, strings, and others. Some elements of these groups may have various sizes that shorten or expand the range of values. For example, the INT type (integer) can have variants such as SINT (short integer), DINT (double integer), LINT (long integer), UINT (unsigned integer), USINT (unsigned short integer), and ULINT (unsigned long integer).

The IEC 61131-3 standard defines the range and size of the memory needed to represent values of the elementary types. Table 1 shows the number of bytes occupied by the elementary-type data in the CPDev virtual machine used here. As seen, typical data sizes are 1, 2, 4, and 8 bytes. In the case of character strings, the size of the data depends on the number of characters. The ADDRESS type in the 4-byte group is a special type used internally by the 32-bit virtual machine instructions to store indexes of data and code memory areas, so ADDRESS is an alias for DWORD (for 16-bit machine, it would be an alias of WORD [11,12]).

The standard also defines generic types (ANY) and user types defined in the form of enumerated types and subranges—for example, (0.2 … 12.0). The values of the enumerated types and subranges are represented by the simple types. Arrays and structures are also user-defined types.

2.2. Data Alignment in Control Software

The data alignment problem appears in some CPUs, where the arrangement of binary data in memory is not free if the software is expected to treat these data according to a specific type. The constrained platforms require alignment of the starting address of these data to a multiple of the data type size [3,13]. The alignment problem is typical for distributed control systems, where the device must correctly interpret complex binary data coming from outside.

Suppose that the data concerning a battery pack are defined by a structure involving the following fields: identifier ID: BYTE, count CNT: UINT, voltage VOLT: INT, status STA: BYTE, state of charge SOC: REAL, installed DT: DATE_AND_TIME, temperature TEMP: INT.

Depending on the arrangement of the structure fields in memory, the data may or may not be interpreted correctly. This is shown in Figure 2, where the upper part refers to the case where the individual data of the structure are transferred to the aligned offsets determined by a compiler. The BYTE, UINT (2B), INT (2B), and REAL (4B) data values are grouped by size, whereas DATE_AND_TIME (8B) is accessed by two 4-byte transfers. If the data are left as in the structure definition, then the situation shown in the lower part will occur. Here, the data are placed in offsets not aligned with the size (indicated in red), which creates problems for some processors. Note that the alignment as in the upper part of Figure 2 may result in some parts of the memory not being used (Section 5).

The problem of data alignment is particularly troublesome in devices involving different processors that exchange data with each other. The method of encoding numbers with multibyte data is an additional issue that should also be taken into account, i.e., either with the least significant byte first (little endian) or last (big endian) [3]. In many cases, this forces the recoding of the values, which slows down access to the data.

The limited memory access is viewed in a number of works as a performance issue. For example, an API infrastructure for optimization problem data stored in CPU and GPU memories is presented in [14] with results of performance studies for unaligned and aligned data. Other authors propose hardware solutions that can alleviate the “memory wall” problem. An integrated solution of in-memory computing (IMC) is proposed in [15]. The paper [2] deals with the Logic-in-Memory (LIM) approach, where selected operations are performed at the memory level, without the main processor. The authors of [1] point to the need to integrate IMC solutions with the main memory of the system and propose a programmable memory controller architecture.

2.3. PLC Software Development

PLC software conforming with the IEC 61131-3 standard is currently developed using three approaches. The first one, depicted on the left side of Figure 3, directly compiles the IEC programs (e.g., in ST language) into the CPU native code. The relatively uncomplicated runtime executes the deployed code quickly, so the approach is used by well-established manufacturers who produce devices in long series. However, it is a single processor solution since the change of CPU requires a new compiler.

The second approach (center in Figure 3) involves translating the IEC programs into C/C++ and then platform-specific compilers generate native binary code [16,17,18]. The first step ensures multiplatform applications; however, the two-step toolchain involving multiple compilers makes the deployment difficult. In such solutions, the limitations of the target platform are known to the compiler, so the problem of unaligned data can be solved during code generation. The compiler knows the specificity of a given platform and prepares the binary code and data location according to the requirements.

There is, however, a third approach, shown on the right in Figure 3, which aims to make the control program code independent of the target platform. The solution is based on the virtual machine concept [19,20]. The universal cross-platform code involved in this approach is interpreted on the controller side by the appropriate runtime (i.e., IsaGRAF [21], STRATON [22]). The runtime environment is often referred to as a virtual machine (VM) because it emulates the operation of a software-defined processor [23]. The solution enables easy software exchange during operation, makes the program independent of the device, and allows for transfer between controllers with different processors. It also means that the deployment of control software does not require an additional step that involves the native compilers.

Multiplatform applications are benefits of the approach, while lower time efficiency is the disadvantage. Despite this, the VM approach is particularly of interest for small- and medium-scale enterprises (SMEs) that manufacture devices in short series and are generally more flexible than the established manufacturers.

The disadvantage of having reduced performance compared to the native compilation is caused by the fact that the code is interpreted by a virtual machine. The problem of data alignment is also cumbersome because binary data in the universal code do not necessarily have to be arranged as expected by the target processor. Therefore, it may happen, particularly in the case of direct access to memory using a pointer, that the processor will not be able to access unaligned data correctly, causing a hardware fault or at least a performance penalty. This happens, for instance, while reading 2B integer data from an odd address.

3. Experimental Setup and Methodology

3.1. Microcontrollers and Microprocessors

To investigate the problem of data alignment reasonably broadly, an experimental setup involving different integrated circuits has been assembled. The circuits have various internal structures, reflected in differences in memory access. Some of them do not accept unaligned data. Seemingly small differences may cause software malfunction (CPU fault) or other phenomena.

The experimental setup consists of the integrated circuits divided into microcontrollers and microprocessors according to the list in Table 2. It is assumed that a microcontroller has built-in FLASH and RAM memories, does not involve a Memory Management Unit (MMU), and often does not use an operating system. The second group includes general-purpose processors that do not have built-in FLASH or RAM, yet contain MMU, so they can use extensive operating systems, e.g., Linux.

The ARM1 (Advanced RISC Machine) core was introduced in 1985 with the ARMv1 architecture. Over the years, successive cores have been developed on various architecture models [24]. This range includes the ARM1 to ARM11 families; Cortex, in which the M, A, and R series can be distinguished; and the latest Neoverse server. Currently, the ARM architecture is also the most widely used, in more than 70% of embedded systems [25]. The reduced number of processor instructions means less complexity and increased power efficiency. The relatively low unit price and convenient architecture enable mass implementation of the ARM processors in various integrated circuits.

Despite the vast dominance of the ARM architecture, many other processor platforms are available for embedded applications. The MIPS architecture, first introduced in 1985, has several generations, from MIPS I to MIPS V [26]. Initially, it was developed as a 32-bit version; then, a 64-bit version was presented, and, finally, microMIPS for small applications. In 2021, the development of this architecture was abandoned in favor of RISC-V [27]. In 2013, Intel introduced the 32-bit Quark series, with low energy consumption and size [28]. The architecture supported the same instruction set as the Pentium P54C and i586. In addition to the circuits above, this research covers two other processor platforms used in embedded systems, namely Xtensa LX and Diamond 106Micro. They are RISC-type solutions based on the Xtensa Instruction Set Architecture [29]. ESP8266 and ESP32 are popular IoT platforms that use Xtensa processors.

As a result, Table 2 includes eleven CPUs with different variants of ARM. In addition to the ARMs, CPUs with the MIPS, Quark, Xtensa, 106Micro, and newer RISC-V architecture are included in the microcontroller section.

3.2. Methodology of Data Alignment Testing

The tests are based on a set of ST language programs, with loop reading, writing, and modification of BYTE, WORD, DWORD, and LWORD data types by add and modulo operations. The CPDev virtual machine that executes the test programs is deployed on each of the platforms. In all cases, the GNU toolchain is applied to prepare the appropriate version of the machine. Three versions of the machine are prepared for each of the platforms, different in the memory access methods. In the case of the microprocessors, the machine runs under a Linux kernel-based operating system, whereas, in the microcontrollers, it is executed directly by the CPU. No hardware virtualization or platform-specific acceleration mechanisms are used.

During the operation, the execution time is measured for each of the test programs. For this purpose, impulses are generated at the controller output (using the GPIO pins) before starting the PLC cycle and after its completion. A measuring device is connected to the pins. The measurements are supplemented with the system clock readings. The PLC cycle time is determined as an average of 3000 measurements.

To evaluate the results consistently, the following assumptions are imposed:

the same type of memory for program and data;
max/min clock frequencies for each of the two groups;
normalization of execution time with respect to the machine control cycle.

The technology used for the code and data memories in CPDev can be selected by the designer. Non-volatile FLASH memory is used for permanent storage of the program. In practice, however, the program in FLASH is always accessed more slowly than in RAM. Due to the wide variation in the speed of FLASH memories, the virtual machine test programs are stored in fast RAM. Thus, the potential impact of the FLASH speed is reduced.

The clock frequency is the basic parameter for ranking the processors. To bring the performance of the tested platforms closer to each other, the microcontrollers run at the maximum operating frequencies, while the microprocessors at the lowest. To normalize the results, a new entity is defined as the execution time of one calculation cycle per 1 MHz of the processor clock frequency.

3.3. Virtual Machine

All the integrated circuits have been driven by the CPDev virtual machine [10,11]. The machine is portable, which means that it can be adapted to a variety of hardware and CPU platforms. Operation of the virtual machine is explained in the Appendix A. Here, it suffices to note that the machine, as with any other processor of Harvard architecture, uses two memory areas, namely code memory and data memory. The first one stores the interpreted code in binary form, including constant data of different types and offset values. The data memory contains the current values of the variables. Therefore, the data alignment problem applies to these two memory areas. The virtual machine can use different memory access methods, so it is possible to check how each of them copes with the problem of unaligned data.

Byte access is the basic method of memory access by the virtual machine. This method is fully supported by any architecture. In the context of data alignment, access to single bytes is always aligned, so it is not a reason for any limitation. Byte access is also the only solution when changing from little endian to big endian or vice versa.

Memory copying using standard C language functions is the second method. The compiler has a significant impact on the performance of the binary code. The memory copying method is universal and flexible in terms of transferability to other platforms.

Direct memory reference using pointers is the third access method. Processor instructions that enable unalignment behavior are usually available on more complex integrated circuits [24]. Support of unaligned memory access may generate additional time penalties.

4. Results of Memory Access Tests

4.1. Execution Times for 4-Byte Variables

Figure 4 shows the results for the group of 32-bit microcontrollers running test programs processing 4-byte variables (DWORD data type). An incorrectly working program may block the processor, and require either a manual or automatic restart, e.g., by a watchdog. This happens for the direct access method and unaligned code (NO ALIGN/DIRECT), where missing data bars in the graph mean incorrect operation of the software. Therefore, the STM32F072, PIC32MX7, ESP8266, SAMD21G18, and SR2NW chips cannot execute the code with an unaligned data arrangement. It is also seen that for more efficient systems, such as STM32F446, NRF52832, or ESP32, memory copying is beneficial compared to byte access. For all tested systems, the reliable byte access is more than twice as time-consuming as the two other methods. The systems supporting access to unaligned data, such as STM32 without F0 series, NRF52832, ESP32, and ESP32-C3, also perform more slowly in the byte access method. SR2NW is the platform with almost negligible differentiation of the results. It is not a flawless system, however, as it does not support access to unaligned data.

The second series of results shown in Figure 5 concerns the microprocessors. For all combinations of architectures and memory access methods, no data access errors were encountered. During all the tests, the byte access was, on average, two-times slower than any other method. Direct access is somewhat faster than memory copying. Of the five microprocessors tested, TI AM5729, BCM27111, and RK3308 provide comparable performance.

4.2. Universal Memory Access

To select the universal memory access method, i.e., which could be recommended for any platform, the sums of normalized program cycles have been calculated for the microcontrollers and microprocessors. The results are shown in Figure 6 and Figure 7. Direct access with unaligned data cannot qualify as universal and reliable; therefore, it is not included. Nevertheless, direct access remains the most efficient, as far as the aligned addresses are used (last column in Figure 6). The memory copy gives somewhat poorer results, similar for aligned and unaligned data in all microcontrollers. In the case of byte access, increasing the size of a program code as a result of the alignment increases the execution time as compared to the unaligned mode.

The microprocessors, due to the larger set of instructions, clearly exhibit the difference between the byte access and the two other methods (Figure 7). Thus, the memory copy and direct access are twice as efficient as composing data from single bytes. The use of alignment along with direct memory access increases the efficiency somewhat (last column). Again, it may be recommended as the universal method for microprocessors.

4.3. Average Execution Times

Figure 8 shows the average of the tests for 1, 2, 4, and 8-byte variables in the entire cross-section of the integrated circuit set. The similar behavior of the byte access for aligned and unaligned data is evident, although, despite its reliability, it remains the least efficient method. Direct access always improves the efficiency. The incompatibility with all architectures in the mode without data alignment is a disadvantage of this solution. The NO ALIGN/DIRECT plot (red) is interrupted for STM32F072 and four other microcontrollers, since they cannot execute the code with unaligned data (compare Figure 4).

5. Power and Memory Usage

5.1. Power Efficiency

To compare the power efficiency following the guidelines from [30], the STM32 series of microcontrollers from STMicroelectronics has been chosen for testing. Each chip can be characterized by several basic properties, such as processor architecture, maximum clock frequency, average current consumption per MHz, and computing performance.

Manufacturers declare the performance by using the popular CoreMark benchmark [31], which suits embedded systems better than the generic Dhrystone test. However, due to the specific method of software operation, a simplified benchmark has been used here to determine the performance from the perspective of the virtual machine. Thus, to compare power consumption, the average current in the RUN mode of the machine is measured at the maximum frequency of the chip. External peripherals are turned off. The code is loaded from FLASH memory, and the external clock signal is applied to the processor (HSE and PLL turned on).

The cycle time of the program with four sizes of the integer variables has been evaluated. The results of the execution time of a single PLC cycle are the average of the results for the three memory access methods.

Table 3 indicates a large variation between the microcontrollers. The STM32F446 chip has the best power efficiency, almost two-times better than the STM32F746. The early STM32F072 is the least efficient. Thus, the STM32F446 seems a reasonable choice among the four. In addition to being a fairly efficient computing unit, it is equipped with a large number of additional peripherals and memory.

The results presented in Table 3 have been obtained for the control program executed by the virtual machine. The machine interprets instructions in the portable binary code one by one and executes them as blocks of machine code for the target CPU. Consequently, each VM instruction requires several CPU instructions, as indicated in Section 4. In addition, the VM adds some supervision to the execution, such as guarding array bounds. This protects the controller from serious failures when a programming error occurs and allows the device to remain operational. Therefore, the portable code interpretation means much slower execution than that of the native code, from a few to over a dozen or more times (depending on the complexity of the instruction). The increased computational complexity is compensated for by the portability and compatibility of the code between different platforms.

5.2. Unused Memory

Memory usage for code and data is another important aspect in embedded systems due to limited resources. In the case of systems with alignment, some memory is left unused while storing a mixed-size data set. For a structure with three BYTE (1B) fields, one INT (2B), and one REAL (4B), the memory usage may resemble that in Figure 9. As seen, three bytes are not used here due to the need to allocate the data at aligned addresses.

Table 4 summarizes the overall memory usage in the STM32 microcontrollers by the program that processes integer data of the four sizes. The given values include the size of the portable binary code and the amount of data memory needed for the variables during program execution. The alignment of the data increased the memory usage by up to 26% compared to the unaligned mode.

6. Conclusions

Research results indicate that the type and architecture of a microcontroller or microprocessor are important in systems where the binary data structure does not take into account the limitations of the platform. Programmable control devices belong to systems in which the program or data can be hardware-independent. The IEC 61131-3 standard defines data types of different sizes that can be used in the programs. Such data may be aligned or do not have to be aligned in memory. Among over a dozen CPUs with different architectures tested here, there are some where data alignment constraints are strong and as such can deal with unaligned data. The byte access method always works, but it places a heavy burden on the execution time. Thus, it will apply especially where the other two methods, i.e., memory copying and direct access by pointer, are not applicable. Changing the value from little-endian representation to big-endian or vice versa is one such case. The direct access method, which seems to be the most efficient, does not apply to several microcontrollers if the data are not aligned. The alignment introduces additional memory requirements. Memory copying is a reasonable compromise for applications where cross-platform portability is important. The CPDev virtual machine applied here for evaluations can operate using each of the three methods.

The contribution of the paper rests in the indication of which of the memory access methods is most efficient for a particular CPU. This may help the designer to choose the one that best suits the system being developed.

Author Contributions

Conceptualization, B.T.; methodology, M.H.; software, M.H. and B.T.; writing—original draft preparation, B.T.; visualization, M.H.; investigation, formal analysis M.H. and B.T.; writing—review and editing, M.H. and B.T.; data curation, M.H.; supervision, validation, funding acquisition, B.T. All authors have read and agreed to the published version of the manuscript.

Funding

This project is financed by the Minister of Education and Science of the Republic of Poland within the “Regional Initiative of Excellence” program for years 2019–2022. Project number 027/RID/2018/19, amount granted 11 999 900 PLN.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in https://github.com/CPDev-ControlProgramDeveloper (accessed on 13 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Operation of the Virtual Machine

Working in the PLC mode, the CPDev virtual machine performs three operation cycles, as shown in Figure 10, namely

pre cycle: reading values of input variables from physical inputs or external sources;
VM task: actual code and data interpretation;
post cycle: setting outputs on the basis of calculated values of variables, communication with external devices, diagnostics, and testing.

The instruction processing module is a crucial component of the architecture. The VM language instruction set consists of direct counterparts of the IEC 61131-3 standard functions and assembler-like jumps, memory copying, subprogram calls, etc. [11,12]. The VM operates as a software interpreter of the universal code, so there is no need to use any hardware-based virtualization features (e.g., hypervisors).

Figure 10 assumes that the controller executes a single control task. The tasks are composed of programs and other Program Organization Units, as defined in [9]. To run more tasks simultaneously, other virtual machine instances can be created with their data memory areas. For a multicore CPU, this may be achieved by assigning each of the VM instances to a particular core. In the case of a multitasking operating system, the VM tasks are executed as threads or processes [32].

Figure 10. Operation of a programmable logic controller using the virtual machine.

References

Mambu, K.; Charles, H.-P.; Kooli, M.; Dumas, J. Towards Integration of a Dedicated Memory Controller and Its Instruction Set to Improve Performance of Systems Containing Computational SRAM. J. Low Power Electron. Appl. 2022, 12, 18. [Google Scholar] [CrossRef]
Ottati, F.; Turvani, G.; Masera, G.; Vacca, M. Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories. Electronics 2021, 10, 2291. [Google Scholar] [CrossRef]
Bai, Y. Practical Microcontroller Engineering with ARM Technologies; Wiley-IEEE Press: Hoboken, NY, USA, 2015; ISBN 978-1-119-05237-1. [Google Scholar]
Tanskanen, J.K.; Sihvo, T.; Niittylahti, J. Byte and modulo addressable parallel memory architecture for video coding. IEEE Trans. Circuits Syst. Video Technol. 2004, 14, 1270–1276. [Google Scholar] [CrossRef]
Geng, T.; Diken, E.; Wang, T.; Jozwiak, L.; Herbordt, M. An access-pattern-aware on-chip vector memory system with automatic loading for simd architectures. In Proceedings of the 2018 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 25–27 September 2018; pp. 1–7. [Google Scholar] [CrossRef]
Lattner, C.; Adve, V. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization, Antibes, Juan-les-Pins, France, 3 October 2004; pp. 75–86. [Google Scholar] [CrossRef]
Catalão, T.; Sousa, M. IEC 61131-3 front-end for the LLVM compiler family. In Proceedings of the 25th IEEE International Conference on Emerging Technologies and Factory Automation, Vienna, Austria, 8–11 September 2020; Volume 1, pp. 1191–1194. [Google Scholar] [CrossRef]
Chai, Y.; Chen, M.; Li, J.; Han, L. Implementation and Optimization of Data Prefetching Algorithm Based on LLVM Compilation System. J. Phys. Conf. Ser. 2021, 1827, 012136. [Google Scholar] [CrossRef]
IEC 611313:2013; Programmable Controllers—Part 3: Programming Languages. European Committee for Electrotechnical Standardization: Brussels, Belgium, 2013.
Trybus, B. Development and Implementation of IEC 61131-3 Virtual Machine. Theor. Appl. Inform. 2011, 23, 21–35. [Google Scholar] [CrossRef] [Green Version]
Sadolewski, J.; Trybus, B. Compiler and virtual machine of a multiplatform control environment. Bull. Pol. Acad. Sci. Tech. Sci. 2022, 70, e140554. [Google Scholar] [CrossRef]
Rzońca, D.; Sadolewski, J.; Stec, A.; Świder, Z.; Trybus, B.; Trybus, L. Developing a multiplatform control environment. J. Autom. Mob. Robot. Intell. Syst. 2019, 13, 73–84. [Google Scholar] [CrossRef]
Alvarez, M.; Salami, E.; Ramirez, A.; Valero, M. Performance impact of unaligned memory operations in SIMD extensions for video codec applications. In Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems & Software, San Jose, CA, USA, 25–27 April 2007; pp. 62–71. [Google Scholar] [CrossRef] [Green Version]
Pinheiro, R.L.; Landa-Silva, D.; Qu, R.; Constantino, A.A.; Yanaga, E. An application programming interface with increased performance for optimisation problems data. J. Manag. Anal. 2016, 3, 305–332. [Google Scholar] [CrossRef] [Green Version]
Kim, M.; Kim, S.-H.; Lee, H.-J.; Rhee, C.-E. Case Study on Integrated Architecture for In-Memory and In-Storage Computing. Electronics 2021, 10, 1750. [Google Scholar] [CrossRef]
Beremiz Integrated Development Environment. Available online: www.beremiz.org (accessed on 13 June 2022).
Tisserant, E.; Bessard, L.; Sousa, M. An open-source IEC 61131-3 integrated development environment. In IEEE International Conference on Industrial Informatics; IEEE: Piscataway, NY, USA, 2007; pp. 183–187. [Google Scholar] [CrossRef]
GEB Automation. GEB Automation IDE Guide. Available online: www.gebautomation.org (accessed on 13 June 2022).
Cavalieri, S.; Puglisi, G.; Scroppo, M.S.; Galvagno, L. Moving IEC 61131-3 applications to a computing framework based on CLR virtual machine. In Proceedings of the IEEE 21st International Conference on Emerging Technologies and Factory Automation, Berlin, Germany, 6–9 September 2016; pp. 1–8. [Google Scholar] [CrossRef]
Lee, Y.; Jeong, J.; Son, Y. Design and implementation of the secure compiler and virtual machine for developing secure IoT services. Future Gener. Comput. Syst. 2017, 76, 350–357. [Google Scholar] [CrossRef]
Rockwell Automation. ISaGRAF Workbench. Available online: www.isagraf.com (accessed on 13 June 2022).
COPA-DATA France. STRATON. Available online: www.straton-plc.com (accessed on 13 June 2022).
Zhang, M.; Lu, Y.; Xia, T. The design and implementation of virtual machine system in embedded SoftPLC system. In Proceedings of the International Conference on Computer Science and Applications, Trento, Italy, 6–8 June 2013; pp. 775–778. [Google Scholar] [CrossRef]
Asghar, M.N. A Review of ARM Processor Architecture History, Progress and Applications. J. Appl. Emerg. Sci. 2020, 10, 171–179. [Google Scholar] [CrossRef]
ARM Microcontrollers Market Size and Forecast. Available online: https://www.verifiedmarketresearch.com/product/arm-microcontrollers-market/ (accessed on 13 June 2022).
Patterson, D.A.; Hennessy, J.L. Computer Organization and Design, Fifth Edition: The Hardware/Software Interface, 5th. ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2013; ISBN 978-0-12-407726-3. [Google Scholar]
Patterson, D.; Waterman, A. The RISC-V Reader: An Open Architecture Atlas; Strawberry Canyon: Black Canyon City, AZ, USA, 2017; ISBN 099-9-24911-8. [Google Scholar]
Sun, J.; Jones, M.; Reinauer, S.; Zimmer, V. Embedded Firmware Solutions; Apress: Berkeley, CA, USA, 2015; ISBN 978-1-4842-0070-4. [Google Scholar] [CrossRef]
Xtensa LX Microprocessor Overview Handbook. 2004. Available online: Loboris.eu/ESP32/Xtensa_lx%20Overview%20handbook.pdf (accessed on 13 June 2022).
Wu, H.; Chen, C.; Weng, K. An Energy-Efficient Strategy for Microcontrollers. Appl. Sci. 2021, 11, 2581. [Google Scholar] [CrossRef]
EEMBC—The Embedded Microprocessor Benchmark Consortium. Coremark Benchmark. Available online: http://www.eembc.org/coremark (accessed on 13 June 2022).
Wang, K.C. Embedded Real-Time Operating Systems. Embedded and Real-Time Operating Systems; Springer: Cham, Germany, 2017; pp. 401–475. ISBN 978-3-319-51517-5. [Google Scholar] [CrossRef]

Figure 1. Exemplary aligned and unaligned arrangement of 1-, 2-, and 4-byte variables in memory.

Figure 2. Example of alignment and unalignment of a complex data structure in memory.

Figure 3. Three software development scenarios for IEC 61131-3 systems: native IEC code (left), translation to C/C ++ code (center), and portable binary code for a virtual machine (right).

Figure 4. Normalized program cycle execution time for various types of memory access, for selected microcontrollers processing 4-byte variables.

Figure 5. Normalized program cycle execution time for various types of memory access, for selected microprocessors processing 4-byte variables.

Figure 6. Total normalized execution time of the program cycle for the tested microcontrollers and memory access methods.

Figure 7. Total normalized execution time of the program cycle for the tested microprocessors and memory access methods.

Figure 8. Average normalized execution time for 1, 2, 4, and 8-byte variables for the memory access methods.

Figure 9. Unused memory areas in systems with data alignment.

Table 1. CPDev representation of the IEC 61131-3 elementary data types.

Name	Data Type	Range
1 byte
BOOL	logic	FALSE/TRUE
BYTE	byte	0 … 255
USINT	unsigned short integer	0 … 255
SINT	signed short integer	−128 … 127
2 bytes
WORD	word	0 … 65,535
INT	integer	−32,768 … 32,767
4 bytes
DWORD	double word	0 … 2³² − 1
UDINT	unsigned double integer	0 … 2³² − 1
DINT	double integer	−2³¹ … 2³¹ − 1
REAL	real	format IEEE-754
TIME	duration time	−24 days 20 h 31 min 23 s…24 days 20 h 31 min 23 s
DATE	date	1 January 0001 … 31 December 9999
TIME_OF_DAY	time of day	00:00:00.00 … 23:59:59.99
ADDRESS	pointer in CPDev	like DWORD
8 bytes
LWORD	long word	0 … 2⁶⁴ − 1
LINT	long integer	−2⁶³ … 2⁶³ − 1
ULINT	unsigned long integer	0 … 2⁶⁴ − 1
LREAL	long real	format IEEE-754
DATE_AND_TIME	date and time	combination DATE and TIME_OF_DAY
Other
STRING	ASCII character string	variable length (1B character)
WSTRING	Unicode character string	variable length (2B character)

Table 2. List of microcontrollers and microprocessors used in the tests.

Architecture	Bit	Used Freq (MHz)	MCU/MPU
Microcontroller
Cortex-M0 (ARMv6-M)	32	48	STM32F072
Cortex-M0+ (ARMv6-M)	32	48	SAMD21G18
Cortex-M4 (ARMv7-M)	32	64	NRF52832
Cortex-M4 (ARMv7-M)	32	72	STM32F303
Cortex-M4 (ARMv7-M)	32	180	STM32F446
Cortex-M7 (ARMv7-M)	32	216	STM32F746
MIPS	32	80	PIC32MX7
106Micro	32	80	ESP8266
Xtensa LX6	32	240	ESP32
RISC-V	32	160	ESP32-C3
Intel Quark	32	32	SR2NW
Microprocessor
ARM11 (ARMv6)	32	700	BCM2835
Cortex-A15 (ARMv7-A)	32	400	TI AM5729
Cortex A35 (ARMv8-A)	64	408	RK3308
Cortex-A53 (ARMv8-A)	64	600	BCM2837
Cortex-A72 (ARMv8-A)	64	600	BCM2711

Table 3. Current consumption for a single cycle of the virtual machine.

MCU	Max Freq. (MHz)	Declared Performance (CoreMark)	Current Consumption (uA/MHz)	Cycle Time (us)	Result (uA/Cycle)
STM32F072	48	106	270	150,864	1304.8
STM32F303	72	245	380	113,184	1098.3
STM32F446	180	608	178	90,720	437.6
STM32F746	216	1082	500	58,968	824.7

Table 4. Usage of code and data memory for unaligned and aligned modes.

Mode	Memory Usage (Bytes)
Mode	BYTE (1B)	WORD (2B)	DWORD (4B)	LWORD (8B)
Unaligned	393	439	531	716
Aligned	498	544	634	818

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hubacz, M.; Trybus, B. Data Alignment on Embedded CPUs for Programmable Control Devices. Electronics 2022, 11, 2174. https://doi.org/10.3390/electronics11142174

AMA Style

Hubacz M, Trybus B. Data Alignment on Embedded CPUs for Programmable Control Devices. Electronics. 2022; 11(14):2174. https://doi.org/10.3390/electronics11142174

Chicago/Turabian Style

Hubacz, Marcin, and Bartosz Trybus. 2022. "Data Alignment on Embedded CPUs for Programmable Control Devices" Electronics 11, no. 14: 2174. https://doi.org/10.3390/electronics11142174

APA Style

Hubacz, M., & Trybus, B. (2022). Data Alignment on Embedded CPUs for Programmable Control Devices. Electronics, 11(14), 2174. https://doi.org/10.3390/electronics11142174

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data Alignment on Embedded CPUs for Programmable Control Devices

Abstract

1. Introduction

2. Background

2.1. IEC 61131-3 Data Types

2.2. Data Alignment in Control Software

2.3. PLC Software Development

3. Experimental Setup and Methodology

3.1. Microcontrollers and Microprocessors

3.2. Methodology of Data Alignment Testing

3.3. Virtual Machine

4. Results of Memory Access Tests

4.1. Execution Times for 4-Byte Variables

4.2. Universal Memory Access

4.3. Average Execution Times

5. Power and Memory Usage

5.1. Power Efficiency

5.2. Unused Memory

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Operation of the Virtual Machine

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI