The Design of a 2D Graphics Accelerator for Embedded Systems

,


Introduction
Recently, as advances in computer technology and semiconductor process technology lead a processor to high performance and high integration density, the overall performance of an embedded system, such as computing performance and energy efficiency, has been increased [1,2].Due to the progress of embedded systems, the demand for adopting embedded systems for a variety of applications is also increasing [3][4][5][6][7][8][9].Some of these applications, such as user-centric applications, require communication with users through 2D graphics [10].Therefore, an embedded system used in these applications requires the functions to process graphics data and write data on the display device.In order to perform these functions, an embedded system, which includes a general-purpose processor (GPP), generally utilizes the GPP or additional graphics processing units (GPUs) with a graphics library [3].However, performing a graphics process in real-time using these methods requires a high-performance GPP or GPU due to the execution of a large number of instruction codes in a limited time.For this reason, these methods are not appropriate for applications that have limited design specifications such as low power consumption or a small area [10][11][12].
In order to solve these issues, 2D graphics accelerators, which perform 2D graphics processing implemented in hardware, were proposed for embedded systems [13,14].These accelerators are connected to the processor in the embedded system through various kinds of interfaces such as PCI Express and memory bus.Unlike the core of a GPP, which requires a long execution time because it performs only simple operations with one instruction, a hardware accelerator can perform complex operations relatively fast [15][16][17][18][19].Moreover, the accelerators have a relatively small area because of the limited and optimized execution logic [20][21][22][23][24]. Therefore, including and exploiting the 2D graphics accelerator allows for a variety of applications that require 2D graphics operations to be implemented with low power and small size.As applying architecture to the system that contains a specific accelerator is an efficient way to satisfy the design specifications of the embedded system, research to design the accelerator for image processing has been performed [25].
Line-drawing is one of the methods to visualize the graphics.As every image is represented as a collection of lines, line-drawing is a basic means of drawing an image [26,27].Accordingly, the line-drawing operation can deal with various kinds of graphics processing [28,29].Although this approach is not the most efficient way for all situations, this approach is significantly efficient when the data to be displayed are in the form of points and lines.In this point of view, some research was performed to utilize line-drawing for image processing [27].Nevertheless, there is not a lot of research using line-drawing as a core algorithm for a graphics accelerator.Our research motivation starts with the idea to apply line-drawing for a graphics accelerator.
In this paper, we present a 2D graphics accelerator for embedded systems.The accelerator performs a 2D graphics process with a line-drawing operation based on Bresenham's algorithm.Furthermore, the accelerator provides anti-aliasing and alpha-blending features.The accelerator is directly connected to the memory bus to communicate with the core of the processor in the embedded system.Based on this structure, the accelerator can be controlled through reading or writing to certain memory addresses.Moreover, the accelerator is directly connected to the frame buffer, which has the memory to send 2D graphic data to a display device.This architectural characteristic reduces workloads by offloading the burden of the processor to have access to the frame buffer.We analyzed the performance of the accelerator by simulating and implementing the processor including the 2D graphics accelerator on a field-programmable gate array (FPGA).In addition, we ascertained the feasibility of the accelerator by synthesizing the accelerator with the Synopsys design compiler using the 180 nm CMOS process.
The paper consists of the following: Section 2 describes the preliminaries, which are essential to implement the features of the accelerator.The preliminaries are composed of Bresenham's algorithm, alpha-blending, and anti-aliasing.Section 3 explicates the architecture of the 2D graphics accelerator and explains the reasons for adopting the architecture.Section 4 describes the hardware implementation results, the analysis results of the accelerator through a sample application running on implemented hardware, and the synthesis results through the Synopsys design compiler.Section 5 summarizes our entire work and presents future work.

Preliminaries
A line-drawing algorithm is an essential element to implement the presented 2D graphics accelerator.As the algorithms vary according to the design architecture and resource usage of the hardware, choosing an appropriate algorithm is important.We chose Bresenham's algorithm and optimized it for the hardware accelerator [30].Moreover, in order to provide advanced visualization, supporting additional features such as alphablending and anti-aliasing are needed.

Bresenham's Line Algorithm
Bresenham's line algorithm is one of the line-drawing algorithms and is typically used in raster graphics systems [31,32].The algorithm calculates the position of the pixels to draw the lines.As this process performs only with integer arithmetic calculation, the process has low complexity and a fast calculation speed [33].In raster graphics, lines are drawn as a way of painting pixels between the start point and end point.Figure 1 represents the various types of lines by Bresenham's algorithm.The two lines in Figure 1a are the type that the x coordinates of the painting pixels always increment by one while drawing lines, and the two lines in Figure 1b  (1) Figure 2 presents the fundamentals of the algorithm for drawing each type of line.The algorithm proceeds by selecting the next point to paint based on the current point, marked as (  ,   ). Figure 2a shows the case of x coordinates of the points always increment while drawing lines.In this case, choosing the y coordinate of the next point between being changed and not being changed is needed.This job is executed by the following operations.Calculate where the real value  at point (  + 1, ) is close to   or   + 1, change the y coordinate when  is close to   + 1.The algorithm repeats these operations until the current point reaches the end point.In the case of y coordinates of the points always increment, the algorithm proceeds by similar operations as shown in Figure 2b.Although the algorithm can be implemented in hardware as it is, optimizing the algorithm for hardware reduces the resource usage.Accordingly, the algorithm should be optimized for hardware implementation by the transformation of the pseudo-code.The following pseudo-code can be obtained through the appropriate transformation of this process as shown in Algorithm 1.In order to optimize the algorithm, binary division, which has a high cost in hardware implementation, is fully excluded by the transformation.This optimization allows the implemented hardware of the algorithm to achieve the design specifications for embedded systems such as low power consumption and less area.

Bresenham's Circle Algorithm
When the width of the line to draw is greater than one pixel's width, drawing the edge of the line to a certain shape increases the quality of the visualization.The circle shape is one of the proper choices.In order to draw circle shapes, we adopt Bresenham's circle algorithm.The algorithm proceedings are similar to Bresenham's line algorithm.Figure 3 shows the rough fundamentals of Bresenham's circle algorithm.Based on the current point (  ,   ), the algorithm selects the next painting point between  1 (  , +1,   ) and  2 (  + 1,   − 1).In order to select the point, calculate the result of the expression (2) by input (  + 1,   − 0.5).The next point is  2 when the result is lower than 0. Otherwise, the next point is  1 .(2)

Alpha-Blending
In order to provide drawing graphics with transparency and blending with the original image, alpha-blending is needed.Figure 4 shows the description of alpha-blending.Each pixel's data in the image to draw has an alpha value  to express the transparency.Alpha-blending blends the graphics to draw and the original image by reading the color value of each pixel of the original image and graphics to draw, calculating the new pixel value of the image frame by expression (3).As the color of the digital image is composed of three color elements-red, green, and blue-the calculation of the new color of pixel  requires calculating each three-color axis.

Anti-Aliasing
When expressing a graphical object that has a higher pixel density than the target graphics system, aliasing can be generated because the raster graphics system has limited pixel density.As the line to draw is an ideal graphical object that has unlimited pixel density, the generation rate of aliasing is very high.Anti-aliasing is a technique to deal with this problem.Figure 5 shows the description of anti-aliasing.Anti-aliasing improves visualization of the aliasing-generated lines, such as the line shown in Figure 5a, by blurring the rough edges at the borders of the line.Blurring can be done by decrementing the alpha value of the rough edges sequentially as shown in Figure 5b.The anti-aliasing process starts with detecting the borders of the line.Akin to Bresenham's line algorithm, the anti-aliasing has two types of lines to process, which are related to the slope value.Figure 6 shows the progression of the anti-aliasing process.The antialiasing starts with detecting the start point and end point of each border segment.The detection is executed while drawing a line with Bresenham's line algorithm by checking the generated coordinates.Next, as the start point ends and the end point of the border segment is clarified, the process applies the decremental alpha value to each point of the border segment.The following pseudo-code presents the process to apply the alpha value when the slope is lower than or equal to one.The alpha value of the pixel is quantified by three bits, maximum of seven, to reduce the area of the circuit by minimizing the arithmetic calculation.

2D Graphics Accelerator
The 2D graphics accelerator provides the 2D graphic processing features including line-drawing, alpha-blending, and anti-aliasing.In order to perform the execution with those features, the accelerator receives setup data, such as start point, end point, the width of the line, bit per pixel (BPP), other configurations, and start flag, from the core of the processor.After the setup data are received and the start instruction is sent, the accelerator operates independently to the core during execution.When the line-drawing process is completed, the accelerator sends the interrupt signal to the interrupt handler of the processor, letting the core recognize the line-drawing process is completed.Based on this characteristic, the workload of the processor is reduced by making it unnecessary for the processor to continuously check what the accelerator completed.

Line-Drawing Process
Figure 7 presents the progression of the line-drawing process.The setup first receives the line configuration from the core, such as start point, end point, and line width.The module generates the aligned coordinate, slope, line width, and point of the edges from the line configuration and transfers to edge builder.The edge builder sets up the borders of the line by generating the coordinates.The accelerator has three cap modes called perpendicular, vertical, and circle for drawing line caps.Line caps are created by submodules in edge builder.The submodules transfer the minimum and maximum value of x and y coordinates to the line detector module.The line detector starts to process line-drawing by determining what coordinates are borders.The painter generates the coordinates to paint, which are inside the borders, and executes the anti-aliasing process when the anti-aliasing option is set.Finally, the blender paints the pixels with alpha-blending through options transferred from the setup and coordinates from the painter by writing the color to the frame buffer.

Optimized Architecture
Figure 8 shows the architecture of the processor including the proposed 2D graphics accelerator.As shown in Figure 8a, the accelerator is connected to the core through the memory bus of the processor.For this reason, the core controls the accelerator through memory access instructions.Moreover, the frame buffer is directly connected to the accelerator and connected to the memory bus.Based on this architecture, the core can deal with the conditions that line-drawing is inefficient to process 2D graphics, such as loading a bitmap image to the frame buffer.This characteristic enables the processor to respond flexibly and efficiently to various conditions.Figure 8b presents the architecture of the 2D graphics accelerator.
The accelerator contains the following six modules, called config register, setup, edge builder, line detector, painter, and blender.Config register is a module to save the line configuration and options, such as anti-aliasing and cap mode, from the memory bus.The other modules perform the line-drawing process with options saved in the config register.The five modules, which perform the line-drawing process, operate as a pipelined architecture.Therefore, the accelerator provides high throughput.In the setup module, the operation to generate the coordinates of the four edges is executed based on the width of the line and the distance between the start point and end point.These coordinates are used for the edge builder module, which is the next pipelined stage.Figure 9 is a block diagram to explain the operations of the edge builder.The edge builder receives the following data signals: minimum and maximum (x, y) coordinates of the points, the distance between the start point and end point (dx, dy), width of the circle to paint when the cap mode is circle, line width, and cap mode.The module generates coordinates of the borders with these signals and submodules.Figure 10 shows all of the cap modes.The edge builder has three selectable cap modes, perpendicular, vertical, and circle, to paint the line caps.The circle submodule generates the coordinates to paint a pixel, which is circular-shaped on edges.The cap submodule generates the coordinates that are parallelogram-shaped, and rectangle-shaped.The line submodule generates borders of the line except for the edges.The entire submodule operates in parallel to provide fast execution.The generated coordinates are sent to the line detector module.As the circle submodule generates the whole circular edge, removing the coordinates that are inside the borders is required.This process is done by the line detector module.The line detector receives the coordinates from the edge builder and detects which coordinate is a valid border.Then, it transfers the valid borders, and the minimum and maximum value of the coordinates, to the painter module.The painter module generates the coordinates inside the borders and paints the pixels of generated coordinates by writing the RGBA data to the memory at a certain address.The address to write the RGBA data can be configured by writing the address to the config register through the memory bus.In addition, the module smooths the pixels at borders through the anti-aliasing when antialiasing mode is set on the config register.The written RGBA data are used by the blender module.The blender is a module to draw the line to the display device.As the frame buffer has the previous image drawn, blending the drawing line with the image is required.Therefore, the blender performs the alpha-blending with the previous image and the coordinates of the line to draw.Finally, the blender writes the updated image to the frame buffer, and provides the images to be shown to the display device.

Implementation and Analysis
In order to implement and verify the 2D graphics accelerator, we verified the algorithms that are required for the 2D graphics accelerator by programming software.We describe the scripts using MATLAB to verify the algorithms, which are line-drawing, antialiasing, alpha-blending, and drawing various line caps.As the algorithms are verified, we transformed the algorithms in accordance with the register-transfer level (RTL) and designed the accelerator with Verilog HDL.
In order to evaluate the 2D graphics accelerator, we integrated the accelerator into the processor, which includes Cortex M0 as a core, by interfacing the accelerator and the core with an AHB-Lite bus.Furthermore, the function that generates the interrupt request signal when the drawing of one line is complete is added.Next, before synthesizing the processor to hardware, we simulated the processor on Vivado 2020.1 version to verify the functionality of the accelerator by executing a customized testbench with a sample program included in the internal ROM of the processor.The embedded program performs the same work as previous MATLAB scripts.The interrupt request signal is generated when the accelerator completes the drawing of one line, and the next configuration of the line is performed by the program.
The synthesis and implementation were executed with the same Vivado tool with a Xilinx xc7z010clg400 FPGA.Table 1 shows the resource utilization of the 2D graphics accelerator and the processor.The result presents that the resource usage of the 2D graphics accelerator is suitable for embedded systems as the utilization of the processor containing the 2D graphics accelerator does not exceed eighty percent of the programmable logic.Table 2 presents the performance of the accelerator on 1024 × 768 resolution at 30 frames per second.In order to evaluate the line-drawing performance, we set up the start point and end point as (50, 50) and (700, 900), which are almost the top-left and bottomright edges of the display, and tested for various conditions such as operating frequency and line width.The result shows that even if the width is as thick as 50 pixels, line-drawing can be performed with more than one line per frame when the operating frequency is more than 50 MHz.According to this result, the accelerator is suitable for a wide range of applications that have resource limitations and line-drawing-based features such as a realtime scope.However, as the results of Table 2 indicate that the drawing efficiency decreases when the width of the line is small, applying the accelerator to complex graphics applications that are not based on line-drawing can be a challenge.In order to test the features of the accelerator, line-drawing with various cap modes, anti-aliasing, and alpha-blending, we ran the test firmware on the processor that draws the various kinds of lines by controlling the 2D graphics accelerator with memory access.The processor contains the video graphics array (VGA) controller to display the image in the frame buffer to a display device through a VGA protocol.Consequently, the 2D graphics features, namely line-drawing, alpha-blending, and anti-aliasing, are visually identified by the display device as shown in Figure 11.One of the essential things in verifying the feasibility of the 2D graphics accelerator is to identify the area of the actual synthesized circuit.In order to identify the area, we synthesize the accelerator by Synopsys design compiler N-2017.09-SP2version using the 180 nm CMOS process.Table 3 summarizes the synthesis result.The result shows that the total area of the accelerator is 742,494 um 2 , which is around 75K gate counts.The results from Tables 2 and 3 show that the accelerator can be realized through a chip with acceptable performance, drawing more than one line per frame.Therefore, attaching the 2D graphics accelerator to the embedded processor can be a suitable solution to deal with design specifications when the application of the system can effectively be composed with line-drawing features.

Conclusions
In this paper, we proposed a 2D graphics accelerator, based on line-drawing, for embedded systems.As line-drawing can be a basic element of image drawing in specific applications, defining required 2D graphics as a set of multiple lines is an effective way to implement graphic features rather than other methods.The accelerator provides the basic line-drawing features and user-centric features that improve visualization, such as alphablending and anti-aliasing.In order to implement these 2D graphics features, we analyzed the line-drawing algorithm and required functions.Moreover, we optimized the algorithm and functions for hardware realization.By transforming the binary division and reducing the size of arithmetic calculation in the algorithm, the algorithm can be implemented with fewer arithmetic units and enables the hardware to operate with low power and few resources.We also constructed a system-on-a-chip including the accelerator for embedded systems.We also included the designed accelerator in the processor, which is used for embedded systems.The accelerator is connected to the core through the memory bus of the processor to receive line configuration and start signals from the core.As the accelerator is directly connected to the frame buffer, the accelerator works independently of the core while performing the line-drawing process.Based on these characteristics of the architecture, the core can execute other jobs while the accelerator performs graphics processes.As a result, the overall performance of the processor with applications using 2D graphics can be improved.In addition, the results of the FPGA implementation and the synthesis using the 180 nm CMOS process show that the accelerator is feasible to realize.
In future work, we will apply our 2D graphics accelerator to a variety of applications that are implemented on embedded systems, compare the performance of the accelerator with other methods, such as implementation with a GPP or GPU.As the drawing performance of the accelerator is not suitable for complex, microscopic graphic processes, classifying and finding the applications that have appropriate conditions to apply the accelerator is necessary.We expect that applying the 2D graphics accelerator based on linedrawing to the processor can be effective in a variety of embedded systems.

Figure 2 .
Figure 2. Bresenham's line algorithm according to the line type.(a) Lines when dx > dy; (b) Lines when dx < dy.

Figure 6 .
Figure 6.Progression of the anti-aliasing process.

Figure 7 .
Figure 7. Progression of the line-drawing process.

Figure 8 .
Figure 8. Architecture of the processor and 2D graphics accelerator.(a) Architecture of the processor with 2D graphics accelerator; (b) Architecture of the 2D graphics accelerator.

Figure 9 .
Figure 9. Block diagram of the edge builder.

Figure 10 .
Figure 10.Cap modes of the edge builder.

Figure 11 .
Figure 11.Experimental environment of the field-programmable gate array (FPGA) implementation.

Table 1 .
Resource utilization of the processor including 2D graphics accelerator.

Table 2 .
Performance of the 2D graphics accelerator.

Table 3 .
Synthesis result of the 2D graphics accelerator.