CFIEE: An Open-Source Critical Metadata Extraction Tool for RISC-V Hardware-Based CFI Schemes

: Control flow critical metadata play a key role in hardware-based control flow integrity (CFI) mechanisms that effectively monitor and secure program control flow based on pre-extracted metadata. The existing control flow analysis tools exhibit some deficiencies, including inadequate compatibility with the RISC-V architecture, a steep learning curve, limited automation capabilities, and restricted data output formats. CFIEE is an open-source tool with a graphical interface for the automated extraction of control flow critical metadata. The tool possesses the capability to analyze RISC-V binary executables, transforming the binary into an intermediate representation (IR) in the form of the disassembled code, and extracting the critical metadata required for studying hardware-based CFI mechanism through a designed control flow transfer relationship analysis algorithm. The extracted metadata include program basic blocks and their corresponding hash values, control flow graphs, function call relationships, distribution of forward transfer instructions, etc. We selected 15 embedded system programs with processor adaptation for functional verification. The results demonstrate the CFIEE’s capability to automatically analyze programs within the supported RISC-V instruction set and generate comprehensive and precise metadata files. This tool can significantly enhance the efficiency of control flow metadata extraction and furnish configurable metadata for the hardware-based security mechanisms.


Introduction
The RISC-V architecture has gained considerable attention in recent years as an open and extensible instruction set architecture (ISA).Known for its modular design and support for customized instructions, RISC-V has become a popular choice for various applications, including embedded systems, Internet of Things (IoT) devices, and high-performance computing.However, with the increasing adoption of RISC-V devices, addressing the architecture's potential security vulnerabilities, particularly in terms of control flow security, has emerged as an urgent concern.Control flow hijacking attacks encompass various techniques such as buffer overflow [1] exploits and return-oriented programming [2].These techniques allow attackers to manipulate a program's control flow by corrupting or overwriting memory locations that store critical information about function calls or returns.Control Flow Integrity (CFI) [3] mechanisms play a crucial role in modern software security by safeguarding against control-flow hijacking attacks.These mechanisms rely on precise control flow data analysis to ensure the integrity of a program's execution path [4].The extraction and analysis of control flow metadata facilitate the detection of anomalies, identification of control flow hijacking attempts, and development of effective countermeasures.Techniques such as control flow graphs, basic block identification, and loop detection provide insights into the execution path of a program, aiding in the identification of potential security vulnerabilities.Furthermore, control flow information can be utilized for runtime monitoring and intrusion detection.By comparing the actual control flow with • The CFIEE tool is a critical metadata extraction tool for RISC-V hardware-based CFI schemes, providing output data files that serve as valuable references for the design of hardware-based CFI mechanisms.• We have developed an algorithm for analyzing control transfer relationships based on the execution rules of RISC-V programs.Through static analysis, the algorithm can approximate the actual execution path of the program, providing CFIEE with a comprehensive analysis scope, which in turn provides researchers with comprehensive CFI metadata.• CFIEE will be released as an open-source project [14], providing unrestricted usage and modification of the software to all individuals under an open-source license.
The paper is structured as follows: Section 2 introduces the control transfer instructions in the RISC-V instruction set, explains the concept of control flow graph, and discusses the working phases of the CFG-based CFI mechanism.In Section 3, a detailed explanation is given regarding the software architecture, internal workflow, functions of different components, and output files of CFIEE.Section 4 presents an application scenario where CFIEE offers data support for hardware-based CFI mechanisms.In Section 5, a comparison between CFIEE and tools with similar functionalities is made along with showcasing analysis results of CFIEE on test programs.Finally, conclusions are presented in Section 6.

Background and Related Works 2.1. Control Transfer Instructions in RISC-V ISA
The RISC-V Instruction Set Architecture (ISA) has emerged as a significant force in computer architecture and microprocessor design.It is an open standard instruction set architecture that has gained widespread attention and adoption in academia and industry due to its versatility, extensibility, and flexibility.The RISC-V ISA adheres to the principles of Reduced Instruction Set Computing (RISC), emphasizing simplicity and efficiency.This architectural elegance is evident in its streamlined instruction set, which allows instructions to execute in a single clock cycle, optimizing performance and energy efficiency [15].One of the key features of RISC-V is its modularity.The ISA is structured around a base integer instruction set, providing a foundation for various application-specific extensions.This modular design enables tailored customization by incorporating specialized instructions to address specific computational needs while ensuring compatibility with the core ISA.Additionally, the RISC-V ISA supports both 32-bit and 64-bit address spaces [16], accommodating a wide range of computing platforms and applications.This adaptability makes RISC-V suitable for deployment in resource-constrained embedded systems and high-performance computing environments.
Table 1 showcases the conditional branch instructions present in the RV32IMAFC instruction set [12,13], excluding pseudo-instructions.When a conditional branch instruction is executed, it involves comparing the values of two source registers (rs1 and rs2), and based on the result, the branch may or may not be taken.This decision-making process underpins the core of conditional branch instructions, allowing programs to take different execution paths based on logical conditions.Table 2 illustrates the unconditional jump instructions within the RV32IMAFC instruction set, excluding pseudo-instructions.The "jal" instruction, an essential member of this category of instructions, is an abbreviation for "jump and link".Upon execution, it unconditionally jumps to a specific section of the program while simultaneously storing the return address in the "x1" register, which is commonly referred to as the "ra" register.In contrast, the "j" instruction represents another form of unconditional jump instruction within the RV32IMAFC instruction set.Like "jal," it unconditionally diverts program execution to a designated location.However, unlike "jal," the "j" instruction does not undertake the task of preserving the return address.The "jalr" instruction represents an additional aspect of the RV32IMAFC instruction set, embodying the concept of indirect jumps where the target address is not explicitly specified in the disassembly code but derived from the contents of rs1 register.This flexibility in specifying jump targets lends itself to various programming scenarios where dynamic or indirect addressing is required.

Control Flow Graph
Control Flow Graph (CFG) is a graphical structure utilized for representing the program's control flow, which is typically in the form of a directed graph [17].The CFG nodes are commonly referred to as basic blocks, which represent uninterrupted code units within the program.Different basic blocks are usually connected by control flow edges, which are directed edges that connect different basic blocks in the CFG.These edges represent the jump or branch relationships during program execution, signifying that upon completion of one basic block's execution, control flow will be transferred to another basic block.
The control flow edges can be categorized into forward and backward edges [18].A forward edge represents the normal direction of control flow in a program, that is, a directed edge from one basic block to another.This type of edge represents the program's control flow transfer along the normal execution path.For instance, forward edges arise when program execution proceeds sequentially to the next basic block or when the true path of a conditional branch is executed.Backward edges are utilized to represent loops or conditional branches in a program, enabling the program to backtrack from one basic block to a previously executed basic block.These edges reflect the non-linear control flow of the program.For instance, within a loop structure, a backward edge occurs when an iteration is completed and the program returns to the beginning of the loop, facilitating multiple executions of code within its body.

Phases of CFG-Based CFI Mechanisms
Most CFI mechanisms can be categorized into two distinct phases [19], each contributing to the overarching goal of enhancing program security.
CFG Construction and Analysis: In the first stage, the CFI mechanism needs to obtain the CFG of the program through a specific analysis process.The accuracy and comprehensiveness of the CFG directly influence the effectiveness of the control flow policy.There are three different approaches to construct control flow graphs: static, dynamic and hybrid.Static analysis is a prevalent technique for constructing CFGs.This method involves a meticulous examination of the program, such as the source code and binary executable file [20][21][22].Static analysis is often conducted during the program's compilation or preprocessing phase, ensuring that the CFG is established before execution.One of the dynamic CFG reconstruction methods was proposed by Yount et al. [23].Dynamic analysis takes a different approach by constructing the CFG during program execution [24].This real-time approach allows the mechanism to adapt to the program's actual behavior, ensuring that the CFG accurately reflects runtime conditions.While dynamic analysis can provide a precise representation of the program's control flow, it may introduce some overhead due to the need for continuous monitoring during execution.Nonetheless, it is a valuable technique for scenarios where the control flow structure may change dynamically.V.H. Sahin proposed Turna, which is a tool for building control flow graphs using a hybrid approach [25].Hybrid approaches combines static and dynamic analysis methods, static analysis provides a framework for the initial control flow graph, and then dynamic analysis is used to refine the graph or verify the control flow.For example, a hybrid approach may use static analysis to establish an initial CFG and then dynamically refine it during program execution to account for runtime variations.Once the CFG is established, the mechanism defines the permissible control flow transfer targets based on CFG.
Runtime Control Flow Verification: The second phase of the CFI mechanism, which takes place during runtime when the program is being executed, plays a crucial role in ensuring the security and integrity of the running program.During this phase, the CFI mechanism continuously monitors the control flow transfers within the running program [26].By continuously monitoring these control flow transfers, the CFI mechanism aims to verify whether they adhere to the predetermined Control Flow Graph (CFG) constructed in the first phase [27].The CFG serves as a blueprint for legitimate control flow paths within the program.Any attempt to transfer control outside of this predefined range raises suspicion and triggers a security response.When the CFI mechanism detects an unauthorized control flow that lacks verification and deviates from the expected path, it promptly initiates appropriate security measures to mitigate potential threats.These security responses can vary depending on system configurations and requirements but may involve terminating or suspending execution of the program.In addition to halting execution, performing exception handling procedures becomes essential when dealing with unauthorized control flow transfers.Exception handling allows for graceful recovery from unexpected events or errors encountered during runtime.By employing proper exception handling techniques, developers can ensure that any abnormal termination caused by unauthorized control flow transfers does not result in data corruption or system instability.

Technical Specifications
CFIEE is a critical metadata extraction tool for RISC-V hardware-based CFI mechanisms implemented in Python.It is compatible with any computer system that supports Python 3. To utilize its disassembly functionality, the tool requires the installation of the riscv32/64-unknown-elf toolchain on the user's computer system.

Overview of CFIEE Architecture
The CFIEE architecture, as illustrated in Figure 1, offers a comprehensive depiction of its functionality.This tool accepts either an RISC-V executable or disassembled file as input, which subsequently undergoes processing through three distinct processes within the CFIEE framework.Ultimately, it generates metadata files pertaining to CFI.
CFIEE is a critical metadata extraction tool for RISC-V hardware-based CFI mechanisms implemented in Python.It is compatible with any computer system that supports Python 3. To utilize its disassembly functionality, the tool requires the installation of the riscv32/64-unknown-elf toolchain on the user's computer system.

Overview of CFIEE Architecture
The CFIEE architecture, as illustrated in Figure 1, offers a comprehensive depiction of its functionality.This tool accepts either an RISC-V executable or disassembled file as input, which subsequently undergoes processing through three distinct processes within the CFIEE framework.Ultimately, it generates metadata files pertaining to CFI.

Input Files
In scenarios such as reverse engineering and malware analysis, it is frequently encountered to have only binary files without access to the corresponding source code.The behavior of a program can be better understood by analyzing compiled binaries and obtaining actual execution path information.Considering this aspect, CFIEE utilizes binary files as the foundation for analysis.CFIEE is capable of accepting RISC-V executables as input.Specifically, the tool can analyze ELF files generated by compiling under the RV32IMAFC instruction set.CFIEE ensures proper analysis when the program utilizes an instruction set within this range.
Additionally, CFIEE can process disassembled files in TXT format if provided by the user.In such cases, users can pre-disassemble the executable file using the RISC-V toolchain and save the resulting disassembly as a .txtfile.This flexibility in input format widens the tool's applicability, catering to varying user preferences and simplifying the analysis process.

Internal Processes
The internal process of CFIEE is illustrated in Figure 1, encompassing three fundamental components: data preprocessing, control flow analysis, and data curation and output.The "data preprocessing" phase is dedicated to formatting the contents of the disassembly file to adhere to CFIEE's processing format.This crucial step aims to eliminate any extraneous content that may result from specific compilation options during program compilation.Preprocessing ensures the extraction of disassembly instructions, enabling smooth subsequent processing.

Input Files
In scenarios such as reverse engineering and malware analysis, it is frequently encountered to have only binary files without access to the corresponding source code.The behavior of a program can be better understood by analyzing compiled binaries and obtaining actual execution path information.Considering this aspect, CFIEE utilizes binary files as the foundation for analysis.CFIEE is capable of accepting RISC-V executables as input.Specifically, the tool can analyze ELF files generated by compiling under the RV32IMAFC instruction set.CFIEE ensures proper analysis when the program utilizes an instruction set within this range.
Additionally, CFIEE can process disassembled files in TXT format if provided by the user.In such cases, users can pre-disassemble the executable file using the RISC-V toolchain and save the resulting disassembly as a .txtfile.This flexibility in input format widens the tool's applicability, catering to varying user preferences and simplifying the analysis process.

Internal Processes
The internal process of CFIEE is illustrated in Figure 1, encompassing three fundamental components: data preprocessing, control flow analysis, and data curation and output.The "data preprocessing" phase is dedicated to formatting the contents of the disassembly file to adhere to CFIEE's processing format.This crucial step aims to eliminate any extraneous content that may result from specific compilation options during program compilation.Preprocessing ensures the extraction of disassembly instructions, enabling smooth subsequent processing.
The core of CFIEE lies in the "control flow analysis" stage.Starting with the initialization function of the program, CFIEE examines and analyzes the control flow of the program.The analysis process includes extracting potential executable functions, decrypting the control flow transfer relationship in these functions, and identifying each basic block.
The "data curation and output" phase primarily concentrates on consolidating the valuable information acquired during the preceding stages and presenting it in either textual or graphical formats.These organized data are then outputted into appropriate files, facilitating further analysis.

CFI-Related Metadata Files
Table 3 showcases the output files of CFIEE.As of the current version, the tool generates eight output files, including three text files associated with basic blocks, three files regarding control transfer instructions, a control flow diagram represented as a vector diagram, and a function call diagram.Notably, a binary file in the bin format is established, containing all forward transfers' addresses.Each line consists of a 32-bit binary number, where the initial 16 bits represent the binary address of the jump instruction, and the final 16 bits delineate the target address of the jump instruction.These documents can provide data reference for CFI scheme.

Workflow of CFIEE
This section introduces the workflow of CFIEE. Figure 2 showcases the step-by-step workflow process.
It begins with an initial input or data collection stage, followed by multiple analysis and processing steps, and concludes with generating desired outputs or results.If the input file is an RISC-V executable file, CFIEE will invoke the RISC-V toolchain in the data preprocessing module to disassemble it, generating a disassembly code file in TXT format.Depending on the file format, CFIEE will then proceed to extract it for instruction recognition or retain it in its original format.
The preprocessed disassembly code will be forwarded to the analysis module for control flow analysis.CFIEE initially extracts function-related data from the disassembly code, identifying all potentially executable functions within the program.Subsequently, it analyzes the control flow transition relationships, which is followed by partitioning the disassembly code into basic blocks within the range of executable functions.
In the data collation and output module, CFIEE computes the hash value of each basic block based on the basic block instructions using the hash algorithm specified by the user and consolidates the basic block information.Additionally, the module handles the sorting and output of control transfer instructions.It gathers control transfer instructions within each function, identifies their corresponding target instructions, and pairs them accordingly.This module also organizes information pertaining to output functions, including the count of forward control transfer instructions within each function and the program's function call relationships.

Functions of CFIEE
The detailed presentation of CFIEE's functions is illustrated in Figure 3, showcasing the key functionalities embedded within the tool's source code, encompassing data manipulation, statistical analysis, and data visualization.

Functions of CFIEE
The detailed presentation of CFIEE's functions is illustrated in Figure 3, showcasing the key functionalities embedded within the tool's source code, encompassing data manipulation, statistical analysis, and data visualization.

Data Preprocessing
During the data preprocessing stage, CFIEE offers the disassembly functionality for ELF files, automatically selecting the disassembled file once the disassembly process is finalized.In situations where the input files are already in a disassembled format, CFIEE takes an adaptive approach to tailor its processing recommendations based on the specific format of these files.This intelligent adaptation ensures that optimal preprocessing decisions are presented to the user regardless of whether they are working with raw binary or pre-disassembled files.The execution of the data preprocessing process is jointly managed by the "file_preprocess.py"and "CFIEE.py"scripts.
As shown in Figure 3, the process consists of three main functions.The function "judge_file_type()" is housed in "CFIEE.py"and aims to determine the need for additional processing of the current disassembly file based on pre-established rules.This function will provide a tag value to subsequent related functions based on the file format.The functions "extract_disassemble_introduction()" and "rewrite_objdump_file()" are situated in the script file "file_preprocess.py".Their respective responsibilities involve extracting the necessary instructions from the disassembly file and reconstructing it.The restructured files reside in the same directory as the source files.This systematical approach ensures the efficient and accurate extraction of the required instructions while eliminating redundant information to enable subsequent analysis within the CFIEE framework.

Data Preprocessing
During the data preprocessing stage, CFIEE offers the disassembly functionality for ELF files, automatically selecting the disassembled file once the disassembly process is finalized.In situations where the input files are already in a disassembled format, CFIEE takes an adaptive approach to tailor its processing recommendations based on the specific format of these files.This intelligent adaptation ensures that optimal preprocessing decisions are presented to the user regardless of whether they are working with raw binary or pre-disassembled files.The execution of the data preprocessing process is jointly managed by the "file_preprocess.py"and "CFIEE.py"scripts.
As shown in Figure 3, the process consists of three main functions.The function "judge_file_type()" is housed in "CFIEE.py"and aims to determine the need for additional processing of the current disassembly file based on pre-established rules.This function will provide a tag value to subsequent related functions based on the file format.The functions "extract_disassemble_introduction()" and "rewrite_objdump_file()" are situated in the script file "file_preprocess.py".Their respective responsibilities involve extracting the necessary instructions from the disassembly file and reconstructing it.The restructured files reside in the same directory as the source files.This systematical approach ensures the efficient and accurate extraction of the required instructions while eliminating redundant in-

Control Flow Analysis
The overall process is divided into three parts: "extract function information", "analyze control transfer relationship", and "divide basic blocks".To begin with, the analysis modules receive the disassembled file as input.The tool starts by extracting various details from the disassembly file, such as function names, start and end addresses, and instruction locations.This initial extraction provides a foundation for further analysis.
Next, CFIEE employs a recursive search algorithm based on program logic to analyze each function.By scrutinizing transfer instructions within these functions, CFIEE anticipates the target addresses the program will access during execution.If any transfer instructions are found within the specific function under analysis, CFIEE delves into the functions corresponding to those target addresses for further examination.Algorithm 1 shows the specific logic of the algorithm.It is based on the transfer instructions within the function.If there are transfer instructions inside the function that the program is currently analyzing, CFIEE will analyze the function where these destination addresses are located.If there are no jump instructions within the current function, CFIEE will mark the next function adjacent to it as a possible function to execute.
After extracting the functions that are likely to be executed, the tool analyzes the control transfer relationships within these functions.It identifies all control transfer instructions and analyzes the target addresses based on the type of transfer instruction.Figure 4 illustrates the analysis logic of CFIEE for function call and return relationships.Prior to analyzing the function call and return relationships, CFIEE has extracted all function address ranges, function call instructions and function return instruction data.When analyzing the function call relationships, whether a function call is generated is determined by the target address of an unconditional jump instruction and the starting address of a specific function.If the condition is true, the call relationship between the current function and the jump target function will be established.
In the process of analyzing the disassembly, a pattern of function calls similar to nested calls caught our attention.For instance, function 1 contains a "jal" instruction that will unconditionally jump to function 2 after saving the return address.At the end of function 2, there is an unconditional jump instruction of type "j".When the program reaches this point, it will jump directly to function 3. Finally, the program executes the return operation at the return instruction of function 3.In response to this scenario, we developed the corresponding analysis logic and incorporated it into the function call relationship.Once the analysis of the function call relationship is completed, it will serve as reference data for analyzing the function return relationship.The analysis approach for the function return relationship is similar to the previous analysis.CFIEE will analyze the function return relationship and determine the target address based on the function call relationship and the address information of the "ret" instruction.
nested calls caught our attention.For instance, function 1 contains a "jal" instruction that will unconditionally jump to function 2 after saving the return address.At the end of function 2, there is an unconditional jump instruction of type "j".When the program reaches this point, it will jump directly to function 3. Finally, the program executes the return operation at the return instruction of function 3.In response to this scenario, we developed the corresponding analysis logic and incorporated it into the function call relationship.Once the analysis of the function call relationship is completed, it will serve as reference data for analyzing the function return relationship.The analysis approach for the function return relationship is similar to the previous analysis.CFIEE will analyze the function return relationship and determine the target address based on the function call relationship and the address information of the "ret" instruction.It is worth noting that the analysis procedures in CFIEE are static, meaning that they do not account for dynamic changes or runtime behavior.This limitation results in CFIEE currently being unable to analyze the target address of indirect jumps, which may hinder its effectiveness in certain scenarios.CFIEE divides basic blocks based on control transfer relationships obtained from previous analyses and specific division rules specified in Table 4.During the division of basic blocks, we take into consideration the possibility of jump or branch target instructions within certain basic blocks.To address this, we have introduced two functions in the basic block division process: "create_basic_blocks_in_order()" and "create_basic_blocks_start_with_taken_target()".The first function strictly It is worth noting that the analysis procedures in CFIEE are static, meaning that they do not account for dynamic changes or runtime behavior.This limitation results in CFIEE currently being unable to analyze the target address of indirect jumps, which may hinder its effectiveness in certain scenarios.CFIEE divides basic blocks based on control transfer relationships obtained from previous analyses and specific division rules specified in Table 4.During the division of basic blocks, we take into consideration the possibility of jump or branch target instructions within certain basic blocks.To address this, we have introduced two functions in the basic block division process: "create_basic_blocks_in_order()" and "create_basic_blocks_start_with_taken_target()".The first function strictly adheres to the basic block division rules, which is based on the disassembly file and the control transfer relationships derived from the previous analysis.It divides the basic blocks in accordance with the address order of the instructions.On the other hand, the second function, "cre-ate_basic_blocks_start_with_taken_target()", focuses specifically on creating a new basic block starting at an address where a jump or branch target instruction resides.This allows us to capture any potential changes in control flow caused by these instructions effectively.By executing these two functions, CFIEE is able to sort the basic blocks according to their starting addresses, ultimately providing accurate and comprehensive basic block information.The sorted basic block information, when combined with the subsequent generated CFG, enables researchers to effectively analyze the program's execution path and identify potential deadlock issues.Furthermore, through analysis of the program's loop structure, researchers can pinpoint loops that may cause performance bottlenecks and optimize them accordingly.Additionally, it helps to understand how different parts of the program interact.The data sorting and output module of CFIEE gathers comprehensive information on basic blocks and computes their corresponding hash values.The calculation process accepts binary or hexadecimal instructions of the basic blocks as input, allowing users to select both the hash algorithm and the desired length of the resulting hash value.Currently, CFIEE offers four options for hash algorithms: MD5, SHA-1, SHA256, and SHA512.Users can select any of these algorithms based on their specific requirements.In addition to algorithm selection, CFIEE also allows the user to specify the length of the generated hash value.Available options include 8-bit, 16-bit, 32-bit, and custom length.This feature allows users to balance storage efficiency and accuracy according to their needs.Furthermore, we plan to enhance CFIEE by incorporating support for custom hash algorithms in future updates.In addition, CFIEE can effectively organize and output necessary control transfer instructions and functional information, providing researchers with comprehensive and accurate data information.
To simplify the process and enhance modularity, we encapsulate the main functions within the process into two different entities: "export_results()" and "generate_CFG()".Specifically, the "export_results()" function can systematically arrange data files and present them in a user-friendly text format.On the other hand, the "generate_CFG()" function plays a key role in building the control flow graph of a program, which provides researchers with a visual representation of the control flow in a RISC-V executable.

Application Scenarios of CFIEE
As a control flow static analysis tool, CFIEE can provide detailed and accurate data for the design and implementation of CFI mechanisms, especially CFG-based CFI.Researchers can develop suitable CFI mechanisms for RISC-V embedded devices through analysis results such as basic block information, control flow graphs, and the number of jump instructions within each function output by the tool.Below, we outline a straightforward method for utilization.
"xxx_control_transfer.bin" in Table 3 contains the forward jump instruction and the address information of the current instruction in binary form.Additionally, "xxx_bin_basic_block_info.txt" and "xxx_hex_basic_block_info.txt" contain binary and hexadecimal representations of basic block data alongside their respective hash values.Figure 5 shows the hardware circuit diagram of a basic CFI mechanism constructed using these data.In this mechanism, the hash values of basic blocks and the PCs corresponding to control transfer instructions are stored within designated registers.When the hash verification unit recognizes the last instruction of the basic block, it calculates the hash value of the current basic block and compares it with the pre-obtained hash value.If the results are the same, it proves that the instructions in the current basic block have not been tampered with.Simultaneously, the "Target Verification" unit in the CFI verification unit is responsible for comparing the PC of the control transfer instruction with its pre-analyzed target instruction.The CFI verification unit is equipped with registers for storing interrupt entry addresses and a shadow stack for validating function return addresses, ensuring the integrity of program interrupts and return addresses.Prior to entering the interrupt, the CFI verification unit examines whether the current interrupt entry address is stored in the register; if not, it is considered an exception for interrupt entry address.During a function call, the program pushes the return address (RA) onto the main stack and updates the stack pointer (SP).Simultaneously, the CFI verification unit copies RA from the main stack to the shadow stack.Upon function return, before executing the return instruction, the program retrieves RA from the main stack and performs a return operation.However, prior to this execution of return instruction, the CFI verification unit validates RA against that on the shadow stack.If there is a match with RA on the shadow stack, it proceeds with normal return; otherwise, it identifies an abnormality in the return address.Any differences detected in the CFI verification unit imply potential tampering or alterations in the program's control flow.
pered with.Simultaneously, the "Target Verification" unit in the CFI verification unit is responsible for comparing the PC of the control transfer instruction with its pre-analyzed target instruction.The CFI verification unit is equipped with registers for storing interrupt entry addresses and a shadow stack for validating function return addresses, ensuring the integrity of program interrupts and return addresses.Prior to entering the interrupt, the CFI verification unit examines whether the current interrupt entry address is stored in the register; if not, it is considered an exception for interrupt entry address.During a function call, the program pushes the return address (RA) onto the main stack and updates the stack pointer (SP).Simultaneously, the CFI verification unit copies RA from the main stack to the shadow stack.Upon function return, before executing the return instruction, the program retrieves RA from the main stack and performs a return operation.However, prior to this execution of return instruction, the CFI verification unit validates RA against that on the shadow stack.If there is a match with RA on the shadow stack, it proceeds with normal return; otherwise, it identifies an abnormality in the return address.Any differences detected in the CFI verification unit imply potential tampering or alterations in the program's control flow.

This is a trade-off between system performance and security
This article primarily offers a basic example of the Control Flow Integrity (CFI) mechanism without delving into specific CFI design intricacies.The main focus remains on providing an introductory illustration rather than exhaustive CFI design details.Future research will utilize the data files from CFIEE to craft CFI solutions suitable for RISC-V architecture.These efforts will delve deeper into CFI intricacies, aiming to create more specific and efficient CFI solutions tailored for the nuances of RISC-V architecture.

Comparison with Other Tools
Currently, there exist several control flow analysis tools available for the RISC-V architecture.For comparison purposes, our evaluation focuses on two specific tools: angr [9] and Turna [25].Angr has garnered significant attention in the field of reverse engineering, and Turna's adoption of a hybrid approach enables it to generate a comprehensive Control Flow Graph (CFG).We have compiled a comparative analysis of their usability and capability to generate control flow information, as presented in Table 5.
Among these three tools, CFIEE stands out as the only one with a GUI operation interface.As angr is a Python library, users need to write a Python program in order to invoke it for further analysis.Unlike angr, both CFIEE and Turna streamline user interaction by eliminating the necessity for users to write additional application programs.Regarding Control Flow Graph (CFG) output functionality, all three tools demonstrate the capability to generate outputs.Both CFIEE and angr can output the hash value of the program basic blocks and the calling relationship of the function.Notably, Turna, being primarily a CFG reconstruction tool, currently lacks these specific functionalities.
While angr and Turna were specifically chosen for comparison in this evaluation due to their usability and ability to generate control flow information, it is important to note that each tool has its own strengths and weaknesses depending on specific requirements or research objectives.We acknowledge the capabilities of angr in obtaining detailed program execution data through static analysis and simulation operations.We also appreciate Turna's idea of using a hybrid approach to rebuild CFG.However, the primary focus of CFIEE research remains centered on offering a straightforward and efficient approach to furnish precise and readily accessible metadata essential for the hardware-based Control Flow Integrity (CFI) mechanism in RISC-V embedded systems.CFIEE aims to provide crucial data, such as hash values of basic blocks, program control flow graphs, instruction jump relationships, and function calling connections.These data are easily and swiftly obtainable through the configuration of the Python environment and the RISC-V toolchain within CFIEE.

Functional Evaluation
For functional evaluation, we selected 15 programs from the Beebs benchmark [28].In order to better test the functionality of CFIEE, we made some changes to the code of the test programs.We modified the initial "fputc" function to add the serial port output related function of the T-head Xuantie E906 processor.In terms of a tool chain, we used the Xuantie-900-gcc-elf-newlib-x86_64-V2.6.1 RISC-V tool chain.This tool chain retains the functions of the original RISC-V tool chain and adds optimization options for the T-Head processors.The test platform utilized was CentOS 7. As CFIEE is developed based on Python 3, our test environment employs Python version 3.11.0.
The quantitative test results of the selected programs are presented in Table 6.These results consist of two sets of data: the number of basic blocks and the number of forward transfer instructions.The count of basic blocks can partially reflect the program's complexity, while the count of forward transfer instructions can reflect the transfer frequency of the program control flow.Table 7 showcases one of the basic blocks present in the "basic_block.txt"file, which is the output of CFIEE.Each basic block's metadata include essential elements such as block number, label, entry address, length, all instructions, and two possible transfer targets.The block number serves as a unique identifier for each basic block within the program.The entry address indicates the starting point of the basic block within the program's memory space.Length refers to the size or extent of a particular basic block in terms of its instruction count.All instructions listed in each basic block's metadata provide a comprehensive overview of what operations are performed within that particular segment.In cases where the final instruction of a basic block is a conditional branch, there will be two transfer targets mentioned in its metadata.A conditional branch allows for decision making based on certain conditions being met or not met during program execution.The presence of two transfer targets signifies that control flow can diverge into two separate paths depending on whether those conditions are satisfied or not.On the contrary, when the final instruction in a basic block is an unconditional jump, it means that the control flow will directly transfer to another location without any condition being evaluated.In this scenario, the basic block associated with this jump will have only one target for transferring control.The absence of a second target implies that there is no alternative path or decision point to be considered after executing this particular instruction.The two figures in Figure 6 display the binary and hexadecimal representations of the basic block metadata.For both files, we consistently assigned the same data elements, including basic block numbers, binary or hexadecimal instructions and addresses, and hash values obtained from instructions and user settings.This standardization of data elements ensures uniformity and facilitates efficient analysis and comparison during the evaluation process.
The two figures in Figure 6 display the binary and hexadecimal representations of the basic block metadata.For both files, we consistently assigned the same data elements, including basic block numbers, binary or hexadecimal instructions and addresses, and hash values obtained from instructions and user settings.This standardization of data elements ensures uniformity and facilitates efficient analysis and comparison during the evaluation process.Figure 7a presents an exemplar of forward control transfer instructions extracted by CFIEE.To streamline data analysis, we systematically categorize all forward control transfer instructions within the specified analysis range according to their corresponding functions, storing them in the data file generated by the tool.Pairing transfer instructions with their respective target instructions facilitates easier analysis and comparison.It is worth noting that when dealing with branch jump instructions, we specifically focus on storing only the target instruction when the branch is "taken".This approach helps us prioritize relevant information while avoiding unnecessary duplication or cluttering of data.
The binary metadata associated with the control transfers showcased in Figure 7a are provided in Figure 7b, which contains all addresses of forward transfers.Each line consists of a 32-bit binary number, where the initial 16 bits represent the binary address of the jump instruction, and the final 16 bits delineate the target address of the jump instruction.These binary data can be directly utilized by researchers in CFI solutions, such as being stored in the secure memory of hardware for utilization by hardware-based CFI mechanisms.In the current format, the hardware overhead caused by storing the data of this file into memory is ℎ   * 32 4 (1) Figure 7a presents an exemplar of forward control transfer instructions extracted by CFIEE.To streamline data analysis, we systematically categorize all forward control transfer instructions within the specified analysis range according to their corresponding functions, storing them in the data file generated by the tool.Pairing transfer instructions with their respective target instructions facilitates easier analysis and comparison.It is worth noting that when dealing with branch jump instructions, we specifically focus on storing only the target instruction when the branch is "taken".This approach helps us prioritize relevant information while avoiding unnecessary duplication or cluttering of data.
The binary metadata associated with the control transfers showcased in Figure 7a are provided in Figure 7b, which contains all addresses of forward transfers.Each line consists of a 32-bit binary number, where the initial 16 bits represent the binary address of the jump instruction, and the final 16 bits delineate the target address of the jump instruction.These binary data can be directly utilized by researchers in CFI solutions, such as being stored in the secure memory of hardware for utilization by hardware-based CFI mechanisms.In the current format, the hardware overhead caused by storing the data of this file into memory is The current binary file format is not specifically designed for a particular CFI mechanism, and researchers have the flexibility to modify its data format and volume according to their research requirements.
Furthermore, Figure 7c illustrates the count of transfer instructions per function across four selected programs.This visualization offers a comprehensive insight into the control flow behavior and distribution within the codebase, thereby enhancing researchers' understanding of the program's structural intricacies.By examining the number of transfer instructions per function, researchers can identify patterns and trends that reveal how information flows through different parts of the code.Figure 8 illustrates an example of function call relationships generated by CFIEE.CFIEE analyzes function call relationships based on unconditional jump instructions within functions.In Figure 8, asterisk labels (*) are appended at the end of specific nodes, signifying functions reached through the 'j' instruction.This comprehensive representation aids in understanding the function call relationships and the flow of control within the codebase, incorporating both "jal" and "j" instructions to offer a more precise and detailed analysis.
The Control Flow Graph (CFG) serves as a crucial metadata for Control Flow Integrity (CFI), ensuring the output of a complete and accurate CFG was a primary objective during the development of CFIEE.In Figure 9, we present a portion of the control flow graph obtained for the "lcdnum" program.Some basic blocks are labeled with "start with taken target" at the end of their names, indicating that the start address of the basic block serves as the target address of a control transfer instruction.The solid black arrows in Figure 9 represent unconditional jumps and "taken" branches resulting from branch jumps, while the red dotted arrows indicate branches of branch jumps that are not taken.
Additionally, in certain basic blocks, a combination of function name and address may appear in the "Taken target" column.This labeling signifies the target address specifically designated for the ret instruction.Since a function may be called by different functions at various times, the ret instruction within a function may have multiple return target addresses.To facilitate researchers in analyzing the ret instruction, we include all target addresses and corresponding functions in the "Taken target" line.This comprehensive representation of the CFG through CFIEE enhances the analysis of control flow integrity and provides valuable support for researchers in understanding the intricacies of the codebase.
Electronics 2024, 13, x FOR PEER REVIEW 16 of 20 The current binary file format is not specifically designed for a particular CFI mechanism, and researchers have the flexibility to modify its data format and volume according to their research requirements.
Furthermore, Figure 7c illustrates the count of transfer instructions per function across four selected programs.This visualization offers a comprehensive insight into the control flow behavior and distribution within the codebase, thereby enhancing researchers' understanding of the program's structural intricacies.By examining the number of transfer instructions per function, researchers can identify patterns and trends that reveal how information flows through different parts of the code.CFIEE analyzes function call relationships based on unconditional jump instructions within functions.In Figure 8, asterisk labels (*) are appended at the end of specific nodes, signifying functions reached through the j' instruction.This comprehensive representation aids in understanding the function call relationships and the flow of control within the codebase, incorporating both "jal" and "j" instructions to offer a more precise and detailed analysis.The Control Flow Graph (CFG) serves as a crucial metadata for Control Flow Integrity (CFI), ensuring the output of a complete and accurate CFG was a primary objective during the development of CFIEE.In Figure 9, we present a portion of the control flow graph obtained for the "lcdnum" program.Some basic blocks are labeled with "start with taken target" at the end of their names, indicating that the start address of the basic block serves as the target address of a control transfer instruction.The solid black arrows in Figure 9 represent unconditional jumps and "taken" branches resulting from branch jumps, while the red dotted arrows indicate branches of branch jumps that are not taken.
Additionally, in certain basic blocks, a combination of function name and address may appear in the "Taken target" column.This labeling signifies the target address specifically designated for the ret instruction.Since a function may be called by different functions at various times, the ret instruction within a function may have multiple return target addresses.To facilitate researchers in analyzing the ret instruction, we include all target addresses and corresponding functions in the "Taken target" line.This comprehensive representation of the CFG through CFIEE enhances the analysis of control flow integrity and provides valuable support for researchers in understanding the intricacies of the codebase.

Conclusions
In this paper, we present CFIEE, an open-source critical metadata extraction tool designed for enhancing hardware-based CFI research in the RISC-V architecture.CFIEE implements automatic static analysis of the control flow of RISC-V executable files, significantly lowering the usage threshold with its graphical interface operation.Researchers can utilize the program control flow graph, program basic block information, and other data output by CFIEE to analyze potential deadlocks, loop exceptions, and other issues within a given program.Furthermore, CFIEE offers valuable metadata for research on hardware-based CFI mechanisms that can aid in the development of secure and effective RISC-V control flow protection mechanisms.
This software simplifies the extraction of critical metadata and automates control flow analysis, reducing the burden of manual data extraction tasks.This increase in efficiency allows researchers to focus more on in-depth analysis and experimentation, ultimately designing more efficient CFI mechanisms that better secure RISC-V devices.The visualization of control flow metadata by CFIEE provides researchers with an accurate depiction of complex control flow relationships, facilitating quicker comprehension and validation of research findings.
While CFIEE currently offers a relatively comprehensive set of functions, there are still opportunities for improvement in terms of operational performance and scope of application.CFIEE currently lacks the capability to handle forward register-related jumps due to its static analysis nature [29].However, it does possess corresponding analysis logic for indirect jumps of the "ret" type.
Since the initial presentation of this work [30], we aim to delve into indirect control flow analysis.On the software front, our plan involves integrating CFIEE with RISC-V compatible simulators to utilize simulation execution data for enhancing static analysis.Additionally, we intend to embark on a combined static-dynamic analysis approach.Regarding research into mechanisms, we will leverage existing lightweight hardware protection mechanisms [31] and integrate CFIEE's data support to investigate a more secure and efficient hardware-based RISC-V CFI mechanism.Furthermore, the metadata utilized in the hardware CFI mechanism has the potential for additional compression [32].
CFIEE is an open-source tool released under an open license, and we encourage users to extend and enhance its capabilities.

Electronics 2024 ,
13,  x FOR PEER REVIEW 7 of 20 accordingly.This module also organizes information pertaining to output functions, including the count of forward control transfer instructions within each function and the program's function call relationships.

Figure 4 .
Figure 4.The analysis logic of CFIEE on function call and return.

Figure 4 .
Figure 4. analysis logic of CFIEE on function call and return.

Figure 5 .Figure 5 .
Figure 5.An example diagram of a CFI mechanism hardware circuit.The verification of CFI mechanisms typically requires multiple cycles.To ensure a sufficient time margin for CFI verification, we have positioned the CFI mechanism between the IF and MEM phases in this work.The synchronization between the execution time of the processor, CFI verification module, user program, and other factors determines the alignment of CFI verification with program execution time.During the IF to MEM phase, it is possible for the EX and MEM phases to execute multiple cycles; in such cases, the running time of CFI verification may be shorter than that of the IF to MEM phase.If the verification process exceeds this pipeline section's running time, it necessitates a stall Figure 5.An example diagram of a CFI mechanism hardware circuit.The verification of CFI mechanisms typically requires multiple cycles.To ensure a sufficient time margin for CFI verification, we have positioned the CFI mechanism between the IF and MEM phases in this work.The synchronization between the execution time of the processor, CFI verification module, user program, and other factors determines the alignment of CFI verification with program execution time.During the IF to MEM phase, it is possible for the EX and MEM phases to execute multiple cycles; in such cases, the running time of CFI verification may be shorter than that of the IF to MEM phase.If the verification process exceeds this pipeline section's running time, it necessitates a stall signal from the CFI verification unit to halt pipeline operation until completion of CFI verification.This is a trade-off between system performance and securityThis article primarily offers a basic example of the Control Flow Integrity (CFI) mechanism without delving into specific CFI design intricacies.The main focus remains on providing an introductory illustration rather than exhaustive CFI design details.Future research will utilize the data files from CFIEE to craft CFI solutions suitable for RISC-V architecture.These efforts will delve deeper into CFI intricacies, aiming to create more specific and efficient CFI solutions tailored for the nuances of RISC-V architecture.

Figure 7 .
Figure 7. (a) An example of metadata for control transfer instructions; (b) binary metadata associated with control transfer instructions in (a); (c) the number of transfer instructions per function in four selected programs.

Figure 7 .
Figure 7. (a) An example of metadata for control transfer instructions; (b) binary metadata associated with control transfer instructions in (a); (c) the number of transfer instructions per function in four selected programs.

Figure 8
Figure 8 illustrates an example of function call relationships generated by CFIEE.CFIEE analyzes function call relationships based on unconditional jump instructions within functions.In Figure8, asterisk labels (*) are appended at the end of specific nodes, signifying functions reached through the j' instruction.This comprehensive representation aids in understanding the function call relationships and the flow of control within the codebase, incorporating both "jal" and "j" instructions to offer a more precise and detailed analysis.

Figure 8 .
Figure 8.An example of the function call relationship output by CFIEE.

Figure 8 . 20 Figure 9 .
Figure 8.An example of the function call relationship output by CFIEE.Electronics 2024, 13, x FOR PEER REVIEW 18 of 20

Table 4 .
The division rules of basic blocks.

Table 5 .
Comparison of CFIEE and other tools.

Table 6 .
Analysis results of selected programs in Beebs benchmark.