Introduction Link to heading

Angr is a python3 library used for binary analysis that allows us to carry out multitude of tasks. Installation instructions can be found here

One of the core functionalities of angr is Symbolic Execution which is the topic of this article. Symbolic execution is a binary analysis technique where inputs are treated as symbolic variables rather than fixed values. In angr, this allows us to explore all possible execution paths of a program based on symbolic inputs in order to reach a desired state.

Let’s take a look at a simple password validator program:

int main() {
    char input_buffer[100]; 

    printf("Input the password: ");
    fgets(input_buffer, sizeof(input_buffer),stdin);  
    input_buffer[strcspn(input_buffer, "\n")] = '\0'; 

    if (strcmp(input_buffer, "p4ssw00rd") == 0) {
        printf("Good Pass\n");
    } else {
        printf("Failure\n");
    }

    return 0;
}

flowchart TD Title2["Path 2: User enters 'Wrongpass' as password"]:::titleStyle Title2 --> Input2 subgraph subgraph2[" "] Input2[Input = x] Condition2["if Input == 'WrongPass'"] Result2[Failure] Input2 --> Condition2 Condition2 --> Result2 end Title1["Path 1: User enters 'p4ssw00rd' as password"]:::titleStyle Title1 --> Input subgraph subgraph1[" "] Input[Input = x] Condition["if Input == 'p4ssw00rd'"] Result[Good Pass] Input --> Condition Condition --> Result end classDef titleStyle fill:none, stroke:none, color:#6A6A6A; %% Hide arrows from titles linkStyle 0 opacity:0; %% Hides arrow from Title2 to Input2 linkStyle 3 opacity:0; %% Hides arrow from Title1 to Input

Note

x represents our symbolic input

As shown on the diagram above, there are two path that your program can take, one where the user types the correct password and the other when the user types an incorrect password.

By using angr, we can walk through the program’s execution flow and set up constraints to determine which input triggers a specific branch. This process helps to identify the input that triggers Path 1, which corresponds to the ‘good password’ path.

Challenge #1 : rev/ctfdpp Link to heading

Now that you’re familiar with symbolic execution, it’s time to dive into the practical part. For the first part of this article, we will take on a challenge from LACTF2023 called rev/ctfd-plus which can be found on the LACTF official website or via the direct download link

Program analysis Link to heading

First, we’ll examine the program’s behavior.

ctfdpp_firstlook

As you can see, it’s a stripped binary so we will not have the debug symbols for our analysis. By running it, we can guess its a straightforward program that request a flag from user, same behavior as example given in introduction.

The control flow graph is not too complicated and fortunately we don’t have to analyze it to find the flag.

ctfdpp_cfg

By looking at this control flow graph, we can notice two interesting branches, the branch number 1 printing “You got the flag!” which is triggered when user supplies the correct flag and the branch number 2 printing “Incorrect flag” which is triggered when user supplies a wrong flag.

Our goal is clear : trigger branch number 1.

Solution Link to heading

Let’s build our solution step by step. First, we’re gonna import angr and load our binary file

import angr

binary_path="./ctfd_plus"
base_address=0x00100000

proj=angr.Project(binary_path,main_opts={'base_addr': base_address},auto_load_libs=False)

Question

Why do we need to specify a base address ?

Here, the base address needs to be specified to ensure the binary is mapped at the correct address in order to match how it would be loaded during execution or as seen in another tool, like ghidra for example.

We took the first address show at the top of ghidra listing view, which is 0x00100000 ctfdpp_baseaddr

Once the binary is loaded, we will now create a state and a simulation manager

state = proj.factory.entry_state()
sm = proj.factory.simgr(state)

A State represents a state of a program during its execution entry_state represent a symbolic execution state at the entry point of the binary. The Simulation Manager will be responsible of managing and controlling multiple symbolic execution path.

Now the last part, we are going to set up the win condition and what we want to avoid.

good_addr=0x0010112e
avoid_addr=0x00101117
sim_manager.explore(find=good_addr,avoid=avoid_addr)
print(sim_manager.found[0].posix.dumps(0))

good_addr is the address we want to reach which correspond to address where the program will print “You got the flag!” avoid_addr is the address we want to avoid which correspond to address where the program will print “Incorrect flag”

These addresses can be found in the listing view by jumping to it from main function shown earlier.

ctfdpp_find_avoid_addr

Or in the control flow graph but since the addresses are cropped on this view, we have to adapt them to be relative to the base address.

ctfdpp_find_avoid_addr_cfg

One way to make these addresses in the control flow graph relative to the base address is to add them to the base address.

Example with good_addr: relative_address = 0x00100000 + 0x112e = 0x10112E

Once we have both addresses we can use the explore function to perform the symbolic execution.

The address to avoid is not mandatory, its main purpose is to guide the symbolic execution towards more relevant paths and avoid exploring paths that are irrelevant and helps to prevent Path Explosion Path explosion occurs when the number of possible execution paths increases exponentially and can potentially run out of memory because of the sheer number of paths being explored. The key to avoid path explosion is to setup constraint or limit the testing area to specific part or function of the program.

Our final script is the following :

import angr

binary_path="./ctfd_plus"
base_address=0x00100000

proj=angr.Project(binary_path,main_opts={'base_addr': base_address},auto_load_libs=False)

state = proj.factory.entry_state()
sim_manager = proj.factory.simgr(state)

good_addr=0x0010112e
avoid_addr=0x00101117
sim_manager.explore(find=good_addr,avoid=avoid_addr)
print(sim_manager.found[0].posix.dumps(0))

By running it, we let angr automatically explore the binary paths and find a solution:

ctfdpp_solve

This way we easily find an input that satisfies our win condition without reversing the function that generate the flag.

ctfdpp_test_flag

Challenge #2 : unbreakable-enterprise-product-activation Link to heading

Let’s move on to a slightly more complex challenge to explore additional functionalities of Angr. The binary for this challenge can be found on this github repository

Program analysis Link to heading

Overview Link to heading

Like the first challenge, this one also requests a password from the user, which in this case is a license code. If the provided license code is incorrect, the program will return an error code.

unbreakableEPA_firstlook

Here is decompiled view of the main function:

void FUN_00400590(int param_1,char *param_2)
{
  undefined8 uVar1;
  undefined auVar2 [16];
  undefined8 uStack_8;
  
  if (param_1 != 2) {
    param_2 = "./unbreakable_enterprise_product_activation product-key";
    errx(1);
  }
  strncpy(&DAT_006042c0,*(char **)(param_2 + 8),0x43);
  FUN_004027f0();
  FUN_00402830();
  // Several other functions calls
  FUN_00403410();
  FUN_00403440();
  auVar2 = FUN_00400830();
  uVar1 = uStack_8;
  uStack_8 = auVar2._0_8_;
  __libc_start_main(FUN_00400590,uVar1,&stack0x00000000,FUN_00403490,FUN_00403500,auVar2._8_8_,
                    &uStack_8);
  do {
                    /* WARNING: Do nothing block with infinite loop */
  } while( true );
}

We can observe that the program check if there is 2 arguments then copy first 67 characters of user input into DAT_006042c0. After parsing user input, the program calls ALOT of functions (more than 50, only 4 showed above for brevity).

If all check of these functions are passed, the instruction auVar2 = FUN_00400830(); will print out a message for successful product activation.

Digging into the license code validation logic Link to heading

Let’s analyze these check function that will validate the license code provided by user. They all have the same pattern, lets take the first one as an example:

void FUN_004027f0(void)
{
  if (DAT_006042c0 == (byte)(((DAT_006042de ^ DAT_006042e6) - DAT_006042c8) + DAT_006042c6)) {
    return;
  }
  FUN_00400850(0xff); //return error code 255
}

As seen before, DAT_006042c0 correspond to the user input. All variables used in this if statement are part of the user input, This means we cannot determine their values through static analysis alone since it’snot already generated.

unbreakableEPA_gdb_func

By looking at the debugger, we can notice that this data taken from user input is stored into following registers : EDX,EAX,ECX,ESI,EDI. We can also notice XOR,SUB,ADD operation on EAX and then a comparison, exactly the same as decompiled code shown above since both of these code represent the exact same thing but at the debugger its straight in assembly.

To see which part (Like an index on an array) of user input is stored into the registers we can calculate it by looking on the addresses offset OR do a bit of debugging.

We attach gdb to the binary, set a breakpoint at 0x004027f0 (the first instruction of FUN_004027f0), and run then the program with a pattern as input.

python -c "import string; print(''.join([c for c in (string.printable[:-5] * 10)][:67]))"
gdb ./unbreakable-enterprise-product-activation
gef➤  break *0x004027f0
Breakpoint 1 at 0x4027f0
gef➤  run '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%'

By using GDB, we see the instructions executed. We step through the xor instruction and inspect the register values using the ‘si’ (step instruction) command 5 times.

unbreakableEPA_gdb_1

Inspecting registers

Once we reach the XOR instruction, all the values are stored int the register without any transformations that will occur just after so we are free to check their values in this specific point of the execution.

gef➤  print $edx
$1 = 0x43 #'C'
gef➤  print $eax #correspond to $al
$2 = 0x75 #'u'
gef➤  print $ecx
$3 = 0x36 #'6'
gef➤  print $esi
$4 = 0x38 #'8'
gef➤  print $edi #correspond to $dil
$5 = 0x30 #'0'

With information values on register known, we can map it to the user input :

EDX takes its value from userinput[38].
EAX (or al) takes its value from userinput[3].
ECX takes its value from userinput[6].
ESI takes its value from userinput[8].
EDI (or dil) takes its value from userinput[0].

Now, since we know exactly how is generated the data, we can craft a valid input.

To pass the first function, we must satisfy the following equation: userinput[0]== (((userinput[30] ^ userinput[38]) - userinput[8]) + userinput[6])

To solve this, we iterate over all possible values of X XOR Y where X, Y, and the result are within the range of printable characters (33-126). For example:

X = 44 (’ , ‘) → userinput[30].
Y = 111 (‘o’) → userinput[38].
Result = 67 (‘C’) → userinput[0].

Since userinput[8] and userinput[6] are related to addition and substraction we dont need to bother much on these and just make sure to both has the same value so that will result on nothing done :)

With all information seen above, we can craft our input that will pass the first test : Cxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,xxxxxxxoxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Upon testing this input, you will notice a different error code is returned which mean we passed the first test

unbreakableEPA_pass_first_function

We do this again by updating the input manually for every 50 step but its getting tougher on every step because we have to keep the constraints from previous steps into account so that’s take huge amount of time to solve in manual way.

Solution Link to heading

As described in the previous section, solving the challenge in a manual way is quite tedious, making it perfect fit for automation with angr.

Like the previous challenge, we begin by loading the binary:

import angr
import claripy

binary_path="./unbreakable-enterprise-product-activation"
base_address=0x00100000
proj=angr.Project(binary_path,main_opts={'base_addr': base_address},auto_load_libs=False)

Do you remember what we saw in the program overview part?

In that section, we observed that 67 bytes were copied from the user input, which is why we use 67 as the input length. We then generate a bitvector to serve as our symbolic input, with a length of 67 bytes.

key_len=67
product_key=claripy.BVS("product_key",key_len*8)

Next, we create our state, ensuring to add our symbolic input as args.

The LAZY_SOLVES option causes state satisfiability to be checked as infrequently as possible.

We also impose constraints on the symbolic input to make sure the solution found correspond to printable characters because in our case due to wacky behaviour of the program, there are scenarios where non-printable random bits can pass all the tests which is not a solution that we want.

state = proj.factory.entry_state(
    args=[binary_path,product_key],
    add_options={angr.options.LAZY_SOLVES}
)
#increase buffer symbolic bytes, 60 by default.
state.libc.buf_symbolic_bytes=input_size + 1 

for i in range(key_len):
    byte = product_key.get_byte(i)
    state.solver.add(byte >= 0x21, byte <= 0x7e)

For the final part of our script, as in the first challenge we create a simulation manager, define address we want to reach or avoid and then initiate symbolic execution.

The difference lies in the way we extract and print the solution, in the previous challenge,where we used sim_manager.found[0].posix.dumps(0) to retrieve the value of stdin that satisfies the win condition (reaching the desired address). This approach didn’t require manually providing variables like the bitvector because we were in concrete execution mode, where the solver itself directly provides input to stdin during exploration process.

However, in our case, the input is passed via argv1 as a symbolic input, to get a value from it we have to use sim_manager.found[0].solver.eval(product_key, cast_to=bytes) to evaluate the symbolic input and print out a concrete value that satisfies the constraints.

sim_manager = proj.factory.simgr(state)

good_addr=0x00400839
avoid_addr=0x0040085d
sim_manager.explore(find=good_addr,avoid=avoid_addr)


flag=sim_manager.found[0].solver.eval(product_key, cast_to=bytes)
print(flag)

There is our final script:

import angr
import claripy

binary_path="./unbreakable-enterprise-product-activation"
base_address=0x00100000
proj=angr.Project(binary_path,main_opts={'base_addr': base_address},auto_load_libs=False)

key_len=97
product_key=claripy.BVS("product_key",key_len*8)

state = proj.factory.entry_state(
    args=[binary_path,product_key],
    add_options={angr.options.LAZY_SOLVES}
)
#increase buffer symbolic bytes, 60 by default.
state.libc.buf_symbolic_bytes=input_size + 1 

for i in range(key_len):
    byte = product_key.get_byte(i)
    state.solver.add(byte >= 0x21, byte <= 0x7e)
    

sim_manager = proj.factory.simgr(state)

good_addr=0x00400839
avoid_addr=0x0040085d
sim_manager.explore(find=good_addr,avoid=avoid_addr)


flag=sim_manager.found[0].solver.eval(product_key, cast_to=bytes)
print(flag)

We can get our flag :)

unbreakableEPA_flag

As we have seen, Symbolic execution is a powerful approach to program analysis which can be used to solve CTF challenges. By treating inputs as symbolic variables, Angr is able to explore multiple execution paths and determine which input satisfies a win condition without deep knowledge about internal working of the analyzed program. However path explosion remains a significant challenge. As the program grows in complexity, the number of possible execution path increases drastically, making it difficult to tools like angr to explore every possible execution path thats why the area of testing should be focussed on only the most likely relevant paths first.

References Link to heading

https://docs.angr.io/en/latest/ https://github.com/Hustcw/Angr_Tutorial_For_CTF https://github.com/angr/angr-doc/blob/master/docs/more-examples.md https://speakerdeck.com/pyconza/controlling-your-angr-techniques-for-improving-symbolic-execution-with-the-angr-framework-by-keith-makan https://angr.io/blog/angr_symbion/