Thursday, November 1, 2012

Regarding Buffer Overflows

In the network security world, vulnerabilities and exploits are currency.  Without vulnerabilities, there would be no exploits.  Without exploits, there would be no network attacks.  Exploits can come in many forms and recently, the user has been the vulnerability: poor password security, phishing emails and other social engineering attacks have become more prevalent.  This is due to hardened network defenses, increased patching and the general lack of new exploitable software vulnerabilities.  Years ago, a system could be taken over by simply sending a network packet or two to the target system from halfway around the world.  But how did that happen?  What is different now?  

Today, there is more awareness of Buffer Overflows in the development world.  This, along with technical enhancements such as Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) and Stack Canaries (that shut down programs that misbehave) limits the impact of this vulnerability.  But there are always workarounds and it is essentially an arms race between the attackers and defenders.  Fundamentally, Buffer Overflows are still a problem, but it's not as easy as it once was.

This post aims to describe Buffer Overflow vulnerabilities in simple terms as well as provide a real world example.  It's not an easy task, mainly because of the technical details, but let's try anyway.

An overflow is what it sounds like: too much of something that doesn't fit in a container will overflow.  In programming terms, these are typically stack overflows or heap overflows.  The main difference is where in memory this overflow happens.  When a program needs to take in information, it allocates memory of a size that the programmer has specified and attempts to write data to that memory space.  If the amount of data written is larger than the space provided, an overflow occurs.  When this happens, other parts of memory are overwritten, which may or may not cause problems, but typically they will overwrite something important, causing memory corruption.  When the running program tries to read from that part of memory, it usually crashes.  

An attacker, after discovering that a program has a buffer overflow problem, can customize the data corruption in an attempt to control the crash.  Controlling the crash will allow the attacker to control the system.

How does an attacker control the crash?  By controlling EIP.  To illustrate this, we will demonstrate a stack buffer overflow and a simple, imaginary program:

C:\TEMP>hello.exe Mike
Hi there, Mike!

The imaginary program above, when run, will print out the words "Hi there, " followed by whatever what given as an argument to the program.  In this case, the name "Mike".  This is then followed by an exclamation point and a new line.  The program then exits.

The program looks like this at an extremely high level:

Create Name Variable (4 bytes);
Read Name from Command Line;
Print "Hi there, " + Name + "!\n";

When the program starts, it will allocate memory space for the Name variable.  In our example, let's say it allocates 4 bytes.  If we then provide a longer name like "Emily", with 5 characters, when the program reads "Emily" and tries to put it into the Name variable, we have an overflow.  Then we may get a nice little program crash. (For the purposes of this post, I won't go into details like the NULL or CRLF characters at the end of the input string).

Even a simple program like above will use almost 50 lines of CPU instructions.  Any function calls, like the Print command, can easily add to the number of instructions that a program needs.  You can imagine how many instructions are needed for more useful programs.  Luckily, today's processors can execute hundreds of billions of instructions a second, although even those seem slow at times.  

So where does EIP come in?  EIP exists in a special part of memory, called a register, that contains the memory address of the next instruction to execute.  After an instruction executes, EIP is changed to the memory address of the next instruction to run.  Essentially, EIP is where the CPU looks for the next thing to do.

There is another part of memory called the "stack."  This is where programs can store temporary information for use later (like "Mike" or "Emily").  In the example above, the second instruction uses a function to read user input into the Name variable.  In this case, let's pretend that the function used to write to the name variable is "strcpy" (a common function to copy strings - String Copy), which has its instructions in memory address 0x08048.  When the CPU gets to this part of the program, it will call that function using a jump (JMP) instruction.  But before it does that, it copies the address of the next instruction into the stack so it knows where to go back to when it's done.  So at a high level, this program looks like:

0x00001: Create Name
0x00005: strcpy (0x08048) Name from Command Line
0x00008: Print (printf, located at 0x080a9) "Hi there," + Name + "!\n"
0x0000a: exit

When our hello.exe program runs, the CPU executes the instructions at memory address 0x00001.  The next line of code is at memory address 0x00005.  The CPU then changes EIP to 0x00005. Then 0x00005 is executed, but since it points to another memory location outside of the normal execution, the address 0x00008 is written to the stack as a sort of bookmark for where to go back to when the strcpy function is completed.  The command "JMP 0x08048" is then executed.  EIP is changed to 0x08048.  Execution begins at 0x08048 until it is done.  When it is complete, the CPU instruction "RET" (short for return) is executed.  This tells the CPU to take the last value written to the stack, in this case 0x00008 and then copy that to EIP.  The CPU then continues execution from 0x00008.

Command execution is then:

0x00005 (JMP to 0x08048, copy 0x00008 to the stack)
RET (copy top of stack (0x00008) to EIP)
0x00008 (JMP to 0x080a9, copy 0x0000a to the stack)
RET (copy top of stack (0x0000a) to EIP)

In our example, we used "Emily" as the Name.  Now, since the Name only has 4 bytes, when "Emily" is written to memory, the last letter "y" (0x79) is written past the 4 bytes.  What is past the 4 bytes?  Who knows?  Noone really, at first.  But one thing about computers is that they are consistent.  The memory structures are the same every time a program is run.  If the overflow goes into the area of the stack where something important is written, say the address to return to (0x00008), when the strcpy function is done and RET is executed, the CPU will try to copy 0x00008 into EIP, except in this case, it's been changed to 0x79008.  The CPU will copy that into EIP and then try to execute the instructions at 0x79008, which is likely garbage and the CPU will error out with "illegal instruction at 0x79008."

On the next run, an attacker could then simply change "Emily" into different values that means something, causing EIP to point to a part of memory the attacker controls.  Remember that an overflow is continuously written to memory.  If we used a very long name instead of Emily, the name we chose will be in memory.  Whole swaths of memory will be overwritten.  Since the attacker can now control EIP, they can simply change it to the memory address of instructions other than the normal ones and then now the attacker has owned your system.

If you want to see this in the real world, fire up an old Windows XP system.  There is a command line program built-in called "netsetup.exe".  If you have a newer system, you can still try this, but it's been patched sometime since Windows XP, so your mileage will vary.  In any case, it's still worth seeing the process in action.  Note that Windows 7 does not have netsetup.exe.

Step 1: Open a command line prompt by running "cmd.exe"
Step 2: Run the program netsetup.exe (it's in the PATH, so you can run it from anywhere).
Step 3: Give netsetup.exe an argument of AAAA:

C:\TEMP>netsetup.exe AAAA

You will get a box that says "Command Line Syntax Error"

Step 4: Add more AAAAs until you get a program crash (hint: it starts to crash at 271 characters).  


If you examine the technical details of the crash, you should find that one of the registers (Windows calls it "P7") has a value of 41414141.  A capital "A" has a hex value of 0x41.  What you are seeing is EIP with a value of 0x41414141, which caused the program to crash because it tried to execute instructions at that memory address.  If you change the letters at the end of your 271 characters to say ABCD, you will see P7 change to something like 44434241.  You are now controlling EIP.

At this point, one could then fire up a debugger and examine memory when the program crashes and find out where the rest of the "AAAAAA"s went in memory, attempt to change the "AAAAAA"s to actual instructions and then modify the last part to point to where the attacker's instructions are in memory.  

The series of characters that make up the attacker's instructions (usually minimal code to give the attacker some level of control) is called the payload.  The coding problem of netsetup.exe that allows memory to be overwritten is called a vulnerability.  If the attacker can actually run his own code against this vulnerability, this would be an exploitable vulnerability.  If not and all she can do is crash the program, it's still a vulnerability but not exploitable.

Since netsetup.exe is a really old program, this is not really a vulnerability disclosure.  I do not know if netsetup is fully exploitable or not.  This would require local execution since netsetup.exe is not a networked program and the input to the program comes from the command line.  But the process for finding vulnerabilities in network-aware programs is the same: keep feeding it garbage and wait for it to crash.  When it crashes, examine memory and see if there is some kind of overflow involved and if as an attacker, you can craft input to the program via network packets, that would give you control of EIP.  "Fuzzing" programs (like this one) automate a lot of this manual process by heaping varying amounts and types of data at a program and then records the crashes.  Vulnerability hunters generally write their own fuzzers to help with this.  Finally, it is important to point out that these are really simple examples.  There is a lot more involved in vulnerability hunting and exploitation, but this is the gist of it.

Hopefully this gives some clarity to some of the terms that are thrown around in the network security world as opposed to adding to the confusion.  In future posts, we'll examine process spawning and the different types of exploits (local, remote, privilege escalation).  Thanks for reading!

No comments:

Post a Comment