Centrifuge and Smart Policy Toolkit¶
Smart Policies are data usage control policies written in unambiguous code and enforced by the policy monitor running on the execution engine of the program. As such, smart policies are more explicit versions of normal policies and can provide data safety guarantees that were previously impossible. Things such as conditions for illegal read accesses and method invocations can be defined and actively enforced as they occur, terminating execution of the offending program when a deviation from permitted behavior is detected.
How do Smart Policies work?¶
Smart Policies are currently implemented as hooks in the execution of the Python Virtual Machine that identify potentially offending sequences of bytecode and run policy code when they are executed.
Smart Policies Process Flow¶
Data Owner (DO) writes a Smart Policy in Centrifuge to protect their data. DO then uses the COVA interface to bind the data to the policy by encrypting the package together.
Data User submits their program to the COVA node responsible for executing the program against the policy-protected data. The data, its policy, and the code intended to process it are united inside the enclave.
A fresh CovaVM instance is created, and the policy and code is loaded into the VM. CovaVM's policy manager (PM) performs initial precondition checks like FileHash before access to the data is handed off.
CovaVM loads the data and begins execution of DU's program. At this stage, CovaVM has already been configured by the Smart Policy and will intercept the program's execution through hooks defined in the policy. If policy has been violated by the code, a PolicyViolationError will halt CovaVM's execution.
CovaVM checks the result of the computation against the postconditions defined in the smart policy, and sends back the results to DU.
A Centrifuge Smart Policy is a JSON file that directs CovaVM on how to instantiate a series of Validators -- small programs that enforce a meaningful part of the policy -- to be run before, during, and after execution of Data User's program on CovaVM. Validators are written in Python and fall into 3 categories:
Preconditions: checks that are made before data is made available to the program. Preconditions are passed an object describing the circumstances of the execution, given by
Runtime Monitors (RMs): checks that are made during the execution of the program. RMs enable control of data access even after access to the data has been given through online monitoring of the execution trace, and policy decisions able to be made at arbitrary points during the execution.
Postconditions: checks that are made post-execution and but before the result is packaged and sent back. Postconditions are run against the result.
Validators all inherit from the base class
dictthat represents the options and arguments to be used.
- Raises a
PolicyViolationErrorwith the message.
CovaVM expects Preconditions to be subclassed from
PreconditionValidator. Preconditions are validators which define checks that ensure certain conditions are met before Data User's program start to run. Things that can be determined in advanced, like policies regarding the entire dataset such as checking the program's hash matches one that is known, or basic static analysis can be run.
- Executes the Validator checks against
context. If any violations are found, raise a
PolicyViolationError. If no violations are found, simply return without exception.
Preconditions are passed the following context dictionary.
- The entire program text loaded into CovaVM.
Runtime Monitors are more complex validators that are enforced after access has been given. We currently use a Python-based Bytecode interpreter to illustrate how CovaVM will implement some important runtime monitoring features. Examples of important policies that require runtime monitoring are ones that involve data-flow analysis, whitelisting functions, and read-access level control.
RMs can be simple to write and express complex behavior by using the policy pattern of using a finite state automaton. Keep a policy state and create transition functions that change the state upon bytecode dispatch events from receive_bytecode().
receive_bytecode(self, bytecode, args)
bytecodecontains the bytecode name (as a string), and
argscontains the list of positional arguments for the bytecode. The complete list of bytecode names can be found here. If any violations are found, raise a
PolicyViolationError. Otherwise, return normally.
Postconditions are additional obligations or conditions that can be defined after the execution of Data User's program has terminated, and prior to the result of the computation being forwarded back to Data User.
datarepresents the Python data object to be returned to Data User. Checks can be performed directly on the data, like type or information size. If any violations are found, raise a
PolicyViolationError. Otherwise, return normally.
CovaVM is currently a bytecode interpreter written in Python so that policies cn inspect Data User's program execution trace in realtime through runtime monitors, and make relevant policy decisions with the information. CovaVM is, in essence, an execution environment that is configured by a Centrifuge policy, knowing when to step in by shifting program control to policy Validator code, and reporting detailed information about execution state to Validators.
PolicyManagers are objects that live within the CovaVM instance that propagate bytecode dispatch events to loaded Smart Policies, using a publish and subscribe model. PolicyManagers also handle parsing Centrifuge JSON policies, as well as configuring policy validators. It may raise a
PolicySyntaxError for malformed policies or a
PolicyRuntimeError for errors during runtime such as missing Validators.
CovaVMResult defines the output result of a CovaVM execution, meant to be processed further by the compute node before finally being packaged back.
- This represents one of 3 states.
VM_SUCCESS, 0, representing completion of execution
VM_POLICY_VIOLATION, 1, represting a policy violation during execution
VM_PROGRAM_ERROR, 2, representing that an uncaught exception occured during execution of the program
- The payload contains the Python object (of any type) that represents either the return data of the execution (if success), a PolicyViolationError if a policy violation, or an
Exceptionif program error.
Data User Interface¶
CovaVM loads up an isolated environment for Data User's program to run, but includes an interface that is directly injected into the name resolution bytecodes.
- Can be raised in order to halt execution within CovaVM and report back data.
Data is currently directly accessible to the program through the keyword
We have chosen to base Centrifuge, our Smart Policy language on JSON due to its hierarchal nature and widespread adoption. Centrifuge should not be considered a programming language, but rather a simple configuration language that instructs CovaVM on how to defend against unwanted execution traces. Centrifuge policies instruct CovaVM on how to load and configure Validators.
CovaVM will ship with several important predefined validators that support the most important use cases and will continue to receive increment updates as we discover more effective solutions for Data Policy Enforcement and Privacy-Preserving policies.
Structure of a Smart Policy¶
Here is the JSON Schema for Centrifuge policies
meta: [optional] An object containing basic metadata for the policy
version: String (typically semver)
pre: An object containing Validator names as keys, and options as body
runtime: An object containing Validator names as keys, and options as body
post: An object containing Validator names as keys, and options as body
NOTE: Validators in pre, runtime, and post are run in alphabetic order, and are meant to process independently of the result of other Validators. CovaVM will halt execution at the first policy violation.
These Validators come bundled with CovaVM.
Checks that the hash of a program's text matches a MD5 hash.
- equalTo: Either a String of the MD5 hash or an Array of Strings of MD5 hashes to check against.
Simple prints out all the bytecodes during the execution of a Python program inside CovaVM. Takes no options.
- options: none
There are currently no Postcondition validators.
We have created a bytecode interpreter in pure Python that runs inside Graphene-SGX and is able to execute programs that use the Python scientific stack (numpy, scipy, sk-learn). The current implementation provides advanced inspection and instrumentation capabilities into the execution of the Data User's program. Information gathered from CovaVM can be directly accessed by the Smart Policy for expressive data control. In addition, since CovaVM is implemented in Python, we are able to write Validators in Python with arbitrarily complex logic on an extensible platform that is suitable for further experimentation. For the end user, a simple JSON-based language provides an accessible way to write policies and control Python-based Validator code without having to deal directly with the CovaVM API.
Currently, Smart Policies can only use the limited set of prebundled validators, and runtime monitors can only detect policy violations at the bytecode level. We are working on a more developer-friendly CovaVM API that will improve the experience of writing customer Validators and overall extensibility of the Smart Policy Enforcement system.
Data Usage Policies can only be semantically meaningful if the structure of the data is known beforehand, and obeys a predefined interface. Our next version of CovaVM will introduce a DataAdapter that serves as a policy-controllable proxy object, providing a semantic bridge for writing meaningful policies dealing with structured data, like CSV, SQL, and JSON.
We have used a Python-based bytecode interpreter for the fast experiment and development cycle, but intend to make changes to CPython directly for hardened security and improved performance.