Sinan Pehlivanoglu is a software engineer at VMware. For the past few years he has been working on the intersection of PL and large scale distributed data. Sinan worked on the enterprise data pipelines at Twitter and founded Vizion project, a data driven career management platform aimed at musicians. Sinan's current interests are in creating programming languages for quantum computing, especially understanding how different interpretations of quantum mechanics are reflected in the PL world.
Oblivious Privacy in a Statically Typed World.
The internet has become a vital part of our everyday lives. With people relying on web applications for getting their groceries, booking trips, scheduling doctor’s appointments, paying their taxes and more, a significant amount of sensitive, private user data flows through servers every second. Facebook alone generates over 4PB of user data every day. Recent laws such as GDPR give users fine grained control over their data, who it is shared it and how.
In large applications, dynamic privacy logic around user data can get very complex, resulting in a control flow that is difficult to reason about and difficult to maintain. GDPR being very young and only adopted by the EU in 2016, the regulations constantly change and evolve. Data licensing deals like advertisement partnership result in the user data traveling through various different applications. The evolving and distributed nature of privacy inevitably leads to human error and unintentional data leaks in complex applications.
User preferences that dictate privacy constraints are dynamic and often stored in an external services or databases. This not only implies that an external call is required before a sensitive data flow can complete, but also that these preferences can change mid-flow. The complex, dynamic logic required for correct privacy assertions result in an information flow that is difficult to understand and maintain. The flow is further complicated in the presence of common data processing practices such as anonymization of user data, multiple dependent schemas or use of third-party libraries. Because of the dynamic nature of the evaluation, static information flow control solutions such as JFlow is not sufficient for this problem at scale.
Harpocrates is a Scala compiler plugin, inspired by the Racket contracts, that instead of approaching privacy from an information flow verification perspective, allows the data to flow freely throughout the application inside the policy membranes but enforces the policies when the data is tried to be accessed, mutated, declassified or passed through the application boundary. Harpocrates eliminates raw data constructors from the application, ensuring the data can only exist in protected form and minimizes misuse and human error. The semantics of the policy is centralized to the user defined implementations of the Policy trait. This means that as the law evolves and the requirements change, the policy only needs to be updated in a single location. The policy enforcement is automatically inserted at the application boundaries by the plugin and the types are aligned as needed. This allows the developer to write familiar Scala code in their existing ecosystem and use their favorite libraries without requiring any changes to the external code.