My research is in programming languages and related areas. Generally, I am interested in any problem space where I think mathematical abstractions will lead to practical impact in the safety, correctness, expressiveness, usability, or security of systems. Here are some current potential research topics:
Security for the Rust programming language
How can we make sure that Rust code that you download off the internet is actually safe to run? Rust has grown in popularity because of its built-in safety guarantees – in particular, type safety and memory safety – but these guarantees don’t generally extend to untrusted code written by someone else. With collaborators at UCSD, I’m working on automated and human-assisted tools to automatically ensure the safety of Rust programs. Check out cargo scan for a work-in-progress tool.
Derivatives of regular expressions
If you know Calculus, you may be familiar with rules like the sum rule and product rule, which say that
d(f + g) = df + dg and
d(fg) = d(f)g + fd(g), where
d is the derivative operator and
g are functions. It turns out there is an interesting (and practically useful!) theory of similar rules for regular expressions, but where the rules are a bit different: for example,
d(AB) = d(A)B + (A∩ε)d(B). I’m interested in leveraging and extending derivative-based tools to debug regular expressions that show up in practice, including those with string variables. If you’re interested, check out our PLDI paper on implementing these things in the Z3 theorem prover.
Verified dataflow programming
Programs written in the cloud are built on dataflow abstractions: pipelines of operators that process data streams. For example, operators include things like
aggregate. Can we verify that these pipelines are correct, lack performance bottlenecks, and preserve data privacy and security? My prior work proposes using partially ordered sets to represent streams, and I’m also currently interested in extending existing theories to work for patterns of events in a stream over time and to encode other correctness and security constraints of interest. A good starting point for this work is my PhD thesis or DiffStream (OOPSLA 2020).
In the past, I have also worked on the following projects:
Fuzzing programmable networks. One surprisingly effective technique for finding bugs in systems is called fuzzing, where we just spam the system with random garbage inputs and see if it crashes or produces any errors. In the FP4 project, we worked on applying fuzzing to programmable network switches (used to program things like firewalls, load balancers, and routing programs which are the backbone of the internet). Check out pystate, which is a piece of this project that I wrote, or FP4, the main repository for the network fuzzer.
Models of streaming computation. Programs that process streaming data are fundamentally different mathematical objects than usual programs: in theoretical computer science terms, they are more like DFAs and NFAs (finite state machines) than Turing machines (unbounded memory). However, most streaming computation in practice goes beyond finite state machines to other primitive forms of quantitative computation that aren’t quite finite-space. For some of my work on quantitative state machines, see Modular Quantitative Monitoring (POPL 2019).