Written by Mark Ettinger – Lead Data Scientist
This is part one of a series of articles on the benefits of using lighweight formal methods in the design process of DeFi Protocols.
Decentralized finance (DeFi) has tremendous promise but anyone following current progress is all too aware of the potential financial dangers of interacting with DeFi protocols. Hacks occur on a regular basis which result in loss of crypto-assets. The value of these hacks often exceeds tens of millions of dollars and sometimes exceeds hundreds of millions of dollars. Auditing is the gold standard for reducing vulnerabilities specific to languages used for programming smart contracts. However many times these hacks are executed through attacking vulnerabilities within the logic of the system. Logical flaws can be extremely difficult to find during design because often the tool used to exploit the logic may not yet exist (e.g. the rise of flash loans and their use to exploit protocols).
This highlights a fundamental difficulty in protocol design. How can you mitigate attacks when you might not be able to conceive of the method of attack? At Volatility Group we believe that many logical insecurities can be prevented through the efficient use of lightweight formal methods. Using a language like TLA+ and a model checker allows you to build the system’s logic and verify the reachable states do not exhibit unexpected behavior.
Besides discovering logical insecurities, using a lightweight formal method language like TLA+ prior to coding improves the design process in three ways:
- The act of formalizing clarifies the thinking process. Specifically, in our experience the design and development team moved from thinking procedurally (i.e. how do we design for implementation) to logically (i.e. what are the states of this system).
- Writing a specification in a mathematical language rather than natural language removes all ambiguities.
- With ambiguities removed, the specification serves as a blueprint against which any implementation can be compared and judged to be correct or incorrect.
This series of posts will summarize our recent explorations using TLA+, a formal specification language. We began this exercise when one of us (Ettinger) became uncomfortable trying to understand the very complex and ambiguous “big picture” of how the DAOracle protocol worked. It was only through mathematics that the ambiguity was eliminated and the complexity wrangled. We should note that while this post emphasizes the importance of using TLA+ before writing code, that is not the process that was used for Version 1 of the DAOracle. We designed the protocol iteratively and wrote the code without a lightweight formal specification. Retrospectively using TLA+ uncovered various problematic issues with Version 1 and has helped to improve our process immensely. We plan on using TLA+ prior to writing code for future versions of the protocol.
What Are Formal Methods?
The subfield of computer science called formal methods has been around for decades but is rarely used in software engineering practice due to the requirements of specialized knowledge, extensive training with novel computational tools, and significant amounts of human effort that is difficult to automate. However, the basic idea of formal methods is quite simple and elegant. The basic idea is to treat algorithms as mathematical objects with which we can rigorously reason and mathematically prove that various expected properties hold. This is not unlike when you took your first geometry class in high school and learned to prove properties of triangles, circles, and other geometric objects. However in formal methods, a notion of proof is introduced which goes beyond the proofs typically expressed with a combination of mathematical symbolism and natural language. Proofs in formal methods are called formal proofs because they are expressed in a formal mathematical language and can be checked for correctness by a computer unlike traditional proofs which you would find in a journal. Formal methods have the advantage of bringing the power of mathematics to bear on software engineering but also carry the disadvantages mentioned above. Perhaps we can have the best of both worlds by using so-called lightweight formal methods.
Lightweight formal methods retain the use of a formal mathematical description language for algorithms and protocols but dispense with formal proofs. This significantly reduces the time and resources necessary for creating a specification. You might now be wondering, “If we drop proofs, what remains?” In lightweight formal methods we retain two crucial elements to aid us in our goal of creating secure software:
- A formal language which enables us to specify and model protocols with mathematical precision.
- Tools for exploring the state space of our models called model checkers.
These two affordances of lightweight formal methods are briefly described below and we intend to expound on both topics at length in future entries in this series of articles. But first let us give an overview of the technical aspects of TLA+
The Mathematics of TLA+
The most difficult aspect of learning to use TLA+ is not mathematics or anything technical but rather learning to think logically rather than procedurally. This is especially true for programmers who are accustomed to describing what the computer should do next. In contrast, TLA+ is built on logic and specifies which state transitions are permitted. (Note: There is also a pseudocode-like language, PlusCal, which mimics procedural thinking and compiles to TLA+. We prefer to use TLA+ directly and do not use PlusCal though some of the references below favor it).
The way one uses TLA+ to write a specification is actually straightforward in theory as TLA+ uses state machines as the models for discrete dynamical systems. A state is an assignment of values to temporal variables. These variables are called temporal because the values assigned to them change over time. Using the linguistic elements described below, one first specifies all the valid initialization states. Then one specifies the valid transitions from one state to the next. That’s it! These two declarations serve to define all valid behaviors of the system.
The syntax of TLA+ can be described as combining three components which we express heuristically as:
TLA+ = First Order Logic + (Linear) Temporal Logic + (Naive) Set Theory
The first component means TLA+ uses all the standard logical connectives and quantifiers, e.g. “and”, “or”, “not”, “implies”, “for all”, and “there exists”. The second component means TLA+ has temporal “modal” operators intuitively expressing the temporal concepts of “always” and “eventually.” Finally, the third component means that everything expressed in TLA+ is a set, though the set theory involved in TLA+ is very basic, including notions like “subset” and “element of” and contains none of the sophisticated mathematics that set theorists study.
In traditional formal methods, after writing a specification we would then prove theorems which hold for all behaviors. Indeed, TLA+ has a tool called the TLA Proof System (TLAPS) which facilitates the construction of proofs but since we are currently only describing lightweight applications we will not discuss this aspect further. (See Specifying Systems for further details on TLAPS.) Rather than proving theorems, which is difficult, time consuming, and costly, we are interested in the easy, fast, and inexpensive benefits which come from the clarity that formal specifications provide and finding unexpected behaviors through model checking.
The Benefits of Formal Specification
There are several benefits to composing a formal specification even without the intention to do any theorem proving or model checking down the road. We regard these benefits as the dramatic low-hanging fruits of lightweight formal methods. These low-hanging fruits are why we are so passionate about our use of TLA+ internally at Volatility Group and also about communicating these advantages to the wider DeFi community. There are three benefits to writing a specification of a protocol in a formal language:
- The specification serves as a blueprint against which any implementation can be compared and judged to be correct or incorrect.
- Writing a specification in a mathematical language rather than natural language removes all ambiguities.
- The act of formalizing clarifies the thinking process.
Regarding the first point, it is important to notice that a formal blueprint is a necessary and sufficient condition for evaluating a piece of software for correctness. Without a specification, the code itself serves as it’s own trivial blueprint in the sense that any and all behavior permitted by the implementation is “correct” by definition. Unintended behaviors, especially logical hacks of protocols, reveal this state of affairs to be unacceptable. A formal mathematical specification fixes this issue.
The third point is a bit more subtle than the first two and requires actual experimentation with formal specification to really “get it.” Writing a specification reveals errors in thinking which are extremely difficult to discover. Turing award winner Leslie Lamport, the creator of TLA+, summarized this by saying “Writing is nature’s way of telling us how lousy our thinking is.”
This is perhaps a good time to address the proper place of a formal specification in the software development workflow. Formal specification should ideally be completed before any implementation is begun. At Volatility Group we became involved with TLA+ only after the initial design and implementation of the DAOracle protocol in Solidity. Nonetheless, writing our TLA+ DAOracle specification ex post facto served to clarify ambiguities and shortcomings in the initial implementation and to serve as a blueprint for the next iteration of the protocol. Going forward, Volatility Group intends to implement the proper workflow of writing a specification before any coding for all future protocols.
The Benefits of Model Checking
A model checker is a computational tool that exhaustively searches the state space of a finite state machine for counterexamples of an expected property. If the state machine is not finite, i.e. the number of possible states is infinite, then a model checker may be used to search a finite subset of the infinite state space. If no counterexample is found in the finite case then the model checker has verified the property holds for all behaviors of the system. In the infinite case the model checker can only explore a finite subset X and therefore may have missed a counterexample to the property that falls outside X. In spite of this limitation, there is empirical folk wisdom that small counterexamples often exist for real-world system models and can be discovered with a reasonable amount of computation using model checkers. Modern model checking algorithms are highly optimized to avoid redundant calculations. Very large state space may be exhaustively explored by exploiting symmetries. Furthermore, many model checkers translate the problem into a statement of propositional logic, thus leveraging recent dramatic advances in SAT-solver algorithms. TLA+ has a model checker, TLC, which can be used to perform model checking for models corresponding to a large subset of the TLA+ language. Model checking will be discussed at length in a future entries in this series.
In the next entry in this series on applications of lightweight formal methods to decentralized finance, we explore our specification of the UMA Optimistic Oracle in detail. In particular, we illustrate how sitting down to translate documentation written in natural language into TLA+ revealed to us some issues that we didn’t even realize we didn’t understand thoroughly! Furthermore the resolution of these issues cannot be found in UMA’s user documentation but only by a close examination of the Solidity source code implementing the Optimistic Oracle. This process sheds light on three important principles which we hope to convince you are reasonable.
- A completely unambiguous, mathematically precise description of any decentralized protocol should be made available by the creators.
- Examining source code in order to understand an algorithm is an inherently insecure methodology.
- A formal specification should be developed before any implementation work begins.
We hope you are intrigued by TLA+ and look forward to feedback that will help shape future articles. Meanwhile see below for references to further information on TLA+.
Lamport, Leslie. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. This is the definitive reference for TLA+, written by the inventor.
Wayne, Hillel. Practical TLA+: Planning Driven Development. This is an excellent introduction to TLA+, though it emphasizes PlusCal, a pseudocode-like language that compiles to TLA+. Also see Wayne’s online tutorial at learntla.com.