Formally Verifying WebAssembly - A Soroban Case Study

October 15, 2024

Formally Verifying WebAssembly

A Soroban Case Study

Author:

Editors:

WebAssembly (or Wasm) has recently witnessed significant adoption in blockchains like Arbitrum, Stellar, and Polkadot. For example, in the case of the Stellar blockchain, users write smart contracts on the Soroban platform. These programs are written in Rust and compiled to Wasm which is deployed on the Stellar blockchain within an isolated virtual machine.

Smart contracts are designed to carry out financial transactions involving millions of dollars. Bugs in these programs are known to have severe consequences. Naturally, formal verification plays a critical role in this domain.

In this blog, we present Certora’s recent efforts towards formally verifying Wasm bytecode.

What Is Wasm?

Wasm is a low-level binary instruction format designed for safe and fast execution, primarily in web environments but also increasingly used in non-browser settings like serverless computing and IoT. It acts as a compilation target for higher-level languages like C, C++, and Rust, and is interoperable with JavaScript, making it easy to integrate into existing web applications. Wasm is “platform-agnostic”, meaning that the same binary can run on any device or operating system, as long as the environment supports Wasm (e.g., most modern web browsers do).

Wasm is executed within a safe, sandboxed environment, which ensures that malicious or faulty code does not affect the host system. When compiling Wasm to native code, Wasm compilers (and interpreters) insert checks to make sure that the Wasm program is running in its own sandboxed memory region [1].

Due to the aforementioned features, Wasm is increasingly being used in the DeFi space.

As is common knowledge, vulnerabilities in smart contracts can be exploited to cause massive financial losses. Formal verification can help improve the security of smart contracts. In the past decade, verification of high-level functional correctness of smart contracts has therefore become a heavily studied topic, leading to the development of many tools that rely on techniques like interactive verification (e.g., Runtime Verification), automated verification (e.g., Certora Prover and Halmos), and symbolic execution (e.g., Hevm). There are also many fuzzing and bug-finding tools that target smart contracts, several of which are covered in a recent survey on threat mitigation for smart contracts.

With the increasing use of Wasm in blockchains, ensuring the correctness of Wasm programs is of utmost importance. Formal verification techniques allow us to accomplish this.

Formal Verification At a Glance

“Program testing can be used to show the presence of bugs, but never to show their absence!”

― Edsger W. Dijkstra

The primary distinguishing factor between formal verification and fuzzing is that the former can generate a proof that the program satisfied the specification for all inputs, whereas the latter can only check this on a set of concrete values and is generally not capable of covering all inputs.

In this blog, we will focus on automated formal verification that uses Satisfiability Modulo Theory (SMT) solvers. These tools take two inputs: a program to be verified and a property the program must satisfy.

These properties are typically written using pre- and post-conditions. Informally, they state that if the precondition “P” is true before program “S” executes, then the postcondition “Q” must hold after “S” executes, assuming that the execution of “S” terminates. This style of reasoning is attributed to the legendary computer scientist, Tony Hoare.

Next, the program and the property are compiled to generate a verification condition (or VC) [9]. By construction, the VC is a valid mathematical formula if and only if the program is correct with respect to the property, i.e., it always satisfies the property. If the VC is invalid, that means the program violates the property for some input, i.e., there is a bug.

When using SMT solvers for verification, checking the validity of the VC is typically reduced to a satisfiability problem by negating it. An example of a satisfiable formula is x * 3 + y = 0: this formula holds for x = 0, y = 0. An example of an unsatisfiable formula is (x XOR y) AND (x == y) because both x XOR y and x == y cannot be true at the same time.

In the context of formal verification, if the negated VC is unsatisfiable (UNSAT), that means the original VC is valid. This UNSAT result indicates that the program is correct and behaves as specified by the property on all inputs. If it is satisfiable (SAT), that indicates an input exists where the program violates the property.

Let’s look at an example to make this more concrete. Consider the following simplified Rust function from a Soroban smart contract that transfers some amount between two accounts (from to to). Here, spend_balance reads the current balance of from, and updates it by subtracting amount from it. receive_balance similarly updates the balance of to by adding amount to it. check_nonnegative makes sure that amount >= 0.

   pub fn transfer(e: &Env, from: Address, to: Address, amount: u64) {
       check_nonnegative(amount);
       spend_balance(e, from.clone(), amount);
       receive_balance(e, to.clone(), amount);
   }

What are some properties we would like transfer to satisfy?

For one, after transfer is done executing, the balance of from should be amount less than it was before transfer was called and the balance of to should be amount more than it was before transfer was called.

Another property is that transfer should only affect the balances of from and to. Any address different from either of these should not be impacted by transfer. These are examples of high-level functional correctness properties.

In SMT-based automated formal verification tools, users write these properties in a specification language and the tool then checks the program against the specification as described above.

For the past six years, Certora has been developing tools for checking such properties for smart contracts running on the Ethereum and Solana blockchains. In the case of Ethereum, Certora targets the low-level bytecode that runs on the Ethereum Virtual machine (EVM bytecode) and for Solana, it targets the low-level Solana Binary Format (SBF). Certora provides specification languages for writing properties. In the case of Ethereum, users write their code in Solidity and their specification in CVL (Certora Verification Language) which is designed to be similar to Solidity. In the case of Solana, Certora provides a lightweight Rust library that allows users to write their properties in Rust. The Certora prover then checks that the property is satisfied by the low-level bytecode that runs on these blockchains.

Formal Verification of Wasm Generated From Rust

The formal verification academic research community has studied Wasm in depth, which has led to developing verified compilers from Wasm to native code [2], tools for checking memory isolation [3] and verifying instruction selection [4] when generating native code from Wasm, mechanized Wasm semantics [8], and developing memory safe extensions to Wasm [5].

Recently, we at Certora have been developing Sunbeam, a new tool for verifying the high-level functional correctness of Rust smart contracts. Sunbeam allows users to write correctness properties using a lightweight specification language embedded in Rust, and then verifies that the smart contract satisfies the properties. Crucially, Sunbeam does not check the correctness of the Rust source code; it first compiles the Rust to Wasm bytecode and verifies that the generated Wasm satisfies the property. This way, Sunbeam verifies code that is actually deployed on the blockchain and does not require trusting the Rust compiler.

Figure 1 shows the detailed workflow of Sunbeam. Sunbeam uses the Rust compiler to generate Wasm from the source code and the user written specification. It then compiles the Wasm to Certora’s intermediate representation called TAC, which is a three-address register machine. Once TAC is generated, Sunbeam performs semantics preserving rewrites to simplify the TAC. The TAC is ultimately compiled to a logical formula for SMT solvers to solve (as explained in the previous section). Finally, the outcome of the verification is shared as an interactive web-based report with the users.

Figure 1. Workflow of Certora Sunbeam. It takes as input a smart contract and its specification written in Rust, (2) compiles both to generate Wasm, (3) translates the Wasm to Certora’s internal intermediate representation called TAC, (4) performs semantics preserving transformations on TAC to simplify the code, (5) converts the code and the specification to a VC, (6) uses off-the-shelf SMT solvers to check the validity of the VC by reducing it to a satisfiability problem, and finally (7) shows the result of verification in a report.

Currently, Sunbeam targets Soroban smart contracts that are deployed in the form of Wasm bytecode in the Stellar blockchain. Soroban’s design makes these programs particularly suitable for formal verification—a contract’s environment is separated into a guest and a host where a lot of the complex functionality is offloaded to the host. This makes the contract code simpler and smaller. Verification of these programs can therefore be reduced to verifying that the contract code itself is correct, under the assumption that the host environment is correctly implemented.

Going back to the transfer method we saw above, let’s try to write some of the properties we stated previously. The first we stated can be written for Sunbeam as follows:

#[rule]
fn transfer_is_correct(e: Env, to: Address, from: Address, amount: i64) {
   cvt::require!(to != from, "addresses are different");
   // read balance of `from` before transfer
   let balance_from_before = Token::balance(&e, from.clone());
   // read balance of `to` before transfer
   let balance_to_before = Token::balance(&e, to.clone());
   // call transfer
   Token::transfer(&e, from.clone(), to.clone(), amount);
   // read balance of `from` after transfer
   let balance_from_after = Token::balance(&e, from.clone());
   // read balance of `to` after transfer
   let balance_to_after = Token::balance(&e, to.clone());
   cvt::assert!(balance_to_after == balance_to_before + amount);
   cvt::assert!(balance_from_after == balance_from_before - amount);
}

The second one about transfer not affecting any address other than from and to can be stated as:

#[rule]
fn effect_on_other(e: Env, amount: i64, from: Address, to: Address, other: Address) {
   cvt::require!(to != other && from != other, "addresses are all different");
   // read balance before
   let balance_other_before = Token::balance(&e, other.clone());
   // call transfer
   Token::transfer(&e, from.clone(), to.clone(), amount);
   // read balance after
   let balance_other_after = Token::balance(&e, other.clone());
   cvt::assert!(balance_other_after == balance_other_before);
}

Here, cvt::require! and cvt::assert! are two macros that Sunbeam interprets as pre- and post-conditions. The #[rule] attribute adds the rules to the Wasm binary in a dedicated section and prevents the Rust compiler from mangling or removing it.

Sunbeam then uses the Rust compiler to generate Wasm from the above code and verifies the property by using SMT solvers, as mentioned above. The exact details on how to run Sunbeam to verify the above code can be found in our tutorial to be presented at Meridian 2024: https://github.com/Certora/meridian2024-workshop.

Challenges of Verifying Low-level Wasm

Verifying the low-level Wasm bytecode has the advantage that it prevents us from having to trust the Rust compiler. Compilers, like any other software, are prone to errors, as has been shown by tools like CSmith [6] and RustSmith [7]. Last year, a bug in the Vyper compiler that generates EVM bytecode from Vyper smart contracts led to a severe security vulnerability in the Curve protocol resulting in the loss of millions of dollars. Recently, Certora and collaborators found a bug in an optimization within LLVM that could have been exploited to draw funds from Aave's instance on ZKsync. Therefore, being able to verify the code that will ultimately be deployed and executed is beneficial.

On the other hand, verifying properties at the Wasm level comes with its challenges.

Wasm offers very few data types (32 and 64 bit signed integers are the ones relevant to our current use case). Therefore, Rust programs that have complex data types look very different when compiled to Wasm. As a simple example, if a Rust program performs some computation over signed 128 bit (i128) values, the compiled Wasm will represent the i128 types using two i64s (64 bit signed integer). If the result is also an i128, it will be stored as two i64s representing the higher and lower order bits. Even with tools like wabt, that let users see a textual representation of the Wasm binary, it may not be obvious from looking at the Wasm code, what Rust code it corresponds to.

When a low-level language like Wasm is used in a blockchain setting, this has even deeper consequences. On the one hand, if the bytecode corresponding to the user written smart contract is too big and complex, it will incur additional gas cost which is not desirable. On the other hand, it is also important for smart contract developers to be able to use custom data types like maps, strings, and structs in their programs which inevitably make the bytecode complex.

The Soroban platform handles these two conflicting requirements by separating the contract environment into the guest and the host. The host environment implements a wide range of functionality that allows the creation of complex data types as immutable “host objects” that are only accessible by the guest environment through object handles (represented as integers). These functions are simply “imported” and used in the Wasm module corresponding to the user written smart contract, which significantly reduces smart contract’s bytecode size. Communication between the guest and host happens via “Soroban Value Types” which involves wrapping and unwrapping objects to and from these values.

What all this means is that while Wasm is the binary format used by Soroban, under the hood, there is an entire domain specific language that corresponds to the Soroban platform, that is essentially embedded in Wasm. A verification tool like Sunbeam that targets Wasm, must therefore be able to reason not only about the Wasm semantics, but also about the semantics of the embedded language, in this case the “Soroban language”. Naively encoding Wasm programs into verification conditions without taking into account the semantics of the Soroban environment functionalities is doomed to fail. At the same time, including the entire Soroban environment implementation in the code to be verified would not be scalable. Sunbeam addresses this challenge by faithfully modeling and implementing the Soroban host functionality in Certora’s intermediate representation, TAC (see Figure 1). Whenever a smart contract invokes these functions, Sunbeam replaces them with its model of the Soroban environment in TAC.

Once verification is done, Sunbeam shows the user a report that summarizes the results of verification as shown in Figure 2. The outcome is either “Verified” or “Violated”. Since formal verification is an undecidable problem in general, in some cases, it is possible that Sunbeam will not be able to verify the property and will “Timeout” or return “Unknown”. This happens when the verification query is too complex for the SMT solvers to handle. The report is interactive and shows call trace information and summarizes important metrics about the complexity of the verification task.

Output generated by Sunbeam: it shows a verification report that shows the properties that were checked by Sunbeam and their statuses (Verified, Violated, etc).

Figure 2. Output generated by Sunbeam: it shows a verification report that shows the properties that were checked by Sunbeam and their statuses (Verified, Violated, etc).

Conclusions

In summary, we are presenting Sunbeam, a new formal verification tool that targets Wasm; specifically it checks that high-level functional correctness properties written for Soroban smart contracts in Rust are satisfied by the compiled Wasm. For that, it offers a light-weight specification language in Rust.

We are actively working on improving Sunbeam. We recently used Sunbeam to verify several properties of the Reflector oracle protocol, specifically for the Subscription and DAO contracts. An early version of Sunbeam is available for users to try and can be found by registering here: https://www.certora.com/. Our workshop content from Meridian 2024 demonstrating how to use Sunbeam can be accessed here: https://github.com/Certora/meridian2024-workshop. Since Sunbeam is still under development, we encourage users to reach out and give us feedback. We look forward to hearing from you!

Going Forward

Longer term, we are eager to extend Sunbeam for verifying Wasm generated not only from Soroban programs, but even from domains beyond smart contracts and the blockchain. Due to Wasm’s memory isolation, safety, and near-native performance, it has been adopted in many areas and continues to gain popularity. We are excited to learn about other uses of Wasm and to explore how we can use Sunbeam to formally verify the correctness of Wasm programs for those domains. Stay tuned for more updates and future blog posts!

Acknowledgments

We are grateful to everyone who has contributed to this project. Sunbeam is the result of the collaborative efforts of an amazing team that includes Alexander Bakst, Eric Eilebrecht, Johannes Späth, Rahav Yairi, Sameer Arora, Ondrej Kurák, Netanel Rubin-Blaier, Gadi Auerbach, Anastasia Fedotov, Jaroslav Bendik, Alexander Nutz, Arie Gurfinkel, and Chandrakana Nandi.

References

[1] E. Johnson et al., "WaVe: a verifiably secure WebAssembly sandboxing runtime," 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2023.

[2] Jay Bosamiya, Wen Shih Lim, and Bryan Parno, “Provably-Safe Multilingual Software Sandboxing using WebAssembly,” USENIX Security 22, Boston, MA, USA, 2022.

[3] Evan Johnson, David Thien, Yousef Alhessi, Shravan Narayan, Fraser Brown, Sorin Lerner, Tyler McMullen, Stefan Savage, Deian Stefan, “Довер´яй, но провер´яй: SFI safety for native-compiled Wasm,” Network and Distributed Systems Security (NDSS) Symposium 2021.

[4] Alexa VanHattum, Monica Pardeshi, Chris Fallin, Adrian Sampson, and Fraser Brown. 2024. Lightweight, Modular Verification for WebAssembly-to-Native Instruction Selection. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

[5] Alexandra E. Michael, Anitha Gollamudi, Jay Bosamiya, Evan Johnson, Aidan Denlinger, Craig Disselkoen, Conrad Watt, Bryan Parno, Marco Patrignani, Marco Vassena, and Deian Stefan. 2023. “MSWasm: Soundly Enforcing Memory-Safe Execution of Unsafe Code,” Proc. ACM Program. Lang. 7, POPL, Article 15 (January 2023).

[6] Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. SIGPLAN Not. 46, 6 (June 2011), 283–294. https://doi.org/10.1145/1993316.1993532

[7] Mayank Sharma, Pingshi Yu, and Alastair F. Donaldson. 2023. RustSmith: Random Differential Compiler Testing for Rust. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 1483–1486. https://doi.org/10.1145/3597926.3604919

[8] Conrad Watt, “Mechanising and evolving the formal semantics of WebAssembly: the Web's new low-level language”, https://www.isa-afp.org/entries/WebAssembly.html

[9] Mike Barnett and K. Rustan M. Leino. 2005. Weakest-precondition of unstructured programs. SIGSOFT Softw. Eng. Notes 31, 1 (January 2006), 82–87. https://doi.org/10.1145/1108768.1108813