Circuit Design

1. INTRODUCTION

WASM (or WebAssembly) is an open standard binary code format close to assembly. Its initialobjective is to provide an alternative to java-script with better performance in the current webecosystems. Benefiting from its platform independence, front-end flexibility (can be compiled fromthe majority of languages including C, C++, assembly script, rust, etc.), good isolated runtimeand speed that is close to native binary, its usage starts to arise in the distributed cloud and edgecomputing. Recently it has become a popular binary format for users to run customized functionson AWS Lambda, Open Yurt, AZURE, etc.

The Problem. To implement a ZKSNARK-backed WASM virtual machine, we need to connect the implementation of WASM runtime with the proof system of ZKSNARK. In general, a ZKSNARK system is represented in arithmetic circuits with polynomial constraints. Therefore we need to abstract the full imperative logic of a WASM virtual machine systematically and rewrite it into arithmetic circuits with constraints. Given two outputs, one is generated by emulating the WASM bytecode in WASM runtime that enforces the semantics of WASM specification, and the other satisfies the constraints imposed on the arithmetic circuits. If the circuits we write preserve the semantics, these two outputs must be the same. Hence the proof of the ZKSNARK derived from the circuits also shows that the output is valid as a result of emulating the bytecode in WASM runtime.

We consider the WASM virtual machine as a gigantic program, with the input as a tuple $(I(C, H), E, IO)$ ,where $I$ is a WASM executable image that contains a code image $C$ and an initial memory $H$ , $E$ is its entry point, and $IO$ represents the (stdin, stdout) firmware. In the serverless setup, the WASM run-time starts with an initial state based on the loaded image $I$ , then jumps to the entry point $E$ and starts executing the bytecode based on the WASM specification.

Internally the WASM run-time maintains a state $S$ denoted by a tuple ( $iaddr$ , $F$ , $M$ , $G$ , $Sp$ , $I$ , $IO$ ) where $iaddr$ is the current instruction address, $F$ is the calling frame with a depth field, $M$ is the memory state, $Sp$ is the stack and $G$ is the set of global variables. The run-time simulates the semantic of each instruction start at $E$ until it reaches the exit. The instructions it simulates form an execution trace $[t_0, t_1, t_2, t_3, \cdots]$ and each transition $t_i$ is a function between states that takes an input $s: S$ and outputs a new state $s': S$ .

For simplicity, we will use the notation of record field to specify a field in state $s:S$ . For example, $s.iaddr$ denotes the current instruction address of state $s$ , $s.IO.stdin$ denotes the input of state $s$ , etc. We also use $s.iaddr.op$ to denote the opcode (operation code that specifies the operation to be performed) at address $s.iaddr$ in the code section $C$ of image $I$ .

Based on the above definition, we define the criteria for a list of state transitions to be validunder $(I(C, H), E, IO)$ , as follows.

-- Definition 2.1 (Valid Execution Trace). Given a WASM machine with input $(I(C, H), E, IO)$ , and $s_0$ is the initial state with $s_0.iaddr = E$ . A valid execution trace is a list of transition functions $t_i$ suchthat the following holds: (1) For all $k$ , $s_k = t_{k-1} \circ \cdots \circ t_1 \circ t_0 (s_0)$ , $t_k$ enforces the semantics of $s_k.iaddr.op$ . (2) If $s_e$ is the last state, then the depth of the calling frame is zero: $se.F.depth = 0$.

Organization of the document. After a brief introduction to the basic ideas about how to connect a stateful virtual machine with ZKSNARK in Section 1, we describe the basic building block and ingredients used to construct ZKWASM circuits in Section 2 and then present the circuits architecture in Section 3. After the architecture is settled, we discuss the circuits of every category of WASM instructions in Section 4. In the end, we present the partition and proof batching technique to solve the long execution trace problem.

Throughout the document, we use the notation $a:A$ to specify a variable of type $A$ , $F$ to specify the number field, and $F_n$ to specify a multi-dimensional vector with dimension $n$ . We denote by $A \rightarrow B$ the function type from $A$ to $B$ and use $\circ$ for function composition. Moreover, we use $G[i][j]$ to specify the value of the cell of matrix $G$ at the $i$ th row and $j$ th column.

PreviousBuild a Rollup Protocol NextZKWASM Circuits

Last updated 7 months ago