Circuit Design

1. INTRODUCTION

WASM (or WebAssembly) is an open standard binary code format close to assembly. Its initialobjective is to provide an alternative to java-script with better performance in the current webecosystems. Benefiting from its platform independence, front-end flexibility (can be compiled fromthe majority of languages including C, C++, assembly script, rust, etc.), good isolated runtimeand speed that is close to native binary, its usage starts to arise in the distributed cloud and edgecomputing. Recently it has become a popular binary format for users to run customized functionson AWS Lambda, Open Yurt, AZURE, etc.

The Problem. To implement a ZKSNARK-backed WASM virtual machine, we need to connect the implementation of WASM runtime with the proof system of ZKSNARK. In general, a ZKSNARK system is represented in arithmetic circuits with polynomial constraints. Therefore we need to abstract the full imperative logic of a WASM virtual machine systematically and rewrite it into arithmetic circuits with constraints. Given two outputs, one is generated by emulating the WASM bytecode in WASM runtime that enforces the semantics of WASM specification, and the other satisfies the constraints imposed on the arithmetic circuits. If the circuits we write preserve the semantics, these two outputs must be the same. Hence the proof of the ZKSNARK derived from the circuits also shows that the output is valid as a result of emulating the bytecode in WASM runtime.

corres of execution.

We consider the WASM virtual machine as a gigantic program, with the input as a tuple (I(C,H),E,IO)(I(C, H), E, IO) ,where II is a WASM executable image that contains a code image CC and an initial memory HH, EE is its entry point, and IOIO represents the (stdin, stdout) firmware. In the serverless setup, the WASM run-time starts with an initial state based on the loaded image II, then jumps to the entry point EE and starts executing the bytecode based on the WASM specification.

Internally the WASM run-time maintains a state SS denoted by a tuple (iaddriaddr, FF, MM, GG, SpSp, II, IOIO) where iaddriaddr is the current instruction address, FF is the calling frame with a depth field, MM is the memory state, SpSp is the stack and GG is the set of global variables. The run-time simulates the semantic of each instruction start at EE until it reaches the exit. The instructions it simulates form an execution trace [t0,t1,t2,t3,][t_0, t_1, t_2, t_3, \cdots] and each transition tit_i is a function between states that takes an input s:Ss: S and outputs a new state s:Ss': S.

For simplicity, we will use the notation of record field to specify a field in state s:Ss:S. For example, s.iaddrs.iaddr denotes the current instruction address of state ss, s.IO.stdins.IO.stdin denotes the input of state ss, etc. We also use s.iaddr.ops.iaddr.op to denote the opcode (operation code that specifies the operation to be performed) at address s.iaddrs.iaddr in the code section CC of image II.

Based on the above definition, we define the criteria for a list of state transitions to be validunder (I(C,H),E,IO)(I(C, H), E, IO), as follows.

-- Definition 2.1 (Valid Execution Trace). Given a WASM machine with input (I(C,H),E,IO)(I(C, H), E, IO), and s0s_0 is the initial state with s0.iaddr=Es_0.iaddr = E. A valid execution trace is a list of transition functions tit_i suchthat the following holds: (1) For all kk, sk=tk1t1t0(s0)s_k = t_{k-1} \circ \cdots \circ t_1 \circ t_0 (s_0), tkt_k enforces the semantics of sk.iaddr.ops_k.iaddr.op. (2) If $s_e$ is the last state, then the depth of the calling frame is zero: $se.F.depth = 0$.

Organization of the document. After a brief introduction to the basic ideas about how to connect a stateful virtual machine with ZKSNARK in Section 1, we describe the basic building block and ingredients used to construct ZKWASM circuits in Section 2 and then present the circuits architecture in Section 3. After the architecture is settled, we discuss the circuits of every category of WASM instructions in Section 4. In the end, we present the partition and proof batching technique to solve the long execution trace problem.

Throughout the document, we use the notation a:Aa:A to specify a variable of type AA, FF to specify the number field, and FnF_n to specify a multi-dimensional vector with dimension nn. We denote by ABA \rightarrow B the function type from AA to BB and use \circ for function composition. Moreover, we use G[i][j]G[i][j] to specify the value of the cell of matrix GG at the ii th row and jj th column.

Last updated