τeaƒeλ

timoфey

Understanding GKR

Posted at # ZK # cryptography

Cryptographic work, such as commitments, is often the primary performance bottleneck in SNARKs. This cost is particularly pronounced when committed values are random and necessarily large, as is the case with PLONKish systems.

In recent years, a lot of innovation have been aimed at making commitments as cheap as possible. Circle STARKs and Binius, in particular, are hash-based and sound1 over small fields, M31 and GF[2] respectively. This means lower prover overhead and better hardware compatibility.

However, it should be noted that FRI, the scheme behind these improvements, involves superlinear-time procedures like FFTs, which become a new bottleneck when applied directly to large computations without recursion.

In this note, I will explore GKR, an interactive proof (IP) scheme that addresses cryptographic overhead differently—by nearly avoiding commitments in the first place. My primary aim is to accurately summarize available materials and research. I am very open to changes and improvements, so if anything is unclear or incorrect, please leave comments here.

Background

GKR08 is a relatively old scheme that is based on multivariate polynomials and the sum-check protocol. A technology that was largely ignored in favor of simpler designs featuring univariate polynomials and divisibility check. Lately, however, it has seen renewed interest as projects like Jolt, Expander, and Modulus have demonstrated not only its viability but also impressive performance results.

Notably, modern GKR-based systems demonstrate O(n)O(n) prover complexity, which is linear in the size of the computation and with a constant factor overhead. For some applications, like matrix multiplication, the prover is less than 10x slower than a C++ program that simply evaluates the circuit, Tha13.

Furthermore, the aforementioned Binius is multivariate and it too involves sumcheck. Even StarkWare’s Circle-Stark-based proof system called Stwo used GKR for LogUp lookups. I think it is safe to say that there is currently a broad consensus on the relevance of GKR.

Multilinear functions & MLEs

Multilinear polynomial is a multivariate polynomial that is linear in each variable, meaning it has degree at most one in each variable. When multilinear polynomials defined over the boolean domain, they have a much lower degree compared to the univariate case for the same domain size, T23a.

For any multilinear polynomial P(x1,x2,,xv)P\left(x_1, x_2, \ldots, x_v\right) over the boolean domain {0,1}v\{0,1\}^v (the boolean hypercube), polynomial operations such as addition, multiplication, and evaluation can be performed using only the evaluations of PP on the boolean hypercube. This eliminates the need to explicitly reconstruct the polynomial from its evaluations, so there’s no FFTs.

Multilinear extension (MLE) is used to translate functions into polynomials over the boolean domain {0,1}v\{0,1\}^v. Every function ff and vector a\vec{a} mapping from {0,1}vF\{0,1\}^v \rightarrow \mathbb{F} has exactly one extension polynomial that is multilinear.

The multilinear extension is a multivariate analog of the low-degree extensions (LDE) commonly present in STARKs. One thinks of MLE(f)\operatorname{MLE}(f) as an “extension” of aa, as MLE(a)\operatorname{MLE}(a) “begins” with aa itself but includes a large number of additional entries. This distance-amplifying nature of MLE, combined with the Schwartz-Zippel lemma, forms the first basic building block of multivariate interactive proofs and GKR in particular.

Sumcheck

The Sumcheck protocol allows a prover P\mathbf{P} to convince a verifier V\mathbf{V} that the sum of a multivariate polynomial over boolean inputs [xi{0,1}][x_i \in \{0, 1\}] is computed correctly.

H:=(x1,,xv){0,1}g(x1,x2,,xv)H := \sum_{(x_1,\cdots,x_v) \in \{0,1\}} g(x_1, x_2, \cdots, x_v)

Sumcheck does not require V\mathbf{V} to know anything about the polynomial to which it is being applied. It is only until the final check in the protocol that, depending on the performance/succinctness tradeoff, V\mathbf{V} can choose to either request the polynomial and evaluate it directly or perform an oracle query, i.e., outsource the evaluation to P\mathbf{P} via a polynomial commitment scheme (PCS).

This is an interactive protocol: If \ell is the number of variables in gg, then \ell rounds are required to complete the protocol. By applying the Fiat-Shamir transform, we can render the sumcheck non-interactive.

This post doesn’t aim to explain the general workings of sumcheck. For that, I recommend T23a (section 4.1) or this blog post.

GKR

In GKR, we work with layered circuits. A circuit is layered if it can be partitioned into layers such that every wire in the circuit is between adjacent layers, L23. The number of layers is the depth of the circuit, denoted as dd. Note that many of today’s variations of GKR allow for more arbitrary topologies just as well.

The arithmetic circuit C\mathcal{C} encodes logic with addition and multiplication gates to combine values on the incoming wires. Accordingly, functions addi\operatorname{add}_i and muli\operatorname{mul}_i are gate selectors that together constitute the wiring predicate of layer ii.

We encode wire values on a boolean hypercube {0,1}n\{0,1\}^n, creating multi-linear extensions W~i(x)\widetilde{W}_i(x) for each layer ii. The output is in W~0(x)\widetilde{W}_0(x), and inputs will be encoded in W~d(x)\widetilde{W}_d(x).

Gate selectors depend only on the wiring pattern of the circuit, not on the values, so they can be evaluated by the verifier locally, XZZ19. Each gate aa at layer ii has two unique in-neighbors, namely in1(a)\operatorname{in}_1(a) and in2(a)\operatorname{in}_2(a).

addi(a,b,c)={1 if (b,c)=(in1(a),in2(a))0 otherwise \operatorname{add}_i(a, b, c)= \begin{cases}1 & \text { if }(b, c)=\left(\operatorname{in}_1(a), \operatorname{in}_2(a)\right) \\ 0 & \text { otherwise }\end{cases}

and muli(a,b,c)=0\operatorname{mul}_i(a, b, c)=0 for all b,c{0,1}i+1b, c \in\{0,1\}^{\ell_{i+1}} (the case where gate aa is a multiplication gate is similar), T23a. Selector MLEs are sparse, with at most 222^{2\ell} non-zero elements out of 0,13{0,1}^{3\ell}.

addi(x,y,z)={1W~i(x)=W~i+1(y)+W~i+1(z)0 otherwise muli(x,y,z)={1W~i(x)=W~i+1(y)W~i+1(z)0 otherwise \begin{aligned} \operatorname{add}_i(x, y, z) & = \begin{cases}1 & \widetilde{W}_i(x)=\widetilde{W}_{i+1}(y)+\widetilde{W}_{i+1}(z) \\ 0 & \text { otherwise }\end{cases} \\ \operatorname{mul}_i(x, y, z) & = \begin{cases}1 & \widetilde{W}_i(x)=\widetilde{W}_{i+1}(y) \cdot \widetilde{W}_{i+1}(z) \\ 0 & \text { otherwise }\end{cases} \end{aligned}

Note, the above definition does not reflect how selector MLEs are computed in practice, it is an effective method for illustrating the relationship between selectors and wire values.

The GKR prover starts with the output claim and iteratively applies the sumcheck protocol to reduce it from one layer to the next until it arrives at the input. The values on the layers are related thusly:

W~i(x)=y,z{0,1}i+1(addi(x,y,z)(W~i+1(y)+W~i+1(z))+muli(x,y,z)(W~i+1(y)W~i+1(z)))(1)\widetilde{W}_i(x)=\sum_{y, z \in\{0,1\}^{\ell_{i+1}}}\binom{\operatorname{add}_i(x, y, z) \cdot(\widetilde{W}_{i+1}(y)+\widetilde{W}_{i+1}(z))}{+\operatorname{mul}_i(x, y, z) \cdot(\widetilde{W}_{i+1}(y) \cdot \widetilde{W}_{i+1}(z))} \tag{1}

A few things to note here: First, notice how gate selectors are indexed—this aligns with the convention that selectors at layer ii determine how values from the next layer i+1i+1 are combined. Notice also that the ii-layer sumcheck is over boolean inputs of size i+1\ell_{i+1}, i.e., the number of gates in the next layer.

Protocol

  1. P\mathbf{P} sends the output vector ω\vec{\omega} and claims that ω~=W~0\tilde{\omega} = \widetilde{W}_0.
  2. V\mathbf{V} sends random r0Er_0 \in \mathbb{E} and computes m0:=ω~(r0)m_0 := \tilde{\omega}(r_0).
  3. P\mathbf{P} and V\mathbf{V} apply sumcheck on the relation between W0W_0 and W1W_1 (using fr0f_{r_0} for the summand) y,z{0,1}1fr0(y,z)=?m0\sum_{y, z \in\{0,1\}^{\ell_1}} f_{r_0}(y, z) \stackrel{?}{=} m_0
  4. P\mathbf{P} and V\mathbf{V} reduce two claims Wi+1(b)W_{i+1}(b) and Wi+1(c)W_{i+1}(c) to a single random evaluation/combination mim_i.
  5. P\mathbf{P} and V\mathbf{V} apply sumcheck on the reduced relation mim_i, alternating steps 4-5 for d1d-1 more times.
  6. V\mathbf{V} checks that W~d\widetilde{W}_d is consistent with the inputs vector x\vec{x}.

1. P\mathbf{P} sends the output vector ω\vec{\omega} and claims that ω~=W~0\tilde{\omega} = \widetilde{W}_0

First, the P\mathbf{P} has to evaluate the circuit at the given input vector x\vec{x}. This is why the prover is always at least linear to the size of the computation (circuit).

A common notation would be to write the output vector ω\vec{\omega} as a function D:{0,1}0FD: \{0,1\}^{\ell_0} \rightarrow \mathbb{F} mapping output gate labels to output values, 202^{\ell_0} of them in total. The gate labels are the vertices of the boolean hypercube; for example, for 0=2\ell_0=2, the 4 output labels are 0000, 0101, 1010, 1111.

In practice, though, P\mathbf{P} and V\mathbf{V} work with polynomials, not functions, so they have to compute multilinear extension ω~\tilde{\omega} from the vector ω\vec{\omega}. Same goes for the inputs xW~d\vec{x} \rightarrow \widetilde{W}_d. However, since computation on MLEs can be performed over evaluations, in practice no additional computation, like interpolation/IFFT, is even required!

2. V\mathbf{V} sends random r0Er_0 \in \mathbb{E} and computes m0:=ω~(r0)m_0 := \tilde{\omega}(r_0)

V\mathbf{V} picks a random challenge r0Er_0 \in \mathbb{E}. Crucially, the soundness error of the sumcheck protocol is inversely proportional to the size of the field from which challenges rir_i are drawn (due to the Schwartz-Zippel lemma), T23a. That’s why in practice for challenge values, we use an extension field E:=Fpk\mathbb{E} := \mathbb{F}_{p^k} where kk is the extension degree.

For example, say we opt for M31 as a base field Fp\mathbb{F}_p whose order is Fp=p=2311|\mathbb{F}_p| = p=2^{31}-1 elements. Then, the probability of soundness error in sumcheck for an \ell-variate summand polynomial is 2311\frac{\ell}{2^{31}-1}, which is too large, certainly for the non-interactive setting. Instead, we choose challenges from QM31—the extension field of M31 with k=4k=4. Now, for any reasonable \ell, the soundness error can be considered negligible.

This soundness characteristic is a notable drawback of sumcheck-based systems, as it requires the GKR prover to work over extension fields after the second round, thus making field work the main contributor to the proving overhead. In Part 2, I’ll cover the recent work from Bagad et al describing an algorithm that reduces the number of extension field operations by multiple orders of magnitude.

Also note that V\mathbf{V} cannot yet trust ω~\tilde{\omega} correctly to encode the outputs. The remainder of the protocol is devoted to confirming that the output claim is consistent with the rest of the circuit and its inputs.

3. P\mathbf{P} and V\mathbf{V} apply sumcheck on the relation between W0W_0 and W1W_1

For the first layer, P\mathbf{P} and V\mathbf{V} run a sumcheck on Equation (1)(1) with i=0i=0, between W0W_0 and W1W_1, with xx fixed to r0r_0.

y,z{0,1}1addr0(y,z)(W~1(y)+W~1(z))+mulr0(y,z)(W~1(y)W~1(z))=?m0\sum_{y, z \in\{0,1\}^{\ell_1}} \operatorname{add}_{r_0}(y, z) \cdot\left(\widetilde{W}_1(y)+\widetilde{W}_1(z)\right)+\operatorname{mul}_{r_0}(y, z) \cdot\left(\widetilde{W}_1(y) \cdot \widetilde{W}_1(z)\right) \stackrel{?}{=} m_0

In the first round of sumcheck, P\mathbf{P} uses r0r_0 to fix the variable xx. In the remaining two rounds, V\mathbf{V} picks b,cEb, c \in \mathbb{E} randomly to fix the variables yy and zz, respectively. At the end of the sumcheck, from V\mathbf{V}‘s point of view, the relation on the sum (Equation 11) is reduced to a simple check that the summand fr0(b,c)f_{r_0}(b, c) evaluates to m0m_0.

To compute fr0(b,c)f_{r_0}(b, c), V\mathbf{V} must compute addr0(b,c)\operatorname{add}_{r_0}(b, c) and mulr0(b,c)\operatorname{mul}_{r_0}(b, c) locally. Remember that these depend only on the circuit’s wiring pattern, not on the values. Since fr0f_{r_0} is recursive, V\mathbf{V} also asks P\mathbf{P} for the W~1(b)\widetilde{W}_1(b) and W~1(c)\widetilde{W}_1(c) values and computes fr0(u,v)f_{r_0}(u, v) to complete the sumcheck protocol.

In this way, P\mathbf{P} and V\mathbf{V} reduce a claim about the output to two claims about values in layer 1. While P\mathbf{P} and V\mathbf{V} could recursively invoke two sumcheck protocols on W~1(b)\widetilde{W}_1(b) and W~1(c)\widetilde{W}_1(c) for the layers above, the number of claims and sumcheck protocols would grow exponentially in dd. XZZ19

4. P\mathbf{P} and V\mathbf{V} reduce two claims Wi+1(b)W_{i+1}(b) and Wi+1(c)W_{i+1}(c) to a single random evaluation/combination mim_i

To avoid an exponential blow-up in complexity, P\mathbf{P} and V\mathbf{V} apply a “rand-eval” reduction subroutine.

Here I will describe a method from CFS17 based on random linear combination (RLC), as it is more commonly found in modern implementations. Later, for completeness, I will also include the method based on line restriction, as described in the original paper GKR08.

Given two claims W~1(b)\widetilde{W}_1(b) and W~1(c)\widetilde{W}_1(c), V\mathbf{V} picks random weights αi,βiE\alpha_i, \beta_i \in \mathbb{E} and computes the RLC as

mi=αiW~1(b)+βiW~1(c)m_i = \alpha_i \widetilde{W}_1(b)+\beta_i \widetilde{W}_1(c)

In the next step, V\mathbf{V} would use mim_i as the claim for the ii-th layer sumcheck.

5. P\mathbf{P} and V\mathbf{V} apply sumcheck on the reduced relation mim_i, alternating steps 4-5 for d1d-1 more times

For the layers i=1,,d1i=1, \ldots, d-1, P\mathbf{P} and V\mathbf{V} would execute the sumcheck protocol on Equation (2)(2) instead of Equation (1)(1)

αiW~i(b)+βiW~i(c)=y,z{0,1}n((αiaddi(b,y,z)+βiaddi(c,y,z))(W~i+1(y)+W~i+1(z))+(αimuli(b,y,z)+βimuli(c,y,z))(W~i+1(y)W~i+1(z)))(2)\begin{align} &\alpha_i \widetilde{W}_i(b)+\beta_i \widetilde{W}_i(c) = \\ &\sum_{y, z \in\{0,1\}^n}\binom{(\alpha_i\cdot\operatorname{add}_i(b, y, z)+\beta_i\cdot\operatorname{add}_i(c, y, z)) \cdot(\widetilde{W}_{i+1}(y)+\widetilde{W}_{i+1}(z))}{+(\alpha_i\cdot\operatorname{mul}_i(b, y, z) +\beta_i\cdot\operatorname{mul}_i(c, y, z)) \cdot(\widetilde{W}_{i+1}(y) \cdot \widetilde{W}_{i+1}(z))} \end{align}\tag{2}

At the end of each layer’s sumcheck protocol, V\mathbf{V} still receives two claims about W~i+1\widetilde{W}_{i+1}, computes their random linear combination, and proceeds to the next layer above recursively until the input layer. XZZ19

6. V\mathbf{V} checks that W~d\widetilde{W}_d is consistent with the inputs vector x\vec{x}

At the input layer dd, V\mathbf{V} receives two claims W~d(bi=d)\widetilde{W}_d(b_{i=d}) and W~d(ci=d)\widetilde{W}_d(c_{i=d}) from P\mathbf{P}. Recall that W~d\widetilde{W}_d is claimed to be a multilinear extension of the input vector x\vec{x}.

If V\mathbf{V} knows all the inputs in clear, they can compute W~d\widetilde{W}_d and evaluate it at bi=db_{i=d} and ci=dc_{i=d} themselves.

Alternatively, if V\mathbf{V} doesn’t know all inputs and is instead given an input commitment [wd][w_d] for succinctness or zero-knowledge reasons, then V\mathbf{V} queries the oracle for evaluations of W~d\widetilde{W}_d at bi=db_{i=d} and ci=dc_{i=d}.

Ultimately, V\mathbf{V} outputs accept\mathbf{accept} if the evaluated values are the same as the two claims; otherwise, they output reject\mathbf{reject}. XZZ19

Analysis

Let C:FE\mathcal{C}: \mathbb{F} \rightarrow \mathbb{E} be a layered circuit of depth dd with n:=xn := |\vec{x}| input variables, having SiS_i gates in layer ii. Naturally, C:=i=0dSi|\mathcal{C}| := \sum^d_{i=0} S_i is the total number of gates in the circuit, i.e., the circuit size.

  • Rounds: O(dlogC)O(d\cdot \log |\mathcal{C}|).
  • P\mathbf{P} time: O(ClogC)O(|\mathcal{C}|\cdot\log |\mathcal{C}|)
  • V\mathbf{V} work: O(n+dlogC+t+S0)O(n+dlogC)O(n+d\cdot\log |\mathcal{C}|+t+S_0) \approx O(n+d\cdot\log |\mathcal{C}|) for 1 output.
  • Communication: O(S0+dlogC)O(S_0+d\cdot\log |\mathcal{C}|) field elements.
  • Soundness error: O(dlogCE)O(\frac{d\cdot\log |\mathcal{C}|}{|\mathbb{E}|})

The GKR protocol consists of one execution of the sumcheck protocol per layer. Therefore, the total communication cost (proof size) is O(dlogC)O(d \log |\mathcal{C}|) field elements. The accumulated soundness error is O(dlogCE)O\left(\frac{d \cdot \log |\mathcal{C}|}{|\mathbb{E}|}\right) due to the Schwartz-Zippel lemma. T23a

The prover must first evaluate the circuit, which takes time C|\mathcal{C}|. It must also compute addi\operatorname{add}_i and muli\operatorname{mul}_i at each layer, which, if done trivially, induces logarithmic overhead logC\log |\mathcal{C}|. The resulting time for the prover is O(ClogC)O(|\mathcal{C}| \cdot \log |\mathcal{C}|). XZZ19

The verifier’s work is O(n+dlogC+t+S0)O(n + d \log |\mathcal{C}| + t + S_0), where S0S_0 is the number of outputs of the circuit, tt denotes the optimal time to evaluate all addi\operatorname{add}_i and muli\operatorname{mul}_i, and the nn term is due to the time needed to evaluate W~d\widetilde{W}_d. XZZ19

Part 2 will cover modifications and techniques that result in O(C)O(|\mathcal{C}|) prover.

Zero Knowledge

To make the sumcheck protocol zero-knowledge, the polynomial in the sumcheck protocol is masked by a random polynomial.

To prove the sumcheck for the summand polynomial ff from Equation (1)(1) and a claim WW in zero-knowledge, P\mathbf{P} generates a random polynomial γ\gamma with the same variables and individual degrees as ff, commits to γ\gamma, and sends V\mathbf{V} a claim Γ=x1,,x{0,1}γ(x1,,x)\Gamma = \sum_{x_1, \ldots, x_{\ell} \in \{0,1\}^{\ell}} \gamma(x_1, \ldots, x_{\ell}) V\mathbf{V} picks a random number ρ\rho, and together with P\mathbf{P} executes the sumcheck protocol on

W+ρΓ=x1,,xn{0,1}f(x1,,xn)+ργ(x1,,xn)W+\rho\cdot\Gamma=\sum_{x_1, \ldots, x_{n} \in\{0,1\}^{\ell}} f\left(x_1, \ldots, x_{n}\right)+\rho\cdot\gamma\left(x_1, \ldots, x_{n}\right)

In the last round of this sumcheck, P\mathbf{P} opens the commitment to γ\gamma at γ(r1,,r)\gamma(r_1, \ldots, r_{\ell}), and the verifier computes f(r1,,r)f(r_1, \ldots, r_{\ell}) by subtracting ργ(r1,,r)\rho \cdot \gamma(r_1, \ldots, r_{\ell}) from the last message, and compares it with P\mathbf{P}‘s original claim.

Chiesa et al. showed that as long as the commitment and opening of γ\gamma are zero-knowledge, the protocol is zero-knowledge.

Original method to reduce two claims to one

Let W~\widetilde{W} be a multilinear polynomial over F\mathbb{F} with logn\log n variables. The following is the description of a simple one-round subroutine from GKR08 with communication cost O(logn)O(\log n) that reduces the evaluation of W~(b)\widetilde{W}(b) and W~(c)\widetilde{W}(c) to the evaluation of W~(r)\widetilde{W}(r) for a single point rEr \in \mathbb{E}.

P\mathbf{P} interpolates a unique line \ell passing through bb and cc, such that (0)=b\ell(0)=b and (1)=c\ell(1)=c. The line can be formally defined as (t)=b+t(cb)\ell(t) = b + t \cdot (c - b) using the point-slope form. The points bb and cc are tuples with vv elements for a vv-variate polynomial W~\widetilde{W}.

By substituting bb and cbc - b into (t)\ell(t), we get the tuple (t)=(0(t),,v(t))\ell(t) = (\ell_0(t), \cdots, \ell_{v}(t)) defined by vv linear polynomials over tt. P\mathbf{P} sends a univariate polynomial qq of degree at most ki+1k_{i+1} that is claimed to be W~i+1\widetilde{W}_{i+1} \circ \ell, the restriction of W~i+1\widetilde{W}_{i+1} to the unique line \ell.

q(t):=(W~)(t)=W~(0(t),,logn(t)))q(t) := (\widetilde{W} \circ \ell)(t)=\widetilde{W}(\ell_0(t),\cdots,\ell_{\log n}(t)))

V\mathbf{V} interprets q(0)q(0) and q(1)q(1) as P\mathbf{P}‘s claims for the values of W~(b)\widetilde{W}(b) and W~(c)\widetilde{W}(c). V\mathbf{V} also picks a random point rFr^* \in \mathbb{F}, sets r=(r)r=\ell(r^*), and interprets q(r)q(r^*) as P\mathbf{P}‘s claim for the value of W~(r)\widetilde{W}(r). See the picture and an example from T23a (Section 4.5.2).

References

  • [Tha13] Justin Thaler (2013). “Time-Optimal Interactive Proofs for Circuit Evaluation”
  • [T23a] Justin Thaler (2023). “Proofs, Arguments, and Zero-Knowledge”. See also lecture notes.
  • [L23] Jieyi Long (2023). “Efficient Arguments and Proofs for Batch Arithmetic Circuit Satisfiability”
  • [XZZ+19]: Tiancheng Xie et al. (2019). “Libra: Succinct Zero-Knowledge Proofs with Optimal Prover Computation”
  • [G24] Ariel Gabizon (2024). zkWarsaw talk “The GKR method”.

Footnotes

  1. statistically sound, as in likelihood of error is small, based on statistical measures according to field size and other factors. Also note that Circle STARKs and Binius must use extension fields in certain parts of computation, e.g. M31^4 for Circle-FFT.