The Composition Complexity of Majority

Christian Engels May 11, 2022 [TCS, Boolean Circuit Complexity]

The Compositon complexity of majority

LRT - The Composition Complexity of Majority Talk

Result

Can majority be given as $h(g_1,\dots,g_m)$ with $g_m$ a function that queries only $k<<n$ many bits and $h$ an arbitrary function.
They prove an optimal $m\geq \Omega(n/k \log k)$ lower bound.
The upper bound is easy to see.
- Split the variables into $n/k$ disjoint set.
- Use $\log k$ of the functions per block to output the hamming weight.
- Use $h$ to compute the overall hamming weight and compute majority from this.
Also: This implies lower bounds for bounded with branching programs.
Also: Lower bounds for small depth circuits.

Proof

Overview

Key Insight:
- Suppose $x_i$ is queried by at most $q$ of the inner functions. Then $I[X_i \mid g_1(X),\dots, g_m(X)]\geq 2^{-O(q)}$.
- I.e., the information of $X_i$ under knowing $g_1,\dots,g_m$ is greater than the information from $q$ equally distributed bits.
They will prove show a lower bound for the hamming function $H:{0,1}^n \rightarrow {0,1}^{\log n}$ (Section 4),

Preliminaries

$x^{(i->b)} = (x_1,\dots,x_{i-1},b,x_{i+1},\dots,x_n)$.
$x^{\oplus i} = (x_1,\dots,x_{i-1},1\oplus x_i,x_{i+1},\dots,x_n)$.
We call a function $k$-local if there exists a set $\lvert I\rvert=k$ such that $f(x)=f(x^{\oplus j})$ for all $x$, for all $j\in [n]\setminus I$, i.e., it depends only on variables in $I$.
$k$-composition complexity of a function is is the minimum integer such that there exists functions $g_1,\dots,g_m:{0,1}^n\rightarrow {0,1}$ and $h:{0,1}^m \rightarrow D$ with the properties that $h(g_1,\dots,g_m)=f$ and every $g_j$ is $k$-local. We denote this by $C_k(f)$.
A function is self-containing if for any there is a sub function of $f_n$ on $I$ that computes $F_{\lvert I\rvert}$.
- Majority and Hamming are self-containing.
If a function is self-containing and $C_k(f_n)\leq m$ then we can write it such that each $g_j$ is $k$-local and each variables is queried at most $mk/n$ times.
- Proof:
  - Let $q=mk/2n$.
  - On average each variable is queried at most $q$ times.
  - By Markov Inequality at most half of the variables are queried more than $2q$ times.
  - The probability distribution is over choice of variables.
  - Now put the variables that are queried at most $2q$ times into a set $I$.
  - As our function is self-containing this now computes $f_{n/2}$.
  - This finishes the proof.
- We need $g_j$ to be $k$-local as otherwise the function could use more than $k$ bits.
Now this can be used to prove a bound on $C_k(f)$ in the obvious contraposition.
We denote by $H$ the entropy.
Mutal information: $I[X\mid Y] = H[X]-H[X\mid Y]$.

Proof of Key Insight Lemma

Statement:
- Let $f$ be the Hamming weight function and a variable $i$ be queried by at most $q$ inner functions. Then $H(X_i \mid g_1(X),\dots, g_m(X)) \leq 1+ \Pr[X^{\oplus 1}\not\in D] - 2^{-O(q)}$ where $D$ is the domain of the function $f$.

Proof for $q=1$

Proposition 5.1
- $H[X_i] - H[X_i \mid g_1(X),\dots,g_m(X)]=1$.
- I.e., the output of the inner functions determine $X_i$ completely.
- Proof:
  - Assume wlog that the only function querying $X_i$ is $g_1$.
  - Now we want to recover $x_i$ from $g_1,\dots,g_m$.
  - By our assumption $\lvert x\rvert=h(g_1,\dots,g_m)$.
  - So $\lvert x\rvert$ is given either by $h(0,g_2,\dots,g_m)$ or $h(1,g_2,\dots,g_m)$.
  - Now look at $x^{\oplus i}$. This does not change $g_2,\dots,g_m$ because we choose $q=1$.
  - As this changes the hamming weight of the output, it means that $g_1$ flips when $x_i$ flips.
  - Hence, as everything else is fixed (because the $g_i$ are local), this is the only bit of information.
  - Notice that we have the computation of $g_1,\dots,g_m$ given.
  - Now compare $h(g_1,\dots,g_m)$ vs the hamming weight of $x^{i->0}$ and $x^{i->1}$.
  - One of them will fit with the computed hamming weight of $h$ and, hence, be the correct one.
  - Hence, we completely recovered $X_i$.
General Case:
- Let $J_i$ be the set of inner functions that depend on variable $i$.
- Now $H[X_i \mid g_1(X),\dots,g_m(X)]\leq H[X_i \mid { g_j(X) \mid j\in \bar J_i}, \lvert X\rvert]$, i.e., the all the functions that don't depend on $i$.
  - This is ok because $g_1,\dots,g_m$ determine $\lvert X\rvert$ completely.
  - Hence, this is potentially only more information but not less we are given.
- This is now by definition $E_{Y\sim D}[-\log \Pr[X_i=Y_i\mid (\forall j\in \bar J_i, g_j(X)=g_j(Y)) \land \lvert X\rvert = \lvert Y\rvert)]]$.
- This is equal to $E_{Y\sim D}[H_2(\Pr[X_i=Y_i\mid (\forall j\in \bar J_i, g_j(X)=g_j(Y)) \land \lvert X\rvert = \lvert Y\rvert)])]$ where this is the binary entropy function, i.e., $-p\log p - (1-p)\log (1-p)$.
- Now we need to bound this by $1+\Pr[X^{\oplus i}\not \in D] -2^{-O(q)}$.
- Now we prove this inequality.
  - Fix output values by $v\in {0,1}^{\bar J_i}$ of the inner functions that do not query variable $i$.
  - Suppose these output values are achievable by some input $x\in D$.
  - Define $S_v = { x \mid \forall j\in \bar J_i, g_j(x)=v_j }$ the set of inputs that produce the values for the inner functions.
  - This $S_v$ partitions the distribution $D$.
  - We call a $v$ good if $\Pr[X^{\plus i}\not\in D \mid X\in S_v]\leq 2\Pr[X^{\oplus i}\not\in D]$.
  - By Markov, the sets $S_v$ with good $v$ account for at most probablity mass, i.e., $\sum_{v \text{is good}} \Pr[X\in S_v] \geq 1/2$.
  - Now we show that our required equation holds for every good $v$.
  - Let $W_v$ be the set of possible hamming weights with $x\in D\cap S_v$.
  - It can only depend on the remaining $q$ input values $g_j(x)$ for $j\in J_i$.
  - Now we want to have that there exists a Hamming Weight that occurs with significant probability within $S_v$ such that conditioned on this we can guess $X_i=1$ with better than random chance.
  - To prove (1): $\Pr[\lvert X\rvert =w^* \mid x\in S_v]\geq 2^{-O(q)}$
  - To Prove (2): $\Pr[X_i=1 \mid X\in S_v\cap \lvert X\rvert=w^*]\leq 1/2 - 2^{O(q)}$.
  - This implies our equation by simple calculations which we skip.
  - To find this $w^*$ we know:
    - If $x\in S_v$ then $x^{\oplus i}\in S_v$. This gives us a naive pairing of values in $S_v$.
    - Now define $p_w = \Pr[X^{\oplus i}\in D, \lvert X^{i-> 0}\rvert = w \mid X\in S_v]$.
    - This has at most $2\lvert W_v\rvert$ non-zero elements.
    - The total sum is at least $1-2^{-10q+1}$
    - The maximum is at least $2^{-q-2}$.
    - So there exists a weight $w*$ such that $p_{w^} \geq p_{w^-1} + 2^{-2q-3}$.
    - This $w^*$ turns out to satisfy the equations.
Notice that the Lemma does not automatically say anything about if this is possible or not.
- This only gives a contradiction if this lemma gives us more information via $I[X_i \mid g_1,\dots,g_m]$ then $I[g_1(X),\dots,g_m(X)]$.
- For majority, the Lemma gives $\Omega(n)$ information but $n>>m$ which is the information of the $g_i$.
- Looking at Parity, the

Observations

Notice that the lemma does not work for Parity and similar things.
- The reason can be seen in the $q=1$ case.
- Here given $g_1,\dots,g_m$ we can still get $x_i \oplus 0$ and $x_i \oplus 1$.
- In the Hamming Weight case I can check if the Hamming weight goes up or down with the bit flip.
- With Parity it will always flip.

Questions

Can we improve this to not just be on hamming weight function, i.e., functions that are a bit different then what they say?
UNCHECKED Can we improve this to other functions?
UNCHECKED Does this work for monotone symmetric functions?
- One thing I am not sure is if I can always reconstruct it.
- Can it happen that we do not get any response/difference?
- In the simple case not but the other case is more difficult.