A Short Overview of Communication Complexity for Algorithms Designers (Reading Group)

Christian Engels April 01, 2020 [TCS, Communication Complexity]

Communication Complexity

Communication Complexity for Algorithm Designers

Communication Complexity Basics (Based on Chapter 4)

Alice and Bob, have $x\in {0,1}^a$ and $y\in {0,1}^b$ both as private information.
Both want to communicate to figure out a function $f:{0,1}^{a+b}\rightarrow {0,1}$.

One Way Communication Protocol

Alice computes $A(x)$ and sends it to Bob for a function $A:{0,1}^a\rightarrow {0,1}^m$.
Bob then decides by computing $B(y,A(x))$.
Alice and Bob do not have a computation bound.
The one way complexity of a function is the minimum over all protocols, worst case number of bits used by any one-way protocol that decides $f$ (remember, $A$, $B$ can be randomized).
I.e., $\min_{ P\text{ protocol}} \max_{x\in {0,1}^a, y\in {0,1}^b} \lvert A(x)\rvert$ for all correct protocols $A(x)$. Similar definition for two-way communication.

Disjointness Problem

Let the Universe be $[n]$ and Alice, Bob having vectors $x,y$ being characteristic vectors of sets of size $k$.
I.e., $[1,0,0,1]= {x_1,x_4}$ vs $[1,1,1,0] = { x_1, x_2, x_3}$.
Accept if $S_x\cap S_y=\emptyset$.

Deterministic One-way Communication Complexity for Disjointness

Every deterministic one-way protocol for disjointness has $n$ has worst-case communication complexity.
Proof:
- Consider Alice only sending $n-1$ bits.
- Now let us look at the set $S={ y\mid y=A(x) \text{ for any } x}$
- The size of this is obviously $2^{n-1}$.
- By pigeonhole principle, there are two distinct inputs where Alice sends the same message.
- As $x_1,x_2$ differ in at least one bit, Bob can have a set that is compatible with $x_1$ but not $x_2$ violating the correctness of the protocol.

Probabilistic One-way Communication Complexity for Disjointness

Public coins are both visible by Alice and Bob at the same time and don't contribute to the communication.
Private coins where Alice and Bob have a private pool.
Public coins are strictly more powerful.
Public coin protocols are equivalent to distributions over deterministic protocols.
Error constants and one vs two-sided errors follow with similar definitions as from normal complexity theory.
Every randomized protocol that decides disjointness with probability $2/3$ correctly uses $\Omega(n)$ communication.
Proof:
- [Yao Lemma]: Let $D$ be a distribution over the space of inputs $(x,y)$. Suppose that every deterministic one-way protocol cost at least $k$ with $\Pr[P\text{ is wrong on } (x,y)]\leq \varepsilon$. Then every public coin randomized protocol with error $\varepsilon$ has communication cost at least $k$.
- Proving the lemma:
  - Let $R$ be a randomized protocol that uses less than $k$ communication cost.
  - Then $R$ is a distribution of deterministic protocols $P_1,\dots, P_s$, each with communication cost less than $k$ and some error probability.
  - Assume every $P_i$ has error larger than $\varepsilon$, then no matter the distribution, $R$ would have error probability higher than $\varepsilon$.
  - Hence there exists an $i$ such that $P_i$ has error probability less than $\varepsilon$ and it uses less than $k$ communication.
  - This is a contradiction to the assumption.

Normal Communication

Matrix view of input/output behavior $M(f)$.
- Let us look in the following at Disjointness with the rows and columns being $\emptyset, { x_1}, {x_2}, {x_1, x_2}$ for $X$ and $Y$ respectively.
- $\begin{pmatrix} 1& 1& 1& 1\ 1& 0& 1& 0\ 1& 1& 0& 0\ 1& 0& 0& 0&\end{pmatrix}$
Rectangles, i.e., take $X,Y$, chose subsets $A\subseteq X$ of rows and $B\subseteq Y$ of columns such that all $(x,y)\in A\times B$ have the same value (monochromatic).
These Rectangles do not need to be continuous!
Theorem: Let $f$ be a function such that every partition into monochromatic rectangles requires at least $t$ rectangles. Then the deterministic communication complexity is at least $\log t$.
- Proof: A deterministic protocol of complexity $\log t$ can only have $t$ many different transcripts.
- It has to partition the sets into rectangles, as otherwise there would be a non-monochromatic rectangle where the protocol would make a mistake.
Corollary: Every covering of $M(f)$ by monochromatic rectangles requires at least $t$ rectangles, then the deterministic communication complexity is at least $\log t$.
Note, $\log r$ where $r$ is the rank is a lower bound for the communication complexity but unknown if it is an upper bound for the complexity.
Examples:
- Equality is the identity matrix.
- $\begin{pmatrix} 1& 0& 0& 0&\ 0& 1& 0& 0\ 0& 0& 1& 0\ 0& 0& 0& 1\end{pmatrix}$.
Disjointness has $\log \binom n k = k\log n/k$ as there are $\binom n k$ sets for a universe of size $n$ and subsets of size $k$.

Randomized Disjointness

Theorem: There exists a distribution such that Disjointness has communication complexity at least $\Omega(k)$$.
What is the distribution?
- Take uniform distribution, the chance that $f(x,y)=1$ is $(3/4)^n$, hence, a algorithm that outputs the constant 1 has high success probability. Hence, accept and reject cases need constant probabilities.
- The inputs need to be distributed to not have significantly less than $\log n$ information content per input as otherwise the input can be send.
Distribution:
- With probability $3/4$: $(x,y)$ is chosen uniformly at random subject to: $x,y$ have exactly $n/4$ ones and there is no index such that $x_i=y_i=1$.
- With probability $1/4$: $(x,y)$ is chosen uniformly at random subject to: $x,y$ have exactly $n/4$ ones and there is an index such that $x_i=y_i=1$.
Almost monochromatic 1 rectangles $R$ with respect to distribution $D$:
- $\Pr[(x,y)\in R\text{ and } f(x,y)=0] \leq 8\varepsilon \Pr[(x,y)\in R\text{ and } f(x,y)=1]$
Show that an almost chromatic rectangle contains at most $2^{-c}$ mass of the distribution where $c$ is as large as possible.
Proof not given.
There is also a protocol that achieves this:
- $Z_1,\dots$ uniform random subsets.
- Send: smallest index such that $X\subseteq Z_i$ for both.
- Discard elements that are not in the $Z_i$ you got.
- Repeat until empty set.
- Cut off communication at some probability related point.

Communication Complexity Datastructures (Chapter 6)

The main point of this section will be to show Datastructure Lower Bounds for $\varepsilon$-Gap Hamming in the Cell Probe Model.
Nearest Neighbor:
- Given points $S={x_1,\dots,x_n}$ in the hamming cube $H^d={0,1}^d$ ($d$ roughly $\Omega(\sqrt{n})$).
- Build a structure $D$ such that: Given a point $x\in H^d$ find the closest point in $S$ to $x$ using $D$.
- This problem can be transferred to other metric spaces.
- Approximation problem where the distance between the real point and the point returned is $l(q,p)\leq (1+\varepsilon)l(q,p)$.

$\varepsilon$-Gap Hamming

Specialization of Nearest Neighbor.
$\varepsilon$-Gap Hamming: Decide if the hamming distance $l(x,y)$ is at most $L$ or at least $(1+\varepsilon)L$ for some input $L$.
A communication protocol for this problem:
- Sample a random string $r_1,\dots,r_s$, $r_i\in { 0,1}^d$ where $r_{i,j}=1$ with probability $\frac{1}{2L}$.
- Alice sends $h=< x,r_1>\mod 2, \dots, <x,r_s>\mod 2$.
- Bob compares $h=<y,r_1>\mod 2,\dots, <y,r_s>\mod 2$ and accepts if it differs only in a small number of coordinates.
- Intuitively this is close to testing equality, except with a bias for the testing string to be zero, i.e., ignore certain errors.
  - If $x,y$ differ in only a single bit, the probability for uniformly $r$ would be $1/2$ to reject.
  - With every choice being $\frac{1}{2L}$, we are much more likely to produce zeroes where and hence, not recognize this.
- The protocol has two-sided error (because of the "at most $L$").
Proof:
- See the random process in a different light.
- Select relevant coordinate with probability $1/L$ and for relevant coordinates choose uniformly at random between $0$ and $1$.
- If $l(x,y)=\Delta$, then:
- $\Pr_j[\langle r_j,x\rangle \mod 2 \not\equiv \langle r_j,y\rangle\mod 2] = 1/2 \cdot \left(1- \left(1-\frac{1}{L}\right)^\Delta\right)$.
- i.e., at least one of the $\Delta$ coordinates is chosen (it is not true that all are not chosen) and then $r_i$ is chosen such that it recognizes the difference with probability $1/2$.
- Now what is the difference if $\Delta \geq (1+\varepsilon)L$?
- $\Pr_j [\langle r_j,x\rangle \mod 2 \not\equiv \langle r_j,y \rangle \mod 2] = 1/2 \left( 1-\frac{1}{L}\right)^L\left(1- \left(1-\frac{1}{L}\right)^{\varepsilon L}\right)$ by plugging $\Delta=(1+\varepsilon)L$.
- Now by $1-x\in [e^{-2x}, e^x]$ for $x\in [0,1]$ we can bound this by
- $\geq 1/2 \cdot e^{\frac{-2L}{L}} \cdot \left(1- e^{\frac{\varepsilon L}{L}}\right)$.
- $\geq \frac{1}{2e^2}(1-e^{-\varepsilon})$.
- This is now constant.
- Let $t$ be the probability that $\langle r_i,x\rangle \mod 2 \not\equiv \langle r_i,y\rangle \mod 2$.
- So if $l(x,y)\leq \Delta$, we expect $ts$ many random inner products to be different while if $l(x,y)\geq (1+\varepsilon)\Delta$ then at least $(t+O(\varepsilon))s$ to be different.
- Chernoff now implies the following:
- If the distance is less than $L$ between $x,y$ then the distance of the resulting vectors is at most $(t+1/2 \cdot h(\varepsilon))s$
- If the distance is greater than $(1+\varepsilon)L$ between $x,y$ then the distance of the resulting vectors is at least $(t+1/2 \cdot h(\varepsilon))s$

Lower Bounds via Asymmetric Communication Complexity

Cell Probe model:
- Computation Model, so we can prove lower bounds against this model for some problems.
- $D:{0,1}^n \times {0,1}^* \rightarrow {0,1}^{s w}$
- Store a database $D$ to answer a set of Queries $Q$ that is known up front. Store $D$ as $s$ cells of $w$ bits.
- Every query algorithm gets the content of cells he specifies.
- Answer every query in $Q$ correctly.
- Query Space (something like this): $\max_x \max_Q \log_w |D(x,Q)|$.
- Every query is if $q\in D$, hence $\log Q$ is the trivial upper bound.
Index Problem:
- Alice has an $i\in [n]$ and Bob $y\in {0,1}^n$. The question is to compute $y_i$.
- Every randomized communication protocol for index has either Alice send at least $\delta \log n$ bits or Bob send at least $n^{1-2\delta}$ bits, both in the worst case.
- Miltersen Lemma: If $M(f)$ has at least $v$ columns that have at least $u$ 1-inputs and there is a deterministic protocol that computes $f$ and Alice, Bob send at most $a,b$ bits respectively. Then $M(f)$ has a 1-rectangle $A\times B$ with $\lvert A\rvert \geq u/2^a$ and $\lvert B\rvert \geq v/2^{a+b}$.
- A datastructure with query time $t$, space $s$, word size $w$ induces a communication protocol for INDEX in which Alice sends $t\log s$ bits and Bob sends at most $tw$ bits.
  - Bob builds the datastructure, Alice queries it.
  - Alice makes $t$ queries with everything an cell in the database.
  - Bob answers the $t$ queries with the content of the cells of size $w$.
  - If Alice can afterwards decide INDEX, then the communication protocol worked.
$(k,l)$ Disjointness:
- Alice has set of size $k$ and Bob of size $l$ from a common universe and they need to decide if the sets are disjoint.
- Solving $(1/\varepsilon^2, n)$-Disjointness on a Universe of size $2n$ has either Alice send at least $\delta/\varepsilon^2 \log n$ bits or Bob send at least $n^{1-2\delta}$ bits for large enough constant $\delta$.
Finding hardness of Gap-Hamming in the Cell probe model via reduction to $(1/\varepsilon,n)$-Disjointness.
- I.e., If we solve Gap-Hamming, we solve $(1/\varepsilon,n)$-Disjointness.
- Nearest Neighbor:
  - Given points $S={x_1,\dots,x_n}$ in the hamming cube $H^d={0,1}^d$ ($d$ roughly $\Omega(\sqrt{n})$).
  - Build a structure $D$ such that given a point $x\in H^d$ find the closest point in $S$ to $x$ using $D$.
- $(1/\varepsilon,n)$-Disjointness: Are sets disjoint with size $1/\varepsilon$ and $n$.
- Solve $(1/\varepsilon,n)$-Disjointness with Gap Hamming
- Easy reduction:
  - Map to $2n$ dimensional hypercube ${0,1}^{2n}$.
  - Alice has a set of size $1/\varepsilon$ from universe of dimension $2n$.
  - Alice maps her input set $S$ to the characteristic vector in ${0,1}^{2n}$.
  - Bob maps his input set $T$ to the point set ${ e_i\mid i\in T}$ where $e_i$ is the characteristic vector of the singleton set.
  - If $S,T$ are disjoint then the query has distance $1/\varepsilon +1$ distance from every point.
  - If they are not disjoint, there exists a point that has distance at most $1/\varepsilon -1$.
  - Both are easy to see. Remember that $\lvert S\rvert = 1/\varepsilon$, i.e., Alice has a vector of hamming weight $1/\varepsilon$.
  - Having overlap, means that there exists an $e_i$ where the corresponding value is also 1. Looking at the distance, we have $\sum_{i\in S} 1\leq \lvert S\rvert -1 = 1/\varepsilon -1$.
  - Thus we reduce to Gap-Hamming with $1/\varepsilon+1$ vs $1\geq 1/\varepsilon-1$ (notice that this is the smallest increase, hence it holds for all reasonable gaps).
  - If we could decide Gap-Hamming in Cell Probe model with these approximation with $w$ word size, $s$ space and $t$ queries then we would have a $t\log s + tw$ communication protocl.
    - This is just Alice queries Bob who has the datastructure $t$ times with cell described by $\log s$.
    - Bob answers $t$ queries with the content of the cell of size $w$.
  - As $(1/\varepsilon,n)$-Disjointness communication has lower bounds of $\delta/\varepsilon \log n$ and $n^{1-2\delta}$.
  - Hence, $t\log s\geq \delta/\varepsilon \log n$ and $tw\geq n^{1-2\delta}$.
- Hardness of high dimensional Gap-Hamming is not very interesting.
- Lemma 6.6:
  - There exists a randomized function $f$ from ${0,1}^{2n}\rightarrow {0,1}^d$ such that for every set $P$ of $n$ points and query $q$ produced by the reduction above, then with probability at least $1-1/n$:
  - If the nearest neighbor distance between $q$ and $P$ is $1/\varepsilon +1$ then the nearest neighbor distance between $f(q)$ and $f(P)$ is at most $\alpha$.
  - If the nearest neighbor distance between $q$ and $P$ is at most $1/\varepsilon -1$ then the nearest neighbor distance between $f(q)$ and $f(P)$ is at most $\alpha(1+h(\varepsilon))$.
  - This map takes $d=\Theta(\varepsilon^{-2}\log n)$ and random inner products with $2n$ bit vectors.
  - In essence, this lemma allows us to reduce the dimension of the easy reduction with some probability.
- Corollary: Every lower bound for $(1/\varepsilon,n)$-Disjointness carries over to the Query-Database problem for the $(1+\varepsilon)$ approximate Nearest Neighbor problem in $d=\Omega(\varepsilon^{-2} \log n)$.
- Corollary: Every datastructure for $(1+\varepsilon)$-approximate nearest neighbor with query time $t=\theta(1)$ and word size $O(n^{1-\delta})$ uses space $s=n^{\Omega(\varepsilon^{-1})}$.
  - Plugging this into our equations gives roughly:
    - $c\log s \geq c'/\varepsilon \log n$ and $cn^{1-\delta}\geq n^{1-2\delta}$.
    - This gives us a bound of $s\geq n^{c'/\varepsilon}$, what the corollary says.
This can be refined to $s=n^{\Omega(\varepsilon^{-2})}$ with a slightly better reduction.