A Short Overview of Communication Complexity for Algorithms Designers (Reading Group)
Christian Engels April 01, 2020 [TCS, Communication Complexity]Communication Complexity
Communication Complexity for Algorithm Designers
Communication Complexity Basics (Based on Chapter 4)
- Alice and Bob, have $x\in {0,1}^a$ and $y\in {0,1}^b$ both as private information.
 - Both want to communicate to figure out a function $f:{0,1}^{a+b}\rightarrow {0,1}$.
 
One Way Communication Protocol
- Alice computes $A(x)$ and sends it to Bob for a function $A:{0,1}^a\rightarrow {0,1}^m$.
 - Bob then decides by computing $B(y,A(x))$.
 - Alice and Bob do not have a computation bound.
 - The one way complexity of a function is the minimum over all protocols, worst case number of bits used by any one-way protocol that decides $f$ (remember, $A$, $B$ can be randomized).
 - I.e., $\min_{ P\text{ protocol}} \max_{x\in {0,1}^a, y\in {0,1}^b} \lvert A(x)\rvert$ for all correct protocols $A(x)$. Similar definition for two-way communication.
 
Disjointness Problem
- Let the Universe be $[n]$ and Alice, Bob having vectors $x,y$ being characteristic vectors of sets of size $k$.
 - I.e., $[1,0,0,1]= {x_1,x_4}$ vs $[1,1,1,0] = { x_1, x_2, x_3}$.
 - Accept if $S_x\cap S_y=\emptyset$.
 
Deterministic One-way Communication Complexity for Disjointness
- Every deterministic one-way protocol for disjointness has $n$ has worst-case communication complexity.
 - Proof:
- Consider Alice only sending $n-1$ bits.
 - Now let us look at the set $S={ y\mid y=A(x) \text{ for any } x}$
 - The size of this is obviously $2^{n-1}$.
 - By pigeonhole principle, there are two distinct inputs where Alice sends the same message.
 - As $x_1,x_2$ differ in at least one bit, Bob can have a set that is compatible with $x_1$ but not $x_2$ violating the correctness of the protocol.
 
 
Probabilistic One-way Communication Complexity for Disjointness
- 
Public coins are both visible by Alice and Bob at the same time and don't contribute to the communication.
 - 
Private coins where Alice and Bob have a private pool.
 - 
Public coins are strictly more powerful.
 - 
Public coin protocols are equivalent to distributions over deterministic protocols.
 - 
Error constants and one vs two-sided errors follow with similar definitions as from normal complexity theory.
 - 
Every randomized protocol that decides disjointness with probability $2/3$ correctly uses $\Omega(n)$ communication.
 - 
Proof:
- [Yao Lemma]: Let $D$ be a distribution over the space of inputs $(x,y)$. Suppose that every deterministic one-way protocol cost at least $k$ with $\Pr[P\text{ is wrong on } (x,y)]\leq \varepsilon$. Then every public coin randomized protocol with error $\varepsilon$ has communication cost at least $k$.
 - Proving the lemma:
- Let $R$ be a randomized protocol that uses less than $k$ communication cost.
 - Then $R$ is a distribution of deterministic protocols $P_1,\dots, P_s$, each with communication cost less than $k$ and some error probability.
 - Assume every $P_i$ has error larger than $\varepsilon$, then no matter the distribution, $R$ would have error probability higher than $\varepsilon$.
 - Hence there exists an $i$ such that $P_i$ has error probability less than $\varepsilon$ and it uses less than $k$ communication.
 - This is a contradiction to the assumption.
 
 
 
Normal Communication
- Matrix view of input/output behavior $M(f)$.
- Let us look in the following at Disjointness with the rows and columns being $\emptyset, { x_1}, {x_2}, {x_1, x_2}$ for $X$ and $Y$ respectively.
 - $\begin{pmatrix} 1& 1& 1& 1\ 1& 0& 1& 0\ 1& 1& 0& 0\ 1& 0& 0& 0&\end{pmatrix}$
 
 - Rectangles, i.e., take $X,Y$, chose subsets $A\subseteq X$ of rows and $B\subseteq Y$ of columns such that all $(x,y)\in A\times B$ have the same value (monochromatic).
 - These Rectangles do not need to be continuous!
 - Theorem: Let $f$ be a function such that every partition into monochromatic rectangles requires at least $t$ rectangles. Then the deterministic communication complexity is at least $\log t$.
- Proof: A deterministic protocol of complexity $\log t$ can only have $t$ many different transcripts.
 - It has to partition the sets into rectangles, as otherwise there would be a non-monochromatic rectangle where the protocol would make a mistake.
 
 - Corollary: Every covering of $M(f)$ by monochromatic rectangles requires at least $t$ rectangles, then the deterministic communication complexity is at least $\log t$.
 - Note, $\log r$ where $r$ is the rank is a lower bound for the communication complexity but unknown if it is an upper bound for the complexity.
 - Examples:
- Equality is the identity matrix.
 - $\begin{pmatrix} 1& 0& 0& 0&\ 0& 1& 0& 0\ 0& 0& 1& 0\ 0& 0& 0& 1\end{pmatrix}$.
 
 - Disjointness has $\log \binom n k = k\log n/k$ as there are $\binom n k$ sets for a universe of size $n$ and subsets of size $k$.
 
Randomized Disjointness
- Theorem: There exists a distribution such that Disjointness has communication complexity at least $\Omega(k)$$.
 - What is the distribution?
- Take uniform distribution, the chance that $f(x,y)=1$ is $(3/4)^n$, hence, a algorithm that outputs the constant 1 has high success probability. Hence, accept and reject cases need constant probabilities.
 - The inputs need to be distributed to not have significantly less than $\log n$ information content per input as otherwise the input can be send.
 
 - Distribution:
- With probability $3/4$: $(x,y)$ is chosen uniformly at random subject to: $x,y$ have exactly $n/4$ ones and there is no index such that $x_i=y_i=1$.
 - With probability $1/4$: $(x,y)$ is chosen uniformly at random subject to: $x,y$ have exactly $n/4$ ones and there is an index such that $x_i=y_i=1$.
 
 - Almost monochromatic 1 rectangles $R$ with respect to distribution $D$:
- $\Pr[(x,y)\in R\text{ and } f(x,y)=0] \leq 8\varepsilon \Pr[(x,y)\in R\text{ and } f(x,y)=1]$
 
 - Show that an almost chromatic rectangle contains at most $2^{-c}$ mass of the distribution where $c$ is as large as possible.
 - Proof not given.
 - There is also a protocol that achieves this:
- $Z_1,\dots$ uniform random subsets.
 - Send: smallest index such that $X\subseteq Z_i$ for both.
 - Discard elements that are not in the $Z_i$ you got.
 - Repeat until empty set.
 - Cut off communication at some probability related point.
 
 
Communication Complexity Datastructures (Chapter 6)
- The main point of this section will be to show Datastructure Lower Bounds for $\varepsilon$-Gap Hamming in the Cell Probe Model.
 - Nearest Neighbor:
- Given points $S={x_1,\dots,x_n}$ in the hamming cube $H^d={0,1}^d$ ($d$ roughly $\Omega(\sqrt{n})$).
 - Build a structure $D$ such that: Given a point $x\in H^d$ find the closest point in $S$ to $x$ using $D$.
 - This problem can be transferred to other metric spaces.
 - Approximation problem where the distance between the real point and the point returned is $l(q,p)\leq (1+\varepsilon)l(q,p)$.
 
 
$\varepsilon$-Gap Hamming
- Specialization of Nearest Neighbor.
 - $\varepsilon$-Gap Hamming: Decide if the hamming distance $l(x,y)$ is at most $L$ or at least $(1+\varepsilon)L$ for some input $L$.
 - A communication protocol for this problem:
- Sample a random string $r_1,\dots,r_s$, $r_i\in { 0,1}^d$ where $r_{i,j}=1$ with probability $\frac{1}{2L}$.
 - Alice sends $h=< x,r_1>\mod 2, \dots, <x,r_s>\mod 2$.
 - Bob compares $h=<y,r_1>\mod 2,\dots, <y,r_s>\mod 2$ and accepts if it differs only in a small number of coordinates.
 - Intuitively this is close to testing equality, except with a bias for the testing string to be zero, i.e., ignore certain errors.
- If $x,y$ differ in only a single bit, the probability for uniformly $r$ would be $1/2$ to reject.
 - With every choice being $\frac{1}{2L}$, we are much more likely to produce zeroes where and hence, not recognize this.
 
 - The protocol has two-sided error (because of the "at most $L$").
 
 - Proof:
- See the random process in a different light.
 - Select relevant coordinate with probability $1/L$ and for relevant coordinates choose uniformly at random between $0$ and $1$.
 - If $l(x,y)=\Delta$, then:
 - $\Pr_j[\langle r_j,x\rangle \mod 2 \not\equiv \langle r_j,y\rangle\mod 2] = 1/2 \cdot \left(1- \left(1-\frac{1}{L}\right)^\Delta\right)$.
 - i.e., at least one of the $\Delta$ coordinates is chosen (it is not true that all are not chosen) and then $r_i$ is chosen such that it recognizes the difference with probability $1/2$.
 - Now what is the difference if $\Delta \geq (1+\varepsilon)L$?
 - $\Pr_j [\langle r_j,x\rangle \mod 2 \not\equiv \langle r_j,y \rangle \mod 2] = 1/2 \left( 1-\frac{1}{L}\right)^L\left(1- \left(1-\frac{1}{L}\right)^{\varepsilon L}\right)$ by plugging $\Delta=(1+\varepsilon)L$.
 - Now by $1-x\in [e^{-2x}, e^x]$ for $x\in [0,1]$ we can bound this by
 - $\geq 1/2 \cdot e^{\frac{-2L}{L}} \cdot \left(1- e^{\frac{\varepsilon L}{L}}\right)$.
 - $\geq \frac{1}{2e^2}(1-e^{-\varepsilon})$.
 - This is now constant.
 - Let $t$ be the probability that $\langle r_i,x\rangle \mod 2 \not\equiv \langle r_i,y\rangle \mod 2$.
 - So if $l(x,y)\leq \Delta$, we expect $ts$ many random inner products to be different while if $l(x,y)\geq (1+\varepsilon)\Delta$ then at least $(t+O(\varepsilon))s$ to be different.
 - Chernoff now implies the following:
 - If the distance is less than $L$ between $x,y$ then the distance of the resulting vectors is at most $(t+1/2 \cdot h(\varepsilon))s$
 - If the distance is greater than $(1+\varepsilon)L$ between $x,y$ then the distance of the resulting vectors is at least $(t+1/2 \cdot h(\varepsilon))s$
 
 
Lower Bounds via Asymmetric Communication Complexity
- Cell Probe model:
- Computation Model, so we can prove lower bounds against this model for some problems.
 - $D:{0,1}^n \times {0,1}^* \rightarrow {0,1}^{s w}$
 - Store a database $D$ to answer a set of Queries $Q$ that is known up front. Store $D$ as $s$ cells of $w$ bits.
 - Every query algorithm gets the content of cells he specifies.
 - Answer every query in $Q$ correctly.
 - Query Space (something like this): $\max_x \max_Q \log_w |D(x,Q)|$.
 - Every query is if $q\in D$, hence $\log Q$ is the trivial upper bound.
 
 - Index Problem:
- Alice has an $i\in [n]$ and Bob $y\in {0,1}^n$. The question is to compute $y_i$.
 - Every randomized communication protocol for index has either Alice send at least $\delta \log n$ bits or Bob send at least $n^{1-2\delta}$ bits, both in the worst case.
 - Miltersen Lemma: If $M(f)$ has at least $v$ columns that have at least $u$ 1-inputs and there is a deterministic protocol that computes $f$ and Alice, Bob send at most $a,b$ bits respectively. Then $M(f)$ has a 1-rectangle $A\times B$ with $\lvert A\rvert \geq u/2^a$ and $\lvert B\rvert \geq v/2^{a+b}$.
 - A datastructure with query time $t$, space $s$, word size $w$ induces a communication protocol for INDEX in which Alice sends $t\log s$ bits and Bob sends at most $tw$ bits.
- Bob builds the datastructure, Alice queries it.
 - Alice makes $t$ queries with everything an cell in the database.
 - Bob answers the $t$ queries with the content of the cells of size $w$.
 - If Alice can afterwards decide INDEX, then the communication protocol worked.
 
 
 - $(k,l)$ Disjointness:
- Alice has set of size $k$ and Bob of size $l$ from a common universe and they need to decide if the sets are disjoint.
 - Solving $(1/\varepsilon^2, n)$-Disjointness on a Universe of size $2n$ has either Alice send at least $\delta/\varepsilon^2 \log n$ bits or Bob send at least $n^{1-2\delta}$ bits for large enough constant $\delta$.
 
 - Finding hardness of Gap-Hamming in the Cell probe model via reduction to $(1/\varepsilon,n)$-Disjointness.
- 
I.e., If we solve Gap-Hamming, we solve $(1/\varepsilon,n)$-Disjointness.
 - 
Nearest Neighbor:
- Given points $S={x_1,\dots,x_n}$ in the hamming cube $H^d={0,1}^d$ ($d$ roughly $\Omega(\sqrt{n})$).
 - Build a structure $D$ such that given a point $x\in H^d$ find the closest point in $S$ to $x$ using $D$.
 
 - 
$(1/\varepsilon,n)$-Disjointness: Are sets disjoint with size $1/\varepsilon$ and $n$.
 - 
Solve $(1/\varepsilon,n)$-Disjointness with Gap Hamming
 - 
Easy reduction:
- Map to $2n$ dimensional hypercube ${0,1}^{2n}$.
 - Alice has a set of size $1/\varepsilon$ from universe of dimension $2n$.
 - Alice maps her input set $S$ to the characteristic vector in ${0,1}^{2n}$.
 - Bob maps his input set $T$ to the point set ${ e_i\mid i\in T}$ where $e_i$ is the characteristic vector of the singleton set.
 - If $S,T$ are disjoint then the query has distance $1/\varepsilon +1$ distance from every point.
 - If they are not disjoint, there exists a point that has distance at most $1/\varepsilon -1$.
 - Both are easy to see. Remember that $\lvert S\rvert = 1/\varepsilon$, i.e., Alice has a vector of hamming weight $1/\varepsilon$.
 - Having overlap, means that there exists an $e_i$ where the corresponding value is also 1. Looking at the distance, we have $\sum_{i\in S} 1\leq \lvert S\rvert -1 = 1/\varepsilon -1$.
 - Thus we reduce to Gap-Hamming with $1/\varepsilon+1$ vs $1\geq 1/\varepsilon-1$ (notice that this is the smallest increase, hence it holds for all reasonable gaps).
 - If we could decide Gap-Hamming in Cell Probe model with these approximation with $w$ word size, $s$ space and $t$ queries then we would have a $t\log s + tw$ communication protocl.
- This is just Alice queries Bob who has the datastructure $t$ times with cell described by $\log s$.
 - Bob answers $t$ queries with the content of the cell of size $w$.
 
 - As $(1/\varepsilon,n)$-Disjointness communication has lower bounds of $\delta/\varepsilon \log n$ and $n^{1-2\delta}$.
 - Hence, $t\log s\geq \delta/\varepsilon \log n$ and $tw\geq n^{1-2\delta}$.
 
 - 
Hardness of high dimensional Gap-Hamming is not very interesting.
 - 
Lemma 6.6:
- There exists a randomized function $f$ from ${0,1}^{2n}\rightarrow {0,1}^d$ such that for every set $P$ of $n$ points and query $q$ produced by the reduction above, then with probability at least $1-1/n$:
 - If the nearest neighbor distance between $q$ and $P$ is $1/\varepsilon +1$ then the nearest neighbor distance between $f(q)$ and $f(P)$ is at most $\alpha$.
 - If the nearest neighbor distance between $q$ and $P$ is at most $1/\varepsilon -1$ then the nearest neighbor distance between $f(q)$ and $f(P)$ is at most $\alpha(1+h(\varepsilon))$.
 - This map takes $d=\Theta(\varepsilon^{-2}\log n)$ and random inner products with $2n$ bit vectors.
 - In essence, this lemma allows us to reduce the dimension of the easy reduction with some probability.
 
 - 
Corollary: Every lower bound for $(1/\varepsilon,n)$-Disjointness carries over to the Query-Database problem for the $(1+\varepsilon)$ approximate Nearest Neighbor problem in $d=\Omega(\varepsilon^{-2} \log n)$.
 - 
Corollary: Every datastructure for $(1+\varepsilon)$-approximate nearest neighbor with query time $t=\theta(1)$ and word size $O(n^{1-\delta})$ uses space $s=n^{\Omega(\varepsilon^{-1})}$.
- Plugging this into our equations gives roughly:
- $c\log s \geq c'/\varepsilon \log n$ and $cn^{1-\delta}\geq n^{1-2\delta}$.
 - This gives us a bound of $s\geq n^{c'/\varepsilon}$, what the corollary says.
 
 
 - Plugging this into our equations gives roughly:
 
 - 
 - This can be refined to $s=n^{\Omega(\varepsilon^{-2})}$ with a slightly better reduction.