A Short Overview of Communication Complexity for Algorithms Designers (Reading Group)
Christian Engels April 01, 2020 [TCS, Communication Complexity]Communication Complexity
Communication Complexity for Algorithm Designers
Communication Complexity Basics (Based on Chapter 4)
- Alice and Bob, have $x\in {0,1}^a$ and $y\in {0,1}^b$ both as private information.
- Both want to communicate to figure out a function $f:{0,1}^{a+b}\rightarrow {0,1}$.
One Way Communication Protocol
- Alice computes $A(x)$ and sends it to Bob for a function $A:{0,1}^a\rightarrow {0,1}^m$.
- Bob then decides by computing $B(y,A(x))$.
- Alice and Bob do not have a computation bound.
- The one way complexity of a function is the minimum over all protocols, worst case number of bits used by any one-way protocol that decides $f$ (remember, $A$, $B$ can be randomized).
- I.e., $\min_{ P\text{ protocol}} \max_{x\in {0,1}^a, y\in {0,1}^b} \lvert A(x)\rvert$ for all correct protocols $A(x)$. Similar definition for two-way communication.
Disjointness Problem
- Let the Universe be $[n]$ and Alice, Bob having vectors $x,y$ being characteristic vectors of sets of size $k$.
- I.e., $[1,0,0,1]= {x_1,x_4}$ vs $[1,1,1,0] = { x_1, x_2, x_3}$.
- Accept if $S_x\cap S_y=\emptyset$.
Deterministic One-way Communication Complexity for Disjointness
- Every deterministic one-way protocol for disjointness has $n$ has worst-case communication complexity.
- Proof:
- Consider Alice only sending $n-1$ bits.
- Now let us look at the set $S={ y\mid y=A(x) \text{ for any } x}$
- The size of this is obviously $2^{n-1}$.
- By pigeonhole principle, there are two distinct inputs where Alice sends the same message.
- As $x_1,x_2$ differ in at least one bit, Bob can have a set that is compatible with $x_1$ but not $x_2$ violating the correctness of the protocol.
Probabilistic One-way Communication Complexity for Disjointness
-
Public coins are both visible by Alice and Bob at the same time and don't contribute to the communication.
-
Private coins where Alice and Bob have a private pool.
-
Public coins are strictly more powerful.
-
Public coin protocols are equivalent to distributions over deterministic protocols.
-
Error constants and one vs two-sided errors follow with similar definitions as from normal complexity theory.
-
Every randomized protocol that decides disjointness with probability $2/3$ correctly uses $\Omega(n)$ communication.
-
Proof:
- [Yao Lemma]: Let $D$ be a distribution over the space of inputs $(x,y)$. Suppose that every deterministic one-way protocol cost at least $k$ with $\Pr[P\text{ is wrong on } (x,y)]\leq \varepsilon$. Then every public coin randomized protocol with error $\varepsilon$ has communication cost at least $k$.
- Proving the lemma:
- Let $R$ be a randomized protocol that uses less than $k$ communication cost.
- Then $R$ is a distribution of deterministic protocols $P_1,\dots, P_s$, each with communication cost less than $k$ and some error probability.
- Assume every $P_i$ has error larger than $\varepsilon$, then no matter the distribution, $R$ would have error probability higher than $\varepsilon$.
- Hence there exists an $i$ such that $P_i$ has error probability less than $\varepsilon$ and it uses less than $k$ communication.
- This is a contradiction to the assumption.
Normal Communication
- Matrix view of input/output behavior $M(f)$.
- Let us look in the following at Disjointness with the rows and columns being $\emptyset, { x_1}, {x_2}, {x_1, x_2}$ for $X$ and $Y$ respectively.
- $\begin{pmatrix} 1& 1& 1& 1\ 1& 0& 1& 0\ 1& 1& 0& 0\ 1& 0& 0& 0&\end{pmatrix}$
- Rectangles, i.e., take $X,Y$, chose subsets $A\subseteq X$ of rows and $B\subseteq Y$ of columns such that all $(x,y)\in A\times B$ have the same value (monochromatic).
- These Rectangles do not need to be continuous!
- Theorem: Let $f$ be a function such that every partition into monochromatic rectangles requires at least $t$ rectangles. Then the deterministic communication complexity is at least $\log t$.
- Proof: A deterministic protocol of complexity $\log t$ can only have $t$ many different transcripts.
- It has to partition the sets into rectangles, as otherwise there would be a non-monochromatic rectangle where the protocol would make a mistake.
- Corollary: Every covering of $M(f)$ by monochromatic rectangles requires at least $t$ rectangles, then the deterministic communication complexity is at least $\log t$.
- Note, $\log r$ where $r$ is the rank is a lower bound for the communication complexity but unknown if it is an upper bound for the complexity.
- Examples:
- Equality is the identity matrix.
- $\begin{pmatrix} 1& 0& 0& 0&\ 0& 1& 0& 0\ 0& 0& 1& 0\ 0& 0& 0& 1\end{pmatrix}$.
- Disjointness has $\log \binom n k = k\log n/k$ as there are $\binom n k$ sets for a universe of size $n$ and subsets of size $k$.
Randomized Disjointness
- Theorem: There exists a distribution such that Disjointness has communication complexity at least $\Omega(k)$$.
- What is the distribution?
- Take uniform distribution, the chance that $f(x,y)=1$ is $(3/4)^n$, hence, a algorithm that outputs the constant 1 has high success probability. Hence, accept and reject cases need constant probabilities.
- The inputs need to be distributed to not have significantly less than $\log n$ information content per input as otherwise the input can be send.
- Distribution:
- With probability $3/4$: $(x,y)$ is chosen uniformly at random subject to: $x,y$ have exactly $n/4$ ones and there is no index such that $x_i=y_i=1$.
- With probability $1/4$: $(x,y)$ is chosen uniformly at random subject to: $x,y$ have exactly $n/4$ ones and there is an index such that $x_i=y_i=1$.
- Almost monochromatic 1 rectangles $R$ with respect to distribution $D$:
- $\Pr[(x,y)\in R\text{ and } f(x,y)=0] \leq 8\varepsilon \Pr[(x,y)\in R\text{ and } f(x,y)=1]$
- Show that an almost chromatic rectangle contains at most $2^{-c}$ mass of the distribution where $c$ is as large as possible.
- Proof not given.
- There is also a protocol that achieves this:
- $Z_1,\dots$ uniform random subsets.
- Send: smallest index such that $X\subseteq Z_i$ for both.
- Discard elements that are not in the $Z_i$ you got.
- Repeat until empty set.
- Cut off communication at some probability related point.
Communication Complexity Datastructures (Chapter 6)
- The main point of this section will be to show Datastructure Lower Bounds for $\varepsilon$-Gap Hamming in the Cell Probe Model.
- Nearest Neighbor:
- Given points $S={x_1,\dots,x_n}$ in the hamming cube $H^d={0,1}^d$ ($d$ roughly $\Omega(\sqrt{n})$).
- Build a structure $D$ such that: Given a point $x\in H^d$ find the closest point in $S$ to $x$ using $D$.
- This problem can be transferred to other metric spaces.
- Approximation problem where the distance between the real point and the point returned is $l(q,p)\leq (1+\varepsilon)l(q,p)$.
$\varepsilon$-Gap Hamming
- Specialization of Nearest Neighbor.
- $\varepsilon$-Gap Hamming: Decide if the hamming distance $l(x,y)$ is at most $L$ or at least $(1+\varepsilon)L$ for some input $L$.
- A communication protocol for this problem:
- Sample a random string $r_1,\dots,r_s$, $r_i\in { 0,1}^d$ where $r_{i,j}=1$ with probability $\frac{1}{2L}$.
- Alice sends $h=< x,r_1>\mod 2, \dots, <x,r_s>\mod 2$.
- Bob compares $h=<y,r_1>\mod 2,\dots, <y,r_s>\mod 2$ and accepts if it differs only in a small number of coordinates.
- Intuitively this is close to testing equality, except with a bias for the testing string to be zero, i.e., ignore certain errors.
- If $x,y$ differ in only a single bit, the probability for uniformly $r$ would be $1/2$ to reject.
- With every choice being $\frac{1}{2L}$, we are much more likely to produce zeroes where and hence, not recognize this.
- The protocol has two-sided error (because of the "at most $L$").
- Proof:
- See the random process in a different light.
- Select relevant coordinate with probability $1/L$ and for relevant coordinates choose uniformly at random between $0$ and $1$.
- If $l(x,y)=\Delta$, then:
- $\Pr_j[\langle r_j,x\rangle \mod 2 \not\equiv \langle r_j,y\rangle\mod 2] = 1/2 \cdot \left(1- \left(1-\frac{1}{L}\right)^\Delta\right)$.
- i.e., at least one of the $\Delta$ coordinates is chosen (it is not true that all are not chosen) and then $r_i$ is chosen such that it recognizes the difference with probability $1/2$.
- Now what is the difference if $\Delta \geq (1+\varepsilon)L$?
- $\Pr_j [\langle r_j,x\rangle \mod 2 \not\equiv \langle r_j,y \rangle \mod 2] = 1/2 \left( 1-\frac{1}{L}\right)^L\left(1- \left(1-\frac{1}{L}\right)^{\varepsilon L}\right)$ by plugging $\Delta=(1+\varepsilon)L$.
- Now by $1-x\in [e^{-2x}, e^x]$ for $x\in [0,1]$ we can bound this by
- $\geq 1/2 \cdot e^{\frac{-2L}{L}} \cdot \left(1- e^{\frac{\varepsilon L}{L}}\right)$.
- $\geq \frac{1}{2e^2}(1-e^{-\varepsilon})$.
- This is now constant.
- Let $t$ be the probability that $\langle r_i,x\rangle \mod 2 \not\equiv \langle r_i,y\rangle \mod 2$.
- So if $l(x,y)\leq \Delta$, we expect $ts$ many random inner products to be different while if $l(x,y)\geq (1+\varepsilon)\Delta$ then at least $(t+O(\varepsilon))s$ to be different.
- Chernoff now implies the following:
- If the distance is less than $L$ between $x,y$ then the distance of the resulting vectors is at most $(t+1/2 \cdot h(\varepsilon))s$
- If the distance is greater than $(1+\varepsilon)L$ between $x,y$ then the distance of the resulting vectors is at least $(t+1/2 \cdot h(\varepsilon))s$
Lower Bounds via Asymmetric Communication Complexity
- Cell Probe model:
- Computation Model, so we can prove lower bounds against this model for some problems.
- $D:{0,1}^n \times {0,1}^* \rightarrow {0,1}^{s w}$
- Store a database $D$ to answer a set of Queries $Q$ that is known up front. Store $D$ as $s$ cells of $w$ bits.
- Every query algorithm gets the content of cells he specifies.
- Answer every query in $Q$ correctly.
- Query Space (something like this): $\max_x \max_Q \log_w |D(x,Q)|$.
- Every query is if $q\in D$, hence $\log Q$ is the trivial upper bound.
- Index Problem:
- Alice has an $i\in [n]$ and Bob $y\in {0,1}^n$. The question is to compute $y_i$.
- Every randomized communication protocol for index has either Alice send at least $\delta \log n$ bits or Bob send at least $n^{1-2\delta}$ bits, both in the worst case.
- Miltersen Lemma: If $M(f)$ has at least $v$ columns that have at least $u$ 1-inputs and there is a deterministic protocol that computes $f$ and Alice, Bob send at most $a,b$ bits respectively. Then $M(f)$ has a 1-rectangle $A\times B$ with $\lvert A\rvert \geq u/2^a$ and $\lvert B\rvert \geq v/2^{a+b}$.
- A datastructure with query time $t$, space $s$, word size $w$ induces a communication protocol for INDEX in which Alice sends $t\log s$ bits and Bob sends at most $tw$ bits.
- Bob builds the datastructure, Alice queries it.
- Alice makes $t$ queries with everything an cell in the database.
- Bob answers the $t$ queries with the content of the cells of size $w$.
- If Alice can afterwards decide INDEX, then the communication protocol worked.
- $(k,l)$ Disjointness:
- Alice has set of size $k$ and Bob of size $l$ from a common universe and they need to decide if the sets are disjoint.
- Solving $(1/\varepsilon^2, n)$-Disjointness on a Universe of size $2n$ has either Alice send at least $\delta/\varepsilon^2 \log n$ bits or Bob send at least $n^{1-2\delta}$ bits for large enough constant $\delta$.
- Finding hardness of Gap-Hamming in the Cell probe model via reduction to $(1/\varepsilon,n)$-Disjointness.
-
I.e., If we solve Gap-Hamming, we solve $(1/\varepsilon,n)$-Disjointness.
-
Nearest Neighbor:
- Given points $S={x_1,\dots,x_n}$ in the hamming cube $H^d={0,1}^d$ ($d$ roughly $\Omega(\sqrt{n})$).
- Build a structure $D$ such that given a point $x\in H^d$ find the closest point in $S$ to $x$ using $D$.
-
$(1/\varepsilon,n)$-Disjointness: Are sets disjoint with size $1/\varepsilon$ and $n$.
-
Solve $(1/\varepsilon,n)$-Disjointness with Gap Hamming
-
Easy reduction:
- Map to $2n$ dimensional hypercube ${0,1}^{2n}$.
- Alice has a set of size $1/\varepsilon$ from universe of dimension $2n$.
- Alice maps her input set $S$ to the characteristic vector in ${0,1}^{2n}$.
- Bob maps his input set $T$ to the point set ${ e_i\mid i\in T}$ where $e_i$ is the characteristic vector of the singleton set.
- If $S,T$ are disjoint then the query has distance $1/\varepsilon +1$ distance from every point.
- If they are not disjoint, there exists a point that has distance at most $1/\varepsilon -1$.
- Both are easy to see. Remember that $\lvert S\rvert = 1/\varepsilon$, i.e., Alice has a vector of hamming weight $1/\varepsilon$.
- Having overlap, means that there exists an $e_i$ where the corresponding value is also 1. Looking at the distance, we have $\sum_{i\in S} 1\leq \lvert S\rvert -1 = 1/\varepsilon -1$.
- Thus we reduce to Gap-Hamming with $1/\varepsilon+1$ vs $1\geq 1/\varepsilon-1$ (notice that this is the smallest increase, hence it holds for all reasonable gaps).
- If we could decide Gap-Hamming in Cell Probe model with these approximation with $w$ word size, $s$ space and $t$ queries then we would have a $t\log s + tw$ communication protocl.
- This is just Alice queries Bob who has the datastructure $t$ times with cell described by $\log s$.
- Bob answers $t$ queries with the content of the cell of size $w$.
- As $(1/\varepsilon,n)$-Disjointness communication has lower bounds of $\delta/\varepsilon \log n$ and $n^{1-2\delta}$.
- Hence, $t\log s\geq \delta/\varepsilon \log n$ and $tw\geq n^{1-2\delta}$.
-
Hardness of high dimensional Gap-Hamming is not very interesting.
-
Lemma 6.6:
- There exists a randomized function $f$ from ${0,1}^{2n}\rightarrow {0,1}^d$ such that for every set $P$ of $n$ points and query $q$ produced by the reduction above, then with probability at least $1-1/n$:
- If the nearest neighbor distance between $q$ and $P$ is $1/\varepsilon +1$ then the nearest neighbor distance between $f(q)$ and $f(P)$ is at most $\alpha$.
- If the nearest neighbor distance between $q$ and $P$ is at most $1/\varepsilon -1$ then the nearest neighbor distance between $f(q)$ and $f(P)$ is at most $\alpha(1+h(\varepsilon))$.
- This map takes $d=\Theta(\varepsilon^{-2}\log n)$ and random inner products with $2n$ bit vectors.
- In essence, this lemma allows us to reduce the dimension of the easy reduction with some probability.
-
Corollary: Every lower bound for $(1/\varepsilon,n)$-Disjointness carries over to the Query-Database problem for the $(1+\varepsilon)$ approximate Nearest Neighbor problem in $d=\Omega(\varepsilon^{-2} \log n)$.
-
Corollary: Every datastructure for $(1+\varepsilon)$-approximate nearest neighbor with query time $t=\theta(1)$ and word size $O(n^{1-\delta})$ uses space $s=n^{\Omega(\varepsilon^{-1})}$.
- Plugging this into our equations gives roughly:
- $c\log s \geq c'/\varepsilon \log n$ and $cn^{1-\delta}\geq n^{1-2\delta}$.
- This gives us a bound of $s\geq n^{c'/\varepsilon}$, what the corollary says.
- Plugging this into our equations gives roughly:
-
- This can be refined to $s=n^{\Omega(\varepsilon^{-2})}$ with a slightly better reduction.