Factoring vs Identity Testing
Christian Engels August 14, 2018layout: page title: Overview over the Equivalence between Polynomial Identity Testing and Polynomial Factorization author: Christian Engels publish: true
Sources
- Shpilka, Volkovich - On the relation between polynomial identity testing and finding variable disjoint factors
- Sudan - Lecture Notes
- Forbes, Shpilka
- Kopparty, Saraf, Shpilka - Equivalence of Polynomial Identity Testing and Deterministic Multivariate Polynomial Factoring
Basics
- Everybody knows what factoring is over univariate polynomials.
- Multivariate polynomials.
- A polynomial is irreducible if it is not constant and cannot be factored into the product of two non-constant polynomials with coefficients in F.
- Multivariate polynomials might not have an explicit algebraic expression (rather we might have as zeroes of some equation system).
- Over a so called
Unique Factorization Domain
, every polynomial can be factored and the factor is unique. - There exists randomized algorithms for factoring. (Kaltofen 89)
- Deterministic algorithms are unkown.
Shpilka, Volkovich
- Multilinear Circuits
- Theorem: Assume we have a deterministic PIT for a circuit of the form $C_1 + C_2\times C_3$ where $C_i$ are $(n,s,d)$ circuits with running time $T(s,d)$. Then we have a deterministic factoring algoirhtms that outputs the factors (in circuit form) in time polynomial in $n,d, T(s,d)$. Additionally, the size of each factor is bounded by $s$.
Preliminaries
- Polynomial sized circuits with polynomial degree on $n$ variables.
- PIT: Is such a circuit identically to the zero polynomial.
- PIT produce circuit lower bounds (boolean or algebraic) and the other way around.
- Depth of a circuit.
- If there is a deterministic algorithm that outputs "true" if a given circuit is decomposable, then there is a deterministic algorithm that decides PIT. (This is obvious, as $f+yz$ is not decomposable into factors if it is not zero for new variables $y,z$.)
- Note that it is enough to find out the index sets as we are multilinear.
- Assume the field is large enough.
- $f$ depends on $x_i$ when there exists $a,b$ such that $f(a)\neq f(a_1,\dots,a_{i-1},b,a_{i+1},\dots, a_n)$.
- $var(f) = { i \mid \text{f depends on $x_i$} }$.
- Variable Partition: Let $I$ be a partition. Then a variable partition of $f$ is such that $f=h_1(X_{I_1})\cdots h_k(X_{I_k})$
Justifying Assignments
- Justifying assignment: We say an assignment $a$ is justifying iff for all index sets $I$, $var(f\vert_{x_I \leftarrow a_I}) = var(f)\setminus I$. In essence, replacing the variables with assignments does not kill anything but the assigned variables.
- Let there be a deterministic PIT algorithms for $(n,s,d)$ circuits. Then there exists an algorithm that returns a justifying assignment in time polynomial in $n,s, T(s,d)$ where $T(s,d)$ is the running time of the PIT algorithm.
Proof
- Prerequisites:
- Let $f_{i,a} = f(a_1,\dots,a_{i-1},x_i,a_{i+1}, \dots, a_n)$.
- We define $f_{i,X}$ as above except that $a_j=x_j$ for all $j\neq i$.
- We will study $f_{i,X}(a_i) - f_{i,X}(0)$.
- We say $a$ is a witness for $x_i$ if $f_{i,X}(a) - f_{i,X}(0) \not\equiv 0$ or $i\not\in var(f)$.
- We say $a$ is a witness for $f$ if $a_i$ is a witness for $x_i$ for all $i$.
- Notice that a witness is not a justifying assignment, it just says that the line stays intact.
- Lemma 2: $a$ is a justifying assignment if $a$ is a witness and for every $i\in var(f)$, $f_{i,a}(a_i)-f_{i,a}(0) \neq 0$.
- Proof:
- Hence, $f_{i,X}(x_i)$ is of the form $x_iP(a) + Q(a)$ where $P(a), Q(a)$ are non-zero.
- Hence, it is "justifying" for all sets of size $n-1$ and hence justifying for all sets.
- Proof:
- Lemma 3: Let $W\subseteq \mathbb{F}$ be a subset of size $d+1$ then $W^n$ contains a witness for $f$.
- Proof:
- Notice that $f_{i,w}(a) - f_{i,w}(0)$ is a univariate polynomial of degree $d$.
- Hence such a set needs to exist, as a univariate with degree $d$ has at most $d$ roots.
- Continue this argument for all $i\in [n]$.
- Proof:
- Lemma 4: For all $i\neq j$, and $\alpha, \beta\in \mathbb{F}$, $(f_{i,X}(\alpha) - f_{i,X}(0))\vert_{x_j \leftarrow \beta} = f_{i,X}(\alpha)\vert_{x_j \leftarrow \beta} - f_{i,X}(0)\vert_{x_j \leftarrow \beta}$.
- Proof:
- Assume w.l.o.g. the left term has $x_j$ in it.
- Then we know that there exists $x_j$ in the final polynomial.
- Hence, replacing it earlier does not cancel it.
- Similarly in the other direction.
- Proof:
- Lemma 5: Given $f$ and a PIT for $\partial f$, we can find a witness for $f$ in time polynomially in $n,d,T(s,d)$.
- Proof:
- For every $i\in [n]$, check for every $c\in [d+1]$ if $f_{i,X}(c) - f_{i,X}(0)\not\equiv 0$.
- Lemma 3 says we will find such an assignment and with a similar argument as in the proof, we will find it in short time.
- Proof:
- Algorithm:
- Find a witness $\alpha$ for $f$ as in Lemma 5.
- Set $g_i=f_{i,X}(\alpha_i) - f_{i,X}(0)$ for all $i$.
- For $j=1,\dots, n$ do
- Find a value $c$ such that for every $k\neq j$, $g_k\not\equiv 0$ then $g_k\vert_{x_j \leftarrow c}\not\equiv 0$.
- Set $a_j \leftarrow c$.
- Set $g_k \leftarrow g_k\vert_{x_j \leftarrow c}$ for all $k\neq j$.
- Theorem 2.4: The algorithm finds a justifying assignment in the usual time bounds.
- Proof:
- There are two parts of the proof as the running time is obvious.
- First:
- Notice that $g_i$ has degree bounded by $d$ and there are at most $n-1$ many values.
- Hence at most $d(n-1)+1$ many values for $c$ have to be searched in every step.
- Second:
- If $g_i$ is non-zero in step $j$ than it is non-zero after step $j$ by construction.
- $g_i$ is initially non-zero as $\alpha$ is a witness.
- After all steps, we found an assignment such that for all $i$ it holds that $f_{i,a}(\alpha_i) - f_{i,a}(0)\neq 0$.
- Lemma 2 finishes the proof.
- First:
- There are two parts of the proof as the running time is obvious.
- Proof:
Decomposition
Algorithm (given a justifying assignment $a$):
- $\mathcal{I}=\emptyset$, $J=[n]$.
- Set $x_n=a_n$ and recursively compute the variable-partition of $C\vert_{x_n \leftarrow a_n}$ or return the singleton set ${ { 1} }$.
- Let $\mathcal{I}'$ be the resulting partition.
- For every $I\in \mathcal{I}'$ check if $C(a)\cdot C \equiv C_{x_n \leftarrow a_n} C_{x_{[n]\setminus I} \leftarrow a_{[n]\setminus I}}$.
- If yes, add $I$ to $\mathcal{I}$ and set $J\leftarrow J\setminus I$.
- Otherwise move to the next $I$.
- Add the remaining elements to $\mathcal{I}$ as a single set ($\mathcal{I}\cup { {J}}$).
-
Lemma 3.2: Let $a$ be a justifying assignment for $f$ then for all $I$ $f(a) \cdot f \equiv f_{x_I\leftarrow a_I} \cdot f_{x_{[n]\setminus I}\leftarrow a_{[n]\setminus I}}$ iff $I$ is a disjoint union of sets from the variable-partition of $f$.
- Meaning:
- Notice, that this says it could be a really bad factoring algorithm (i.e., not factoring at all).
- Notice that in the first recursion step we have $n$ as a special variable. If some variable partition $I$ is not a valid partition by this lemma, then it means it has to be in a set with $n$.
- Proof:
- $\Rightarrow$ (Assume equality holds)
- As $a$ is a justifying assignment $f\vert_{x_I\leftarrow a_I}$ and $f\vert_{x_{[n]\setminus I} \leftarrow a_{[n]\setminus I}}$ are not equivalent to the zero polynomial.
- Hence, $f(a)$ is not equal to zero (by equality).
- Hence, $f(x) = f_{x_{[n]\setminus I}\leftarrow a_{[n]\setminus I}} \cdot f\vert_{x_I \leftarrow a_I}/f(a)$.
- Then the results follows by uniqueness of decomposition.
- Notice that this just says, we found a factorization (a unique factorization but not the smallest).
- $\Leftarrow$ (decomposition to equality)
- We have the decomposition $f(x)=h(X_I)g(X_{[n]\setminus I})$.
- Now plugging in our justifying assignment results in $f\vert_{x_I\leftarrow a_I} = h(a) g(X_{[n]\setminus I})$ and $f\vert_{[n]\setminus I \leftarrow a_{[n]\setminus I}} = h(X_I) g(a)$.
- Hence, $f\vert_{x_I\leftarrow a_I} f\vert_{[n]\setminus I \leftarrow a_{[n]\setminus I}} = h(a) g(X_{[n]\setminus I}) \cdot h(X_I) g(a)$.
- And this is equal to $f(a)f(x)$. by associativity.
- Notice, that we used here again the multilinearity.
- $\Rightarrow$ (Assume equality holds)
- Meaning:
-
Lemma 3.1: The algorithm outputs a variable partition for the polynomial computed by C
- Proof:
- Proof by induction.
- The base case is clear as a univariate multilinear polynomial does not have factors.
- Let $C=h_1(X_{I_1}) \cdots h_k(X_{I_k})$. where $\mathcal{I}={ I_1,\dots, I_k}$ the correct partition.
- Assume w.l.o.g. that $n\in I_k$.
- Now $C_{x_n\leftarrow a_n} = h_1\cdots h_{k-1} h_k\vert_{x_n\leftarrow a_n}$ by justifying.
- Now we can use our induction hypothesis on $C_{x_n\leftarrow a_n}$.
- Notice that this could have factors $h_k\vert_{x_n\leftarrow a_n} = g_1\cdots g_l$
- However, we will use the induction hypothesis on $C_{x_n\leftarrow a_n}$.
- This gives us then a partition (not variable partition) $I_1,\dots, I_{k-1}, I_{k,1},\dots, I_{k,l}$.
- By Lemma 3.2 $I_1,\dots,I_{k-1}$ will be added, however $I_{k,1},\dots,I_{k,l}$ will not be added (as $\mathcal{I}$ was already the partition).
- The runtime if obvious from the algorithm.
- Proof:
-
Proof of the Theorem
- The proof is now simple.
- By Theorem 2.4 we can compute a justifying assignment (notice that he PIT also works for the derivative) and the number of variables.
- Lemma 3.1 gives us the variable partition.
- Hence, for every $I$ we see $\alpha y^m h_I = C\vert_{x_{[n]\setminus I} \leftarrow a_{[n]\setminus I}}$ for some $m$.
- A simple interpolation gives us $h_I$.
Kopparty, Saraf, Shpilka
Rough Overview of the Kaltofen Factorization Algorithm (reducing multivariate factorization to univariate factorization)
- Initialization:
- Assume $f(x,y)$.
- Assume $f$ is monic, i.e., highest degree coefficient is one.
- Assume $f(x,0)$ is square free.
- Univariate Factoring:
- Factor $f(x,0)$ as a univariate polynomial into $g(x,0)\cdot h(x,0)$.
- Various algorithms such as LLL and others.
- Lifting:
- Lift the factors by the following theorem:
- Let $R=\mathbb{F}(x)[y]$ and $I=y^{2^i}R$ and let $f=g\cdot h\mod I$ and $a,b\in R$ such that $ag+bh=1\mod I$. Then we can efficiently compute $g',h'$ such that $f=g'h' \mod I^2$ and $g=g'\mod I$, $h=h'\mod I$.
- Now we use this to compute polynomials $g_i(x,y), h_i(x,y)$ such that $f(x,y) = g_i(x,y)h_i(x,y) \mod y^{2^i}$.
- Notice that we essentially started with the factors $f(x,y)=g(x,y)h(x,y)\mod y$.
- Reducing the degree:
- We continue until we are at $y^{4d^2}$.
- Now $g_k$ divides(modulo $y^{4d^2}$) another low degree polynomial that has a common factor with $f$.
- Compute $g'$ such that $g'=g_k r \mod y^{4d^2}$ such that $\deg_x(g') < \deg_x(f)$ and $\deg_y(g')\leq \deg_y(f)$.
- Do this by solving linear equation system.
- This system is solvable as $g=g_k r_k \mod y^{4d^2}$.
- Computing a non-trivial factor:
- Compute $\gcd(f,g')$.
- One can show (with the resultant) that this has always a non-trivial gcd.
- Continue now by induction on the factors.
- Extending this to multivariate case:
- Take $f(x,y_1,\dots,y_n)$ and take it as $f(x,\alpha_1 + \beta_1 t,\dots, \alpha_n t+\beta_n)$ in $\mathbb{F}(\alpha_1,\dots,\alpha_n)[x,t]$ for some random subspace.
- Effective Hilbert Theorem: $\Pr_{\alpha,\beta}[f(x,\alpha_1t+\beta_1,\dots,\alpha_nt+\beta_n) \text{ is not irreducible}] \leq O(\frac{d^5}{\lvert S\rvert})$.
- Effective and efficient Hilber Theorem: We can even compute a polynomial $h(z_1,\dots,z_n,y_1,\dots,y_n)$ such that $h(\alpha,\beta)\neq 0$ iff $f$ is irreducible (follows form the proof of the effective hilbert theorem).
- This is were we use our blackbox/hitting set for, i.e., one of our elements in the hitting set will be enough without needing to compute $h$.
- With high probability this restriction is enough (Effective Hilbert irreducibility).
Theorem
- Polynomial sized circuits with polynomial degree on $n$ variables.
- PIT: Is such a circuit identically to the zero polynomial.
- PIT produce circuit lower bounds (boolean or algebraic) and the other way around.
- Factoring: Compute irreducible factors of $f$.
- Depth of a circuit.
- $(n,s,d)$ circuit with $n$ variables, size $s$ and degree $d$.
- Degree of multivariate polynomial is the total degree.
- Theorem: Let $\mathbb{F}$ be a finite field or the rationals $\mathbb{Q}$. Suppose polynomial identity testing for $(n,s,d)$ circuits over $\mathbb{F}$ can be solved deterministically in time $\poly(n,s,d)$. Let the factors be given by $f=\prod_{i=1}^k g_i^{p^{e_i} j_i}$. Then we can compute an arithmetic circuit for the factor $g_i^{p^{e_i}}$ in deterministic time polynomial in $n,s,d,t$. Where $t=l\cdot p$ if the field has characteristic $p$ and $t$ the maximum bit complexity of the constants used in the circuit if $\mathbb{F}$ is the rations.
Explanation of the theorem
- Bit complexity is needed.
- The details in the theorem change if we use black-box or white-box algorithms for PIT.
- Black-box can extract $g_i$ while whitebox only $g_i^{p^{e_i}}$.
- We want to replace the last.
- We know that given a circuit $C$ of size $s$ the factors can be computed by a circuit of size $\poly(s)$.
- There exists a polynomial (in the size of $g$) circuit $h$ such that if this $h(\alpha,\beta)\not\equiv 0$, then such a $g$ restricted to a 2 dimensional subspace given by $\alpha,\beta$ is irreducible.
- These two facts are enough for the black-box algorithm.
- Remember that a deterministic PIT is a polynomial size hitting set.
- Hence, a set where not all circuits of some size $s$ evalue to zero on all the elements in the hitting set.
- Hence, just doing the factoring on all the hitting set points is enough.
- This is the problem for whitebox is that we do not have such a hitting set!
- For whitebox, that we work over the field $f(x,ta_1,\dots,ta_n)\in \mathbb{F}(a_1,\dots,a_n)[x,t]$
- This field is to complicated for
The whitebox case
Overview
- Sadly the factors have small circuits proof uses hilbert's theorem.
- Idea:
- Reduce $f(x_1,\dots,x_n)$ to $f(x_1,tx_2,\dots,tx_n)$ as a polynomial in $K[x_2,\dots,x_n][x_1,t]$.
- Factor over this field and use the rest of Kaltofen's algorithm.
- Factoring over this field could still be hard but whitebox PIT will help us.
- Algorithm:
- Reduce to bivariate factoring over a large field.
- Define $f'(x,t) = f(x,ta_1,\dots,ta_n)$ and show that the factors correspond to the original polynoimals.
- Notice that in all this, we just look at our polynomial as a polynomial in $\mathbb{F}[a_1,\dots,a_n][x,t]$.
- Univariate factorization
- Find an irreducible factor for $f'(x,0)$.
- Write $f'(x,t)=g'(x,t,a_1,\dots,a_n) h'(x,t,a_1,\dots,a_n) \mod T$ and view this as an equation in $T$.
- Hensel Lifting
- Lift to $g'(x,t,a_1,\dots,a_n) h'(x,t,a_1,\dots,a_n) \mod T^{2^k}$.
- The following lemma (3.6) does not need PIT: There exists a single arithmetic circuit with inputs $a_1,\dots,a_n$ and computes the coefficients of of $g',h'$ for the monomial $x^i t^j$ for all $i,j$.
- Lift to $g'(x,t,a_1,\dots,a_n) h'(x,t,a_1,\dots,a_n) \mod T^{2^k}$.
- Solve linear equation
- Suppose $g=\sum_{i\leq d, j\leq D} c_{i,j}(a_1,\dots,a_n)x^i t^j$
- We know $\sum_{i\leq d, j\leq d} r_{i,j}x^i t^j = \left(\sum_{i\leq D, j\leq D} c_{i,j}x^i t^j\right) \left(\sum_{i\leq D, j\leq D} s_{i,j}x^i t^j\right) \mod T^{2^k}$.
- Solve this to get a non-trivial solution.
- This step needs PIT!
- If there is no solution, $f'$ is irreducible, otherwise find GCD.
- Proof: The proof works with using the Hensel Lifting THeoreym.
- Solving linear equations is possible with PIT, main idea:
- We want to find a $jxj$ minor that has full rank in the matrix (if $j=n$ then we only have the 0 vector as a solution).
- Get a $1\times 1$ minor by trying all elements in a row.
- Check all extensions of a $j\times j$ minor to a $(j+1)\times (j+1)$ minor by checking if the determinant is non-zero.
- Kramer's formula tells us the following:
- Let $M_j^{(i)}$ be $M_j$ with the $i$th column replaced by $(M_{1,j+1},\dots,M_{j,j+1})$.
- Then our solution is: $(\det(M_j^{(1)}),\dots,\det(M_j^{(j)}), -\det(M_j),0,\dots,0)$.
- Compute GCD
- To get a non-trivial factor of $f$, i.e., $h(x,y_1,\dots,y_n)$ divides $f(x,y_1,\dots,y_n)$.
- Continue the recursion by factoring $f/h$ where $h$ is the resulting factor.
- PIT is needed here to check if $f/h=f$ in a recursion step.
- Reduce to bivariate factoring over a large field.
Some more Details
- Let $h'$ be a factor of $f'$. Then there exists $h$ such that $h'(x,t,a_1,\dots,a_n)=h(x,ta_1,\dots,ta_n)$ and $h$ divides $f$.
- Proof:
- Let $h'(x,t,a_1,\dots,a_n) = \sum_{i,j} x^i t^j h'_{i,j(a_1,\dots,a_n)$.
- $h'_{i,j}$ is homogeneous by the following argument:
- Let $f'=h'\cdot g'$.
- $h'$ and $g'$ are relatively prime (squaredfreeness and monicness).
- Note that $f'(x,zt,a_1,\dots,a_n) = f'(x,t,za_1,\dots,za_n)$.
- Hence $h'(x,zt, a_1,\dots,a_n) g'(x,zt,a_1,\dots,a_n) = h'(x,t,za_1,\dots,za_n) g'(x,t,za_1,\dots,za_n)$.
- Hence by substituting $z=1$, i.e., calculate $\mod Z-1$.
- Applying Hensel Lifting, we get that $h'(x,zt,a_1,\dots,a_n) = h'(x,t,za_1,\dots,za_n)$.
- This implies $h'_{i,j}$ is a homogeneous polynoimal of degree $j$.
- Define $h(x,y_1,\dots,y_n) = h'(x,1,y_1,\dots,y_n)$.
- $h$ has now the required properties.
- Proof:
- Use normal univariate factoring
- Ignore a bunch of other stuff.
- The polynomials $g_k,h_k$ (remember that $g_k(x,t) h_k(x,t) = f'(x,t) \mod T^{2^k}$) have small circuits in the following sense: There exists a circuit $C$ of size $size(f)+poly(n,k,d)$ such that $C$ computes all the coefficients of $g_k$ and $h_k$ in of the monomials $X^jt^{j'}$.
-
- If $f'$ is reducible in $\mathbb{F}(a_1,\dots,a_n)[x,t]$ then the linear equation system from 4. has a non-trivial solution.
- Consider the polynomial $S$ (with $s_{i,j}$ coefficients) and $R$ (with $r_{i,j}$ coefficients). Then $R$ and $f'$ have non-trivial GCD in the ring $\mathbb{F}(t,a_1,\dots,a_n)[x]$.
- Let $h$ be the gcd of $f'$ and $R$
- Lemma 2.1: Homogeneous component extraction.
- Lemma 2.9: Computing gcd with PIT is easy.
- Lemma 2.3: If $f$ is monic and $g$ is monic and a factor in $\mathbb{F}(y_1,\dots,y_n)[x]$ then it is in $\mathbb{F}[x,y_1,\dots,y_n]$.
- Lemma 3.1: Let $h'(x,t,a_1,\dots,a_n)$ be a factor of $f'(x,t,a_1,\dots,a_n)$. Then there exist $h(x,y_1,\dots,y_n)$ such that $h'(x,t,a_1,\dots,a_n)= h(x,ta_1,\dots,ta_n)$.
- See $h'=\sum_{i,j} x^i t^j h'_{i,j}(a_1,\dots,a_n)$.
- $h'_{i,j}$ is a homogeneous polynomial in $a_1,\dots,a_n$ of degree $j$. This is obvious by our construction.
- Corollary 3.2: Let $h'(x,t,a_1,\dots,a_n)$ be a factor of $f'(x,t,a_1,\dots,a_n)$. Then there exists $h(x,y_1,\dots,y_n)$ such that $h'(x,t,a_1,\dots,a_n)=h(x,ta_1,\dots,ta_n)$ and $h(x,y_1,\dots,y_n)=h'(x,1,y_1,\dots,y_n)$.
- With this we can do the following after getting the gcd of $f'$.
- Compute a small circuit for $h'$ for every coefficient of $x^i$.
- We have a circuit for $f,h$ and can compute one for $f/h$. Try to factor $f/h$ and test if $f=h$ to stop, i.e., we found a trivial factor.