Thank you to arXiv for use of its open access interoperability.
Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.
It is a well-known fact that the permanent polynomial is complete for the complexity class VNP, and it is largely suspected that the determinant does not share this property, despite its similar expression.
We study the question of why the VNP-completeness proof of the permanent fails for the determinant.
We isolate three fundamental properties that are sufficient to prove a polynomial sequence is VNP-hard, of which two are shared by both the permanent and the determinant.
We proceed to show that the permanent satisfies the third property, which we refer to as the ``cost of a boolean sum," while the determinant does not, showcasing the fundamental difference between the polynomial families.
We further note that this differentiation also applies in the border complexity setting and that our results apply for counting complexity.
It is a well-known fact that the permanent polynomial is complete for the complexity class VNP, and it is largely suspected that the determinant does not share this property, despite its similar expression.
We study the question of why the VNP-completeness proof of the permanent fails for the determinant.
We isolate three fundamental properties that are sufficient to prove a polynomial sequence is VNP-hard, of which two are shared by both the permanent and the determinant.
We proceed to show that the permanent satisfies the third property, which we refer to as the ``cost of a boolean sum," while the determinant does not, showcasing the fundamental difference between the polynomial families.
We further note that this differentiation also applies in the border complexity setting and that our results apply for counting complexity.
In the last few years, we have seen a surge of empirical and theoretical works about “scaling laws”, whose goals are to characterize the performance of learning methods based on various problem parameters (e.g., number of observations and parameters, or amount of compute). From a theoretical point of view, this marks a renewed interest in...
In the last few years, we have seen a surge of empirical and theoretical works about “scaling laws”, whose goals are to characterize the performance of learning methods based on various problem parameters (e.g., number of observations and parameters, or amount of compute). From a theoretical point of view, this marks a renewed interest in asymptotic equivalents—something the machine learning community had mostly moved away from (and, let’s be honest, kind of looked down on) in favor of non-asymptotic bounds. The two types of bounds have their own pros and cons, and I value both of them, but on a personal note, after having spent close to twenty years in proving (sometimes tedious) non-asymptotic results, proving asymptotic results is particularly rewarding, especially when you experience that “aha!” moment where the empirical behavior aligns perfectly with the derived expression (see some examples below).
Scaling laws for quadratic optimization require aggregating the effects of all eigenvalues of the Hessian matrix and not considering only the worst one. This was done with Laplace’s method in an earlier post for gradient descent and partially for Nesterov acceleration. This blog post explores similar asymptotic scaling laws and will consider stochastic algorithms, but there is more to it.
I will use a classical tool from applied mathematics, electrical engineering, and computer science: the z-transform(a.k.a. generating function) of a discrete sequence. My goal in this blog post is to present the essentials of the method. For more details, see my recent preprint [1], with lots of potential follow-ups. Note that the z-transform has already been used in the context of stochastic gradient descent for stability analysis [15, 20] and for deriving convergence rates with constant momentum [22].
z-transform method and final value theorem
From Abel’s theorem to the finite value theorem. Given a sequence \((b_k)_{k \geqslant 0}\), its z-transform is the function \(B\) defined for a complex argument \(z \in \mathbb{C}\), as \(B(z) = \sum_{k=0}^\infty b_k z^k\) (note the change of convention where \(z^{-1}\) is replaced by \(z\)). The behavior of the series defined by \((b_k)_{k \geqslant 0}\) can be characterized by the behavior of \(B(z)\) for \(z\) tending to \(1\) while remaining in \([0,1)\). This is the classical Abel’s theorem: when the series defined by \((b_k)_{k \geqslant 0}\) is convergent, then $$\sum_{k=0}^{+\infty} b_k = \lim_{z \to 1^-} B(z).$$
When applied to \(b_k = a_k\, – a_{k-1}\) for \(k \geqslant 1\), with \(b_0 = a_0\), then we have \(\sum_{\ell=1}^k b_\ell = a_k\), and \(B(z) = (1\, – z) A(z)\) where \(A\) is the z-transfom of \((a_k)_{k \geqslant 0}\). We then obtain the final value theorem, which states $$ \lim_{z \to 1^-} \ ( 1 \, – z) A(z) = \lim_{k \to +\infty} a_k.$$
From limits to asymptotic equivalents. In order to obtain equivalents of the sequence \((a_k)_{k \geqslant 0}\) (and not simply its limit), we can use two additional properties:
the z-transform of \(((k+1) a_{k+1})_{k \geqslant 0}\) is \(A'(z)\), leading to $$ \lim_{z \to 1^-} \ ( 1 \, – z) A'(z)= \lim_{k \to +\infty} k a_k.$$
To treat non-integer powers, we have the following identity, valid for any real \(\nu > 0\), $$ \lim_{z \to 1^-} \ ( 1 \,- z)^{\nu} A(z) = \Gamma(\nu) \cdot \lim_{k \to +\infty} k^{1-\nu} a_k,$$ where \(\Gamma\) is the Gamma function. A classical example is \(a_k = \frac{1}{k!} \nu(\nu+1) \cdots (\nu+k-1) = \frac{ \Gamma(k+\nu)}{\Gamma(\nu)\Gamma(k+1)} \sim \frac{k^{\nu-1}}{\Gamma(\nu)}\), for which \(A(z) = (1-z)^{-\nu}\).
Combining the last two properties (with an extension to higher-order derivatives), when the two limits exist, we have: $$ \lim_{z \to 1^-} \ ( 1 \, – z)^{\nu} A^{(\mu)}(z) = \Gamma(\nu)\cdot \lim_{k \to +\infty} k^{1+\mu-\nu} a_k.$$ Therefore, the z-transform method consists in deriving asymptotics of the z-transform or its derivatives around \(z = 1^-\). Before showing why this is a simplification, some words on sufficient conditions for the equivalence to hold, through a sequence of so-called “Tauberian” theorems.
Tauberian theory. In order to show that limits and asymptotic equivalents exist, several options are available. When only considering real-valued arguments for the z-transforms, various sufficient conditions have been derived from the work of Hardy and Littlewood [2] (the book from Jacob Korevaar [3] is a great source of detailed results). In a nutshell, oscillating sequences such as \(a_k = (-1)^k\) are the ones where the limits are not equal (or may not exist), and a classical sufficient condition is that there exists a constant \(c\) such that for all \(k \geqslant 1\), \(a_k \, – a_{k-1} \leqslant c / k^{2+\mu – \nu}\) (in particular, with \(c = 0\), being decreasing is a sufficient condition). Alternative frameworks can also be considered using complex analysis, which only depends on property of the z-transform \(A(z)\) for complex \(z\) [4, 5].
Properties of the z-transform. What makes the z-transform so versatile is a suite of properties that link modifications of the sequence \((a_k)_{k \geqslant 0}\) to its z-transform \(A(z)\) (see more details in [1, 6]). The most important ones are:
Shift: If \(A(z)\) is the z-transform of the sequence \((a_k)_{k \geqslant 0}\), then \(z A(z)\) is the z-transform of the sequence \((0,a_0,a_1,\dots)\), that is, shifted by 1. This implies that sequences that satisfy a linear difference equation lead to rational z-transforms.
Multiplication by a polynomial: If \(A\) is the z-transform of \((a_k)_{k \geqslant 0}\), then \(z \mapsto z A'(z)\) is the \(z\)-transform of \((k a_k)_{k \geqslant 0}\). Thus, when we have a difference equation with rational coefficients (in \(k\)), this leads to an ordinary differential equation for the z-transform.
Classical use of the z-transform method to derive asymptotic equivalents. The z-transform method is a classical tool, with already many applications, e.g., in combinatorics, analysis of recurrence relations, queuing theory, signal processing, and control (see [6, 7]). In this blog post, I will focus on optimization of quadratic functions.
Gradient descent and spectral dimension
Like in an earlier post, we consider an idealized quadratic optimization problem in infinite dimension, which can be rewritten as the minimization of $$F(\eta) = \frac{1}{2} \langle \eta-\eta_\ast, H ( \eta-\eta_\ast)\rangle, $$ where \(\eta_\ast\) is a global optimum of \(F\) and \(H\) the positive semi-definite Hessian operator. Gradient descent leads to the recursion $$ \eta_k = \eta_{k-1} \, – \gamma F'(\eta_{k-1}) = \eta_{k-1} \, – \gamma H ( \eta_{k-1} \, – \eta_\ast), $$ where \(\gamma > 0\) is a step-size which we choose so that the operator norm of \(\gamma H\) is less than one (e.g., \(\gamma \leqslant 1/L\), where \(L\) is the largest eigenvalue of the operator \(H\)). With \(\theta_k = \eta_k \, – \eta_\ast\), this leads to the following linear iteration $$ \theta_k = (I \, – \gamma H) \theta_{k-1} = ( I \, – \gamma H)^k \delta, $$ where \(\delta = \theta_0 = \eta_0\, – \eta_\ast\).
Thus, given a spectral decomposition \(\gamma H = \sum_{i=1}^{+\infty} \lambda_i u_i \otimes u_i\) with eigenvalues \((\lambda_i)_{i \geqslant 1}\) in \([0,1]\) and eigenvectors \((u_i)_{i \geqslant 1}\), we get the following performance criterion $$ a_k = \frac{1}{2\gamma} \sum_{i=1}^{+\infty} \lambda_i (1 \, – \lambda_i)^{2k} \langle u_i,\delta \rangle^2 = \int_{0}^1 (1 \, – \lambda)^{2k} d\sigma(\lambda), $$ where $$ d\sigma(\lambda) = \frac{1}{2\gamma} \sum_{i=1}^{+\infty} \lambda_i \langle u_i,\delta \rangle^2 {\rm Dirac}(\lambda|\lambda_i) $$ is a weighted spectral measure, as defined in [8] in the gossip context (see this earlier post), with a z-transform equal to $$ A(z) = \sum_{k=0}^{+\infty} z^k \int_0^{1 } (1-\lambda)^{2k} d\sigma(\lambda) = \int_0^{1 } \frac{1}{1 \, – z (1-\lambda)^{2}} d\sigma(\lambda), $$ which is a rational function in \(z\) and \(\lambda\). It can thus be re-written as, using a partial function decomposition in the variable \(\lambda\), as $$A(z) = \frac{1}{2 \sqrt{z} } \int_0^{1 } \frac{d\sigma(\lambda) }{ \lambda + z^{-1/2} \, – 1 } – \frac{1}{2\sqrt{z}} \int_0^{1 } \frac{d\sigma(\lambda) }{ \lambda\, – z^{-1/2} – 1 }.$$ We thus get $$A(z) = \frac{1}{2 \sqrt{z} } S( z^{-1/2}-1 ) – \frac{1}{2 \sqrt{z}}S ( – z^{-1/2}-1) ,$$ where \(\displaystyle S(u) = \int_0^1 \! \frac{d\sigma(\lambda)}{\lambda + u}\) is the Stieltjes transform of the measure \(\sigma\) (see Chapter 8 of [10]), a classical tool in random matrix theory. We see above that what matters in the equivalent of \(S(u)\) around \(u=0\) (since the value at \(-2\) is bounded).
The spectral dimension defined by [8] corresponds to a measure whose behavior around zero has density \(c \lambda^{\omega \, – 1}\), or in almost equivalent terms so that \(S(u) \sim c \Gamma(\omega) \Gamma(1-\omega) u^{\omega\, -1}\) around \(u=0\), with a similar expansion for derivatives. This leads directly to a an explicit scaling law in \({1}/{k^\omega}\) which exactly recovers an earlier post (which was using Laplace method) with a different proof.
The spectral dimension depends explicitly on problem parameters:
For least-squares regression, as described in an earlier post and in many papers, this corresponds to \(\lambda_i \sim L / i^\alpha\) (“capacity” conditions) and \(\langle \delta, u_i \rangle \sim \Delta / i^{\beta/2}\) (“source” conditions), leading to \(\omega = \frac{\beta-1}{\alpha} + 1 \) assumed to be positive, and \(c = \frac{L \Delta^2 }{2 \alpha (\gamma L)^{\frac{\beta-1}{\alpha}+1}}\). The equivalent is then $$ a_ k \sim c \frac{\Gamma(\omega)}{(2k)^{\omega}},$$ with nice connections with non-asymptotic bounds for the same problem.
For gossip, as shown in [8], we get that \(\omega\) is the underlying dimension of the set on which local messages are sent.
Summary. Linear iterations lead to rational z-transforms along all eigenvectors, and then asymptotics of the Stieltjes transform (based on the spectral dimension) can be used after partial function decomposition. For gradient descent (a simple linear iteration with constant coefficients), this can be easily achieved in other ways. However, for time-dependent coefficients (Nesterov acceleration) and for stochastic iterations, it will provide a new direct way that avoids lengthy computations.
Nesterov acceleration vs. heavy-ball
A classical way to accelerate the convergence of the gradient iteration is to use an extrapolation step due to Nesterov [11]. This can be formulated with two sequences (and a first-order recursion) as: $$ \begin{array}{rcl} \eta_{k+1} & = & \zeta_{k} \, – \gamma F'(\zeta_{k}) \\ \zeta_{k+1} & = & \eta_{k+1} + ( 1 \, – \tau_{k+1}) ( \eta_{k+1} \, – \eta_k), \end{array} $$ and with a single sequence (and a second-order recursion) as $$ \eta_{k+1} = \eta_{k} + ( 1 \, – \tau_{k}) ( \eta_{k}\, – \eta_{k-1})\, – \gamma F’ \big[ \eta_{k} + ( 1 \,- \tau_{k}) ( \eta_{k} \, – \eta_{k-1})\big]. $$
In the original formulation with convex functions, \(\tau_k\) is chosen asymptotically proportional to \(2\rho/(k+1)\) with \(\rho = 3/2\). For quadratic functions, [9] argues that the choice \(\rho = 1\) is more natural as it leads to a simple rescaled reformulation, and [12, 13] consider a general \(\rho\). We now show that the z-transform allows us to analyze all integers \(\rho\), with a conjecture that is empirically valid for all \(\rho >0\).
With \(\theta_k = \eta_k\, – \eta_\ast\), we obtain the following recursion for quadratic functions $$ \theta_{k+1} = ( I- \gamma H) \bigg[ \theta_{k} + \Big( 1\, – \frac{ 2\rho }{k+2\rho-1} \Big) ( \theta_{k} \, – \theta_{k-1}) \bigg]. $$ I show in [1] how the performance measure leads to an explicit rational function in \((z,\lambda)\) of order \(2 \rho\), obtained through differentiations of the z-transform (to take into account the rational depence of \(\tau_k\)). For \(\rho=1\), we can easily get by hand a partial function decomposition, while for larger integer \(\rho\), this can be readily done by symbolic computation tools such as Mathematica. Overall, the final equivalents for \(a_k = \frac{1}{2} \langle \eta_k \, – \eta_\ast, H ( \eta_k \, – \eta_\ast) \rangle\) are essentially (up to logarithmic terms) proportional to \({1}/{k^{\min\{2\omega, \omega + \rho\}}}\). More precisely we get totally explicit equivalents:
\(\displaystyle a_k \sim c \frac{\Gamma(2 \rho)^2}{\Gamma(\rho)^2} \cdot \frac{\Gamma(\rho-\omega) \Gamma(\omega)}{\Gamma(4\rho-1-2\omega)} \frac{\Gamma(2\rho-1/2-\omega)}{\Gamma(\rho+1/2-\omega)} \frac{2^{2\rho-1}}{4^\omega } \cdot \frac{1}{k^{ 2\omega}}\) for \(\omega \in (0,\rho)\).
\(\displaystyle a_k \sim c \frac{\Gamma(2 \rho)^2}{\Gamma(\rho)^2} \cdot\frac{1}{2^{2\rho-1}} \cdot \frac{ \log k}{k^{2\rho}}\) for \(\omega = \rho\).
\(\displaystyle a_k \sim c \frac{\Gamma(2 \rho)^2}{\Gamma(\rho)^2} \cdot \frac{1}{2^{2\rho-1}} \Gamma(\omega-\rho)\cdot \frac{1}{k^{\omega+\rho}}\) for \(\omega > \rho\).
And amazingly, they do match in our experiments even beyond integer \(\rho\), as shown below.
Nesterov acceleration for \(\omega = 1\) and several \(\rho\)’s. Note the classical vanishing oscillations for small \(\rho\). Moreover, the best performance is achieved around \(\rho \approx \omega\).
What about heavy-ball? The heavy-ball method is very similar to Nesterov acceleration, with the following iteration $$ \eta_{k+1} = \eta_{k} \, – \gamma F'(\eta_k) + (1-\tau_{k+1}) ( \eta_k \,- \eta_{k-1}),$$ leading to, with the same notations, to $$\theta_{k+1} = \Big( I- \frac{k}{k+2\rho-1} \gamma H\Big) \theta_{k} + \Big( 1 \, – \frac{ 2\rho }{k+2\rho-1} \Big) ( \theta_{k} \, – \theta_{k-1}).$$ The associated z-transform is also the integral with respect to the spectral measure of a rational function, leading to an asymptotic equivalent around \(z=1^-\). Then, one can plot the actual performance and compare to the equivalent, as shown below.
Heavy-ball method with \(\rho=1\) for two values of the spectral dimension \(\omega\). Top: performance \(a_k\) vs. equivalent \(\bar{a}_k\), bottom: ratio.
What’s happening? The two do not match for \(\omega\) large enough (right plot above). It turns out that oscillations do not vanish, and the sufficient Tauberian conditions are not satisfied. What can then be shown in terms of asymptotic equivalents remains open.
We now consider minimizing the expexted risk for a linear predictor with square loss: $$ F(\eta) = \frac{1}{2}\mathbb{E} \big[ ( y \, – \langle x, \eta \rangle )^2 \big]$$ (keeping in mind that we can use a feature vector). Single-pass stochastic gradient descent (SGD) leads to the iteration $$\eta_k = \eta_{k-1}\, – \gamma ( \langle x_k,\eta_{k-1} \rangle \,- y_k) x_k, $$ with \((x_k,y_k)\) an independent sample from the distribution defining \(F\). We can write it using a minimizer \(\eta_\ast\) of \(F\) as $$\eta_k -\eta_\ast =( I \, – \gamma x_k \otimes x_k) (\eta_{k-1} \, – \eta_\ast) + \gamma \varepsilon_k x_k, $$ where \(\varepsilon_k = y_k \, – \langle x_k, \eta_\ast \rangle\) is the optimal residual. We denote \(\theta_k = \eta_k \, – \eta_\ast\). In this context, SGD is referred to as the least-mean-squares (LMS) algorithm [14, 15, 16, 17].
The simplest but incomplete analysis uses that \(\mathbb{E} [ \varepsilon_k x_k ] =0\) (which is the optimality condition characterizing \(\varepsilon_k\)), to get: $$ \mathbb{E}[\theta_k] = ( I \, – \gamma H) \mathbb{E}[\theta_{k-1}],$$ leading to the classical linear iteration of gradient descent. However, it does not lead to any result on the performance measure \(\frac{1}{2} \mathbb{E} [ \langle \theta_k, H \theta_k \rangle] = \mathbb{E} [ F(\eta_k)\, – F(\eta_\ast) ]\).
Traditionally, there are two terms in the analysis: “bias” and “variance” (see a thorough discussion in [27]). For simplicity, let’s assume that \(\varepsilon_k\) is independent of \(x_k\), with \(\mathbb{E} [ \varepsilon_k]=0\) and \(\mathbb{E} [ \varepsilon_k^2]=\sigma^2\) (not to be confused with the notation for the spectral measure defined above). We then consider the operator $$\Theta_k = (\eta_k \, – \eta_\ast)\otimes (\eta_k \,- \eta_\ast) = \theta_k \otimes \theta_k, $$ for which we have $$F(\eta_k) \,- F(\eta_\ast) = \frac{1}{2} \langle \Theta_k,H \rangle = \frac{1}{2}{\rm tr}( \Theta_k H), $$ and $$ \mathbb{E} [ \Theta_k | \Theta_{k-1}] = \mathbb{E} \big[ ( I \, – \gamma x_k \otimes x_k) \otimes ( I \, – \gamma x_k \otimes x_k) \big] \Theta_{k-1} + \gamma^2 \sigma^2 H,$$ leading to: $$ \mathbb{E} [ \Theta_k ] = \big( I\, – \gamma H \otimes I \, – \gamma I \otimes H\, + \gamma^2 \mathbb{E} [ x \otimes x \otimes x \otimes x] \big) \mathbb{E} [ \Theta_{k-1} ] \, + \gamma^2 \sigma^2 H.$$
We consider the operator (defined on self-adjoint operators) $$ T = H \otimes I \, + I \otimes H \, -\, \gamma \mathbb{E} [ x \otimes x \otimes x \otimes x],$$ so that $$\tag{1} \mathbb{E} [ \Theta_k ] = \big( I\, – \gamma T \big) \mathbb{E} [ \Theta_{k-1} ] \, + \gamma^2 \sigma^2 H, $$ leading to, after summation of the geometric series (see an earlier post): $$\mathbb{E} [ \Theta_k ] = \big( I\, – \gamma T \big)^k \Theta_{0} \, + \gamma \sigma^2 \big[ I- \big( I- \gamma T \big)^k \big] T^{-1} H .$$
A sufficient condition for convergence is that the operator \(I\, – \gamma T\) (as an operator on symmetric operators) has eigenvalues in \([-1,1]\). In [17], we define and characterize the largest step-size with few assumptions. We consider instead specific models for asymptotic computations.
Simplest model. In order to obtain simpler formulas, several models have been proposed that are compatible with the Hessian operator \(H\) having eigenvalues \((\lambda_i)_{i \geqslant 1}\) and eigenvectors \((u_i)_{i \geqslant 1}\). The simplest model [18] is \(x = r u_j\) with \(r \in \mathbb{R}\) and \(i \in \mathbb{N}^\ast\) random and independent, with \(\mathbb{E}[r^2] = \sum_{i \geqslant 1} h_i = {\rm tr}(H)\), and \(\mathbb{P}(j = i) = h_i / {\rm tr}(H)\). What makes this model special is that, like in the deterministic case, there are no interactions between eigensubspaces. While this includes one-hot encoded discrete data as recently used in [19], this remains limited compared to the model below (which includes Gaussian data and is often used in random matrix theory).
Simplified high-dimensional model. We consider $$x= \sum_{i = 1}^{+\infty} h_i^{1/2} z_i u_i,$$ where the variables \(z_i \in \mathbb{R}\), \(i \geqslant 1\), are independent (and not merely un-correlated), and have zero mean and unit variance. The only parameter that we will need to characterize the operator \(T\) is the fourth-order moment, \(\mathbb{E} [ z_i^2] = 3 + \kappa\), where \(\kappa\) is the excess kurtosis.
The Gaussian model [15] where all \(z_i\) are Gaussians corresponds to \(\kappa=0\), while the Rademacher model where \(z_i \in \{-1,1\}\) are Rademacher random variables corresponds to the smallest possible value \(\kappa=-2\). Other models could also be considered, particularly when using kernels [25].
With this simplified model, as already shown in the 1980’s in the Gaussian context [20, 15], in the basis of eigenvectors of \(H\), the operator \(T\) has a simple block-diagonal structure: the off-diagonal elements of \(\mathbb{E} \big[ \Theta_k ] \) evolve independently (and converge to zero linearly), while the diagonal elements that are needed to compute the performance measure evolve according to a linear iteration with a matrix that is the sum of a diagonal and a rank-one matrix. More precisely, assuming without loss of generality that the operator \(H\) is diagonal equal to \({\rm Diag}(h)\), Eq. \((1)\) projected on diagonal elements becomes $${\rm diag}( \mathbb{E}[ \Theta_k]) = ( I\, – \gamma V) {\rm diag}( \mathbb{E}[ \Theta_k]) + \gamma^2 \sigma^2 h,$$ with $$V = {\rm Diag}\big( 2 h \, – \gamma(\kappa+2) h \circ h\big) – \gamma h \otimes h .$$
The performance \(a_k = \mathbb{E} [ F(\eta_k) ] – F(\eta_\ast)\) is thus $$ a_k = \frac{1}{2}\langle \delta \circ \delta, ( I \, – \gamma V )^k h \rangle + \gamma \sigma^2 \langle h, \big(I\, – ( I \, – \gamma V )^k \big) V^{-1} h \rangle. $$ In order to study convergence, we could look at an eigenvalue decomposition of the matrix \(V\) using “secular” equations [21]. Instead, we will use the z-transform method, which avoids explicit computations of eigenvalues.
For simplicity, we consider here only the interpolation regime where \(\sigma^2=0\) (i.e., there is no noise on top of optimal predictions, see [1] for all details). Then, the z-transform is equal to $$A(z) = \sum_{k=0}^{+\infty} a_k z^k = \frac{1}{2}\langle \delta \circ \delta, ( (1-z) I \, + z \gamma V )^{-1} h \rangle.$$ The matrix \((1-z) I \, + z \gamma V\) that needs to be inverted has a “diagonal + rank-one” structure, and can thus be inverted using the matrix inversion lemma, leading to an explicit rational function in \(z\), that can be directly analyzed around \(z = 1\) (see [1]).
Asymptotic equivalent. Following this post and the gradient descent analysis above, we consider \(h_i = {L}/{i^\alpha}\) and \(\delta_i = {\Delta}/{i^{\beta/2}}\). If we assume \(\kappa = -2\) (Rademacher case, which leads to the simplest formulas), the largest step-size is \(\gamma = 2/ {\rm tr}(H)\) (which can be significantly smaller than \(2 / \lambda_\max(H)\) when \(\alpha\) is close to 1), and we get the asymptotic equivalent $$a_k \sim \frac{L \Delta^2}{2 \alpha} \frac{1}{1-\gamma {\rm tr}(H)/2 } \frac{\Gamma(\omega)}{(2 \gamma L k)^{\omega}},$$ with \(\omega= (\beta-1)/\alpha + 1\), and under the condition \(\omega = (0, 2- 1/\alpha)\) (see [1] for a result for all cases). Up to constants, this leads to the same equivalent as for gradient descent. See below for an illustration.
Comparison of asymptotic equivalent and performance of SGD in the interpolation regime for several replications for \(\omega=1\).
We recover the results from [22] with a simpler and more direct proof, thereby obtaining asymptotic results that complement the non-asymptotic results from [23]. This is also a subcase of results from [24] with a simpler proof. Various extensions are possible, e.g., using averaging of iterates, or computing estimates of the variance of the performance (and not only its expectation); see [1] for details.
Conclusion
In this blog post, I have shown how the z-transform can be used to derive asymptotic equivalents of sequences that naturally appear in the optimization of quadratic forms, with classical tools such as acceleration and stochastic approximation. In all cases, the z-transform method allows simple computations of asymptotic equivalents (e.g., scaling laws) through the expansion of rational functions.
Where does the “z” come from? For those questioning the naming of the z-transform, the wikipedia page has some interesting historical notes. The idea of using generating functions to study sequences dates back at least to De Moivre around 1730, while its first use in discrete engineering systems was done by Witold Hurewicz in 1947 using the letter “z” (to make it different from the letter “s” used for the Laplace transform (its continuous-time counterpart). The name z-transform was later coined in 1952 by John R. Ragazzini and Lofti Zadeh. As confirmed by a personal communication from Lofti Zadeh to Murat Arcak, It has nothing to do with the first letter of Loftis Zadeh’s name (“I used the letter z not because of my name but because in the mathematical literature on difference equations, z was the symbol that was used”). However, I didn’t find any formal confirmation that Zorro had nothing to do with it.
Acknowledgements. I thank Adrien Taylor, Ayoub Melliti, Nicolas Flammarion, Baptiste Goujaud, and Raphaël Berthier for insightful discussions related to this work.
Authors: Gal Beniamini, Yuval Dor, Alon Vinnikov, Shir Granot Peled, Or Weinstein, Or Sharir, Noam Wies, Tomer Nussbaum, Ido Ben Shaul, Tomer Zekharya, Yoav Levine, Shai Shalev-Shwartz, Amnon Shashua
Frontier AI models demonstrate formidable breadth of knowledge. But how close
are they to true human -- or superhuman -- expertise? Genuine experts can
tackle the hardest problems and push the boundaries of scientific
understanding. To illuminate the limits of frontier model capabilities, we turn
away from contrived competitive programming puzzles, and instead focus on
real-life research problems.
We construct FormulaOne, a benchmark that lies at the intersection of graph
theory, logic, and algorithms, all well within the training distribution of
frontier models. Our problems are incredibly demanding, requiring an array of
reasoning steps. The dataset has three key properties. First, it is of
commercial interest and relates to practical large-scale optimisation problems,
such as those arising in routing, scheduling, and network design. Second, it is
generated from the highly expressive framework of Monadic Second-Order (MSO)
logic on graphs, paving the way toward automatic problem generation at scale;
ideal for building RL environments. Third, many of our problems are intimately
related to the frontier of theoretical computer science, and to central
conjectures therein, such as the Strong Exponential Time Hypothesis (SETH). As
such, any significant algorithmic progress on our dataset, beyond known
results, could carry profound theoretical implications.
Remarkably, state-of-the-art models like OpenAI's o3 fail entirely on
FormulaOne, solving less than 1% of the questions, even when given 10 attempts
and explanatory fewshot examples -- highlighting how far they remain from
expert-level understanding in some domains. To support further research, we
additionally curate FormulaOne-Warmup, offering a set of simpler tasks, from
the same distribution. We release the full corpus along with a comprehensive
evaluation framework.
Frontier AI models demonstrate formidable breadth of knowledge. But how close
are they to true human -- or superhuman -- expertise? Genuine experts can
tackle the hardest problems and push the boundaries of scientific
understanding. To illuminate the limits of frontier model capabilities, we turn
away from contrived competitive programming puzzles, and instead focus on
real-life research problems.
We construct FormulaOne, a benchmark that lies at the intersection of graph
theory, logic, and algorithms, all well within the training distribution of
frontier models. Our problems are incredibly demanding, requiring an array of
reasoning steps. The dataset has three key properties. First, it is of
commercial interest and relates to practical large-scale optimisation problems,
such as those arising in routing, scheduling, and network design. Second, it is
generated from the highly expressive framework of Monadic Second-Order (MSO)
logic on graphs, paving the way toward automatic problem generation at scale;
ideal for building RL environments. Third, many of our problems are intimately
related to the frontier of theoretical computer science, and to central
conjectures therein, such as the Strong Exponential Time Hypothesis (SETH). As
such, any significant algorithmic progress on our dataset, beyond known
results, could carry profound theoretical implications.
Remarkably, state-of-the-art models like OpenAI's o3 fail entirely on
FormulaOne, solving less than 1% of the questions, even when given 10 attempts
and explanatory fewshot examples -- highlighting how far they remain from
expert-level understanding in some domains. To support further research, we
additionally curate FormulaOne-Warmup, offering a set of simpler tasks, from
the same distribution. We release the full corpus along with a comprehensive
evaluation framework.
Authors: Ali Fattahi, Ali Eshragh, Babak Aslani, Meysam Rabiee
We study the problem of clustering ranking vectors, where each vector
represents preferences as an ordered list of distinct integers. Specifically,
we focus on the k-centroids ranking vectors clustering problem (KRC), which
aims to partition a set of ranking vectors into k clusters and identify the
centroid of each cluster. Unlike classical k-means clustering (KMC), KRC
constrains both the observations and centroids to be ranking vectors. We
establish the NP-hardness of KRC and characterize its feasible set. For the
single-cluster case, we derive a closed-form analytical solution for the
optimal centroid, which can be computed in linear time. To address the
computational challenges of KRC, we develop an efficient approximation
algorithm, KRCA, which iteratively refines initial solutions from KMC, referred
to as the baseline solution. Additionally, we introduce a branch-and-bound
(BnB) algorithm for efficient cluster reconstruction within KRCA, leveraging a
decision tree framework to reduce computational time while incorporating a
controlling parameter to balance solution quality and efficiency. We establish
theoretical error bounds for KRCA and BnB. Through extensive numerical
experiments on synthetic and real-world datasets, we demonstrate that KRCA
consistently outperforms baseline solutions, delivering significant
improvements in solution quality with fast computational times. This work
highlights the practical significance of KRC for personalization and
large-scale decision making, offering methodological advancements and insights
that can be built upon in future studies.
We study the problem of clustering ranking vectors, where each vector
represents preferences as an ordered list of distinct integers. Specifically,
we focus on the k-centroids ranking vectors clustering problem (KRC), which
aims to partition a set of ranking vectors into k clusters and identify the
centroid of each cluster. Unlike classical k-means clustering (KMC), KRC
constrains both the observations and centroids to be ranking vectors. We
establish the NP-hardness of KRC and characterize its feasible set. For the
single-cluster case, we derive a closed-form analytical solution for the
optimal centroid, which can be computed in linear time. To address the
computational challenges of KRC, we develop an efficient approximation
algorithm, KRCA, which iteratively refines initial solutions from KMC, referred
to as the baseline solution. Additionally, we introduce a branch-and-bound
(BnB) algorithm for efficient cluster reconstruction within KRCA, leveraging a
decision tree framework to reduce computational time while incorporating a
controlling parameter to balance solution quality and efficiency. We establish
theoretical error bounds for KRCA and BnB. Through extensive numerical
experiments on synthetic and real-world datasets, we demonstrate that KRCA
consistently outperforms baseline solutions, delivering significant
improvements in solution quality with fast computational times. This work
highlights the practical significance of KRC for personalization and
large-scale decision making, offering methodological advancements and insights
that can be built upon in future studies.
Authors: Yuxi Liu, Konpat Preechakul, Kananart Kuwaranancharoen, Yutong Bai
While machine learning has advanced through massive parallelization, we
identify a critical blind spot: some problems are fundamentally sequential.
These "inherently serial" problems-from mathematical reasoning to physical
simulations to sequential decision-making-require dependent computational steps
that cannot be parallelized. Drawing from complexity theory, we formalize this
distinction and demonstrate that current parallel-centric architectures face
fundamental limitations on such tasks. We argue that recognizing the serial
nature of computation holds profound implications on machine learning, model
design, hardware development. As AI tackles increasingly complex reasoning,
deliberately scaling serial computation-not just parallel computation-is
essential for continued progress.
While machine learning has advanced through massive parallelization, we
identify a critical blind spot: some problems are fundamentally sequential.
These "inherently serial" problems-from mathematical reasoning to physical
simulations to sequential decision-making-require dependent computational steps
that cannot be parallelized. Drawing from complexity theory, we formalize this
distinction and demonstrate that current parallel-centric architectures face
fundamental limitations on such tasks. We argue that recognizing the serial
nature of computation holds profound implications on machine learning, model
design, hardware development. As AI tackles increasingly complex reasoning,
deliberately scaling serial computation-not just parallel computation-is
essential for continued progress.
Authors: Éric Colin de Verdière, Vincent Despré, Loïc Dubois
Tutte's celebrated barycentric embedding theorem describes a natural way to
build straight-line embeddings (crossing-free drawings) of a (3-connected)
planar graph: map the vertices of the outer face to the vertices of a convex
polygon, and ensure that each remaining vertex is in convex position, namely, a
barycenter with positive coefficients of its neighbors. Actually computing an
embedding then boils down to solving a system of linear equations. A
particularly appealing feature of this method is the flexibility given by the
choice of the barycentric weights. Generalizations of Tutte's theorem to
surfaces of nonpositive curvature are known, but due to their inherently
continuous nature, they do not lead to an algorithm.
In this paper, we propose a purely discrete analog of Tutte's theorem for
surfaces (with or without boundary) of nonpositive curvature, based on the
recently introduced notion of reducing triangulations. We prove a Tutte theorem
in this setting: every drawing homotopic to an embedding such that each vertex
is harmonious (a discrete analog of being in convex position) is a weak
embedding (arbitrarily close to an embedding). We also provide a
polynomial-time algorithm to make an input drawing harmonious without
increasing the length of any edge, in a similar way as a drawing can be put in
convex position without increasing the edge lengths.
Tutte's celebrated barycentric embedding theorem describes a natural way to
build straight-line embeddings (crossing-free drawings) of a (3-connected)
planar graph: map the vertices of the outer face to the vertices of a convex
polygon, and ensure that each remaining vertex is in convex position, namely, a
barycenter with positive coefficients of its neighbors. Actually computing an
embedding then boils down to solving a system of linear equations. A
particularly appealing feature of this method is the flexibility given by the
choice of the barycentric weights. Generalizations of Tutte's theorem to
surfaces of nonpositive curvature are known, but due to their inherently
continuous nature, they do not lead to an algorithm.
In this paper, we propose a purely discrete analog of Tutte's theorem for
surfaces (with or without boundary) of nonpositive curvature, based on the
recently introduced notion of reducing triangulations. We prove a Tutte theorem
in this setting: every drawing homotopic to an embedding such that each vertex
is harmonious (a discrete analog of being in convex position) is a weak
embedding (arbitrarily close to an embedding). We also provide a
polynomial-time algorithm to make an input drawing harmonious without
increasing the length of any edge, in a similar way as a drawing can be put in
convex position without increasing the edge lengths.
Authors: Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan
A central question in computer science and statistics is whether efficient
algorithms can achieve the information-theoretic limits of statistical
problems. Many computational-statistical tradeoffs have been shown under
average-case assumptions, but since statistical problems are average-case in
nature, it has been a challenge to base them on standard worst-case
assumptions.
In PAC learning where such tradeoffs were first studied, the question is
whether computational efficiency can come at the cost of using more samples
than information-theoretically necessary. We base such tradeoffs on
$\mathsf{NP}$-hardness and obtain:
$\circ$ Sharp computational-statistical tradeoffs assuming $\mathsf{NP}$
requires exponential time: For every polynomial $p(n)$, there is an $n$-variate
class $C$ with VC dimension $1$ such that the sample complexity of
time-efficiently learning $C$ is $\Theta(p(n))$.
$\circ$ A characterization of $\mathsf{RP}$ vs. $\mathsf{NP}$ in terms of
learning: $\mathsf{RP} = \mathsf{NP}$ iff every $\mathsf{NP}$-enumerable class
is learnable with $O(\mathrm{VCdim}(C))$ samples in polynomial time. The
forward implication has been known since (Pitt and Valiant, 1988); we prove the
reverse implication.
Notably, all our lower bounds hold against improper learners. These are the
first $\mathsf{NP}$-hardness results for improperly learning a subclass of
polynomial-size circuits, circumventing formal barriers of Applebaum, Barak,
and Xiao (2008).
A central question in computer science and statistics is whether efficient
algorithms can achieve the information-theoretic limits of statistical
problems. Many computational-statistical tradeoffs have been shown under
average-case assumptions, but since statistical problems are average-case in
nature, it has been a challenge to base them on standard worst-case
assumptions.
In PAC learning where such tradeoffs were first studied, the question is
whether computational efficiency can come at the cost of using more samples
than information-theoretically necessary. We base such tradeoffs on
$\mathsf{NP}$-hardness and obtain:
$\circ$ Sharp computational-statistical tradeoffs assuming $\mathsf{NP}$
requires exponential time: For every polynomial $p(n)$, there is an $n$-variate
class $C$ with VC dimension $1$ such that the sample complexity of
time-efficiently learning $C$ is $\Theta(p(n))$.
$\circ$ A characterization of $\mathsf{RP}$ vs. $\mathsf{NP}$ in terms of
learning: $\mathsf{RP} = \mathsf{NP}$ iff every $\mathsf{NP}$-enumerable class
is learnable with $O(\mathrm{VCdim}(C))$ samples in polynomial time. The
forward implication has been known since (Pitt and Valiant, 1988); we prove the
reverse implication.
Notably, all our lower bounds hold against improper learners. These are the
first $\mathsf{NP}$-hardness results for improperly learning a subclass of
polynomial-size circuits, circumventing formal barriers of Applebaum, Barak,
and Xiao (2008).
Authors: Alex Conway, Laxman Dhulipala, Martin Farach-Colton, Rob Johnson, Ben Landrum, Christopher Musco, Yarin Shechter, Torsten Suel, Richard Wen
Graph-based nearest neighbor search methods have seen a surge of popularity
in recent years, offering state-of-the-art performance across a wide variety of
applications. Central to these methods is the task of constructing a sparse
navigable search graph for a given dataset endowed with a distance function.
Unfortunately, doing so is computationally expensive, so heuristics are
universally used in practice.
In this work, we initiate the study of fast algorithms with provable
guarantees for search graph construction. For a dataset with $n$ data points,
the problem of constructing an optimally sparse navigable graph can be framed
as $n$ separate but highly correlated minimum set cover instances. This yields
a naive $O(n^3)$ time greedy algorithm that returns a navigable graph whose
sparsity is at most $O(\log n)$ higher than optimal. We improve significantly
on this baseline, taking advantage of correlation between the set cover
instances to leverage techniques from streaming and sublinear-time set cover
algorithms. Combined with problem-specific pre-processing techniques, we
present an $\tilde{O}(n^2)$ time algorithm for constructing an $O(\log
n)$-approximate sparsest navigable graph under any distance function.
The runtime of our method is optimal up to logarithmic factors under the
Strong Exponential Time Hypothesis via a reduction from Monochromatic Closest
Pair. Moreover, we prove that, as with general set cover, obtaining better than
an $O(\log n)$-approximation is NP-hard, despite the significant additional
structure present in the navigable graph problem. Finally, we show that our
techniques can also beat cubic time for the closely related and practically
important problems of constructing $\alpha$-shortcut reachable and
$\tau$-monotonic graphs, which are also used for nearest neighbor search. For
such graphs, we obtain $\tilde{O}(n^{2.5})$ time or better algorithms.
Graph-based nearest neighbor search methods have seen a surge of popularity
in recent years, offering state-of-the-art performance across a wide variety of
applications. Central to these methods is the task of constructing a sparse
navigable search graph for a given dataset endowed with a distance function.
Unfortunately, doing so is computationally expensive, so heuristics are
universally used in practice.
In this work, we initiate the study of fast algorithms with provable
guarantees for search graph construction. For a dataset with $n$ data points,
the problem of constructing an optimally sparse navigable graph can be framed
as $n$ separate but highly correlated minimum set cover instances. This yields
a naive $O(n^3)$ time greedy algorithm that returns a navigable graph whose
sparsity is at most $O(\log n)$ higher than optimal. We improve significantly
on this baseline, taking advantage of correlation between the set cover
instances to leverage techniques from streaming and sublinear-time set cover
algorithms. Combined with problem-specific pre-processing techniques, we
present an $\tilde{O}(n^2)$ time algorithm for constructing an $O(\log
n)$-approximate sparsest navigable graph under any distance function.
The runtime of our method is optimal up to logarithmic factors under the
Strong Exponential Time Hypothesis via a reduction from Monochromatic Closest
Pair. Moreover, we prove that, as with general set cover, obtaining better than
an $O(\log n)$-approximation is NP-hard, despite the significant additional
structure present in the navigable graph problem. Finally, we show that our
techniques can also beat cubic time for the closely related and practically
important problems of constructing $\alpha$-shortcut reachable and
$\tau$-monotonic graphs, which are also used for nearest neighbor search. For
such graphs, we obtain $\tilde{O}(n^{2.5})$ time or better algorithms.
A rounding scheme for set cover has served as an important component in
design of approximation algorithms for the problem, and there exists an
H_s-approximate rounding scheme, where s denotes the maximum subset size,
directly implying an approximation algorithm with the same approximation
guarantee. A rounding scheme has also been considered under some online models,
and in particular, under the element arrival model used as a crucial subroutine
in algorithms for online set cover, an O(log s)-competitive rounding scheme is
known [Buchbinder, Chen, and Naor, SODA 2014]. On the other hand, under a more
general model, called the subset arrival model, only a simple O(log
n)-competitive rounding scheme is known, where n denotes the number of elements
in the ground set.
In this paper, we present an O(log^2 s)-competitive rounding scheme under the
subset arrival model, with one mild assumption that s is known upfront. Using
our rounding scheme, we immediately obtain an O(log^2 s)-approximation
algorithm for multi-stage stochastic set cover, improving upon the existing
algorithms [Swamy and Shmoys, SICOMP 2012; Byrka and Srinivasan, SIDMA 2018]
when s is small enough compared to the number of stages and the number of
elements. Lastly, for set cover with s = 2, also known as edge cover, we
present a 1.8-competitive rounding scheme under the edge arrival model.
A rounding scheme for set cover has served as an important component in
design of approximation algorithms for the problem, and there exists an
H_s-approximate rounding scheme, where s denotes the maximum subset size,
directly implying an approximation algorithm with the same approximation
guarantee. A rounding scheme has also been considered under some online models,
and in particular, under the element arrival model used as a crucial subroutine
in algorithms for online set cover, an O(log s)-competitive rounding scheme is
known [Buchbinder, Chen, and Naor, SODA 2014]. On the other hand, under a more
general model, called the subset arrival model, only a simple O(log
n)-competitive rounding scheme is known, where n denotes the number of elements
in the ground set.
In this paper, we present an O(log^2 s)-competitive rounding scheme under the
subset arrival model, with one mild assumption that s is known upfront. Using
our rounding scheme, we immediately obtain an O(log^2 s)-approximation
algorithm for multi-stage stochastic set cover, improving upon the existing
algorithms [Swamy and Shmoys, SICOMP 2012; Byrka and Srinivasan, SIDMA 2018]
when s is small enough compared to the number of stages and the number of
elements. Lastly, for set cover with s = 2, also known as edge cover, we
present a 1.8-competitive rounding scheme under the edge arrival model.
For a fixed graph $H$, the $H$-Coloring problem asks whether a given graph
admits an edge-preserving function from its vertex set to that of $H$. A
seminal theorem of Hell and Ne\v{s}et\v{r}il asserts that the $H$-Coloring
problem is NP-hard whenever $H$ is loopless and non-bipartite. A result of
Jansen and Pieterse implies that for every graph $H$, the $H$-Coloring problem
parameterized by the vertex cover number $k$ admits a kernel with
$O(k^{\Delta(H)})$ vertices and bit-size bounded by $O(k^{\Delta(H)} \cdot \log
k)$, where $\Delta(H)$ denotes the maximum degree in $H$. For the case where
$H$ is a complete graph on at least three vertices, this kernel size nearly
matches conditional lower bounds established by Jansen and Kratsch and by
Jansen and Pieterse.
This paper presents new upper and lower bounds on the kernel size of
$H$-Coloring problems parameterized by the vertex cover number. The upper
bounds arise from two kernelization algorithms. The first is purely
combinatorial, and its size is governed by a structural quantity of the graph
$H$, called the non-adjacency witness number. As applications, we obtain
kernels whose size is bounded by a fixed polynomial for natural classes of
graphs $H$ with unbounded maximum degree. More strikingly, we show that for
almost every graph $H$, the degree of the polynomial that bounds the size of
our combinatorial kernel grows only logarithmically in $\Delta(H)$. Our second
kernel leverages linear-algebraic tools and involves the notion of faithful
independent representations of graphs. It strengthens the general bound from
prior work and, among other applications, yields near-optimal kernels for
problems concerning the dimension of orthogonal graph representations over
finite fields. We complement these results with conditional lower bounds,
thereby nearly settling the kernel complexity of the problem for various target
graphs $H$.
For a fixed graph $H$, the $H$-Coloring problem asks whether a given graph
admits an edge-preserving function from its vertex set to that of $H$. A
seminal theorem of Hell and Ne\v{s}et\v{r}il asserts that the $H$-Coloring
problem is NP-hard whenever $H$ is loopless and non-bipartite. A result of
Jansen and Pieterse implies that for every graph $H$, the $H$-Coloring problem
parameterized by the vertex cover number $k$ admits a kernel with
$O(k^{\Delta(H)})$ vertices and bit-size bounded by $O(k^{\Delta(H)} \cdot \log
k)$, where $\Delta(H)$ denotes the maximum degree in $H$. For the case where
$H$ is a complete graph on at least three vertices, this kernel size nearly
matches conditional lower bounds established by Jansen and Kratsch and by
Jansen and Pieterse.
This paper presents new upper and lower bounds on the kernel size of
$H$-Coloring problems parameterized by the vertex cover number. The upper
bounds arise from two kernelization algorithms. The first is purely
combinatorial, and its size is governed by a structural quantity of the graph
$H$, called the non-adjacency witness number. As applications, we obtain
kernels whose size is bounded by a fixed polynomial for natural classes of
graphs $H$ with unbounded maximum degree. More strikingly, we show that for
almost every graph $H$, the degree of the polynomial that bounds the size of
our combinatorial kernel grows only logarithmically in $\Delta(H)$. Our second
kernel leverages linear-algebraic tools and involves the notion of faithful
independent representations of graphs. It strengthens the general bound from
prior work and, among other applications, yields near-optimal kernels for
problems concerning the dimension of orthogonal graph representations over
finite fields. We complement these results with conditional lower bounds,
thereby nearly settling the kernel complexity of the problem for various target
graphs $H$.
Expanders are powerful algorithmic structures with two key properties: they
are
a) routable: for any multi-commodity flow unit demand, there exists a routing
with low congestion over short paths, where a demand is unit if the amount of
demand sent / received by any vertex is at most the number of edges adjacent to
it.
b) stable / prunable: for any (sequence of) edge failures, there exists a
proportionally small subset of vertices that can be disabled, such that the
graph induced on the remaining vertices is an expander.
Two natural algorithmic problems correspond to these two existential
guarantees: expander routing, i.e. computing a low-congestion routing for a
unit multi-commodity demand on an expander, and expander pruning, i.e.,
maintaining the subset of disabled vertices under a sequence of edge failures.
This paper considers the combination of the two problems: maintaining a
routing for a unit multi-commodity demand under pruning steps. This is done
through the introduction of a family of expander graphs that, like hypercubes,
are easy to route in, and are self-pruning: for an online sequence of edge
deletions, a simple self-contained algorithm can find a few vertices to prune
with each edge deletion, such that the remaining graph always remains an
easy-to-route-in expander in the family.
Notably, and with considerable technical work, this self-pruning can be made
worst-case, i.e., such that every single adversarial deletion only causes a
small number of additional deletions. Our results also allow tight
constant-factor control over the length of routing paths (with the usual
trade-offs in congestion and pruning ratio) and therefore extend to
constant-hop and length-constrained expanders in which routing over constant
length paths is crucial.
Expanders are powerful algorithmic structures with two key properties: they
are
a) routable: for any multi-commodity flow unit demand, there exists a routing
with low congestion over short paths, where a demand is unit if the amount of
demand sent / received by any vertex is at most the number of edges adjacent to
it.
b) stable / prunable: for any (sequence of) edge failures, there exists a
proportionally small subset of vertices that can be disabled, such that the
graph induced on the remaining vertices is an expander.
Two natural algorithmic problems correspond to these two existential
guarantees: expander routing, i.e. computing a low-congestion routing for a
unit multi-commodity demand on an expander, and expander pruning, i.e.,
maintaining the subset of disabled vertices under a sequence of edge failures.
This paper considers the combination of the two problems: maintaining a
routing for a unit multi-commodity demand under pruning steps. This is done
through the introduction of a family of expander graphs that, like hypercubes,
are easy to route in, and are self-pruning: for an online sequence of edge
deletions, a simple self-contained algorithm can find a few vertices to prune
with each edge deletion, such that the remaining graph always remains an
easy-to-route-in expander in the family.
Notably, and with considerable technical work, this self-pruning can be made
worst-case, i.e., such that every single adversarial deletion only causes a
small number of additional deletions. Our results also allow tight
constant-factor control over the length of routing paths (with the usual
trade-offs in congestion and pruning ratio) and therefore extend to
constant-hop and length-constrained expanders in which routing over constant
length paths is crucial.
Authors: Mark de Berg, Andrés López Martínez, Frits Spieksma
This paper introduces the concept of the "Price of Diversity" (PoD) in
discrete optimization problems, quantifying the trade-off between solution
diversity and cost. For a minimization problem, the PoD is defined as the
worst-case ratio, over all instances, of the minimum achievable cost of a
diverse set of $k$ solutions to the cost of a single optimal solution for the
same instance. Here, the cost of a $k$-solution set is determined by the most
expensive solution within the set. Focusing on the Traveling Salesman Problem
(TSP) as a key example, we study the PoD in the setting where $k$ edge-disjoint
tours are required. We establish that, asymptotically, the PoD of finding two
edge-disjoint tours is $\frac{8}{5}$ in a special one-dimensional case and 2 in
a general metric space. We obtain these results from analyzing a related
fundamental problem: the Shortest Hamiltonian Path problem (SHP), for which we
establish similar results.
This paper introduces the concept of the "Price of Diversity" (PoD) in
discrete optimization problems, quantifying the trade-off between solution
diversity and cost. For a minimization problem, the PoD is defined as the
worst-case ratio, over all instances, of the minimum achievable cost of a
diverse set of $k$ solutions to the cost of a single optimal solution for the
same instance. Here, the cost of a $k$-solution set is determined by the most
expensive solution within the set. Focusing on the Traveling Salesman Problem
(TSP) as a key example, we study the PoD in the setting where $k$ edge-disjoint
tours are required. We establish that, asymptotically, the PoD of finding two
edge-disjoint tours is $\frac{8}{5}$ in a special one-dimensional case and 2 in
a general metric space. We obtain these results from analyzing a related
fundamental problem: the Shortest Hamiltonian Path problem (SHP), for which we
establish similar results.
Breadth-first search (BFS) is known as a basic search strategy for learning
graph properties. As the scales of graph databases have increased tremendously
in recent years, large-scale graphs G are often disk-resident. Obtaining the
BFS results of G in semi-external memory model is inevitable, because the
in-memory BFS algorithm has to maintain the entire G in the main memory, and
external BFS algorithms consume high computational costs. As a good trade-off
between the internal and external memory models, semi-external memory model
assumes that the main memory can at least reside a spanning tree of G.
Nevertheless, the semi-external BFS problem is still an open issue due to its
difficulty. Therefore, this paper presents a comprehensive study for processing
BFS in semi-external memory model. After discussing the naive solutions based
on the basic framework of semi-external graph algorithms, this paper presents
an efficient algorithm, named EP-BFS, with a small minimum memory space
requirement, which is an important factor for evaluating semi-external
algorithms. Extensive experiments are conducted on both real and synthetic
large-scale graphs, where graph WDC-2014 contains over 1.7 billion nodes, and
graph eu-2015 has over 91 billion edges. Experimental results confirm that
EP-BFS can achieve up to 10 times faster.
Breadth-first search (BFS) is known as a basic search strategy for learning
graph properties. As the scales of graph databases have increased tremendously
in recent years, large-scale graphs G are often disk-resident. Obtaining the
BFS results of G in semi-external memory model is inevitable, because the
in-memory BFS algorithm has to maintain the entire G in the main memory, and
external BFS algorithms consume high computational costs. As a good trade-off
between the internal and external memory models, semi-external memory model
assumes that the main memory can at least reside a spanning tree of G.
Nevertheless, the semi-external BFS problem is still an open issue due to its
difficulty. Therefore, this paper presents a comprehensive study for processing
BFS in semi-external memory model. After discussing the naive solutions based
on the basic framework of semi-external graph algorithms, this paper presents
an efficient algorithm, named EP-BFS, with a small minimum memory space
requirement, which is an important factor for evaluating semi-external
algorithms. Extensive experiments are conducted on both real and synthetic
large-scale graphs, where graph WDC-2014 contains over 1.7 billion nodes, and
graph eu-2015 has over 91 billion edges. Experimental results confirm that
EP-BFS can achieve up to 10 times faster.
A $k$-submodular function naturally generalizes submodular functions by
taking as input $k$ disjoint subsets, rather than a single subset. Unlike
standard submodular maximization, which only requires selecting elements for
the solution, $k$-submodular maximization adds the challenge of determining the
subset to which each selected element belongs. Prior research has shown that
the greedy algorithm is a 1/2-approximation for the monotone $k$-submodular
maximization problem under cardinality or matroid constraints. However, whether
a firm 1/2-approximation exists for the budgeted version (i.e., with a knapsack
constraint) has remained open for several years. We resolve this question
affirmatively by proving that the 1-Guess Greedy algorithm, which first guesses
an appropriate element from an optimal solution before proceeding with the
greedy algorithm, achieves a 1/2-approximation. This result is asymptotically
tight as $((k+1)/(2k)+\epsilon)$-approximation requires exponentially many
value oracle queries even without constraints (Iwata et al., SODA 2016). We
further show that 1-Guess Greedy is 1/3-approximation for the non-monotone
problem. This algorithm is both simple and parallelizable, making it
well-suited for practical applications. Using the thresholding technique from
(Badanidiyuru and Vondrak, SODA 2014), it runs in nearly $\tilde O(n^2k^2)$
time.
The proof idea is simple: we introduce a novel continuous transformation from
an optimal solution to a greedy solution, using the multilinear extension to
evaluate every fractional solution during the transformation. This continuous
analysis approach yields two key extensions. First, it enables improved
approximation ratios of various existing algorithms. Second, our method
naturally extends to $k$-submodular maximization problems under broader
constraints, offering a more flexible and unified analysis framework.
A $k$-submodular function naturally generalizes submodular functions by
taking as input $k$ disjoint subsets, rather than a single subset. Unlike
standard submodular maximization, which only requires selecting elements for
the solution, $k$-submodular maximization adds the challenge of determining the
subset to which each selected element belongs. Prior research has shown that
the greedy algorithm is a 1/2-approximation for the monotone $k$-submodular
maximization problem under cardinality or matroid constraints. However, whether
a firm 1/2-approximation exists for the budgeted version (i.e., with a knapsack
constraint) has remained open for several years. We resolve this question
affirmatively by proving that the 1-Guess Greedy algorithm, which first guesses
an appropriate element from an optimal solution before proceeding with the
greedy algorithm, achieves a 1/2-approximation. This result is asymptotically
tight as $((k+1)/(2k)+\epsilon)$-approximation requires exponentially many
value oracle queries even without constraints (Iwata et al., SODA 2016). We
further show that 1-Guess Greedy is 1/3-approximation for the non-monotone
problem. This algorithm is both simple and parallelizable, making it
well-suited for practical applications. Using the thresholding technique from
(Badanidiyuru and Vondrak, SODA 2014), it runs in nearly $\tilde O(n^2k^2)$
time.
The proof idea is simple: we introduce a novel continuous transformation from
an optimal solution to a greedy solution, using the multilinear extension to
evaluate every fractional solution during the transformation. This continuous
analysis approach yields two key extensions. First, it enables improved
approximation ratios of various existing algorithms. Second, our method
naturally extends to $k$-submodular maximization problems under broader
constraints, offering a more flexible and unified analysis framework.
We propose an $O(\log n)$-approximation algorithm for the bipartiteness ratio
for undirected graphs introduced by Trevisan (SIAM Journal on Computing, vol.
41, no. 6, 2012), where $n$ is the number of vertices. Our approach extends the
cut-matching game framework for sparsest cut to the bipartiteness ratio. Our
algorithm requires only $\mathrm{poly}\log n$ many single-commodity undirected
maximum flow computations. Therefore, with the current fastest undirected
max-flow algorithms, it runs in nearly linear time. Along the way, we introduce
the concept of well-linkedness for skew-symmetric graphs and prove a novel
characterization of bipartitness ratio in terms of well-linkedness in an
auxiliary skew-symmetric graph, which may be of independent interest.
As an application, we devise an $\tilde{O}(mn)$-time algorithm that given a
graph whose maximum cut deletes a $1-\eta$ fraction of edges, finds a cut that
deletes a $1 - O(\log n \log(1/\eta)) \cdot \eta$ fraction of edges, where $m$
is the number of edges.
We propose an $O(\log n)$-approximation algorithm for the bipartiteness ratio
for undirected graphs introduced by Trevisan (SIAM Journal on Computing, vol.
41, no. 6, 2012), where $n$ is the number of vertices. Our approach extends the
cut-matching game framework for sparsest cut to the bipartiteness ratio. Our
algorithm requires only $\mathrm{poly}\log n$ many single-commodity undirected
maximum flow computations. Therefore, with the current fastest undirected
max-flow algorithms, it runs in nearly linear time. Along the way, we introduce
the concept of well-linkedness for skew-symmetric graphs and prove a novel
characterization of bipartitness ratio in terms of well-linkedness in an
auxiliary skew-symmetric graph, which may be of independent interest.
As an application, we devise an $\tilde{O}(mn)$-time algorithm that given a
graph whose maximum cut deletes a $1-\eta$ fraction of edges, finds a cut that
deletes a $1 - O(\log n \log(1/\eta)) \cdot \eta$ fraction of edges, where $m$
is the number of edges.
We revisit the well-known online traveling salesman problem (OLTSP) and its
extension, the online dial-a-ride problem (OLDARP). A server starting at a
designated origin in a metric space, is required to serve online requests, and
return to the origin such that the completion time is minimized. The SmartStart
algorithm, introduced by Ascheuer et al., incorporates a waiting approach into
an online schedule-based algorithm and attains the optimal upper bound of 2 for
the OLTSP and the OLDARP if each schedule is optimal. Using the Christofides'
heuristic to approximate each schedule leads to the currently best upper bound
of (7 + sqrt(13)) / 4 approximately 2.6514 in polynomial time.
In this study, we investigate how an online algorithm with predictions, a
recent popular framework (i.e. the so-called learning-augmented algorithms),
can be used to improve the best competitive ratio in polynomial time. In
particular, we develop a waiting strategy with online predictions, each of
which is only a binary decision-making for every schedule in a whole route,
rather than forecasting an entire set of requests in the beginning (i.e.
offline predictions). That is, it does not require knowing the number of
requests in advance. The proposed online schedule-based algorithm can achieve
1.1514 * lambda + 1.5-consistency and 1.5 + 1.5 / (2.3028 * lambda -
1)-robustness in polynomial time, where lambda lies in the interval (1/theta,
1] and theta is set to (1 + sqrt(13)) / 2 approximately 2.3028. The best
consistency tends to approach to 2 when lambda is close to 1/theta. Meanwhile,
we show any online schedule-based algorithms cannot derive a competitive ratio
of less than 2 even with perfect online predictions.
We revisit the well-known online traveling salesman problem (OLTSP) and its
extension, the online dial-a-ride problem (OLDARP). A server starting at a
designated origin in a metric space, is required to serve online requests, and
return to the origin such that the completion time is minimized. The SmartStart
algorithm, introduced by Ascheuer et al., incorporates a waiting approach into
an online schedule-based algorithm and attains the optimal upper bound of 2 for
the OLTSP and the OLDARP if each schedule is optimal. Using the Christofides'
heuristic to approximate each schedule leads to the currently best upper bound
of (7 + sqrt(13)) / 4 approximately 2.6514 in polynomial time.
In this study, we investigate how an online algorithm with predictions, a
recent popular framework (i.e. the so-called learning-augmented algorithms),
can be used to improve the best competitive ratio in polynomial time. In
particular, we develop a waiting strategy with online predictions, each of
which is only a binary decision-making for every schedule in a whole route,
rather than forecasting an entire set of requests in the beginning (i.e.
offline predictions). That is, it does not require knowing the number of
requests in advance. The proposed online schedule-based algorithm can achieve
1.1514 * lambda + 1.5-consistency and 1.5 + 1.5 / (2.3028 * lambda -
1)-robustness in polynomial time, where lambda lies in the interval (1/theta,
1] and theta is set to (1 + sqrt(13)) / 2 approximately 2.3028. The best
consistency tends to approach to 2 when lambda is close to 1/theta. Meanwhile,
we show any online schedule-based algorithms cannot derive a competitive ratio
of less than 2 even with perfect online predictions.
We introduce a new method for analyzing midpoint discretizations of
stochastic differential equations (SDEs), which are frequently used in Markov
chain Monte Carlo (MCMC) methods for sampling from a target measure $\pi
\propto \exp(-V)$. Borrowing techniques from Malliavin calculus, we compute
estimates for the Radon-Nikodym derivative for processes on $L^2([0, T);
\mathbb{R}^d)$ which may anticipate the Brownian motion, in the sense that they
may not be adapted to the filtration at the same time. Applying these to
various popular midpoint discretizations, we are able to improve the regularity
and cross-regularity results in the literature on sampling methods. We also
obtain a query complexity bound of $\widetilde{O}(\frac{\kappa^{5/4}
d^{1/4}}{\varepsilon^{1/2}})$ for obtaining a $\varepsilon^2$-accurate sample
in $\mathsf{KL}$ divergence, under log-concavity and strong smoothness
assumptions for $\nabla^2 V$.
We introduce a new method for analyzing midpoint discretizations of
stochastic differential equations (SDEs), which are frequently used in Markov
chain Monte Carlo (MCMC) methods for sampling from a target measure $\pi
\propto \exp(-V)$. Borrowing techniques from Malliavin calculus, we compute
estimates for the Radon-Nikodym derivative for processes on $L^2([0, T);
\mathbb{R}^d)$ which may anticipate the Brownian motion, in the sense that they
may not be adapted to the filtration at the same time. Applying these to
various popular midpoint discretizations, we are able to improve the regularity
and cross-regularity results in the literature on sampling methods. We also
obtain a query complexity bound of $\widetilde{O}(\frac{\kappa^{5/4}
d^{1/4}}{\varepsilon^{1/2}})$ for obtaining a $\varepsilon^2$-accurate sample
in $\mathsf{KL}$ divergence, under log-concavity and strong smoothness
assumptions for $\nabla^2 V$.
Authors: David Gillman, Jacob Platnick, Dana Randall
Weighted equitable partitioning of a graph has been of interest lately due to
several applications, including redistricting, network algorithms, and image
decomposition. Weighting a partition according to the spanning-tree metric has
been of mathematical and practical interest because it typically favors
partitions with more compact pieces. An appealing algorithm suggested by
Charikar et al. is to sample a random spanning tree and remove k-1 edges,
producing a random forest. If the components of the forest form a balanced
partition, the partition is equitable under an easily computed acceptance
probability. Cannon et al. recently showed that spanning trees on grid graphs
and grid-like graphs on $n$ vertices are splittable into $k$ equal sized pieces
with probability at least $n^{-2k}$, leading to the first rigorous sampling
algorithm for a class of graphs. We present complementary results showing that
spanning trees on dense random graphs also have inverse polynomial probability
of being splittable, giving another class of graphs where equitable partitions
can be efficiently sampled exactly. These proofs also guarantee fast
almost-uniform sampling for the up-down walk on forests, giving another
provably efficient randomized method for generating equitable partitions.
Further, we show that problems with the well-studied ReCom algorithm for
equitable partitioning are more extensive than previously known, even in
special cases that were believed to be more promising. We present a family of
graphs where the Markov chain fails to be irreducible when it must keep the
components perfectly equitable; yet when the chain is allowed an imbalance of
just one vertex between components, the rejection sampling step may take
exponential time. This is true even when the graph satisfies desirable
properties that have been conjectured to be sufficient for fast sampling.
Weighted equitable partitioning of a graph has been of interest lately due to
several applications, including redistricting, network algorithms, and image
decomposition. Weighting a partition according to the spanning-tree metric has
been of mathematical and practical interest because it typically favors
partitions with more compact pieces. An appealing algorithm suggested by
Charikar et al. is to sample a random spanning tree and remove k-1 edges,
producing a random forest. If the components of the forest form a balanced
partition, the partition is equitable under an easily computed acceptance
probability. Cannon et al. recently showed that spanning trees on grid graphs
and grid-like graphs on $n$ vertices are splittable into $k$ equal sized pieces
with probability at least $n^{-2k}$, leading to the first rigorous sampling
algorithm for a class of graphs. We present complementary results showing that
spanning trees on dense random graphs also have inverse polynomial probability
of being splittable, giving another class of graphs where equitable partitions
can be efficiently sampled exactly. These proofs also guarantee fast
almost-uniform sampling for the up-down walk on forests, giving another
provably efficient randomized method for generating equitable partitions.
Further, we show that problems with the well-studied ReCom algorithm for
equitable partitioning are more extensive than previously known, even in
special cases that were believed to be more promising. We present a family of
graphs where the Markov chain fails to be irreducible when it must keep the
components perfectly equitable; yet when the chain is allowed an imbalance of
just one vertex between components, the rejection sampling step may take
exponential time. This is true even when the graph satisfies desirable
properties that have been conjectured to be sufficient for fast sampling.
Authors: Hamidreza Akef, Minki Hhan, David Soloveichik
Computing equilibrium concentrations of molecular complexes is generally
analytically intractable and requires numerical approaches. In this work we
focus on the polymer-monomer level, where indivisible molecules (monomers)
combine to form complexes (polymers). Rather than employing free-energy
parameters for each polymer, we focus on the athermic setting where all
interactions preserve enthalpy. This setting aligns with the strongly bonded
(domain-based) regime in DNA nanotechnology when strands can bind in different
ways, but always with maximum overall bonding -- and is consistent with the
saturated configurations in the Thermodynamic Binding Networks (TBNs) model.
Within this context, we develop an iterative algorithm for assigning polymer
concentrations to satisfy detailed-balance, where on-target (desired) polymers
are in high concentrations and off-target (undesired) polymers are in low. Even
if not directly executed, our algorithm provides effective insights into upper
bounds on concentration of off-target polymers, connecting combinatorial
arguments about discrete configurations such as those in the TBN model to
real-valued concentrations. We conclude with an application of our method to
decreasing leak in DNA logic and signal propagation. Our results offer a new
framework for design and verification of equilibrium concentrations when
configurations are distinguished by entropic forces.
Computing equilibrium concentrations of molecular complexes is generally
analytically intractable and requires numerical approaches. In this work we
focus on the polymer-monomer level, where indivisible molecules (monomers)
combine to form complexes (polymers). Rather than employing free-energy
parameters for each polymer, we focus on the athermic setting where all
interactions preserve enthalpy. This setting aligns with the strongly bonded
(domain-based) regime in DNA nanotechnology when strands can bind in different
ways, but always with maximum overall bonding -- and is consistent with the
saturated configurations in the Thermodynamic Binding Networks (TBNs) model.
Within this context, we develop an iterative algorithm for assigning polymer
concentrations to satisfy detailed-balance, where on-target (desired) polymers
are in high concentrations and off-target (undesired) polymers are in low. Even
if not directly executed, our algorithm provides effective insights into upper
bounds on concentration of off-target polymers, connecting combinatorial
arguments about discrete configurations such as those in the TBN model to
real-valued concentrations. We conclude with an application of our method to
decreasing leak in DNA logic and signal propagation. Our results offer a new
framework for design and verification of equilibrium concentrations when
configurations are distinguished by entropic forces.
We study the multiprocessor scheduling with rejection problem under a machine
cost constraint. In this problem, each job is either rejected with a rejection
penalty or; accepted and scheduled on one of the machines for processing. The
machine cost is proportional to the total processing time of the jobs scheduled
on it. The problem aims to minimize the makespan of accepted jobs plus the
total rejection penalty of rejected jobs while the total machine cost does not
exceed a given upper bound. We present a simple $2$-approximation algorithm for
the problem and we achieve an EPTAS when the number $m$ of machines is a fixed
constant.
We study the multiprocessor scheduling with rejection problem under a machine
cost constraint. In this problem, each job is either rejected with a rejection
penalty or; accepted and scheduled on one of the machines for processing. The
machine cost is proportional to the total processing time of the jobs scheduled
on it. The problem aims to minimize the makespan of accepted jobs plus the
total rejection penalty of rejected jobs while the total machine cost does not
exceed a given upper bound. We present a simple $2$-approximation algorithm for
the problem and we achieve an EPTAS when the number $m$ of machines is a fixed
constant.
Suppose that a group test operation is available for checking order relations
in a set, can this speed up problems like finding the minimum/maximum element,
rank determination and selection? We consider a one-sided group test to be
available, where queries are of the form $u \le_Q V$ or $V \le_Q u$, and the
answer is `yes' if and only if there is some $v \in V$ such that $u \le v$ or
$v \le u$, respectively. We restrict attention to total orders and focus on
query-complexity; for min or max finding, we give a Las Vegas algorithm that
makes $\mathcal{O}(\log^2 n)$ expected queries. We also give randomized
approximate algorithms for rank determination and selection; we allow a
relative error of $1 \pm \delta$ for $\delta > 0$ in the estimated rank or
selected element. In this case, we give a Monte Carlo algorithm for approximate
rank determination with expected query complexity
$\tilde{\mathcal{O}}(1/\delta^2 - \log \epsilon)$, where $1-\epsilon$ is the
probability that the algorithm succeeds. We also give a Monte Carlo algorithm
for approximate selection that has expected query complexity
$\tilde{\mathcal{O}}(-\log( \epsilon \delta^2) / \delta^4)$; it has probability
at least $\frac{1}{2}$ to output an element $x$, and if so, $x$ has the desired
approximate rank with probability $1-\epsilon$.
Suppose that a group test operation is available for checking order relations
in a set, can this speed up problems like finding the minimum/maximum element,
rank determination and selection? We consider a one-sided group test to be
available, where queries are of the form $u \le_Q V$ or $V \le_Q u$, and the
answer is `yes' if and only if there is some $v \in V$ such that $u \le v$ or
$v \le u$, respectively. We restrict attention to total orders and focus on
query-complexity; for min or max finding, we give a Las Vegas algorithm that
makes $\mathcal{O}(\log^2 n)$ expected queries. We also give randomized
approximate algorithms for rank determination and selection; we allow a
relative error of $1 \pm \delta$ for $\delta > 0$ in the estimated rank or
selected element. In this case, we give a Monte Carlo algorithm for approximate
rank determination with expected query complexity
$\tilde{\mathcal{O}}(1/\delta^2 - \log \epsilon)$, where $1-\epsilon$ is the
probability that the algorithm succeeds. We also give a Monte Carlo algorithm
for approximate selection that has expected query complexity
$\tilde{\mathcal{O}}(-\log( \epsilon \delta^2) / \delta^4)$; it has probability
at least $\frac{1}{2}$ to output an element $x$, and if so, $x$ has the desired
approximate rank with probability $1-\epsilon$.
Authors: Yury Makarychev, Madhusudhan Reddy Pittu, Ali Vakilian
We study the classic Max-Cut problem under multiple cardinality constraints,
which we refer to as the Constrained Max-Cut problem. Given a graph $G=(V, E)$,
a partition of the vertices into $c$ disjoint parts $V_1, \ldots, V_c$, and
cardinality parameters $k_1, \ldots, k_c$, the goal is to select a set $S
\subseteq V$ such that $|S \cap V_i| = k_i$ for each $i \in [c]$, maximizing
the total weight of edges crossing $S$ (i.e., edges with exactly one endpoint
in $S$).
By designing an approximate kernel for Constrained Max-Cut and building on
the correlation rounding technique of Raghavendra and Tan (2012), we present a
$(0.858 - \varepsilon)$-approximation algorithm for the problem when $c =
O(1)$. The algorithm runs in time $O\left(\min\{k/\varepsilon,
n\}^{\poly(c/\varepsilon)} + \poly(n)\right)$, where $k = \sum_{i \in [c]} k_i$
and $n=|V|$. This improves upon the $(\frac{1}{2} +
\varepsilon_0)$-approximation of Feige and Langberg (2001) for $\maxcut_k$ (the
special case when $c=1, k_1 = k$), and generalizes the $(0.858 -
\varepsilon)$-approximation of Raghavendra and Tan (2012), which only applies
when $\min\{k,n-k\}=\Omega(n)$ and does not handle multiple constraints.
We also establish that, for general values of $c$, it is NP-hard to determine
whether a feasible solution exists that cuts all edges. Finally, we present a
$1/2$-approximation algorithm for Max-Cut under an arbitrary matroid
constraint.
We study the classic Max-Cut problem under multiple cardinality constraints,
which we refer to as the Constrained Max-Cut problem. Given a graph $G=(V, E)$,
a partition of the vertices into $c$ disjoint parts $V_1, \ldots, V_c$, and
cardinality parameters $k_1, \ldots, k_c$, the goal is to select a set $S
\subseteq V$ such that $|S \cap V_i| = k_i$ for each $i \in [c]$, maximizing
the total weight of edges crossing $S$ (i.e., edges with exactly one endpoint
in $S$).
By designing an approximate kernel for Constrained Max-Cut and building on
the correlation rounding technique of Raghavendra and Tan (2012), we present a
$(0.858 - \varepsilon)$-approximation algorithm for the problem when $c =
O(1)$. The algorithm runs in time $O\left(\min\{k/\varepsilon,
n\}^{\poly(c/\varepsilon)} + \poly(n)\right)$, where $k = \sum_{i \in [c]} k_i$
and $n=|V|$. This improves upon the $(\frac{1}{2} +
\varepsilon_0)$-approximation of Feige and Langberg (2001) for $\maxcut_k$ (the
special case when $c=1, k_1 = k$), and generalizes the $(0.858 -
\varepsilon)$-approximation of Raghavendra and Tan (2012), which only applies
when $\min\{k,n-k\}=\Omega(n)$ and does not handle multiple constraints.
We also establish that, for general values of $c$, it is NP-hard to determine
whether a feasible solution exists that cuts all edges. Finally, we present a
$1/2$-approximation algorithm for Max-Cut under an arbitrary matroid
constraint.
Scrolling through the deluge of people advertising their ICML papers on social media, I found myself bedeviled by a philosophical question about the logical reconstruction of machine learning. In the most famous machine learning papers, the central claims take the form:
Y is predictable from X, and method M demonstrates a particular accuracy on this prediction problem.
The AlexNet paper showed that ImageNet image classes were predictable from images, and a convolutional neural network achieved high accuracy. The GPT3 paper achieved high accuracy on a variety of natural language processing prediction problems with one-shot learning and transformer-based large language models. The Support Vector Network paper showed that support vector machines could recognize handwritten digits with high accuracy.
In these descriptions, I am being vague today about what precisely “X” and “Y” are. “X” could be streams of text, images, or electronic health records. “Y” could be translations, image classes, or cancer diagnoses. How papers build up these constructs and concepts also bugs me, but I will leave a full logical reconstruction of machine learning for another blog post.1
Today, I am interested in understanding the papers that argue the negative claim and whether they ever provide useful evidence. That is, the papers that assert:
Y is not predictable from X.
I call claims of this form “unpredictability arguments.” Papers making unpredictability arguments can get a lot of temporary traction in machine learning discourse. They give fuel to the petty battles inside the community. In our current landscape, they give ammunition for lay critiques of industry. They can even help bring on AI Winters if people take them seriously enough. The problem is they are much harder to justify as stated.
Unpredictability arguments can be purely theoretical. These might say something like “under this list of assumptions about how data is generated and this list of assumptions about how you build the prediction function, Y is not predictable from X.” Such arguments can be helpful for conceptualization, but they are too strongly tied to their assumptions. Yes, simple linear perceptrons can’t build an XOR of input bits. But simple multilayer perceptrons can. Moreover, if your prediction function is allowed to include simple numerical transformations of the input data, all of a sudden XOR is predictable by linear functions (that is, just add XOR as a feature.).
Theory bounds are weak because practitioners (and theorists for that matter) can simply change the rules. Yes, you can tell me that under some very contrived scenario, I can’t build a particular computer program. But machine learning practice is the apotheosis of computer science, fully embracing a Feyerabendian zeal for anything goes. I can always disagree with your assessment of how data is collected, generated, and processed. And there are an infinite number of computer programs I can write in PyTorch to disprove your unpredictability assertion.
Minsky and Papert’s arguments that I’m alluding to above—about whether perceptrons can compute parity—are a notable exception that proves the rule. No one really understands their arguments, and if you believed them without really digging into the details or trying some simple alternative prediction programs, you would have abandoned supervised learning for decades (oh wait, that’s actually what happened).
Now, what about experimental bounds? If someone writes a paper saying, “I did procedure P and found transformers were really bad at predicting Y from X,” what does this tell us? I’ve seen a lot of papers make such claims concerning LLMs and transformers. You’ll have people saying, “See, LLMs are stochastic parrots,” or “See, LLMs just learn epicycles.” In these papers, the predictability of the Y value is supposed to be obvious to the reader, and we’re supposed to be shocked that some machine learning model can’t predict it.
Sure, people can say all they want. As someone who wants better critiques of predictions, I’m never convinced by these papers, no matter how frequently they capture a viral zeitgeist. In the spirit of a Lakatosianoffense, I can always counter by tweeting “your procedure is pretty arbitrary, my friend,” or “i don’t think that baseline is fair,” or “skill issue.” There is no reason to believe that the code shared in such papers is the only way for a method to predict Y from X. There is an infinite regress of hyperparameters, optimization variants, and so on. You only have so many forking paths you can explore in the garden before the conference deadline.
Most people would find competitions to be more compelling evidence. Machine learning is built upon competitive testing, leading to breakthroughs in proving Y predictable from X. Everyone loves the ImageNet competition. Benchmark scores are a central part of how companies showcase their latest language models. Can competitive testing “prove” Y is unpredictable?
What do we conclude when machine learning competitions fail? A notable example of failure is the Future of Families Challenge (formerly known as the Fragile Families Challenge). The goal here was to predict certain socioeconomic outcomes from complex, unstructured data recorded about a large birth cohort. After hundreds of researchers tried to predict the outcomes, the study concluded that the best predictions were not very accurate and were only slightly better than those from a simple benchmark model.”
What should we conclude from this competition? We could conclude that the Ys in this paper (including “material hardship,” GPA, “grit,” and eviction) are not predictable from the Xs. I could also conclude that poor predictability arises because the data is far worse than advertised (e.g., there is a lot of missing data in the dataset). I could conclude that the targets studied have poor construct validity. There’s a long line of objections that I can string together even when a competitive test fails to find predictability.
In any event, I don’t have good answers for how to think about this aspect of the philosophy of machine learning yet. I’m very much thinking out loud today! But I’m posting because I’d love to hear your thoughts on what would constitute compelling evidence to prove the impossibility of prediction.
We give new lower bounds for the fragments of the Ideal Proof System (IPS) introduced by Grochow and Pitassi (JACM 2018). The Ideal Proof System is a central topic in algebraic proof complexity developed in the context of Nullstellensatz refutation (Beame, Impagliazzo, Krajicek, Pitassi, Pudlak, FOCS 1994) and simulates Extended Frege efficiently. Our main results are as follows.
1. mult-IPS_{Lin'}: We prove nearly quadratic-size formula lower bound for multilinear refutation (over the Boolean hypercube) of a variant of the subset-sum axiom polynomial. Extending this, we obtain a nearly matching qualitative statement for a constant degree target polynomial.
2. IPS_{Lin'}: Over the fields of characteristic zero, we prove exponential-size sum-of-ROABPs lower bound for the refutation of a variant of the subset-sum axiom polynomial. The result also extends over the fields of positive characteristics when the target polynomial is suitably modified. The modification is inspired by the recent results (Hakoniemi, Limaye, Tzameret, STOC 2024 and Behera, Limaye, Ramanathan, Srinivasan, ICALP 2025).
The mult-IPS_{Lin'} lower bound result is obtained by combining the quadratic-size formula lower bound technique of Kalorkoti (SICOMP 1985) with some additional ideas. The proof technique of IPS_{Lin'} lower bound result is inspired by the recent lower bound result of Chatterjee, Kush, Saraf and Shpilka (CCC 2024).
We give new lower bounds for the fragments of the Ideal Proof System (IPS) introduced by Grochow and Pitassi (JACM 2018). The Ideal Proof System is a central topic in algebraic proof complexity developed in the context of Nullstellensatz refutation (Beame, Impagliazzo, Krajicek, Pitassi, Pudlak, FOCS 1994) and simulates Extended Frege efficiently. Our main results are as follows.
1. mult-IPS_{Lin'}: We prove nearly quadratic-size formula lower bound for multilinear refutation (over the Boolean hypercube) of a variant of the subset-sum axiom polynomial. Extending this, we obtain a nearly matching qualitative statement for a constant degree target polynomial.
2. IPS_{Lin'}: Over the fields of characteristic zero, we prove exponential-size sum-of-ROABPs lower bound for the refutation of a variant of the subset-sum axiom polynomial. The result also extends over the fields of positive characteristics when the target polynomial is suitably modified. The modification is inspired by the recent results (Hakoniemi, Limaye, Tzameret, STOC 2024 and Behera, Limaye, Ramanathan, Srinivasan, ICALP 2025).
The mult-IPS_{Lin'} lower bound result is obtained by combining the quadratic-size formula lower bound technique of Kalorkoti (SICOMP 1985) with some additional ideas. The proof technique of IPS_{Lin'} lower bound result is inspired by the recent lower bound result of Chatterjee, Kush, Saraf and Shpilka (CCC 2024).
Authors: Markus Bläser, Radu Curticapean, Julian Dörfler, Christian Ikenmeyer
For a fixed graph H, the function #IndSub(H,*) maps graphs G to the count of
induced H-copies in G; this function obviously "counts something" in that it
has a combinatorial interpretation. Linear combinations of such functions are
called graph motif parameters and have recently received significant attention
in counting complexity after a seminal paper by Curticapean, Dell and Marx
(STOC'17). We show that, among linear combinations of functions #IndSub(H,*)
involving only graphs H without isolated vertices, precisely those with
positive integer coefficients maintain a combinatorial interpretation. It is
important to note that graph motif parameters can be nonnegative for all inputs
G, even when some coefficients are negative.
Formally, we show that evaluating any graph motif parameter with a negative
coefficient is impossible in an oracle variant of #P, where an implicit graph
is accessed by oracle queries. Our proof follows the classification of the
relativizing closure properties of #P by Hertrampf, Vollmer, and Wagner
(SCT'95) and the framework developed by Ikenmeyer and Pak (STOC'22), but our
application of the required Ramsey theorem turns out to be more subtle, as
graphs do not have the required Ramsey property.
Our techniques generalize from graphs to relational structures, including
colored graphs. Vastly generalizing this, we introduce motif parameters over
categories that count occurrences of sub-objects in the category. We then prove
a general dichotomy theorem that characterizes which such parameters have a
combinatorial interpretation. Using known results in Ramsey theory for
categories, we obtain a dichotomy for motif parameters of finite vector spaces
as well as parameter sets.
For a fixed graph H, the function #IndSub(H,*) maps graphs G to the count of
induced H-copies in G; this function obviously "counts something" in that it
has a combinatorial interpretation. Linear combinations of such functions are
called graph motif parameters and have recently received significant attention
in counting complexity after a seminal paper by Curticapean, Dell and Marx
(STOC'17). We show that, among linear combinations of functions #IndSub(H,*)
involving only graphs H without isolated vertices, precisely those with
positive integer coefficients maintain a combinatorial interpretation. It is
important to note that graph motif parameters can be nonnegative for all inputs
G, even when some coefficients are negative.
Formally, we show that evaluating any graph motif parameter with a negative
coefficient is impossible in an oracle variant of #P, where an implicit graph
is accessed by oracle queries. Our proof follows the classification of the
relativizing closure properties of #P by Hertrampf, Vollmer, and Wagner
(SCT'95) and the framework developed by Ikenmeyer and Pak (STOC'22), but our
application of the required Ramsey theorem turns out to be more subtle, as
graphs do not have the required Ramsey property.
Our techniques generalize from graphs to relational structures, including
colored graphs. Vastly generalizing this, we introduce motif parameters over
categories that count occurrences of sub-objects in the category. We then prove
a general dichotomy theorem that characterizes which such parameters have a
combinatorial interpretation. Using known results in Ramsey theory for
categories, we obtain a dichotomy for motif parameters of finite vector spaces
as well as parameter sets.
Authors: Artur Riazanov, Anastasia Sofronova, Dmitry Sokolov, Weiqiang Yuan
We show that for a randomly sampled unsatisfiable $O(\log n)$-CNF over $n$
variables the randomized two-party communication cost of finding a clause
falsified by the given variable assignment is linear in $n$.
We show that for a randomly sampled unsatisfiable $O(\log n)$-CNF over $n$
variables the randomized two-party communication cost of finding a clause
falsified by the given variable assignment is linear in $n$.
We consider the algorithmic challenge that is faced by blockchains that have
multidimensional block constraints and serve quasi-patient bidders. We provide
online approximation algorithms for this problem, thus solving open problems
left by [Babaioff and Nisan, EC 2025].
We consider the algorithmic challenge that is faced by blockchains that have
multidimensional block constraints and serve quasi-patient bidders. We provide
online approximation algorithms for this problem, thus solving open problems
left by [Babaioff and Nisan, EC 2025].
Authors: Sophia Heimann, Hung P. Hoang, Stefan Hougardy
The $k$-opt algorithm is one of the simplest and most widely used heuristics
for solving the traveling salesman problem. Starting from an arbitrary tour,
the $k$-opt algorithm improves the current tour in each iteration by exchanging
up to $k$ edges. The algorithm continues until no further improvement of this
kind is possible. For a long time, it remained an open question how many
iterations the $k$-opt algorithm might require for small values of $k$,
assuming the use of an optimal pivot rule. In this paper, we resolve this
question for the cases $k = 3$ and $k = 4$ by proving that in both these cases
an exponential number of iterations may be needed even if an optimal pivot rule
is used. Combined with a recent result from Heimann, Hoang, and Hougardy (ICALP
2024), this provides a complete answer for all $k \geq 3$ regarding the number
of iterations the $k$-opt algorithm may require under an optimal pivot rule. In
addition we establish an analogous exponential lower bound for the 2.5-opt
algorithm, a variant that generalizes 2-opt and is a restricted version of
3-opt. All our results hold for both the general and the metric traveling
salesman problem.
The $k$-opt algorithm is one of the simplest and most widely used heuristics
for solving the traveling salesman problem. Starting from an arbitrary tour,
the $k$-opt algorithm improves the current tour in each iteration by exchanging
up to $k$ edges. The algorithm continues until no further improvement of this
kind is possible. For a long time, it remained an open question how many
iterations the $k$-opt algorithm might require for small values of $k$,
assuming the use of an optimal pivot rule. In this paper, we resolve this
question for the cases $k = 3$ and $k = 4$ by proving that in both these cases
an exponential number of iterations may be needed even if an optimal pivot rule
is used. Combined with a recent result from Heimann, Hoang, and Hougardy (ICALP
2024), this provides a complete answer for all $k \geq 3$ regarding the number
of iterations the $k$-opt algorithm may require under an optimal pivot rule. In
addition we establish an analogous exponential lower bound for the 2.5-opt
algorithm, a variant that generalizes 2-opt and is a restricted version of
3-opt. All our results hold for both the general and the metric traveling
salesman problem.
Authors: Zihan Zhu, Dongchao Wu, Zhanbang Zhang, Jian Yang
Ever since Clos topologies were used in datacenter networks (DCNs), a
practical centralized scheduling algorithm that supports dynamic scheduling has
been absent. The introduction of optical switches in DCNs as a future-proof
solution exacerbates this problem due to several properties of optical
switches, such as the fact that they are generally bufferless and therefore
rely on centralized scheduling, and that they have long switching times and
therefore require the number of rearrangements to be minimized.
In this paper, we propose a centralized scheduling algorithm that achieves
theoretical maximum throughput even in one-rate bidirectional Clos networks,
while producing schemes with near-minimal numbers of rearrangements. It is the
only algorithm that directly supports bidirectional Clos networks and has a
time efficiency high enough to support dynamic scheduling to date. For static
minimal rewiring, its running time ranges from a fraction to a few hundredths
of other algorithms, and the number of rearrangements has also been steadily
improved, allowing for more frequent adjustments and less impact on ongoing
communications. In addition, the algorithm is very flexible and can support
various functional requirements in real-world environments. We achieve this
result through the replacement chain concept and bitset optimization.
Ever since Clos topologies were used in datacenter networks (DCNs), a
practical centralized scheduling algorithm that supports dynamic scheduling has
been absent. The introduction of optical switches in DCNs as a future-proof
solution exacerbates this problem due to several properties of optical
switches, such as the fact that they are generally bufferless and therefore
rely on centralized scheduling, and that they have long switching times and
therefore require the number of rearrangements to be minimized.
In this paper, we propose a centralized scheduling algorithm that achieves
theoretical maximum throughput even in one-rate bidirectional Clos networks,
while producing schemes with near-minimal numbers of rearrangements. It is the
only algorithm that directly supports bidirectional Clos networks and has a
time efficiency high enough to support dynamic scheduling to date. For static
minimal rewiring, its running time ranges from a fraction to a few hundredths
of other algorithms, and the number of rearrangements has also been steadily
improved, allowing for more frequent adjustments and less impact on ongoing
communications. In addition, the algorithm is very flexible and can support
various functional requirements in real-world environments. We achieve this
result through the replacement chain concept and bitset optimization.
The weighted $k$-server is a variant of the $k$-server problem, where the
cost of moving a server is the server's weight times the distance through which
it moves. The problem is famous for its intriguing properties and for evading
standard techniques for designing and analyzing online algorithms. Even on
uniform metric spaces with sufficiently many points, the deterministic
competitive ratio of weighted $k$-server is known to increase doubly
exponentially with respect to $k$, while the behavior of its randomized
competitive ratio is not fully understood. Specifically, no upper bound better
than doubly exponential is known, while the best known lower bound is singly
exponential in $k$. In this paper, we close the exponential gap between these
bounds by giving an $\exp(O(k^2))$-competitive randomized online algorithm for
the weighted $k$-server problem on uniform metrics, thus breaking the doubly
exponential barrier for deterministic algorithms for the first time. This is
achieved by a recursively defined notion of a phase which, on the one hand,
forces a lower bound on the cost of any offline solution, while, on the other
hand, also admits a randomized online algorithm with bounded expected cost. The
algorithm is also recursive; it involves running several algorithms virtually
and in parallel and following the decisions of one of them in a random order.
We also show that our techniques can be lifted to construct an
$\exp(O(k^2))$-competitive randomized online algorithm for the generalized
$k$-server problem on weighted uniform metrics.
The weighted $k$-server is a variant of the $k$-server problem, where the
cost of moving a server is the server's weight times the distance through which
it moves. The problem is famous for its intriguing properties and for evading
standard techniques for designing and analyzing online algorithms. Even on
uniform metric spaces with sufficiently many points, the deterministic
competitive ratio of weighted $k$-server is known to increase doubly
exponentially with respect to $k$, while the behavior of its randomized
competitive ratio is not fully understood. Specifically, no upper bound better
than doubly exponential is known, while the best known lower bound is singly
exponential in $k$. In this paper, we close the exponential gap between these
bounds by giving an $\exp(O(k^2))$-competitive randomized online algorithm for
the weighted $k$-server problem on uniform metrics, thus breaking the doubly
exponential barrier for deterministic algorithms for the first time. This is
achieved by a recursively defined notion of a phase which, on the one hand,
forces a lower bound on the cost of any offline solution, while, on the other
hand, also admits a randomized online algorithm with bounded expected cost. The
algorithm is also recursive; it involves running several algorithms virtually
and in parallel and following the decisions of one of them in a random order.
We also show that our techniques can be lifted to construct an
$\exp(O(k^2))$-competitive randomized online algorithm for the generalized
$k$-server problem on weighted uniform metrics.
Authors: Michal Dvořák, Dušan Knop, Michal Opler, Jan Pokorný, Ondřej Suchý, Krisztina Szilágyi
In this paper, we study the problem of pathfinding on traversal-dependent
graphs, i.e., graphs whose edges change depending on the previously visited
vertices. In particular, we study \emph{self-deleting graphs}, introduced by
Carmesin et al. (Sarah Carmesin, David Woller, David Parker, Miroslav Kulich,
and Masoumeh Mansouri. The Hamiltonian cycle and travelling salesperson
problems with traversal-dependent edge deletion. J. Comput. Sci.), which
consist of a graph $G=(V, E)$ and a function $f\colon V\rightarrow 2^E$, where
$f(v)$ is the set of edges that will be deleted after visiting the vertex $v$.
In the \textsc{(Shortest) Self-Deleting $s$-$t$-path} problem we are given a
self-deleting graph and its vertices $s$ and $t$, and we are asked to find a
(shortest) path from $s$ to $t$, such that it does not traverse an edge in
$f(v)$ after visiting $v$ for any vertex $v$.
We prove that \textsc{Self-Deleting $s$-$t$-path} is NP-hard even if the
given graph is outerplanar, bipartite, has maximum degree $3$, bandwidth $2$
and $|f(v)|\leq 1$ for each vertex $v$. We show that \textsc{Shortest
Self-Deleting $s$-$t$-path} is W[1]-complete parameterized by the length of the
sought path and that \textsc{Self-Deleting $s$-$t$-path} is \W{1}-complete
parameterized by the vertex cover number, feedback vertex set number and
treedepth. We also show that the problem becomes FPT when we parameterize by
the maximum size of $f(v)$ and several structural parameters. Lastly, we show
that the problem does not admit a polynomial kernel even for parameterization
by the vertex cover number and the maximum size of $f(v)$ combined already on
2-outerplanar graphs.
In this paper, we study the problem of pathfinding on traversal-dependent
graphs, i.e., graphs whose edges change depending on the previously visited
vertices. In particular, we study \emph{self-deleting graphs}, introduced by
Carmesin et al. (Sarah Carmesin, David Woller, David Parker, Miroslav Kulich,
and Masoumeh Mansouri. The Hamiltonian cycle and travelling salesperson
problems with traversal-dependent edge deletion. J. Comput. Sci.), which
consist of a graph $G=(V, E)$ and a function $f\colon V\rightarrow 2^E$, where
$f(v)$ is the set of edges that will be deleted after visiting the vertex $v$.
In the \textsc{(Shortest) Self-Deleting $s$-$t$-path} problem we are given a
self-deleting graph and its vertices $s$ and $t$, and we are asked to find a
(shortest) path from $s$ to $t$, such that it does not traverse an edge in
$f(v)$ after visiting $v$ for any vertex $v$.
We prove that \textsc{Self-Deleting $s$-$t$-path} is NP-hard even if the
given graph is outerplanar, bipartite, has maximum degree $3$, bandwidth $2$
and $|f(v)|\leq 1$ for each vertex $v$. We show that \textsc{Shortest
Self-Deleting $s$-$t$-path} is W[1]-complete parameterized by the length of the
sought path and that \textsc{Self-Deleting $s$-$t$-path} is \W{1}-complete
parameterized by the vertex cover number, feedback vertex set number and
treedepth. We also show that the problem becomes FPT when we parameterize by
the maximum size of $f(v)$ and several structural parameters. Lastly, we show
that the problem does not admit a polynomial kernel even for parameterization
by the vertex cover number and the maximum size of $f(v)$ combined already on
2-outerplanar graphs.
Authors: Marta Piecyk, Astrid Pieterse, Paweł Rzążewski, Magnus Wahlström
For a fixed graph $H$, in the List $H$-Coloring problem, we are given a graph
$G$ along with list $L(v) \subseteq V(H)$ for every $v \in V(G)$, and we have
to determine if there exists a list homomorphism $\varphi$ from $(G,L)$ to $H$,
i.e., an edge preserving mapping $\varphi: V(G)\to V(H)$ that satisfies
$\varphi(v)\in L(v)$ for every $v\in V(G)$. Note that if $H$ is the complete
graph on $q$ vertices, the problem is equivalent to List $q$-Coloring. We
investigate the kernelization properties of List $H$-Coloring parameterized by
the vertex cover number of $G$: given an instance $(G,L)$ and a vertex cover of
$G$ of size $k$, can we reduce $(G,L)$ to an equivalent instance $(G',L')$ of
List $H$-Coloring where the size of $G'$ is bounded by a low-degree polynomial
$p(k)$ in $k$? This question has been investigated previously by Jansen and
Pieterse [Algorithmica 2019], who provided an upper bound, which turns out to
be optimal if $H$ is a complete graph, i.e., for List $q$-Coloring. This result
was one of the first applications of the method of kernelization via
bounded-degree polynomials. We define two new integral graph invariants,
$c^*(H)$ and $d^*(H)$, with $d^*(H) \leq c^*(H) \leq d^*(H)+1$, and show that
for every graph $H$, List $H$-Coloring
-- has a kernel with $\mathcal{O}(k^{c^*(H)})$ vertices,
-- admits no kernel of size $\mathcal{O}(k^{d^*(H)-\varepsilon})$ for any
$\varepsilon > 0$, unless the polynomial hierarchy collapses.
-- Furthermore, if $c^*(H) > d^*(H)$, then there is a kernel with
$\mathcal{O}(k^{c^*(H)-\varepsilon})$ vertices where $\varepsilon \geq
2^{1-c^*(H)}$.
Additionally, we show that for some classes of graphs, including powers of
cycles and graphs $H$ where $\Delta(H) \leq c^*(H)$ (which in particular
includes cliques), the bound $d^*(H)$ is tight, using the polynomial method. We
conjecture that this holds in general.
For a fixed graph $H$, in the List $H$-Coloring problem, we are given a graph
$G$ along with list $L(v) \subseteq V(H)$ for every $v \in V(G)$, and we have
to determine if there exists a list homomorphism $\varphi$ from $(G,L)$ to $H$,
i.e., an edge preserving mapping $\varphi: V(G)\to V(H)$ that satisfies
$\varphi(v)\in L(v)$ for every $v\in V(G)$. Note that if $H$ is the complete
graph on $q$ vertices, the problem is equivalent to List $q$-Coloring. We
investigate the kernelization properties of List $H$-Coloring parameterized by
the vertex cover number of $G$: given an instance $(G,L)$ and a vertex cover of
$G$ of size $k$, can we reduce $(G,L)$ to an equivalent instance $(G',L')$ of
List $H$-Coloring where the size of $G'$ is bounded by a low-degree polynomial
$p(k)$ in $k$? This question has been investigated previously by Jansen and
Pieterse [Algorithmica 2019], who provided an upper bound, which turns out to
be optimal if $H$ is a complete graph, i.e., for List $q$-Coloring. This result
was one of the first applications of the method of kernelization via
bounded-degree polynomials. We define two new integral graph invariants,
$c^*(H)$ and $d^*(H)$, with $d^*(H) \leq c^*(H) \leq d^*(H)+1$, and show that
for every graph $H$, List $H$-Coloring
-- has a kernel with $\mathcal{O}(k^{c^*(H)})$ vertices,
-- admits no kernel of size $\mathcal{O}(k^{d^*(H)-\varepsilon})$ for any
$\varepsilon > 0$, unless the polynomial hierarchy collapses.
-- Furthermore, if $c^*(H) > d^*(H)$, then there is a kernel with
$\mathcal{O}(k^{c^*(H)-\varepsilon})$ vertices where $\varepsilon \geq
2^{1-c^*(H)}$.
Additionally, we show that for some classes of graphs, including powers of
cycles and graphs $H$ where $\Delta(H) \leq c^*(H)$ (which in particular
includes cliques), the bound $d^*(H)$ is tight, using the polynomial method. We
conjecture that this holds in general.
We provide new high-accuracy randomized algorithms for solving linear systems
and regression problems that are well-conditioned except for $k$ large singular
values. For solving such $d \times d$ positive definite system our algorithms
succeed whp. and run in time $\tilde O(d^2 + k^\omega)$. For solving such
regression problems in a matrix $\mathbf{A} \in \mathbb{R}^{n \times d}$ our
methods succeed whp. and run in time $\tilde O(\mathrm{nnz}(\mathbf{A}) + d^2 +
k^\omega)$ where $\omega$ is the matrix multiplication exponent and
$\mathrm{nnz}(\mathbf{A})$ is the number of non-zeros in $\mathbf{A}$. Our
methods nearly-match a natural complexity limit under dense inputs for these
problems and improve upon a trade-off in prior approaches that obtain running
times of either $\tilde O(d^{2.065}+k^\omega)$ or $\tilde O(d^2 +
dk^{\omega-1})$ for $d\times d$ systems. Moreover, we show how to obtain these
running times even under the weaker assumption that all but $k$ of the singular
values have a suitably bounded generalized mean. Consequently, we give the
first nearly-linear time algorithm for computing a multiplicative approximation
to the nuclear norm of an arbitrary dense matrix. Our algorithms are built on
three general recursive preconditioning frameworks, where matrix sketching and
low-rank update formulas are carefully tailored to the problems' structure.
We provide new high-accuracy randomized algorithms for solving linear systems
and regression problems that are well-conditioned except for $k$ large singular
values. For solving such $d \times d$ positive definite system our algorithms
succeed whp. and run in time $\tilde O(d^2 + k^\omega)$. For solving such
regression problems in a matrix $\mathbf{A} \in \mathbb{R}^{n \times d}$ our
methods succeed whp. and run in time $\tilde O(\mathrm{nnz}(\mathbf{A}) + d^2 +
k^\omega)$ where $\omega$ is the matrix multiplication exponent and
$\mathrm{nnz}(\mathbf{A})$ is the number of non-zeros in $\mathbf{A}$. Our
methods nearly-match a natural complexity limit under dense inputs for these
problems and improve upon a trade-off in prior approaches that obtain running
times of either $\tilde O(d^{2.065}+k^\omega)$ or $\tilde O(d^2 +
dk^{\omega-1})$ for $d\times d$ systems. Moreover, we show how to obtain these
running times even under the weaker assumption that all but $k$ of the singular
values have a suitably bounded generalized mean. Consequently, we give the
first nearly-linear time algorithm for computing a multiplicative approximation
to the nuclear norm of an arbitrary dense matrix. Our algorithms are built on
three general recursive preconditioning frameworks, where matrix sketching and
low-rank update formulas are carefully tailored to the problems' structure.
Pinwheel Scheduling is a fundamental scheduling problem, in which each task
$i$ is associated with a positive integer $d_i$, and the objective is to
schedule one task per time slot, ensuring each task perpetually appears at
least once in every $d_i$ time slots. Although conjectured to be
PSPACE-complete, it remains open whether Pinwheel Scheduling is NP-hard (unless
a compact input encoding is used) or even contained in NP.
We introduce k-Visits, a finite version of Pinwheel Scheduling, where given n
deadlines, the goal is to schedule each task exactly k times. While we observe
that the 1-Visit problem is trivial, we prove that 2-Visits is strongly
NP-complete through a surprising reduction from Numerical 3-Dimensional
Matching (N3DM). As intermediate steps in the reduction, we define NP-complete
variants of N3DM which may be of independent interest. We further extend our
strong NP-hardness result to a generalization of k-Visits $k\geq 2$ in which
the deadline of each task may vary throughout the schedule, as well as to a
similar generalization of Pinwheel Scheduling, thus making progress towards
settling the complexity of Pinwheel Scheduling.
Additionally, we prove that 2-Visits can be solved in linear time if all
deadlines are distinct, rendering it one of the rare natural problems which
exhibit the interesting dichotomy of being in P if their input is a set and
NP-complete if the input is a multiset. We achieve this through a Turing
reduction from 2-Visits to a variation of N3DM, which we call Position
Matching. Based on this reduction, we also show an FPT algorithm for 2-Visits
parameterized by a value related to how close the input deadlines are to each
other, as well as a linear-time algorithm for instances with up to two distinct
deadlines.
Pinwheel Scheduling is a fundamental scheduling problem, in which each task
$i$ is associated with a positive integer $d_i$, and the objective is to
schedule one task per time slot, ensuring each task perpetually appears at
least once in every $d_i$ time slots. Although conjectured to be
PSPACE-complete, it remains open whether Pinwheel Scheduling is NP-hard (unless
a compact input encoding is used) or even contained in NP.
We introduce k-Visits, a finite version of Pinwheel Scheduling, where given n
deadlines, the goal is to schedule each task exactly k times. While we observe
that the 1-Visit problem is trivial, we prove that 2-Visits is strongly
NP-complete through a surprising reduction from Numerical 3-Dimensional
Matching (N3DM). As intermediate steps in the reduction, we define NP-complete
variants of N3DM which may be of independent interest. We further extend our
strong NP-hardness result to a generalization of k-Visits $k\geq 2$ in which
the deadline of each task may vary throughout the schedule, as well as to a
similar generalization of Pinwheel Scheduling, thus making progress towards
settling the complexity of Pinwheel Scheduling.
Additionally, we prove that 2-Visits can be solved in linear time if all
deadlines are distinct, rendering it one of the rare natural problems which
exhibit the interesting dichotomy of being in P if their input is a set and
NP-complete if the input is a multiset. We achieve this through a Turing
reduction from 2-Visits to a variation of N3DM, which we call Position
Matching. Based on this reduction, we also show an FPT algorithm for 2-Visits
parameterized by a value related to how close the input deadlines are to each
other, as well as a linear-time algorithm for instances with up to two distinct
deadlines.
Interactive proofs of proximity (IPPs) are a relaxation of interactive proofs, analogous to property testing, in which soundness is required to hold only for inputs that are far from the property being verified. In such proof systems, the verifier has oracle access to the input, and it engages in two types of activities before making its decision: querying the input oracle and communicating with the prover. The main objective is to achieve protocols where both the query and communication complexities are extremely low.
Of particular interest are IPPs in which the querying and the interacting activities are performed independently, with no information flow from one activity to the other. Such IPPs were systematically studied by Goldreich, Rothblum, and Skverer (ITCS 2023), who introduced two variants: the pre-coordinated model, where the querying and interacting activities may use a common source of randomness, and the isolated model, where the two activities are fully independent, each operating with a separate source of randomness.
We focus on what is possible under these models when soundness is relaxed to computational soundness. Our previous work (ECCC, TR24-131) showed that the pre-coordinated model becomes significantly more powerful under this relaxation. In this work, we consider the isolated model under the same relaxation and show a separation between the two models. We consider a property that, by our previous work, has a computationally sound IPP in the pre-coordinated model with poly-logarithmic complexities (assuming the existence of collision-resistant hashing functions), and show that any computationally sound IPP in the isolated model for this property must have either query complexity or communication complexity that is $n^{\Omega(1)}$, where $n$ is the length of the input.
Interactive proofs of proximity (IPPs) are a relaxation of interactive proofs, analogous to property testing, in which soundness is required to hold only for inputs that are far from the property being verified. In such proof systems, the verifier has oracle access to the input, and it engages in two types of activities before making its decision: querying the input oracle and communicating with the prover. The main objective is to achieve protocols where both the query and communication complexities are extremely low.
Of particular interest are IPPs in which the querying and the interacting activities are performed independently, with no information flow from one activity to the other. Such IPPs were systematically studied by Goldreich, Rothblum, and Skverer (ITCS 2023), who introduced two variants: the pre-coordinated model, where the querying and interacting activities may use a common source of randomness, and the isolated model, where the two activities are fully independent, each operating with a separate source of randomness.
We focus on what is possible under these models when soundness is relaxed to computational soundness. Our previous work (ECCC, TR24-131) showed that the pre-coordinated model becomes significantly more powerful under this relaxation. In this work, we consider the isolated model under the same relaxation and show a separation between the two models. We consider a property that, by our previous work, has a computationally sound IPP in the pre-coordinated model with poly-logarithmic complexities (assuming the existence of collision-resistant hashing functions), and show that any computationally sound IPP in the isolated model for this property must have either query complexity or communication complexity that is $n^{\Omega(1)}$, where $n$ is the length of the input.
Douglas Hofstadter first published Gödel, Escher, Bach: an Eternal Golden Braid in 1979 and my then high school self tried, and failed, to read though the entire book. It focused on the contradictions, with Kurt Gödel's incompleteness theorems, M. C. Escher's Drawing Hands and Johann Sebastian Bach's Canon a 2 per tonos, a piece that keeps rising until it ends a whole tone higher than it started.
I'd prefer to focus less on the paradoxes and self-reference to the true beauty and complexity of computation. So now having had a long career in the field, who would I call on to capture the power and beauty of computing?
It has to start with Alan Turing. His seminal paper On Computable Numbers, with an Application to the Entscheidungsproblem gives a clean model for computation and still the best argument (Section 9) for why this simple model captures everything computable. The Entscheidungsproblem, that you can't mechanize all of mathematics, comes as a consequence, not as a goal of the paper. In a much later paper, The Chemical Basis of Morphogenesis, he shows how the beauty of nature can emerge naturally from computation, which of course we now know much better arises from discrete DNA sequences.
For music, instead of Bach's abstract works, I prefer to focus on the emotional power of music that still arises from a musical score that is not unlike a computer program in the way it lays out the composition. Take for example Richard Wagner's Prelude and Liebestod (the beginning and the end of his opera Tristan and Isolde). It captures the tension of the two lovers from the very first notes and keeps that tension going until it resolves at the very end.
While soccer and basketball have mostly continuous play, I prefer that great American game of baseball that after each pitch has a very discrete state space that stadiums would capture with a few light bulbs (strikes, balls, outs), and yet could keep the excitement and tension on every one of those pitches. No one dominated the game more in his time than George Herman "Babe" Ruth, who I might have admittedly also chose to keep the same syllable cadence as Hofstadter.
So let's thank Turing, Wagner, Ruth, and the many others that showed we can show how incredible complexity and beauty can arise in the simplicity of computation. So many stories to tell.
By Lance Fortnow
Douglas Hofstadter first published Gödel, Escher, Bach: an Eternal Golden Braid in 1979 and my then high school self tried, and failed, to read though the entire book. It focused on the contradictions, with Kurt Gödel's incompleteness theorems, M. C. Escher's Drawing Hands and Johann Sebastian Bach's Canon a 2 per tonos, a piece that keeps rising until it ends a whole tone higher than it started.
I'd prefer to focus less on the paradoxes and self-reference to the true beauty and complexity of computation. So now having had a long career in the field, who would I call on to capture the power and beauty of computing?
It has to start with Alan Turing. His seminal paper On Computable Numbers, with an Application to the Entscheidungsproblem gives a clean model for computation and still the best argument (Section 9) for why this simple model captures everything computable. The Entscheidungsproblem, that you can't mechanize all of mathematics, comes as a consequence, not as a goal of the paper. In a much later paper, The Chemical Basis of Morphogenesis, he shows how the beauty of nature can emerge naturally from computation, which of course we now know much better arises from discrete DNA sequences.
For music, instead of Bach's abstract works, I prefer to focus on the emotional power of music that still arises from a musical score that is not unlike a computer program in the way it lays out the composition. Take for example Richard Wagner's Prelude and Liebestod (the beginning and the end of his opera Tristan and Isolde). It captures the tension of the two lovers from the very first notes and keeps that tension going until it resolves at the very end.
While soccer and basketball have mostly continuous play, I prefer that great American game of baseball that after each pitch has a very discrete state space that stadiums would capture with a few light bulbs (strikes, balls, outs), and yet could keep the excitement and tension on every one of those pitches. No one dominated the game more in his time than George Herman "Babe" Ruth, who I might have admittedly also chose to keep the same syllable cadence as Hofstadter.
So let's thank Turing, Wagner, Ruth, and the many others that showed we can show how incredible complexity and beauty can arise in the simplicity of computation. So many stories to tell.
We show that for a randomly sampled unsatisfiable $O(\log n)$-CNF over $n$ variables the randomized two-party communication cost of finding a clause falsified by the given variable assignment is linear in $n$.
We show that for a randomly sampled unsatisfiable $O(\log n)$-CNF over $n$ variables the randomized two-party communication cost of finding a clause falsified by the given variable assignment is linear in $n$.
Authors: Vincent Cheval, Florian Horn, Soumyajit Paul, Mahsa Shirmohammadi
A major open question in algorithmic game theory is whether normal-form
correlated equilibria (NFCE) can be computed efficiently in succinct games such
as extensive-form games [DFF+25,6PR24,FP23,HvS08,VSF08,PR08]. Motivated by this
question, we study the associated Threshold problem: deciding whether there
exists a correlated equilibrium whose value exceeds a given threshold. We prove
that this problem is PSPACE-hard for NFCE in multiplayer extensive-form games
with perfect recall, even for fixed thresholds. To contextualize this result,
we also establish the complexity of the Threshold problem for Nash equilibria
in this setting, showing it is ER-complete. These results uncover a surprising
complexity reversal: while optimal correlated equilibria are computationally
simpler than optimal Nash in normal-form games, the opposite holds in
extensive-form games, where computing optimal correlated equilibria is provably
harder. Building on this line of inquiry, we also address a related question by
[VSF08], who introduced the notions of extensive-form correlated equilibrium
(EFCE) and agent-form correlated equilibrium (AFCE). They asked how difficult
the Threshold problem is for AFCE; we answer this question by proving that it
is NP-hard, even in two-player games without chance nodes. Complementing our
hardness results, we establish tight complexity classifications for the
Threshold problem across several correlated equilibrium concepts - including
EFCE, AFCE, normal-form coarse, extensive-form coarse, and agent-form coarse
correlated equilibria. For each of these solution concepts in multiplayer
stochastic extensive-form games with perfect recall, we prove NP-completeness
by providing matching NP upper bounds to the previously known hardness results.
Together, our results provide the most complete landscape to date for the
complexity of optimal equilibrium computation in extensive-form games.
A major open question in algorithmic game theory is whether normal-form
correlated equilibria (NFCE) can be computed efficiently in succinct games such
as extensive-form games [DFF+25,6PR24,FP23,HvS08,VSF08,PR08]. Motivated by this
question, we study the associated Threshold problem: deciding whether there
exists a correlated equilibrium whose value exceeds a given threshold. We prove
that this problem is PSPACE-hard for NFCE in multiplayer extensive-form games
with perfect recall, even for fixed thresholds. To contextualize this result,
we also establish the complexity of the Threshold problem for Nash equilibria
in this setting, showing it is ER-complete. These results uncover a surprising
complexity reversal: while optimal correlated equilibria are computationally
simpler than optimal Nash in normal-form games, the opposite holds in
extensive-form games, where computing optimal correlated equilibria is provably
harder. Building on this line of inquiry, we also address a related question by
[VSF08], who introduced the notions of extensive-form correlated equilibrium
(EFCE) and agent-form correlated equilibrium (AFCE). They asked how difficult
the Threshold problem is for AFCE; we answer this question by proving that it
is NP-hard, even in two-player games without chance nodes. Complementing our
hardness results, we establish tight complexity classifications for the
Threshold problem across several correlated equilibrium concepts - including
EFCE, AFCE, normal-form coarse, extensive-form coarse, and agent-form coarse
correlated equilibria. For each of these solution concepts in multiplayer
stochastic extensive-form games with perfect recall, we prove NP-completeness
by providing matching NP upper bounds to the previously known hardness results.
Together, our results provide the most complete landscape to date for the
complexity of optimal equilibrium computation in extensive-form games.
Authors: Piotr Bacik, Joël Ouaknine, James Worrell
The Skolem Problem asks to determine whether a given linear recurrence
sequence (LRS) $\langle u_n \rangle_{n=0}^\infty$ over the integers has a zero
term, that is, whether there exists $n$ such that $u_n = 0$. Decidability of
the problem is open in general, with the most notable positive result being a
decision procedure for LRS of order at most 4.
In this paper we consider a bounded version of the Skolem Problem, in which
the input consists of an LRS $\langle u_n \rangle_{n=0}^\infty$ and a bound $N
\in \mathbb N$ (with all integers written in binary), and the task is to
determine whether there exists $n\in\{0,\ldots,N\}$ such that $u_n=0$. We give
a randomised algorithm for this problem that, for all $d\in \mathbb N$, runs in
polynomial time on the class of LRS of order at most $d$. As a corollary we
show that the (unrestricted) Skolem Problem for LRS of order at most 4 lies in
$\mathsf{coRP}$, improving the best previous upper bound of
$\mathsf{NP}^{\mathsf{RP}}$.
The running time of our algorithm is exponential in the order of the LRS -- a
dependence that appears necessary in view of the $\mathsf{NP}$-hardness of the
Bounded Skolem Problem. However, even for LRS of a fixed order, the problem
involves detecting zeros within an exponentially large range. For this, our
algorithm relies on results from $p$-adic analysis to isolate polynomially many
candidate zeros and then test in randomised polynomial time whether each
candidate is an actual zero by reduction to arithmetic-circuit identity
testing.
The Skolem Problem asks to determine whether a given linear recurrence
sequence (LRS) $\langle u_n \rangle_{n=0}^\infty$ over the integers has a zero
term, that is, whether there exists $n$ such that $u_n = 0$. Decidability of
the problem is open in general, with the most notable positive result being a
decision procedure for LRS of order at most 4.
In this paper we consider a bounded version of the Skolem Problem, in which
the input consists of an LRS $\langle u_n \rangle_{n=0}^\infty$ and a bound $N
\in \mathbb N$ (with all integers written in binary), and the task is to
determine whether there exists $n\in\{0,\ldots,N\}$ such that $u_n=0$. We give
a randomised algorithm for this problem that, for all $d\in \mathbb N$, runs in
polynomial time on the class of LRS of order at most $d$. As a corollary we
show that the (unrestricted) Skolem Problem for LRS of order at most 4 lies in
$\mathsf{coRP}$, improving the best previous upper bound of
$\mathsf{NP}^{\mathsf{RP}}$.
The running time of our algorithm is exponential in the order of the LRS -- a
dependence that appears necessary in view of the $\mathsf{NP}$-hardness of the
Bounded Skolem Problem. However, even for LRS of a fixed order, the problem
involves detecting zeros within an exponentially large range. For this, our
algorithm relies on results from $p$-adic analysis to isolate polynomially many
candidate zeros and then test in randomised polynomial time whether each
candidate is an actual zero by reduction to arithmetic-circuit identity
testing.
Authors: Mika Göös, Nathaniel Harms, Artur Riazanov
We exhibit an $n$-bit communication problem with a constant-cost randomized
protocol but which requires $n^{\Omega(1)}$ deterministic (or even
non-deterministic) queries to an Equality oracle. Therefore, even constant-cost
randomized protocols cannot be efficiently "derandomized" using Equality
oracles. This improves on several recent results and answers a question from
the survey of Hatami and Hatami (SIGACT News 2024). It also gives a
significantly simpler and quantitatively superior proof of the main result of
Fang, G\"o\"os, Harms, and Hatami ( STOC 2025), that constant-cost
communication does not reduce to the $k$-Hamming Distance hierarchy.
We exhibit an $n$-bit communication problem with a constant-cost randomized
protocol but which requires $n^{\Omega(1)}$ deterministic (or even
non-deterministic) queries to an Equality oracle. Therefore, even constant-cost
randomized protocols cannot be efficiently "derandomized" using Equality
oracles. This improves on several recent results and answers a question from
the survey of Hatami and Hatami (SIGACT News 2024). It also gives a
significantly simpler and quantitatively superior proof of the main result of
Fang, G\"o\"os, Harms, and Hatami ( STOC 2025), that constant-cost
communication does not reduce to the $k$-Hamming Distance hierarchy.
Authors: Prashanth Amireddy, Amik Raj Behera, Srikanth Srinivasan, Madhu Sudan
We consider random walks on ``balanced multislices'' of any ``grid'' that
respects the ``symmetries'' of the grid, and show that a broad class of such
walks are good spectral expanders. (A grid is a set of points of the form
$\mathcal{S}^n$ for finite $\mathcal{S}$, and a balanced multi-slice is the
subset that contains an equal number of coordinates taking every value in
$\mathcal{S}$. A walk respects symmetries if the probability of going from $u =
(u_1,\ldots,u_n)$ to $v = (v_1,\ldots,v_n)$ is invariant under simultaneous
permutations of the coordinates of $u$ and $v$.) Our main theorem shows that,
under some technical conditions, every such walk where a single step leads to
an almost $\mathcal{O}(1)$-wise independent distribution on the next state,
conditioned on the previous state, satisfies a non-trivially small singular
value bound.
We give two applications of our theorem to error-correcting codes: (1) We
give an analog of the Ore-DeMillo-Lipton-Schwartz-Zippel lemma for polynomials,
and junta-sums, over balanced multislices. (2) We also give a local
list-correction algorithm for $d$-junta-sums mapping an arbitrary grid
$\mathcal{S}^n$ to an Abelian group, correcting from a near-optimal
$(\frac{1}{|\mathcal{S}|^{d}} - \varepsilon)$ fraction of errors for every
$\varepsilon > 0$, where a $d$-junta-sum is a sum of (arbitrarily many)
$d$-juntas (and a $d$-junta is a function that depends on only $d$ of the $n$
variables).
Our proofs are obtained by exploring the representation theory of the
symmetric group and merging it with some careful spectral analysis.
We consider random walks on ``balanced multislices'' of any ``grid'' that
respects the ``symmetries'' of the grid, and show that a broad class of such
walks are good spectral expanders. (A grid is a set of points of the form
$\mathcal{S}^n$ for finite $\mathcal{S}$, and a balanced multi-slice is the
subset that contains an equal number of coordinates taking every value in
$\mathcal{S}$. A walk respects symmetries if the probability of going from $u =
(u_1,\ldots,u_n)$ to $v = (v_1,\ldots,v_n)$ is invariant under simultaneous
permutations of the coordinates of $u$ and $v$.) Our main theorem shows that,
under some technical conditions, every such walk where a single step leads to
an almost $\mathcal{O}(1)$-wise independent distribution on the next state,
conditioned on the previous state, satisfies a non-trivially small singular
value bound.
We give two applications of our theorem to error-correcting codes: (1) We
give an analog of the Ore-DeMillo-Lipton-Schwartz-Zippel lemma for polynomials,
and junta-sums, over balanced multislices. (2) We also give a local
list-correction algorithm for $d$-junta-sums mapping an arbitrary grid
$\mathcal{S}^n$ to an Abelian group, correcting from a near-optimal
$(\frac{1}{|\mathcal{S}|^{d}} - \varepsilon)$ fraction of errors for every
$\varepsilon > 0$, where a $d$-junta-sum is a sum of (arbitrarily many)
$d$-juntas (and a $d$-junta is a function that depends on only $d$ of the $n$
variables).
Our proofs are obtained by exploring the representation theory of the
symmetric group and merging it with some careful spectral analysis.
We define a class of $C^k$-regular surfaces, $k \geq 1$, \emph{tileable
surfaces}, that admit geometric tilings by a finite number of congruence
classes of tiles. We show how to construct many examples, and examine the
relationship with the well known tilings of the plane and sphere, as well as
monohedral polyhedral surfaces.
We define a class of $C^k$-regular surfaces, $k \geq 1$, \emph{tileable
surfaces}, that admit geometric tilings by a finite number of congruence
classes of tiles. We show how to construct many examples, and examine the
relationship with the well known tilings of the plane and sphere, as well as
monohedral polyhedral surfaces.
Authors: Matthias Lanzinger, Igor Razgon, Daniel Unterberger
We present the first fixed-parameter tractable (fpt) algorithms for precisely
determining several central hypergraph decomposition parameters, including
generalized hypertree width, fractional hypertree width, and adaptive width.
Despite the recognized importance of these measures in complexity theory,
databases, and constraint satisfaction, no exact fpt algorithms for any of them
had previously been known. Our results are obtained for hypergraph classes of
bounded rank and bounded degree.
Our approach extends a recent algorithm for treewidth (Boja\'ncyk &
Pilipczuk, LMCS 2022) utilizing monadic second-order (MSO) transductions.
Leveraging this framework, we overcome the significant technical hurdles
presented by hypergraphs, whose structural decompositions are technically much
more intricate than their graph counterparts.
We present the first fixed-parameter tractable (fpt) algorithms for precisely
determining several central hypergraph decomposition parameters, including
generalized hypertree width, fractional hypertree width, and adaptive width.
Despite the recognized importance of these measures in complexity theory,
databases, and constraint satisfaction, no exact fpt algorithms for any of them
had previously been known. Our results are obtained for hypergraph classes of
bounded rank and bounded degree.
Our approach extends a recent algorithm for treewidth (Boja\'ncyk &
Pilipczuk, LMCS 2022) utilizing monadic second-order (MSO) transductions.
Leveraging this framework, we overcome the significant technical hurdles
presented by hypergraphs, whose structural decompositions are technically much
more intricate than their graph counterparts.
Hypergraph $2$-colorability is one of the classical NP-hard problems. Person
and Schacht [SODA'09] designed a deterministic algorithm whose expected running
time is polynomial over a uniformly chosen $2$-colorable $3$-uniform
hypergraph. Lee, Molla, and Nagle recently extended this to $k$-uniform
hypergraphs for all $k\geq 3$. Both papers relied heavily on the regularity
lemma, hence their analysis was involved and their running time hid tower-type
constants.
Our first result in this paper is a new simple and elementary deterministic
$2$-coloring algorithm that reproves the theorems of Person-Schacht and
Lee-Molla-Nagle while avoiding the use of the regularity lemma. We also show
how to turn our new algorithm into a randomized one with average expected
running time of only $O(n)$.
Our second and main result gives what we consider to be the ultimate evidence
of just how easy it is to find a $2$-coloring of an average $2$-colorable
hypergraph. We define a coloring oracle to be an algorithm which, given vertex
$v$, assigns color red/blue to $v$ while inspecting as few edges as possible,
so that the answers to any sequence of queries to the oracle are consistent
with a single legal $2$-coloring of the input. Surprisingly, we show that there
is a coloring oracle that, on average, can answer every vertex query in time
$O(1)$.
Hypergraph $2$-colorability is one of the classical NP-hard problems. Person
and Schacht [SODA'09] designed a deterministic algorithm whose expected running
time is polynomial over a uniformly chosen $2$-colorable $3$-uniform
hypergraph. Lee, Molla, and Nagle recently extended this to $k$-uniform
hypergraphs for all $k\geq 3$. Both papers relied heavily on the regularity
lemma, hence their analysis was involved and their running time hid tower-type
constants.
Our first result in this paper is a new simple and elementary deterministic
$2$-coloring algorithm that reproves the theorems of Person-Schacht and
Lee-Molla-Nagle while avoiding the use of the regularity lemma. We also show
how to turn our new algorithm into a randomized one with average expected
running time of only $O(n)$.
Our second and main result gives what we consider to be the ultimate evidence
of just how easy it is to find a $2$-coloring of an average $2$-colorable
hypergraph. We define a coloring oracle to be an algorithm which, given vertex
$v$, assigns color red/blue to $v$ while inspecting as few edges as possible,
so that the answers to any sequence of queries to the oracle are consistent
with a single legal $2$-coloring of the input. Surprisingly, we show that there
is a coloring oracle that, on average, can answer every vertex query in time
$O(1)$.
Heegaard splittings provide a natural representation of closed 3-manifolds by
gluing handlebodies along a common surface. These splittings can be
equivalently given by two finite sets of meridians lying in the surface, which
define a Heegaard diagram. We present a data structure to effectively represent
Heegaard diagrams as normal curves with respect to triangulations of a surface
of complexity measured by the space required to express the normal coordinates'
vectors in binary. This structure can be significantly more compressed than
triangulations of 3-manifolds, given exponential gains for some families. Even
with this succinct definition of complexity, we establish polynomial time
algorithms for comparing and manipulating diagrams, performing stabilizations,
detecting trivial stabilizations and reductions, and computing topological
invariants of the underlying manifolds, such as their fundamental and first
homology groups. We also contrast early implementations of our techniques with
standard software programs for 3-manifolds, achieving better precision and
faster algorithms for the average cases and exponential gains in speed for some
particular presentations of the inputs.
Heegaard splittings provide a natural representation of closed 3-manifolds by
gluing handlebodies along a common surface. These splittings can be
equivalently given by two finite sets of meridians lying in the surface, which
define a Heegaard diagram. We present a data structure to effectively represent
Heegaard diagrams as normal curves with respect to triangulations of a surface
of complexity measured by the space required to express the normal coordinates'
vectors in binary. This structure can be significantly more compressed than
triangulations of 3-manifolds, given exponential gains for some families. Even
with this succinct definition of complexity, we establish polynomial time
algorithms for comparing and manipulating diagrams, performing stabilizations,
detecting trivial stabilizations and reductions, and computing topological
invariants of the underlying manifolds, such as their fundamental and first
homology groups. We also contrast early implementations of our techniques with
standard software programs for 3-manifolds, achieving better precision and
faster algorithms for the average cases and exponential gains in speed for some
particular presentations of the inputs.
Authors: Lingxiao Huang, Zhenyu Jiang, Yi Li, Xuan Wu
This paper considers coresets for the robust $k$-medians problem with $m$
outliers, and new constructions in various metric spaces are obtained.
Specifically, for metric spaces with a bounded VC or doubling dimension $d$,
the coreset size is $O(m) + \tilde{O}(kd\varepsilon^{-2})$, which is optimal up
to logarithmic factors. For Euclidean spaces, the coreset size is
$O(m\varepsilon^{-1}) +
\tilde{O}(\min\{k^{4/3}\varepsilon^{-2},k\varepsilon^{-3}\})$, improving upon a
recent result by Jiang and Lou (ICALP 2025). These results also extend to
robust $(k,z)$-clustering, yielding, for VC and doubling dimension, a coreset
size of $O(m) + \tilde{O}(kd\varepsilon^{-2z})$ with the optimal linear
dependence on $m$. This extended result improves upon the earlier work of Huang
et al. (SODA 2025). The techniques introduce novel dataset decompositions,
enabling chaining arguments to be applied jointly across multiple components.
This paper considers coresets for the robust $k$-medians problem with $m$
outliers, and new constructions in various metric spaces are obtained.
Specifically, for metric spaces with a bounded VC or doubling dimension $d$,
the coreset size is $O(m) + \tilde{O}(kd\varepsilon^{-2})$, which is optimal up
to logarithmic factors. For Euclidean spaces, the coreset size is
$O(m\varepsilon^{-1}) +
\tilde{O}(\min\{k^{4/3}\varepsilon^{-2},k\varepsilon^{-3}\})$, improving upon a
recent result by Jiang and Lou (ICALP 2025). These results also extend to
robust $(k,z)$-clustering, yielding, for VC and doubling dimension, a coreset
size of $O(m) + \tilde{O}(kd\varepsilon^{-2z})$ with the optimal linear
dependence on $m$. This extended result improves upon the earlier work of Huang
et al. (SODA 2025). The techniques introduce novel dataset decompositions,
enabling chaining arguments to be applied jointly across multiple components.
Authors: Yasushi Kawase, Kazuhisa Makino, Vinh Long Phan, Hanna Sumita
In this study, we investigate a scheduling problem on identical machines in
which jobs require initial setup before execution. We assume that an algorithm
can dynamically form a batch (i.e., a collection of jobs to be processed
together) from the remaining jobs. The setup time is modeled as a known
monotone function of the set of jobs within a batch, while the execution time
of each job remains unknown until completion. This uncertainty poses
significant challenges for minimizing the makespan. We address these challenges
by considering two scenarios: each job batch must be assigned to a single
machine, or a batch may be distributed across multiple machines. For both
scenarios, we analyze settings with and without preemption. Across these four
settings, we design online algorithms that achieve asymptotically optimal
competitive ratios with respect to both the number of jobs and the number of
machines.
In this study, we investigate a scheduling problem on identical machines in
which jobs require initial setup before execution. We assume that an algorithm
can dynamically form a batch (i.e., a collection of jobs to be processed
together) from the remaining jobs. The setup time is modeled as a known
monotone function of the set of jobs within a batch, while the execution time
of each job remains unknown until completion. This uncertainty poses
significant challenges for minimizing the makespan. We address these challenges
by considering two scenarios: each job batch must be assigned to a single
machine, or a batch may be distributed across multiple machines. For both
scenarios, we analyze settings with and without preemption. Across these four
settings, we design online algorithms that achieve asymptotically optimal
competitive ratios with respect to both the number of jobs and the number of
machines.
Permutation patterns and pattern avoidance are central, well-studied concepts
in combinatorics and computer science. Given two permutations $\tau$ and $\pi$,
the pattern matching problem (PPM) asks whether $\tau$ contains $\pi$. This
problem arises in various contexts in computer science and statistics and has
been studied extensively in exact-, parameterized-, approximate-,
property-testing- and other formulations.
In this paper, we study pattern matching in a \emph{streaming setting}, when
the input $\tau$ is revealed sequentially, one element at a time. There is
extensive work on the space complexity of various statistics in streams of
integers. The novelty of our setting is that the input stream is \emph{a
permutation}, which allows inferring some information about future inputs. Our
algorithms crucially take advantage of this fact, while existing lower bound
techniques become difficult to apply.
We show that the complexity of the problem changes dramatically depending on
the pattern~$\pi$. The space requirement is: $\Theta(k\log{n})$ for the
monotone patterns $\pi = 12\dots k$, or $\pi = k\dots21$, $O(\sqrt{n\log{n}})$
for $\pi \in \{312,132\}$, $O(\sqrt{n} \log n)$ for $\pi \in \{231,213\}$, and
$\widetilde{\Theta}_{\pi}(n)$ for all other $\pi$. If $\tau$ is an arbitrary
sequence of integers (not necessary a permutation), we show that the complexity
is $\widetilde{\Theta}_{\pi}(n)$ in all except the first (monotone) cases.
Permutation patterns and pattern avoidance are central, well-studied concepts
in combinatorics and computer science. Given two permutations $\tau$ and $\pi$,
the pattern matching problem (PPM) asks whether $\tau$ contains $\pi$. This
problem arises in various contexts in computer science and statistics and has
been studied extensively in exact-, parameterized-, approximate-,
property-testing- and other formulations.
In this paper, we study pattern matching in a \emph{streaming setting}, when
the input $\tau$ is revealed sequentially, one element at a time. There is
extensive work on the space complexity of various statistics in streams of
integers. The novelty of our setting is that the input stream is \emph{a
permutation}, which allows inferring some information about future inputs. Our
algorithms crucially take advantage of this fact, while existing lower bound
techniques become difficult to apply.
We show that the complexity of the problem changes dramatically depending on
the pattern~$\pi$. The space requirement is: $\Theta(k\log{n})$ for the
monotone patterns $\pi = 12\dots k$, or $\pi = k\dots21$, $O(\sqrt{n\log{n}})$
for $\pi \in \{312,132\}$, $O(\sqrt{n} \log n)$ for $\pi \in \{231,213\}$, and
$\widetilde{\Theta}_{\pi}(n)$ for all other $\pi$. If $\tau$ is an arbitrary
sequence of integers (not necessary a permutation), we show that the complexity
is $\widetilde{\Theta}_{\pi}(n)$ in all except the first (monotone) cases.
We study the $k$-edge connectivity problem on undirected graphs in the
distributed sketching model, where we have $n$ nodes and a referee. Each node
sends a single message to the referee based on its 1-hop neighborhood in the
graph, and the referee must decide whether the graph is $k$-edge connected by
taking into account the received messages.
We present the first lower bound for deciding a graph connectivity problem in
this model with a deterministic algorithm. Concretely, we show that the worst
case message length is $\Omega( k )$ bits for $k$-edge connectivity, for any
super-constant $k = O(\sqrt{n})$. Previously, only a lower bound of $\Omega(
\log^3 n )$ bits was known for ($1$-edge) connectivity, due to Yu (SODA 2021).
In fact, our result is the first super-polylogarithmic lower bound for a
connectivity decision problem in the distributed graph sketching model.
To obtain our result, we introduce a new lower bound graph construction, as
well as a new 3-party communication complexity problem that we call
UniqueOverlap. As this problem does not appear to be amenable to reductions to
existing hard problems such as set disjointness or indexing due to correlations
between the inputs of the three players, we leverage results from
cross-intersecting set families to prove the hardness of UniqueOverlap for
deterministic algorithms. Finally, we obtain the sought lower bound for
deciding $k$-edge connectivity via a novel simulation argument that, in
contrast to previous works, does not introduce any probability of error and
thus works for deterministic algorithms.
We study the $k$-edge connectivity problem on undirected graphs in the
distributed sketching model, where we have $n$ nodes and a referee. Each node
sends a single message to the referee based on its 1-hop neighborhood in the
graph, and the referee must decide whether the graph is $k$-edge connected by
taking into account the received messages.
We present the first lower bound for deciding a graph connectivity problem in
this model with a deterministic algorithm. Concretely, we show that the worst
case message length is $\Omega( k )$ bits for $k$-edge connectivity, for any
super-constant $k = O(\sqrt{n})$. Previously, only a lower bound of $\Omega(
\log^3 n )$ bits was known for ($1$-edge) connectivity, due to Yu (SODA 2021).
In fact, our result is the first super-polylogarithmic lower bound for a
connectivity decision problem in the distributed graph sketching model.
To obtain our result, we introduce a new lower bound graph construction, as
well as a new 3-party communication complexity problem that we call
UniqueOverlap. As this problem does not appear to be amenable to reductions to
existing hard problems such as set disjointness or indexing due to correlations
between the inputs of the three players, we leverage results from
cross-intersecting set families to prove the hardness of UniqueOverlap for
deterministic algorithms. Finally, we obtain the sought lower bound for
deciding $k$-edge connectivity via a novel simulation argument that, in
contrast to previous works, does not introduce any probability of error and
thus works for deterministic algorithms.
We study the problem of sampling from a distribution $\mu$ with density
$\propto e^{-V}$ for some potential function $V:\mathbb R^d\to \mathbb R$ with
query access to $V$ and $\nabla V$. We start with the following standard
assumptions:
(1) The potential function $V$ is $L$-smooth.
(2) The second moment $\mathbf{E}_{X\sim \mu}[\|X\|^2]\leq M$.
Recently, He and Zhang (COLT'25) showed that the query complexity of sampling
from such distributions is at least
$\left(\frac{LM}{d\epsilon}\right)^{\Omega(d)}$ where $\epsilon$ is the desired
accuracy in total variation distance, and the Poincar\'e constant can be
arbitrarily large.
Meanwhile, another common assumption in the study of diffusion based samplers
(see e.g., the work of Chen, Chewi, Li, Li, Salim and Zhang (ICLR'23))
strengthens the smoothness condition (1) to the following:
(1*) The potential function of *every* distribution along the
Ornstein-Uhlenbeck process starting from $\mu$ is $L$-smooth.
We show that under the assumptions (1*) and (2), the query complexity of
sampling from $\mu$ can be $\mathrm{poly}(L,d)\cdot
\left(\frac{Ld+M}{\epsilon^2}\right)^{\mathcal{O}(L+1)}$, which is polynomial
in $d$ and $\frac{1}{\epsilon}$ when $L=\mathcal{O}(1)$ and
$M=\mathrm{poly}(d)$. This improves the algorithm with quasi-polynomial query
complexity developed by Huang et al. (COLT'24). Our results imply that the
seemly moderate strengthening of the smoothness condition (1) to (1*) can lead
to an exponential gap in the query complexity of sampling algorithms.
Moreover, we show that together with the assumption (1*) and the stronger
moment assumption that $\|X\|$ is $\lambda$-sub-Gaussian for $X\sim\mu$, the
Poincar\'e constant of $\mu$ is at most $\mathcal{O}(\lambda)^{2(L+1)}$. As an
application of our technique, we obtain improved estimate of the Poincar\'e
constant for mixture of Gaussians with the same covariance.
We study the problem of sampling from a distribution $\mu$ with density
$\propto e^{-V}$ for some potential function $V:\mathbb R^d\to \mathbb R$ with
query access to $V$ and $\nabla V$. We start with the following standard
assumptions:
(1) The potential function $V$ is $L$-smooth.
(2) The second moment $\mathbf{E}_{X\sim \mu}[\|X\|^2]\leq M$.
Recently, He and Zhang (COLT'25) showed that the query complexity of sampling
from such distributions is at least
$\left(\frac{LM}{d\epsilon}\right)^{\Omega(d)}$ where $\epsilon$ is the desired
accuracy in total variation distance, and the Poincar\'e constant can be
arbitrarily large.
Meanwhile, another common assumption in the study of diffusion based samplers
(see e.g., the work of Chen, Chewi, Li, Li, Salim and Zhang (ICLR'23))
strengthens the smoothness condition (1) to the following:
(1*) The potential function of *every* distribution along the
Ornstein-Uhlenbeck process starting from $\mu$ is $L$-smooth.
We show that under the assumptions (1*) and (2), the query complexity of
sampling from $\mu$ can be $\mathrm{poly}(L,d)\cdot
\left(\frac{Ld+M}{\epsilon^2}\right)^{\mathcal{O}(L+1)}$, which is polynomial
in $d$ and $\frac{1}{\epsilon}$ when $L=\mathcal{O}(1)$ and
$M=\mathrm{poly}(d)$. This improves the algorithm with quasi-polynomial query
complexity developed by Huang et al. (COLT'24). Our results imply that the
seemly moderate strengthening of the smoothness condition (1) to (1*) can lead
to an exponential gap in the query complexity of sampling algorithms.
Moreover, we show that together with the assumption (1*) and the stronger
moment assumption that $\|X\|$ is $\lambda$-sub-Gaussian for $X\sim\mu$, the
Poincar\'e constant of $\mu$ is at most $\mathcal{O}(\lambda)^{2(L+1)}$. As an
application of our technique, we obtain improved estimate of the Poincar\'e
constant for mixture of Gaussians with the same covariance.