Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Friday, July 18

TR25-099 | The Algebraic Cost of a Boolean Sum | Ian Orzel, Srikanth Srinivasan, Sébastien Tavenas, Amir Yehudayoff

from ECCC Papers

It is a well-known fact that the permanent polynomial is complete for the complexity class VNP, and it is largely suspected that the determinant does not share this property, despite its similar expression. We study the question of why the VNP-completeness proof of the permanent fails for the determinant. We isolate three fundamental properties that are sufficient to prove a polynomial sequence is VNP-hard, of which two are shared by both the permanent and the determinant. We proceed to show that the permanent satisfies the third property, which we refer to as the ``cost of a boolean sum," while the determinant does not, showcasing the fundamental difference between the polynomial families. We further note that this differentiation also applies in the border complexity setting and that our results apply for counting complexity.

It is a well-known fact that the permanent polynomial is complete for the complexity class VNP, and it is largely suspected that the determinant does not share this property, despite its similar expression. We study the question of why the VNP-completeness proof of the permanent fails for the determinant. We isolate three fundamental properties that are sufficient to prove a polynomial sequence is VNP-hard, of which two are shared by both the permanent and the determinant. We proceed to show that the permanent satisfies the third property, which we refer to as the ``cost of a boolean sum," while the determinant does not, showcasing the fundamental difference between the polynomial families. We further note that this differentiation also applies in the border complexity setting and that our results apply for counting complexity.

Revisiting scaling laws via the z-transform

from Francis Bach

In the last few years, we have seen a surge of empirical and theoretical works about “scaling laws”, whose goals are to characterize the performance of learning methods based on various problem parameters (e.g., number of observations and parameters, or amount of compute). From a theoretical point of view, this marks a renewed interest in...

In the last few years, we have seen a surge of empirical and theoretical works about “scaling laws”, whose goals are to characterize the performance of learning methods based on various problem parameters (e.g., number of observations and parameters, or amount of compute). From a theoretical point of view, this marks a renewed interest in asymptotic equivalents—something the machine learning community had mostly moved away from (and, let’s be honest, kind of looked down on) in favor of non-asymptotic bounds. The two types of bounds have their own pros and cons, and I value both of them, but on a personal note, after having spent close to twenty years in proving (sometimes tedious) non-asymptotic results, proving asymptotic results is particularly rewarding, especially when you experience that “aha!” moment where the empirical behavior aligns perfectly with the derived expression (see some examples below).

Scaling laws for quadratic optimization require aggregating the effects of all eigenvalues of the Hessian matrix and not considering only the worst one. This was done with Laplace’s method in an earlier post for gradient descent and partially for Nesterov acceleration. This blog post explores similar asymptotic scaling laws and will consider stochastic algorithms, but there is more to it.

I will use a classical tool from applied mathematics, electrical engineering, and computer science: the z-transform (a.k.a. generating function) of a discrete sequence. My goal in this blog post is to present the essentials of the method. For more details, see my recent preprint [1], with lots of potential follow-ups. Note that the z-transform has already been used in the context of stochastic gradient descent for stability analysis [15, 20] and for deriving convergence rates with constant momentum [22].

z-transform method and final value theorem

From Abel’s theorem to the finite value theorem. Given a sequence \((b_k)_{k \geqslant 0}\), its z-transform is the function \(B\) defined for a complex argument \(z \in \mathbb{C}\), as \(B(z) = \sum_{k=0}^\infty b_k z^k\) (note the change of convention where \(z^{-1}\) is replaced by \(z\)). The behavior of the series defined by \((b_k)_{k \geqslant 0}\) can be characterized by the behavior of \(B(z)\) for \(z\) tending to \(1\) while remaining in \([0,1)\). This is the classical Abel’s theorem: when the series defined by \((b_k)_{k \geqslant 0}\) is convergent, then $$\sum_{k=0}^{+\infty} b_k = \lim_{z \to 1^-} B(z).$$

When applied to \(b_k = a_k\, – a_{k-1}\) for \(k \geqslant 1\), with \(b_0 = a_0\), then we have \(\sum_{\ell=1}^k b_\ell = a_k\), and \(B(z) = (1\, – z) A(z)\) where \(A\) is the z-transfom of \((a_k)_{k \geqslant 0}\). We then obtain the final value theorem, which states $$ \lim_{z \to 1^-} \ ( 1 \, – z) A(z) = \lim_{k \to +\infty} a_k.$$

From limits to asymptotic equivalents. In order to obtain equivalents of the sequence \((a_k)_{k \geqslant 0}\) (and not simply its limit), we can use two additional properties:

  • the z-transform of \(((k+1) a_{k+1})_{k \geqslant 0}\) is \(A'(z)\), leading to $$ \lim_{z \to 1^-} \ ( 1 \, – z) A'(z)= \lim_{k \to +\infty} k a_k.$$
  • To treat non-integer powers, we have the following identity, valid for any real \(\nu > 0\), $$ \lim_{z \to 1^-} \ ( 1 \,- z)^{\nu} A(z) = \Gamma(\nu) \cdot \lim_{k \to +\infty} k^{1-\nu} a_k,$$ where \(\Gamma\) is the Gamma function. A classical example is \(a_k = \frac{1}{k!} \nu(\nu+1) \cdots (\nu+k-1) = \frac{ \Gamma(k+\nu)}{\Gamma(\nu)\Gamma(k+1)} \sim \frac{k^{\nu-1}}{\Gamma(\nu)}\), for which \(A(z) = (1-z)^{-\nu}\).

Combining the last two properties (with an extension to higher-order derivatives), when the two limits exist, we have: $$ \lim_{z \to 1^-} \ ( 1 \, – z)^{\nu} A^{(\mu)}(z) = \Gamma(\nu)\cdot \lim_{k \to +\infty} k^{1+\mu-\nu} a_k.$$ Therefore, the z-transform method consists in deriving asymptotics of the z-transform or its derivatives around \(z = 1^-\). Before showing why this is a simplification, some words on sufficient conditions for the equivalence to hold, through a sequence of so-called “Tauberian” theorems.

Tauberian theory. In order to show that limits and asymptotic equivalents exist, several options are available. When only considering real-valued arguments for the z-transforms, various sufficient conditions have been derived from the work of Hardy and Littlewood [2] (the book from Jacob Korevaar [3] is a great source of detailed results). In a nutshell, oscillating sequences such as \(a_k = (-1)^k\) are the ones where the limits are not equal (or may not exist), and a classical sufficient condition is that there exists a constant \(c\) such that for all \(k \geqslant 1\), \(a_k \, – a_{k-1} \leqslant c / k^{2+\mu – \nu}\) (in particular, with \(c = 0\), being decreasing is a sufficient condition). Alternative frameworks can also be considered using complex analysis, which only depends on property of the z-transform \(A(z)\) for complex \(z\) [4, 5].

Properties of the z-transform. What makes the z-transform so versatile is a suite of properties that link modifications of the sequence \((a_k)_{k \geqslant 0}\) to its z-transform \(A(z)\) (see more details in [1, 6]). The most important ones are:

  • Shift: If \(A(z)\) is the z-transform of the sequence \((a_k)_{k \geqslant 0}\), then \(z A(z)\) is the z-transform of the sequence \((0,a_0,a_1,\dots)\), that is, shifted by 1. This implies that sequences that satisfy a linear difference equation lead to rational z-transforms.
  • Multiplication by a polynomial: If \(A\) is the z-transform of \((a_k)_{k \geqslant 0}\), then \(z \mapsto z A'(z)\) is the \(z\)-transform of \((k a_k)_{k \geqslant 0}\). Thus, when we have a difference equation with rational coefficients (in \(k\)), this leads to an ordinary differential equation for the z-transform.

Classical use of the z-transform method to derive asymptotic equivalents. The z-transform method is a classical tool, with already many applications, e.g., in combinatorics, analysis of recurrence relations, queuing theory, signal processing, and control (see [6, 7]). In this blog post, I will focus on optimization of quadratic functions.

Gradient descent and spectral dimension

Like in an earlier post, we consider an idealized quadratic optimization problem in infinite dimension, which can be rewritten as the minimization of $$F(\eta) = \frac{1}{2} \langle \eta-\eta_\ast, H ( \eta-\eta_\ast)\rangle, $$ where \(\eta_\ast\) is a global optimum of \(F\) and \(H\) the positive semi-definite Hessian operator. Gradient descent leads to the recursion $$ \eta_k = \eta_{k-1} \, – \gamma F'(\eta_{k-1}) = \eta_{k-1} \, – \gamma H ( \eta_{k-1} \, – \eta_\ast), $$ where \(\gamma > 0\) is a step-size which we choose so that the operator norm of \(\gamma H\) is less than one (e.g., \(\gamma \leqslant 1/L\), where \(L\) is the largest eigenvalue of the operator \(H\)). With \(\theta_k = \eta_k \, – \eta_\ast\), this leads to the following linear iteration $$ \theta_k = (I \, – \gamma H) \theta_{k-1} = ( I \, – \gamma H)^k \delta, $$ where \(\delta = \theta_0 = \eta_0\, – \eta_\ast\).

Thus, given a spectral decomposition \(\gamma H = \sum_{i=1}^{+\infty} \lambda_i u_i \otimes u_i\) with eigenvalues \((\lambda_i)_{i \geqslant 1}\) in \([0,1]\) and eigenvectors \((u_i)_{i \geqslant 1}\), we get the following performance criterion $$ a_k = \frac{1}{2\gamma} \sum_{i=1}^{+\infty} \lambda_i (1 \, – \lambda_i)^{2k} \langle u_i,\delta \rangle^2 = \int_{0}^1 (1 \, – \lambda)^{2k} d\sigma(\lambda), $$ where $$ d\sigma(\lambda) = \frac{1}{2\gamma} \sum_{i=1}^{+\infty} \lambda_i \langle u_i,\delta \rangle^2 {\rm Dirac}(\lambda|\lambda_i) $$
is a weighted spectral measure, as defined in [8] in the gossip context (see this earlier post), with a z-transform equal to $$ A(z) = \sum_{k=0}^{+\infty} z^k \int_0^{1 } (1-\lambda)^{2k} d\sigma(\lambda)
= \int_0^{1 } \frac{1}{1 \, – z (1-\lambda)^{2}} d\sigma(\lambda), $$ which is a rational function in \(z\) and \(\lambda\). It can thus be re-written as, using a partial function decomposition in the variable \(\lambda\), as $$A(z) = \frac{1}{2 \sqrt{z} } \int_0^{1 } \frac{d\sigma(\lambda) }{ \lambda + z^{-1/2} \, – 1 } – \frac{1}{2\sqrt{z}} \int_0^{1 } \frac{d\sigma(\lambda) }{ \lambda\, – z^{-1/2} – 1 }.$$ We thus get $$A(z) = \frac{1}{2 \sqrt{z} } S( z^{-1/2}-1 ) – \frac{1}{2 \sqrt{z}}S ( – z^{-1/2}-1) ,$$
where \(\displaystyle S(u) = \int_0^1 \! \frac{d\sigma(\lambda)}{\lambda + u}\) is the Stieltjes transform of the measure \(\sigma\) (see Chapter 8 of [10]), a classical tool in random matrix theory. We see above that what matters in the equivalent of \(S(u)\) around \(u=0\) (since the value at \(-2\) is bounded).

The spectral dimension defined by [8] corresponds to a measure whose behavior around zero has density \(c \lambda^{\omega \, – 1}\), or in almost equivalent terms so that \(S(u) \sim c \Gamma(\omega) \Gamma(1-\omega) u^{\omega\, -1}\) around \(u=0\), with a similar expansion for derivatives. This leads directly to a an explicit scaling law in \({1}/{k^\omega}\) which exactly recovers an earlier post (which was using Laplace method) with a different proof.

The spectral dimension depends explicitly on problem parameters:

  • For least-squares regression, as described in an earlier post and in many papers, this corresponds to \(\lambda_i \sim L / i^\alpha\) (“capacity” conditions) and \(\langle \delta, u_i \rangle \sim \Delta / i^{\beta/2}\) (“source” conditions), leading to \(\omega = \frac{\beta-1}{\alpha} + 1 \) assumed to be positive, and \(c = \frac{L \Delta^2 }{2 \alpha (\gamma L)^{\frac{\beta-1}{\alpha}+1}}\). The equivalent is then $$ a_ k \sim c \frac{\Gamma(\omega)}{(2k)^{\omega}},$$ with nice connections with non-asymptotic bounds for the same problem.
  • For gossip, as shown in [8], we get that \(\omega\) is the underlying dimension of the set on which local messages are sent.

Summary. Linear iterations lead to rational z-transforms along all eigenvectors, and then asymptotics of the Stieltjes transform (based on the spectral dimension) can be used after partial function decomposition. For gradient descent (a simple linear iteration with constant coefficients), this can be easily achieved in other ways. However, for time-dependent coefficients (Nesterov acceleration) and for stochastic iterations, it will provide a new direct way that avoids lengthy computations.

Nesterov acceleration vs. heavy-ball

A classical way to accelerate the convergence of the gradient iteration is to use an extrapolation step due to Nesterov [11]. This can be formulated with two sequences (and a first-order recursion) as:
$$ \begin{array}{rcl} \eta_{k+1} & = & \zeta_{k} \, – \gamma F'(\zeta_{k}) \\ \zeta_{k+1} & = & \eta_{k+1} + ( 1 \, – \tau_{k+1}) ( \eta_{k+1} \, – \eta_k), \end{array} $$ and with a single sequence (and a second-order recursion) as $$ \eta_{k+1} = \eta_{k} + ( 1 \, – \tau_{k}) ( \eta_{k}\, – \eta_{k-1})\, – \gamma F’ \big[ \eta_{k} + ( 1 \,- \tau_{k}) ( \eta_{k} \, – \eta_{k-1})\big]. $$

In the original formulation with convex functions, \(\tau_k\) is chosen asymptotically proportional to \(2\rho/(k+1)\) with \(\rho = 3/2\). For quadratic functions, [9] argues that the choice \(\rho = 1\) is more natural as it leads to a simple rescaled reformulation, and [12, 13] consider a general \(\rho\). We now show that the z-transform allows us to analyze all integers \(\rho\), with a conjecture that is empirically valid for all \(\rho >0\).

With \(\theta_k = \eta_k\, – \eta_\ast\), we obtain the following recursion for quadratic functions $$ \theta_{k+1} = ( I- \gamma H) \bigg[ \theta_{k} + \Big( 1\, – \frac{ 2\rho }{k+2\rho-1} \Big) ( \theta_{k} \, – \theta_{k-1}) \bigg]. $$ I show in [1] how the performance measure leads to an explicit rational function in \((z,\lambda)\) of order \(2 \rho\), obtained through differentiations of the z-transform (to take into account the rational depence of \(\tau_k\)). For \(\rho=1\), we can easily get by hand a partial function decomposition, while for larger integer \(\rho\), this can be readily done by symbolic computation tools such as Mathematica. Overall, the final equivalents for \(a_k = \frac{1}{2} \langle \eta_k \, – \eta_\ast, H ( \eta_k \, – \eta_\ast) \rangle\) are essentially (up to logarithmic terms) proportional to \({1}/{k^{\min\{2\omega, \omega + \rho\}}}\). More precisely we get totally explicit equivalents:

  • \(\displaystyle a_k \sim c \frac{\Gamma(2 \rho)^2}{\Gamma(\rho)^2} \cdot \frac{\Gamma(\rho-\omega) \Gamma(\omega)}{\Gamma(4\rho-1-2\omega)} \frac{\Gamma(2\rho-1/2-\omega)}{\Gamma(\rho+1/2-\omega)} \frac{2^{2\rho-1}}{4^\omega } \cdot \frac{1}{k^{ 2\omega}}\) for \(\omega \in (0,\rho)\).
  • \(\displaystyle a_k \sim c \frac{\Gamma(2 \rho)^2}{\Gamma(\rho)^2} \cdot\frac{1}{2^{2\rho-1}} \cdot \frac{ \log k}{k^{2\rho}}\) for \(\omega = \rho\).
  • \(\displaystyle a_k \sim c \frac{\Gamma(2 \rho)^2}{\Gamma(\rho)^2} \cdot \frac{1}{2^{2\rho-1}} \Gamma(\omega-\rho)\cdot \frac{1}{k^{\omega+\rho}}\) for \(\omega > \rho\).

And amazingly, they do match in our experiments even beyond integer \(\rho\), as shown below.

Nesterov acceleration for \(\omega = 1\) and several \(\rho\)’s. Note the classical vanishing oscillations for small \(\rho\). Moreover, the best performance is achieved around \(\rho \approx \omega\).

What about heavy-ball? The heavy-ball method is very similar to Nesterov acceleration, with the following iteration $$ \eta_{k+1} = \eta_{k} \, – \gamma F'(\eta_k) + (1-\tau_{k+1}) ( \eta_k \,- \eta_{k-1}),$$ leading to, with the same notations, to $$\theta_{k+1} = \Big( I- \frac{k}{k+2\rho-1} \gamma H\Big) \theta_{k} + \Big( 1 \, – \frac{ 2\rho }{k+2\rho-1} \Big) ( \theta_{k} \, – \theta_{k-1}).$$ The associated z-transform is also the integral with respect to the spectral measure of a rational function, leading to an asymptotic equivalent around \(z=1^-\). Then, one can plot the actual performance and compare to the equivalent, as shown below.

Heavy-ball method with \(\rho=1\) for two values of the spectral dimension \(\omega\). Top: performance \(a_k\) vs. equivalent \(\bar{a}_k\), bottom: ratio.

What’s happening? The two do not match for \(\omega\) large enough (right plot above). It turns out that oscillations do not vanish, and the sufficient Tauberian conditions are not satisfied. What can then be shown in terms of asymptotic equivalents remains open.

Stochastic gradient descent (a.k.a. least-mean-squares algorithm)

We now consider minimizing the expexted risk for a linear predictor with square loss: $$ F(\eta) = \frac{1}{2}\mathbb{E} \big[ ( y \, – \langle x, \eta \rangle )^2 \big]$$ (keeping in mind that we can use a feature vector). Single-pass stochastic gradient descent (SGD) leads to the iteration $$\eta_k = \eta_{k-1}\, – \gamma ( \langle x_k,\eta_{k-1} \rangle \,- y_k) x_k, $$ with \((x_k,y_k)\) an independent sample from the distribution defining \(F\). We can write it using a minimizer \(\eta_\ast\) of \(F\) as $$\eta_k -\eta_\ast =( I \, – \gamma x_k \otimes x_k) (\eta_{k-1} \, – \eta_\ast) + \gamma \varepsilon_k x_k, $$ where \(\varepsilon_k = y_k \, – \langle x_k, \eta_\ast \rangle\) is the optimal residual. We denote \(\theta_k = \eta_k \, – \eta_\ast\). In this context, SGD is referred to as the least-mean-squares (LMS) algorithm [14, 15, 16, 17].

The simplest but incomplete analysis uses that \(\mathbb{E} [ \varepsilon_k x_k ] =0\) (which is the optimality condition characterizing \(\varepsilon_k\)), to get: $$ \mathbb{E}[\theta_k] = ( I \, – \gamma H) \mathbb{E}[\theta_{k-1}],$$ leading to the classical linear iteration of gradient descent. However, it does not lead to any result on the performance measure \(\frac{1}{2} \mathbb{E} [ \langle \theta_k, H \theta_k \rangle] = \mathbb{E} [ F(\eta_k)\, – F(\eta_\ast) ]\).

Traditionally, there are two terms in the analysis: “bias” and “variance” (see a thorough discussion in [27]). For simplicity, let’s assume that \(\varepsilon_k\) is independent of \(x_k\), with \(\mathbb{E} [ \varepsilon_k]=0\) and \(\mathbb{E} [ \varepsilon_k^2]=\sigma^2\) (not to be confused with the notation for the spectral measure defined above). We then consider the operator $$\Theta_k = (\eta_k \, – \eta_\ast)\otimes (\eta_k \,- \eta_\ast) = \theta_k \otimes \theta_k, $$ for which we have $$F(\eta_k) \,- F(\eta_\ast) = \frac{1}{2} \langle \Theta_k,H \rangle = \frac{1}{2}{\rm tr}( \Theta_k H), $$ and $$ \mathbb{E} [ \Theta_k | \Theta_{k-1}] = \mathbb{E} \big[ ( I \, – \gamma x_k \otimes x_k) \otimes ( I \, – \gamma x_k \otimes x_k) \big] \Theta_{k-1} + \gamma^2 \sigma^2 H,$$ leading to: $$ \mathbb{E} [ \Theta_k ] = \big( I\, – \gamma H \otimes I \, – \gamma I \otimes H\, + \gamma^2 \mathbb{E} [ x \otimes x \otimes x \otimes x] \big) \mathbb{E} [ \Theta_{k-1} ] \, + \gamma^2 \sigma^2 H.$$

We consider the operator (defined on self-adjoint operators) $$ T = H \otimes I \, + I \otimes H \, -\, \gamma \mathbb{E} [ x \otimes x \otimes x \otimes x],$$ so that $$\tag{1} \mathbb{E} [ \Theta_k ] = \big( I\, – \gamma T \big) \mathbb{E} [ \Theta_{k-1} ] \, + \gamma^2 \sigma^2 H, $$ leading to, after summation of the geometric series (see an earlier post): $$\mathbb{E} [ \Theta_k ] = \big( I\, – \gamma T \big)^k \Theta_{0} \, + \gamma \sigma^2 \big[ I- \big( I- \gamma T \big)^k \big] T^{-1} H .$$

A sufficient condition for convergence is that the operator \(I\, – \gamma T\) (as an operator on symmetric operators) has eigenvalues in \([-1,1]\). In [17], we define and characterize the largest step-size with few assumptions. We consider instead specific models for asymptotic computations.

Simplest model. In order to obtain simpler formulas, several models have been proposed that are compatible with the Hessian operator \(H\) having eigenvalues \((\lambda_i)_{i \geqslant 1}\) and eigenvectors \((u_i)_{i \geqslant 1}\). The simplest model [18] is \(x = r u_j\) with \(r \in \mathbb{R}\) and \(i \in \mathbb{N}^\ast\) random and independent, with \(\mathbb{E}[r^2] = \sum_{i \geqslant 1} h_i = {\rm tr}(H)\), and \(\mathbb{P}(j = i) = h_i / {\rm tr}(H)\). What makes this model special is that, like in the deterministic case, there are no interactions between eigensubspaces. While this includes one-hot encoded discrete data as recently used in [19], this remains limited compared to the model below (which includes Gaussian data and is often used in random matrix theory).

Simplified high-dimensional model. We consider $$x= \sum_{i = 1}^{+\infty} h_i^{1/2} z_i u_i,$$ where the variables \(z_i \in \mathbb{R}\), \(i \geqslant 1\), are independent (and not merely un-correlated), and have zero mean and unit variance. The only parameter that we will need to characterize the operator \(T\) is the fourth-order moment, \(\mathbb{E} [ z_i^2] = 3 + \kappa\), where \(\kappa\) is the excess kurtosis.

The Gaussian model [15] where all \(z_i\) are Gaussians corresponds to \(\kappa=0\), while the Rademacher model where \(z_i \in \{-1,1\}\) are Rademacher random variables corresponds to the smallest possible value \(\kappa=-2\). Other models could also be considered, particularly when using kernels [25].

With this simplified model, as already shown in the 1980’s in the Gaussian context [20, 15], in the basis of eigenvectors of \(H\), the operator \(T\) has a simple block-diagonal structure: the off-diagonal elements of \(\mathbb{E} \big[ \Theta_k ] \) evolve independently (and converge to zero linearly), while the diagonal elements that are needed to compute the performance measure evolve according to a linear iteration with a matrix that is the sum of a diagonal and a rank-one matrix. More precisely, assuming without loss of generality that the operator \(H\) is diagonal equal to \({\rm Diag}(h)\), Eq. \((1)\) projected on diagonal elements becomes $${\rm diag}( \mathbb{E}[ \Theta_k]) = ( I\, – \gamma V) {\rm diag}( \mathbb{E}[ \Theta_k]) + \gamma^2 \sigma^2 h,$$ with $$V = {\rm Diag}\big( 2 h \, – \gamma(\kappa+2) h \circ h\big) – \gamma h \otimes h .$$

The performance \(a_k = \mathbb{E} [ F(\eta_k) ] – F(\eta_\ast)\) is thus $$ a_k = \frac{1}{2}\langle \delta \circ \delta, ( I \, – \gamma V )^k h \rangle + \gamma \sigma^2 \langle h, \big(I\, – ( I \, – \gamma V )^k \big) V^{-1} h \rangle. $$ In order to study convergence, we could look at an eigenvalue decomposition of the matrix \(V\) using “secular” equations [21]. Instead, we will use the z-transform method, which avoids explicit computations of eigenvalues.

For simplicity, we consider here only the interpolation regime where \(\sigma^2=0\) (i.e., there is no noise on top of optimal predictions, see [1] for all details). Then, the z-transform is equal to $$A(z) = \sum_{k=0}^{+\infty} a_k z^k = \frac{1}{2}\langle \delta \circ \delta, ( (1-z) I \, + z \gamma V )^{-1} h \rangle.$$ The matrix \((1-z) I \, + z \gamma V\) that needs to be inverted has a “diagonal + rank-one” structure, and can thus be inverted using the matrix inversion lemma, leading to an explicit rational function in \(z\), that can be directly analyzed around \(z = 1\) (see [1]).

Asymptotic equivalent. Following this post and the gradient descent analysis above, we consider \(h_i = {L}/{i^\alpha}\) and \(\delta_i = {\Delta}/{i^{\beta/2}}\). If we assume \(\kappa = -2\) (Rademacher case, which leads to the simplest formulas), the largest step-size is \(\gamma = 2/ {\rm tr}(H)\) (which can be significantly smaller than \(2 / \lambda_\max(H)\) when \(\alpha\) is close to 1), and we get the asymptotic equivalent $$a_k \sim \frac{L \Delta^2}{2 \alpha} \frac{1}{1-\gamma {\rm tr}(H)/2 } \frac{\Gamma(\omega)}{(2 \gamma L k)^{\omega}},$$ with \(\omega= (\beta-1)/\alpha + 1\), and under the condition \(\omega = (0, 2- 1/\alpha)\) (see [1] for a result for all cases). Up to constants, this leads to the same equivalent as for gradient descent. See below for an illustration.

Comparison of asymptotic equivalent and performance of SGD in the interpolation regime for several replications for \(\omega=1\).

We recover the results from [22] with a simpler and more direct proof, thereby obtaining asymptotic results that complement the non-asymptotic results from [23]. This is also a subcase of results from [24] with a simpler proof. Various extensions are possible, e.g., using averaging of iterates, or computing estimates of the variance of the performance (and not only its expectation); see [1] for details.

Conclusion


In this blog post, I have shown how the z-transform can be used to derive asymptotic equivalents of sequences that naturally appear in the optimization of quadratic forms, with classical tools such as acceleration and stochastic approximation. In all cases, the z-transform method allows simple computations of asymptotic equivalents (e.g., scaling laws) through the expansion of rational functions.

Since quadratic functions are at the heart of optimization, many recent developments can be considered as candidates for the z-transform method, such as Richardson extrapolation, non-linear iterations for generic convex functions, time-varying coefficients beyond rational functions, mixing momentum techniques and SGD [26, 26], primal-dual algorithms, variance reduced methods, or sampling techniques. Many situations where the z-transform can prove useful!

Where does the “z” come from? For those questioning the naming of the z-transform, the wikipedia page has some interesting historical notes. The idea of using generating functions to study sequences dates back at least to De Moivre around 1730, while its first use in discrete engineering systems was done by Witold Hurewicz in 1947 using the letter “z” (to make it different from the letter “s” used for the Laplace transform (its continuous-time counterpart). The name z-transform was later coined in 1952 by John R. Ragazzini and Lofti Zadeh. As confirmed by a personal communication from Lofti Zadeh to Murat Arcak, It has nothing to do with the first letter of Loftis Zadeh’s name (“I used the letter z not because of my name but because in the mathematical literature on difference equations, z was the symbol that was used”). However, I didn’t find any formal confirmation that Zorro had nothing to do with it.

Acknowledgements. I thank Adrien Taylor, Ayoub Melliti, Nicolas Flammarion, Baptiste Goujaud, and Raphaël Berthier for insightful discussions related to this work.

References

[1] Francis Bach. On the effectiveness of the z-transform method in quadratic optimization. Technical report, arXiv:2507.03404, 2025.
[2] Godfrey H. Hardy and John E. Littlewood. Abel’s theorem and its converse. Proceedings of the London Mathematical Society, 2(1):205–235, 1920.
[3] Jacob Korevaar. Tauberian Theory: A Century of Developments. Springer, 2004.
[4] Philippe Flajolet and Andrew Odlyzko. Singularity analysis of generating functions. SIAM Journal on Discrete Mathematics, 3(2):216–240, 1990.
[5] Edward A. Bender. Asymptotic methods in enumeration. SIAM Review, 16(4):485–515, 1974.
[6] Eliahu Ibrahim Jury. Theory and Application of the z-Transform Method. Robert E. Krieger Publishing Company, 1964.
[7] Herbert S. Wilf. Generatingfunctionology. CRC Press, 2005.
[8] Raphaël Berthier, Francis Bach, and Pierre Gaillard. Accelerated gossip in networks of given dimension using Jacobi polynomial iterations. SIAM Journal on Mathematics of Data Science, 2(1):24–47, 2020.
[9] Nicolas Flammarion and Francis Bach. From averaging to acceleration, there is only a step-size. Conference on Learning Theory, 2015.
[10] David Vernon Widder. Laplace Transform. Princeton University Press, 1942.
[11] Yurii Nesterov. A method for solving a convex programming problem with rate of convergence \(O(1/k^2)\). Soviet Mathematics. Doklady, 269(3):543–547, 1983.
[12] Antonin Chambolle and Charles Dossal. On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. Journal of Optimization theory and Applications, 166:968–982, 2015.
[13] Jean-François Aujol, Charles Dossal, Hippolyte Labarrière, and Aude Rondepierre. Strong convergence of FISTA iterates under Hölderian and quadratic growth conditions. Technical Report 2407.17063, arXiv, 2024.
[14] Neil Bershad. Analysis of the normalized LMS algorithm with Gaussian inputs. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4):793–806, 1986.
[15] Arie Feuer and Ehud Weinstein. Convergence analysis of LMS filters with uncorrelated Gaussian data. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(1):222–230, 2003.
[16] Francis Bach and Eric Moulines. Non-strongly-convex smooth stochastic approximation with convergence rate \(O(1/n)\). Advances in Neural Information Processing Systems, 2013.
[17] Alexandre Défossez and Francis Bach. Averaged least-mean-squares: Bias-variance trade-offs and optimal sampling distributions. International Conference on Artificial Intelligence and Statistics, 2015
[18] Dirk T. M. Slock. On the convergence behavior of the LMS and the normalized LMS algorithms. IEEE Transactions on Signal Processing, 41(9):2811–2825, 1993.
[19] Frederik Kunstner, Francis Bach. Scaling laws for gradient descent and sign descent for linear bigram models under Zipf’s law. Technical report, arXiv:2505.19227, 2025.
[20] Larry L. Horowitz and Kenneth D. Senne. Performance advantage of complex LMS for controlling narrow-band adaptive arrays. IEEE Transactions on Circuits and Systems, 28(6):562–576, 1981.
[21] Gene H. Golub. Some modified matrix eigenvalue problems. SIAM Review, 15(2):318–334, 1973.
[22] Maksim Velikanov, Denis Kuznedelev, and Dmitry Yarotsky. A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momentaInternational Conference on Learning Representations, 2023.
[23] Raphaël Berthier, Francis Bach, and Pierre Gaillard. Tight nonparametric convergence rates for stochastic gradient descent under the noiseless linear model. Advances in Neural Information Processing Systems, 2020b.
[24] Elliot Paquette, Courtney Paquette, Lechao Xiao, and Jeffrey Pennington. 4+3 phases of compute-optimal neural scaling laws. Advances in Neural Information Processing Systems, 2024.
[25] Aditya Varre and Nicolas Flammarion. Accelerated SGD for non-strongly-convex least squares. In Conference on Learning Theory, 2022.
[26] Damien Ferbach, Katie Everett, Gauthier Gidel, Elliot Paquette, and Courtney Paquette. Dimension-adapted momentum outscales SGD. Technical Report 2505.16098, arXiv, 2025.
[27] Aymeric Dieuleveut, Nicolas Flammarion, Francis Bach. Harder, better, faster, stronger convergence rates for least-squares regressionJournal of Machine Learning Research, 18(101):1-51, 2017.

By Francis Bach

FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming

from arXiv: Computational Complexity

Authors: Gal Beniamini, Yuval Dor, Alon Vinnikov, Shir Granot Peled, Or Weinstein, Or Sharir, Noam Wies, Tomer Nussbaum, Ido Ben Shaul, Tomer Zekharya, Yoav Levine, Shai Shalev-Shwartz, Amnon Shashua

Frontier AI models demonstrate formidable breadth of knowledge. But how close are they to true human -- or superhuman -- expertise? Genuine experts can tackle the hardest problems and push the boundaries of scientific understanding. To illuminate the limits of frontier model capabilities, we turn away from contrived competitive programming puzzles, and instead focus on real-life research problems. We construct FormulaOne, a benchmark that lies at the intersection of graph theory, logic, and algorithms, all well within the training distribution of frontier models. Our problems are incredibly demanding, requiring an array of reasoning steps. The dataset has three key properties. First, it is of commercial interest and relates to practical large-scale optimisation problems, such as those arising in routing, scheduling, and network design. Second, it is generated from the highly expressive framework of Monadic Second-Order (MSO) logic on graphs, paving the way toward automatic problem generation at scale; ideal for building RL environments. Third, many of our problems are intimately related to the frontier of theoretical computer science, and to central conjectures therein, such as the Strong Exponential Time Hypothesis (SETH). As such, any significant algorithmic progress on our dataset, beyond known results, could carry profound theoretical implications. Remarkably, state-of-the-art models like OpenAI's o3 fail entirely on FormulaOne, solving less than 1% of the questions, even when given 10 attempts and explanatory fewshot examples -- highlighting how far they remain from expert-level understanding in some domains. To support further research, we additionally curate FormulaOne-Warmup, offering a set of simpler tasks, from the same distribution. We release the full corpus along with a comprehensive evaluation framework.

Authors: Gal Beniamini, Yuval Dor, Alon Vinnikov, Shir Granot Peled, Or Weinstein, Or Sharir, Noam Wies, Tomer Nussbaum, Ido Ben Shaul, Tomer Zekharya, Yoav Levine, Shai Shalev-Shwartz, Amnon Shashua

Frontier AI models demonstrate formidable breadth of knowledge. But how close are they to true human -- or superhuman -- expertise? Genuine experts can tackle the hardest problems and push the boundaries of scientific understanding. To illuminate the limits of frontier model capabilities, we turn away from contrived competitive programming puzzles, and instead focus on real-life research problems. We construct FormulaOne, a benchmark that lies at the intersection of graph theory, logic, and algorithms, all well within the training distribution of frontier models. Our problems are incredibly demanding, requiring an array of reasoning steps. The dataset has three key properties. First, it is of commercial interest and relates to practical large-scale optimisation problems, such as those arising in routing, scheduling, and network design. Second, it is generated from the highly expressive framework of Monadic Second-Order (MSO) logic on graphs, paving the way toward automatic problem generation at scale; ideal for building RL environments. Third, many of our problems are intimately related to the frontier of theoretical computer science, and to central conjectures therein, such as the Strong Exponential Time Hypothesis (SETH). As such, any significant algorithmic progress on our dataset, beyond known results, could carry profound theoretical implications. Remarkably, state-of-the-art models like OpenAI's o3 fail entirely on FormulaOne, solving less than 1% of the questions, even when given 10 attempts and explanatory fewshot examples -- highlighting how far they remain from expert-level understanding in some domains. To support further research, we additionally curate FormulaOne-Warmup, offering a set of simpler tasks, from the same distribution. We release the full corpus along with a comprehensive evaluation framework.

Ranking Vectors Clustering: Theory and Applications

from arXiv: Computational Complexity

Authors: Ali Fattahi, Ali Eshragh, Babak Aslani, Meysam Rabiee

We study the problem of clustering ranking vectors, where each vector represents preferences as an ordered list of distinct integers. Specifically, we focus on the k-centroids ranking vectors clustering problem (KRC), which aims to partition a set of ranking vectors into k clusters and identify the centroid of each cluster. Unlike classical k-means clustering (KMC), KRC constrains both the observations and centroids to be ranking vectors. We establish the NP-hardness of KRC and characterize its feasible set. For the single-cluster case, we derive a closed-form analytical solution for the optimal centroid, which can be computed in linear time. To address the computational challenges of KRC, we develop an efficient approximation algorithm, KRCA, which iteratively refines initial solutions from KMC, referred to as the baseline solution. Additionally, we introduce a branch-and-bound (BnB) algorithm for efficient cluster reconstruction within KRCA, leveraging a decision tree framework to reduce computational time while incorporating a controlling parameter to balance solution quality and efficiency. We establish theoretical error bounds for KRCA and BnB. Through extensive numerical experiments on synthetic and real-world datasets, we demonstrate that KRCA consistently outperforms baseline solutions, delivering significant improvements in solution quality with fast computational times. This work highlights the practical significance of KRC for personalization and large-scale decision making, offering methodological advancements and insights that can be built upon in future studies.

Authors: Ali Fattahi, Ali Eshragh, Babak Aslani, Meysam Rabiee

We study the problem of clustering ranking vectors, where each vector represents preferences as an ordered list of distinct integers. Specifically, we focus on the k-centroids ranking vectors clustering problem (KRC), which aims to partition a set of ranking vectors into k clusters and identify the centroid of each cluster. Unlike classical k-means clustering (KMC), KRC constrains both the observations and centroids to be ranking vectors. We establish the NP-hardness of KRC and characterize its feasible set. For the single-cluster case, we derive a closed-form analytical solution for the optimal centroid, which can be computed in linear time. To address the computational challenges of KRC, we develop an efficient approximation algorithm, KRCA, which iteratively refines initial solutions from KMC, referred to as the baseline solution. Additionally, we introduce a branch-and-bound (BnB) algorithm for efficient cluster reconstruction within KRCA, leveraging a decision tree framework to reduce computational time while incorporating a controlling parameter to balance solution quality and efficiency. We establish theoretical error bounds for KRCA and BnB. Through extensive numerical experiments on synthetic and real-world datasets, we demonstrate that KRCA consistently outperforms baseline solutions, delivering significant improvements in solution quality with fast computational times. This work highlights the practical significance of KRC for personalization and large-scale decision making, offering methodological advancements and insights that can be built upon in future studies.

The Serial Scaling Hypothesis

from arXiv: Computational Complexity

Authors: Yuxi Liu, Konpat Preechakul, Kananart Kuwaranancharoen, Yutong Bai

While machine learning has advanced through massive parallelization, we identify a critical blind spot: some problems are fundamentally sequential. These "inherently serial" problems-from mathematical reasoning to physical simulations to sequential decision-making-require dependent computational steps that cannot be parallelized. Drawing from complexity theory, we formalize this distinction and demonstrate that current parallel-centric architectures face fundamental limitations on such tasks. We argue that recognizing the serial nature of computation holds profound implications on machine learning, model design, hardware development. As AI tackles increasingly complex reasoning, deliberately scaling serial computation-not just parallel computation-is essential for continued progress.

Authors: Yuxi Liu, Konpat Preechakul, Kananart Kuwaranancharoen, Yutong Bai

While machine learning has advanced through massive parallelization, we identify a critical blind spot: some problems are fundamentally sequential. These "inherently serial" problems-from mathematical reasoning to physical simulations to sequential decision-making-require dependent computational steps that cannot be parallelized. Drawing from complexity theory, we formalize this distinction and demonstrate that current parallel-centric architectures face fundamental limitations on such tasks. We argue that recognizing the serial nature of computation holds profound implications on machine learning, model design, hardware development. As AI tackles increasingly complex reasoning, deliberately scaling serial computation-not just parallel computation-is essential for continued progress.

A Discrete Analog of Tutte's Barycentric Embeddings on Surfaces

from arXiv: Computational Geometry

Authors: Éric Colin de Verdière, Vincent Despré, Loïc Dubois

Tutte's celebrated barycentric embedding theorem describes a natural way to build straight-line embeddings (crossing-free drawings) of a (3-connected) planar graph: map the vertices of the outer face to the vertices of a convex polygon, and ensure that each remaining vertex is in convex position, namely, a barycenter with positive coefficients of its neighbors. Actually computing an embedding then boils down to solving a system of linear equations. A particularly appealing feature of this method is the flexibility given by the choice of the barycentric weights. Generalizations of Tutte's theorem to surfaces of nonpositive curvature are known, but due to their inherently continuous nature, they do not lead to an algorithm. In this paper, we propose a purely discrete analog of Tutte's theorem for surfaces (with or without boundary) of nonpositive curvature, based on the recently introduced notion of reducing triangulations. We prove a Tutte theorem in this setting: every drawing homotopic to an embedding such that each vertex is harmonious (a discrete analog of being in convex position) is a weak embedding (arbitrarily close to an embedding). We also provide a polynomial-time algorithm to make an input drawing harmonious without increasing the length of any edge, in a similar way as a drawing can be put in convex position without increasing the edge lengths.

Authors: Éric Colin de Verdière, Vincent Despré, Loïc Dubois

Tutte's celebrated barycentric embedding theorem describes a natural way to build straight-line embeddings (crossing-free drawings) of a (3-connected) planar graph: map the vertices of the outer face to the vertices of a convex polygon, and ensure that each remaining vertex is in convex position, namely, a barycenter with positive coefficients of its neighbors. Actually computing an embedding then boils down to solving a system of linear equations. A particularly appealing feature of this method is the flexibility given by the choice of the barycentric weights. Generalizations of Tutte's theorem to surfaces of nonpositive curvature are known, but due to their inherently continuous nature, they do not lead to an algorithm. In this paper, we propose a purely discrete analog of Tutte's theorem for surfaces (with or without boundary) of nonpositive curvature, based on the recently introduced notion of reducing triangulations. We prove a Tutte theorem in this setting: every drawing homotopic to an embedding such that each vertex is harmonious (a discrete analog of being in convex position) is a weak embedding (arbitrarily close to an embedding). We also provide a polynomial-time algorithm to make an input drawing harmonious without increasing the length of any edge, in a similar way as a drawing can be put in convex position without increasing the edge lengths.

Computational-Statistical Tradeoffs from NP-hardness

from arXiv: Data Structures and Algorithms

Authors: Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan

A central question in computer science and statistics is whether efficient algorithms can achieve the information-theoretic limits of statistical problems. Many computational-statistical tradeoffs have been shown under average-case assumptions, but since statistical problems are average-case in nature, it has been a challenge to base them on standard worst-case assumptions. In PAC learning where such tradeoffs were first studied, the question is whether computational efficiency can come at the cost of using more samples than information-theoretically necessary. We base such tradeoffs on $\mathsf{NP}$-hardness and obtain: $\circ$ Sharp computational-statistical tradeoffs assuming $\mathsf{NP}$ requires exponential time: For every polynomial $p(n)$, there is an $n$-variate class $C$ with VC dimension $1$ such that the sample complexity of time-efficiently learning $C$ is $\Theta(p(n))$. $\circ$ A characterization of $\mathsf{RP}$ vs. $\mathsf{NP}$ in terms of learning: $\mathsf{RP} = \mathsf{NP}$ iff every $\mathsf{NP}$-enumerable class is learnable with $O(\mathrm{VCdim}(C))$ samples in polynomial time. The forward implication has been known since (Pitt and Valiant, 1988); we prove the reverse implication. Notably, all our lower bounds hold against improper learners. These are the first $\mathsf{NP}$-hardness results for improperly learning a subclass of polynomial-size circuits, circumventing formal barriers of Applebaum, Barak, and Xiao (2008).

Authors: Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan

A central question in computer science and statistics is whether efficient algorithms can achieve the information-theoretic limits of statistical problems. Many computational-statistical tradeoffs have been shown under average-case assumptions, but since statistical problems are average-case in nature, it has been a challenge to base them on standard worst-case assumptions. In PAC learning where such tradeoffs were first studied, the question is whether computational efficiency can come at the cost of using more samples than information-theoretically necessary. We base such tradeoffs on $\mathsf{NP}$-hardness and obtain: $\circ$ Sharp computational-statistical tradeoffs assuming $\mathsf{NP}$ requires exponential time: For every polynomial $p(n)$, there is an $n$-variate class $C$ with VC dimension $1$ such that the sample complexity of time-efficiently learning $C$ is $\Theta(p(n))$. $\circ$ A characterization of $\mathsf{RP}$ vs. $\mathsf{NP}$ in terms of learning: $\mathsf{RP} = \mathsf{NP}$ iff every $\mathsf{NP}$-enumerable class is learnable with $O(\mathrm{VCdim}(C))$ samples in polynomial time. The forward implication has been known since (Pitt and Valiant, 1988); we prove the reverse implication. Notably, all our lower bounds hold against improper learners. These are the first $\mathsf{NP}$-hardness results for improperly learning a subclass of polynomial-size circuits, circumventing formal barriers of Applebaum, Barak, and Xiao (2008).

Efficiently Constructing Sparse Navigable Graphs

from arXiv: Data Structures and Algorithms

Authors: Alex Conway, Laxman Dhulipala, Martin Farach-Colton, Rob Johnson, Ben Landrum, Christopher Musco, Yarin Shechter, Torsten Suel, Richard Wen

Graph-based nearest neighbor search methods have seen a surge of popularity in recent years, offering state-of-the-art performance across a wide variety of applications. Central to these methods is the task of constructing a sparse navigable search graph for a given dataset endowed with a distance function. Unfortunately, doing so is computationally expensive, so heuristics are universally used in practice. In this work, we initiate the study of fast algorithms with provable guarantees for search graph construction. For a dataset with $n$ data points, the problem of constructing an optimally sparse navigable graph can be framed as $n$ separate but highly correlated minimum set cover instances. This yields a naive $O(n^3)$ time greedy algorithm that returns a navigable graph whose sparsity is at most $O(\log n)$ higher than optimal. We improve significantly on this baseline, taking advantage of correlation between the set cover instances to leverage techniques from streaming and sublinear-time set cover algorithms. Combined with problem-specific pre-processing techniques, we present an $\tilde{O}(n^2)$ time algorithm for constructing an $O(\log n)$-approximate sparsest navigable graph under any distance function. The runtime of our method is optimal up to logarithmic factors under the Strong Exponential Time Hypothesis via a reduction from Monochromatic Closest Pair. Moreover, we prove that, as with general set cover, obtaining better than an $O(\log n)$-approximation is NP-hard, despite the significant additional structure present in the navigable graph problem. Finally, we show that our techniques can also beat cubic time for the closely related and practically important problems of constructing $\alpha$-shortcut reachable and $\tau$-monotonic graphs, which are also used for nearest neighbor search. For such graphs, we obtain $\tilde{O}(n^{2.5})$ time or better algorithms.

Authors: Alex Conway, Laxman Dhulipala, Martin Farach-Colton, Rob Johnson, Ben Landrum, Christopher Musco, Yarin Shechter, Torsten Suel, Richard Wen

Graph-based nearest neighbor search methods have seen a surge of popularity in recent years, offering state-of-the-art performance across a wide variety of applications. Central to these methods is the task of constructing a sparse navigable search graph for a given dataset endowed with a distance function. Unfortunately, doing so is computationally expensive, so heuristics are universally used in practice. In this work, we initiate the study of fast algorithms with provable guarantees for search graph construction. For a dataset with $n$ data points, the problem of constructing an optimally sparse navigable graph can be framed as $n$ separate but highly correlated minimum set cover instances. This yields a naive $O(n^3)$ time greedy algorithm that returns a navigable graph whose sparsity is at most $O(\log n)$ higher than optimal. We improve significantly on this baseline, taking advantage of correlation between the set cover instances to leverage techniques from streaming and sublinear-time set cover algorithms. Combined with problem-specific pre-processing techniques, we present an $\tilde{O}(n^2)$ time algorithm for constructing an $O(\log n)$-approximate sparsest navigable graph under any distance function. The runtime of our method is optimal up to logarithmic factors under the Strong Exponential Time Hypothesis via a reduction from Monochromatic Closest Pair. Moreover, we prove that, as with general set cover, obtaining better than an $O(\log n)$-approximation is NP-hard, despite the significant additional structure present in the navigable graph problem. Finally, we show that our techniques can also beat cubic time for the closely related and practically important problems of constructing $\alpha$-shortcut reachable and $\tau$-monotonic graphs, which are also used for nearest neighbor search. For such graphs, we obtain $\tilde{O}(n^{2.5})$ time or better algorithms.

Online Rounding for Set Cover under Subset Arrivals

from arXiv: Data Structures and Algorithms

Authors: Jarosław Byrka, Yongho Shin

A rounding scheme for set cover has served as an important component in design of approximation algorithms for the problem, and there exists an H_s-approximate rounding scheme, where s denotes the maximum subset size, directly implying an approximation algorithm with the same approximation guarantee. A rounding scheme has also been considered under some online models, and in particular, under the element arrival model used as a crucial subroutine in algorithms for online set cover, an O(log s)-competitive rounding scheme is known [Buchbinder, Chen, and Naor, SODA 2014]. On the other hand, under a more general model, called the subset arrival model, only a simple O(log n)-competitive rounding scheme is known, where n denotes the number of elements in the ground set. In this paper, we present an O(log^2 s)-competitive rounding scheme under the subset arrival model, with one mild assumption that s is known upfront. Using our rounding scheme, we immediately obtain an O(log^2 s)-approximation algorithm for multi-stage stochastic set cover, improving upon the existing algorithms [Swamy and Shmoys, SICOMP 2012; Byrka and Srinivasan, SIDMA 2018] when s is small enough compared to the number of stages and the number of elements. Lastly, for set cover with s = 2, also known as edge cover, we present a 1.8-competitive rounding scheme under the edge arrival model.

Authors: Jarosław Byrka, Yongho Shin

A rounding scheme for set cover has served as an important component in design of approximation algorithms for the problem, and there exists an H_s-approximate rounding scheme, where s denotes the maximum subset size, directly implying an approximation algorithm with the same approximation guarantee. A rounding scheme has also been considered under some online models, and in particular, under the element arrival model used as a crucial subroutine in algorithms for online set cover, an O(log s)-competitive rounding scheme is known [Buchbinder, Chen, and Naor, SODA 2014]. On the other hand, under a more general model, called the subset arrival model, only a simple O(log n)-competitive rounding scheme is known, where n denotes the number of elements in the ground set. In this paper, we present an O(log^2 s)-competitive rounding scheme under the subset arrival model, with one mild assumption that s is known upfront. Using our rounding scheme, we immediately obtain an O(log^2 s)-approximation algorithm for multi-stage stochastic set cover, improving upon the existing algorithms [Swamy and Shmoys, SICOMP 2012; Byrka and Srinivasan, SIDMA 2018] when s is small enough compared to the number of stages and the number of elements. Lastly, for set cover with s = 2, also known as edge cover, we present a 1.8-competitive rounding scheme under the edge arrival model.

Kernelization for $H$-Coloring

from arXiv: Data Structures and Algorithms

Authors: Yael Berkman, Ishay Haviv

For a fixed graph $H$, the $H$-Coloring problem asks whether a given graph admits an edge-preserving function from its vertex set to that of $H$. A seminal theorem of Hell and Ne\v{s}et\v{r}il asserts that the $H$-Coloring problem is NP-hard whenever $H$ is loopless and non-bipartite. A result of Jansen and Pieterse implies that for every graph $H$, the $H$-Coloring problem parameterized by the vertex cover number $k$ admits a kernel with $O(k^{\Delta(H)})$ vertices and bit-size bounded by $O(k^{\Delta(H)} \cdot \log k)$, where $\Delta(H)$ denotes the maximum degree in $H$. For the case where $H$ is a complete graph on at least three vertices, this kernel size nearly matches conditional lower bounds established by Jansen and Kratsch and by Jansen and Pieterse. This paper presents new upper and lower bounds on the kernel size of $H$-Coloring problems parameterized by the vertex cover number. The upper bounds arise from two kernelization algorithms. The first is purely combinatorial, and its size is governed by a structural quantity of the graph $H$, called the non-adjacency witness number. As applications, we obtain kernels whose size is bounded by a fixed polynomial for natural classes of graphs $H$ with unbounded maximum degree. More strikingly, we show that for almost every graph $H$, the degree of the polynomial that bounds the size of our combinatorial kernel grows only logarithmically in $\Delta(H)$. Our second kernel leverages linear-algebraic tools and involves the notion of faithful independent representations of graphs. It strengthens the general bound from prior work and, among other applications, yields near-optimal kernels for problems concerning the dimension of orthogonal graph representations over finite fields. We complement these results with conditional lower bounds, thereby nearly settling the kernel complexity of the problem for various target graphs $H$.

Authors: Yael Berkman, Ishay Haviv

For a fixed graph $H$, the $H$-Coloring problem asks whether a given graph admits an edge-preserving function from its vertex set to that of $H$. A seminal theorem of Hell and Ne\v{s}et\v{r}il asserts that the $H$-Coloring problem is NP-hard whenever $H$ is loopless and non-bipartite. A result of Jansen and Pieterse implies that for every graph $H$, the $H$-Coloring problem parameterized by the vertex cover number $k$ admits a kernel with $O(k^{\Delta(H)})$ vertices and bit-size bounded by $O(k^{\Delta(H)} \cdot \log k)$, where $\Delta(H)$ denotes the maximum degree in $H$. For the case where $H$ is a complete graph on at least three vertices, this kernel size nearly matches conditional lower bounds established by Jansen and Kratsch and by Jansen and Pieterse. This paper presents new upper and lower bounds on the kernel size of $H$-Coloring problems parameterized by the vertex cover number. The upper bounds arise from two kernelization algorithms. The first is purely combinatorial, and its size is governed by a structural quantity of the graph $H$, called the non-adjacency witness number. As applications, we obtain kernels whose size is bounded by a fixed polynomial for natural classes of graphs $H$ with unbounded maximum degree. More strikingly, we show that for almost every graph $H$, the degree of the polynomial that bounds the size of our combinatorial kernel grows only logarithmically in $\Delta(H)$. Our second kernel leverages linear-algebraic tools and involves the notion of faithful independent representations of graphs. It strengthens the general bound from prior work and, among other applications, yields near-optimal kernels for problems concerning the dimension of orthogonal graph representations over finite fields. We complement these results with conditional lower bounds, thereby nearly settling the kernel complexity of the problem for various target graphs $H$.

Maintaining Routing Structures under Deletions via Self-Pruning

from arXiv: Data Structures and Algorithms

Authors: Bernhard Haeupler, Antti Roeyskoe

Expanders are powerful algorithmic structures with two key properties: they are a) routable: for any multi-commodity flow unit demand, there exists a routing with low congestion over short paths, where a demand is unit if the amount of demand sent / received by any vertex is at most the number of edges adjacent to it. b) stable / prunable: for any (sequence of) edge failures, there exists a proportionally small subset of vertices that can be disabled, such that the graph induced on the remaining vertices is an expander. Two natural algorithmic problems correspond to these two existential guarantees: expander routing, i.e. computing a low-congestion routing for a unit multi-commodity demand on an expander, and expander pruning, i.e., maintaining the subset of disabled vertices under a sequence of edge failures. This paper considers the combination of the two problems: maintaining a routing for a unit multi-commodity demand under pruning steps. This is done through the introduction of a family of expander graphs that, like hypercubes, are easy to route in, and are self-pruning: for an online sequence of edge deletions, a simple self-contained algorithm can find a few vertices to prune with each edge deletion, such that the remaining graph always remains an easy-to-route-in expander in the family. Notably, and with considerable technical work, this self-pruning can be made worst-case, i.e., such that every single adversarial deletion only causes a small number of additional deletions. Our results also allow tight constant-factor control over the length of routing paths (with the usual trade-offs in congestion and pruning ratio) and therefore extend to constant-hop and length-constrained expanders in which routing over constant length paths is crucial.

Authors: Bernhard Haeupler, Antti Roeyskoe

Expanders are powerful algorithmic structures with two key properties: they are a) routable: for any multi-commodity flow unit demand, there exists a routing with low congestion over short paths, where a demand is unit if the amount of demand sent / received by any vertex is at most the number of edges adjacent to it. b) stable / prunable: for any (sequence of) edge failures, there exists a proportionally small subset of vertices that can be disabled, such that the graph induced on the remaining vertices is an expander. Two natural algorithmic problems correspond to these two existential guarantees: expander routing, i.e. computing a low-congestion routing for a unit multi-commodity demand on an expander, and expander pruning, i.e., maintaining the subset of disabled vertices under a sequence of edge failures. This paper considers the combination of the two problems: maintaining a routing for a unit multi-commodity demand under pruning steps. This is done through the introduction of a family of expander graphs that, like hypercubes, are easy to route in, and are self-pruning: for an online sequence of edge deletions, a simple self-contained algorithm can find a few vertices to prune with each edge deletion, such that the remaining graph always remains an easy-to-route-in expander in the family. Notably, and with considerable technical work, this self-pruning can be made worst-case, i.e., such that every single adversarial deletion only causes a small number of additional deletions. Our results also allow tight constant-factor control over the length of routing paths (with the usual trade-offs in congestion and pruning ratio) and therefore extend to constant-hop and length-constrained expanders in which routing over constant length paths is crucial.

The Price of Diversity of the Traveling Salesman Problem

from arXiv: Data Structures and Algorithms

Authors: Mark de Berg, Andrés López Martínez, Frits Spieksma

This paper introduces the concept of the "Price of Diversity" (PoD) in discrete optimization problems, quantifying the trade-off between solution diversity and cost. For a minimization problem, the PoD is defined as the worst-case ratio, over all instances, of the minimum achievable cost of a diverse set of $k$ solutions to the cost of a single optimal solution for the same instance. Here, the cost of a $k$-solution set is determined by the most expensive solution within the set. Focusing on the Traveling Salesman Problem (TSP) as a key example, we study the PoD in the setting where $k$ edge-disjoint tours are required. We establish that, asymptotically, the PoD of finding two edge-disjoint tours is $\frac{8}{5}$ in a special one-dimensional case and 2 in a general metric space. We obtain these results from analyzing a related fundamental problem: the Shortest Hamiltonian Path problem (SHP), for which we establish similar results.

Authors: Mark de Berg, Andrés López Martínez, Frits Spieksma

This paper introduces the concept of the "Price of Diversity" (PoD) in discrete optimization problems, quantifying the trade-off between solution diversity and cost. For a minimization problem, the PoD is defined as the worst-case ratio, over all instances, of the minimum achievable cost of a diverse set of $k$ solutions to the cost of a single optimal solution for the same instance. Here, the cost of a $k$-solution set is determined by the most expensive solution within the set. Focusing on the Traveling Salesman Problem (TSP) as a key example, we study the PoD in the setting where $k$ edge-disjoint tours are required. We establish that, asymptotically, the PoD of finding two edge-disjoint tours is $\frac{8}{5}$ in a special one-dimensional case and 2 in a general metric space. We obtain these results from analyzing a related fundamental problem: the Shortest Hamiltonian Path problem (SHP), for which we establish similar results.

Efficient Semi-External Breadth-First Search

from arXiv: Data Structures and Algorithms

Authors: Xiaolong Wan, Xixian Han

Breadth-first search (BFS) is known as a basic search strategy for learning graph properties. As the scales of graph databases have increased tremendously in recent years, large-scale graphs G are often disk-resident. Obtaining the BFS results of G in semi-external memory model is inevitable, because the in-memory BFS algorithm has to maintain the entire G in the main memory, and external BFS algorithms consume high computational costs. As a good trade-off between the internal and external memory models, semi-external memory model assumes that the main memory can at least reside a spanning tree of G. Nevertheless, the semi-external BFS problem is still an open issue due to its difficulty. Therefore, this paper presents a comprehensive study for processing BFS in semi-external memory model. After discussing the naive solutions based on the basic framework of semi-external graph algorithms, this paper presents an efficient algorithm, named EP-BFS, with a small minimum memory space requirement, which is an important factor for evaluating semi-external algorithms. Extensive experiments are conducted on both real and synthetic large-scale graphs, where graph WDC-2014 contains over 1.7 billion nodes, and graph eu-2015 has over 91 billion edges. Experimental results confirm that EP-BFS can achieve up to 10 times faster.

Authors: Xiaolong Wan, Xixian Han

Breadth-first search (BFS) is known as a basic search strategy for learning graph properties. As the scales of graph databases have increased tremendously in recent years, large-scale graphs G are often disk-resident. Obtaining the BFS results of G in semi-external memory model is inevitable, because the in-memory BFS algorithm has to maintain the entire G in the main memory, and external BFS algorithms consume high computational costs. As a good trade-off between the internal and external memory models, semi-external memory model assumes that the main memory can at least reside a spanning tree of G. Nevertheless, the semi-external BFS problem is still an open issue due to its difficulty. Therefore, this paper presents a comprehensive study for processing BFS in semi-external memory model. After discussing the naive solutions based on the basic framework of semi-external graph algorithms, this paper presents an efficient algorithm, named EP-BFS, with a small minimum memory space requirement, which is an important factor for evaluating semi-external algorithms. Extensive experiments are conducted on both real and synthetic large-scale graphs, where graph WDC-2014 contains over 1.7 billion nodes, and graph eu-2015 has over 91 billion edges. Experimental results confirm that EP-BFS can achieve up to 10 times faster.

A 1/2-Approximation for Budgeted $k$-Submodular Maximization

from arXiv: Data Structures and Algorithms

Authors: Chenhao Wang

A $k$-submodular function naturally generalizes submodular functions by taking as input $k$ disjoint subsets, rather than a single subset. Unlike standard submodular maximization, which only requires selecting elements for the solution, $k$-submodular maximization adds the challenge of determining the subset to which each selected element belongs. Prior research has shown that the greedy algorithm is a 1/2-approximation for the monotone $k$-submodular maximization problem under cardinality or matroid constraints. However, whether a firm 1/2-approximation exists for the budgeted version (i.e., with a knapsack constraint) has remained open for several years. We resolve this question affirmatively by proving that the 1-Guess Greedy algorithm, which first guesses an appropriate element from an optimal solution before proceeding with the greedy algorithm, achieves a 1/2-approximation. This result is asymptotically tight as $((k+1)/(2k)+\epsilon)$-approximation requires exponentially many value oracle queries even without constraints (Iwata et al., SODA 2016). We further show that 1-Guess Greedy is 1/3-approximation for the non-monotone problem. This algorithm is both simple and parallelizable, making it well-suited for practical applications. Using the thresholding technique from (Badanidiyuru and Vondrak, SODA 2014), it runs in nearly $\tilde O(n^2k^2)$ time. The proof idea is simple: we introduce a novel continuous transformation from an optimal solution to a greedy solution, using the multilinear extension to evaluate every fractional solution during the transformation. This continuous analysis approach yields two key extensions. First, it enables improved approximation ratios of various existing algorithms. Second, our method naturally extends to $k$-submodular maximization problems under broader constraints, offering a more flexible and unified analysis framework.

Authors: Chenhao Wang

A $k$-submodular function naturally generalizes submodular functions by taking as input $k$ disjoint subsets, rather than a single subset. Unlike standard submodular maximization, which only requires selecting elements for the solution, $k$-submodular maximization adds the challenge of determining the subset to which each selected element belongs. Prior research has shown that the greedy algorithm is a 1/2-approximation for the monotone $k$-submodular maximization problem under cardinality or matroid constraints. However, whether a firm 1/2-approximation exists for the budgeted version (i.e., with a knapsack constraint) has remained open for several years. We resolve this question affirmatively by proving that the 1-Guess Greedy algorithm, which first guesses an appropriate element from an optimal solution before proceeding with the greedy algorithm, achieves a 1/2-approximation. This result is asymptotically tight as $((k+1)/(2k)+\epsilon)$-approximation requires exponentially many value oracle queries even without constraints (Iwata et al., SODA 2016). We further show that 1-Guess Greedy is 1/3-approximation for the non-monotone problem. This algorithm is both simple and parallelizable, making it well-suited for practical applications. Using the thresholding technique from (Badanidiyuru and Vondrak, SODA 2014), it runs in nearly $\tilde O(n^2k^2)$ time. The proof idea is simple: we introduce a novel continuous transformation from an optimal solution to a greedy solution, using the multilinear extension to evaluate every fractional solution during the transformation. This continuous analysis approach yields two key extensions. First, it enables improved approximation ratios of various existing algorithms. Second, our method naturally extends to $k$-submodular maximization problems under broader constraints, offering a more flexible and unified analysis framework.

Cut-Matching Games for Bipartiteness Ratio of Undirected Graphs

from arXiv: Data Structures and Algorithms

Authors: Tasuku Soma, Mingquan Ye, Yuichi Yoshida

We propose an $O(\log n)$-approximation algorithm for the bipartiteness ratio for undirected graphs introduced by Trevisan (SIAM Journal on Computing, vol. 41, no. 6, 2012), where $n$ is the number of vertices. Our approach extends the cut-matching game framework for sparsest cut to the bipartiteness ratio. Our algorithm requires only $\mathrm{poly}\log n$ many single-commodity undirected maximum flow computations. Therefore, with the current fastest undirected max-flow algorithms, it runs in nearly linear time. Along the way, we introduce the concept of well-linkedness for skew-symmetric graphs and prove a novel characterization of bipartitness ratio in terms of well-linkedness in an auxiliary skew-symmetric graph, which may be of independent interest. As an application, we devise an $\tilde{O}(mn)$-time algorithm that given a graph whose maximum cut deletes a $1-\eta$ fraction of edges, finds a cut that deletes a $1 - O(\log n \log(1/\eta)) \cdot \eta$ fraction of edges, where $m$ is the number of edges.

Authors: Tasuku Soma, Mingquan Ye, Yuichi Yoshida

We propose an $O(\log n)$-approximation algorithm for the bipartiteness ratio for undirected graphs introduced by Trevisan (SIAM Journal on Computing, vol. 41, no. 6, 2012), where $n$ is the number of vertices. Our approach extends the cut-matching game framework for sparsest cut to the bipartiteness ratio. Our algorithm requires only $\mathrm{poly}\log n$ many single-commodity undirected maximum flow computations. Therefore, with the current fastest undirected max-flow algorithms, it runs in nearly linear time. Along the way, we introduce the concept of well-linkedness for skew-symmetric graphs and prove a novel characterization of bipartitness ratio in terms of well-linkedness in an auxiliary skew-symmetric graph, which may be of independent interest. As an application, we devise an $\tilde{O}(mn)$-time algorithm that given a graph whose maximum cut deletes a $1-\eta$ fraction of edges, finds a cut that deletes a $1 - O(\log n \log(1/\eta)) \cdot \eta$ fraction of edges, where $m$ is the number of edges.

Waiting is worth it and can be improved with predictions

from arXiv: Data Structures and Algorithms

Authors: Ya-Chun Liang, Meng-Hsi Li, Chung-Shou Liao, Clifford Stein

We revisit the well-known online traveling salesman problem (OLTSP) and its extension, the online dial-a-ride problem (OLDARP). A server starting at a designated origin in a metric space, is required to serve online requests, and return to the origin such that the completion time is minimized. The SmartStart algorithm, introduced by Ascheuer et al., incorporates a waiting approach into an online schedule-based algorithm and attains the optimal upper bound of 2 for the OLTSP and the OLDARP if each schedule is optimal. Using the Christofides' heuristic to approximate each schedule leads to the currently best upper bound of (7 + sqrt(13)) / 4 approximately 2.6514 in polynomial time. In this study, we investigate how an online algorithm with predictions, a recent popular framework (i.e. the so-called learning-augmented algorithms), can be used to improve the best competitive ratio in polynomial time. In particular, we develop a waiting strategy with online predictions, each of which is only a binary decision-making for every schedule in a whole route, rather than forecasting an entire set of requests in the beginning (i.e. offline predictions). That is, it does not require knowing the number of requests in advance. The proposed online schedule-based algorithm can achieve 1.1514 * lambda + 1.5-consistency and 1.5 + 1.5 / (2.3028 * lambda - 1)-robustness in polynomial time, where lambda lies in the interval (1/theta, 1] and theta is set to (1 + sqrt(13)) / 2 approximately 2.3028. The best consistency tends to approach to 2 when lambda is close to 1/theta. Meanwhile, we show any online schedule-based algorithms cannot derive a competitive ratio of less than 2 even with perfect online predictions.

Authors: Ya-Chun Liang, Meng-Hsi Li, Chung-Shou Liao, Clifford Stein

We revisit the well-known online traveling salesman problem (OLTSP) and its extension, the online dial-a-ride problem (OLDARP). A server starting at a designated origin in a metric space, is required to serve online requests, and return to the origin such that the completion time is minimized. The SmartStart algorithm, introduced by Ascheuer et al., incorporates a waiting approach into an online schedule-based algorithm and attains the optimal upper bound of 2 for the OLTSP and the OLDARP if each schedule is optimal. Using the Christofides' heuristic to approximate each schedule leads to the currently best upper bound of (7 + sqrt(13)) / 4 approximately 2.6514 in polynomial time. In this study, we investigate how an online algorithm with predictions, a recent popular framework (i.e. the so-called learning-augmented algorithms), can be used to improve the best competitive ratio in polynomial time. In particular, we develop a waiting strategy with online predictions, each of which is only a binary decision-making for every schedule in a whole route, rather than forecasting an entire set of requests in the beginning (i.e. offline predictions). That is, it does not require knowing the number of requests in advance. The proposed online schedule-based algorithm can achieve 1.1514 * lambda + 1.5-consistency and 1.5 + 1.5 / (2.3028 * lambda - 1)-robustness in polynomial time, where lambda lies in the interval (1/theta, 1] and theta is set to (1 + sqrt(13)) / 2 approximately 2.3028. The best consistency tends to approach to 2 when lambda is close to 1/theta. Meanwhile, we show any online schedule-based algorithms cannot derive a competitive ratio of less than 2 even with perfect online predictions.

Analysis of Langevin midpoint methods using an anticipative Girsanov theorem

from arXiv: Data Structures and Algorithms

Authors: Matthew S. Zhang

We introduce a new method for analyzing midpoint discretizations of stochastic differential equations (SDEs), which are frequently used in Markov chain Monte Carlo (MCMC) methods for sampling from a target measure $\pi \propto \exp(-V)$. Borrowing techniques from Malliavin calculus, we compute estimates for the Radon-Nikodym derivative for processes on $L^2([0, T); \mathbb{R}^d)$ which may anticipate the Brownian motion, in the sense that they may not be adapted to the filtration at the same time. Applying these to various popular midpoint discretizations, we are able to improve the regularity and cross-regularity results in the literature on sampling methods. We also obtain a query complexity bound of $\widetilde{O}(\frac{\kappa^{5/4} d^{1/4}}{\varepsilon^{1/2}})$ for obtaining a $\varepsilon^2$-accurate sample in $\mathsf{KL}$ divergence, under log-concavity and strong smoothness assumptions for $\nabla^2 V$.

Authors: Matthew S. Zhang

We introduce a new method for analyzing midpoint discretizations of stochastic differential equations (SDEs), which are frequently used in Markov chain Monte Carlo (MCMC) methods for sampling from a target measure $\pi \propto \exp(-V)$. Borrowing techniques from Malliavin calculus, we compute estimates for the Radon-Nikodym derivative for processes on $L^2([0, T); \mathbb{R}^d)$ which may anticipate the Brownian motion, in the sense that they may not be adapted to the filtration at the same time. Applying these to various popular midpoint discretizations, we are able to improve the regularity and cross-regularity results in the literature on sampling methods. We also obtain a query complexity bound of $\widetilde{O}(\frac{\kappa^{5/4} d^{1/4}}{\varepsilon^{1/2}})$ for obtaining a $\varepsilon^2$-accurate sample in $\mathsf{KL}$ divergence, under log-concavity and strong smoothness assumptions for $\nabla^2 V$.

Splittable Spanning Trees and Balanced Forests in Dense Random Graphs

from arXiv: Data Structures and Algorithms

Authors: David Gillman, Jacob Platnick, Dana Randall

Weighted equitable partitioning of a graph has been of interest lately due to several applications, including redistricting, network algorithms, and image decomposition. Weighting a partition according to the spanning-tree metric has been of mathematical and practical interest because it typically favors partitions with more compact pieces. An appealing algorithm suggested by Charikar et al. is to sample a random spanning tree and remove k-1 edges, producing a random forest. If the components of the forest form a balanced partition, the partition is equitable under an easily computed acceptance probability. Cannon et al. recently showed that spanning trees on grid graphs and grid-like graphs on $n$ vertices are splittable into $k$ equal sized pieces with probability at least $n^{-2k}$, leading to the first rigorous sampling algorithm for a class of graphs. We present complementary results showing that spanning trees on dense random graphs also have inverse polynomial probability of being splittable, giving another class of graphs where equitable partitions can be efficiently sampled exactly. These proofs also guarantee fast almost-uniform sampling for the up-down walk on forests, giving another provably efficient randomized method for generating equitable partitions. Further, we show that problems with the well-studied ReCom algorithm for equitable partitioning are more extensive than previously known, even in special cases that were believed to be more promising. We present a family of graphs where the Markov chain fails to be irreducible when it must keep the components perfectly equitable; yet when the chain is allowed an imbalance of just one vertex between components, the rejection sampling step may take exponential time. This is true even when the graph satisfies desirable properties that have been conjectured to be sufficient for fast sampling.

Authors: David Gillman, Jacob Platnick, Dana Randall

Weighted equitable partitioning of a graph has been of interest lately due to several applications, including redistricting, network algorithms, and image decomposition. Weighting a partition according to the spanning-tree metric has been of mathematical and practical interest because it typically favors partitions with more compact pieces. An appealing algorithm suggested by Charikar et al. is to sample a random spanning tree and remove k-1 edges, producing a random forest. If the components of the forest form a balanced partition, the partition is equitable under an easily computed acceptance probability. Cannon et al. recently showed that spanning trees on grid graphs and grid-like graphs on $n$ vertices are splittable into $k$ equal sized pieces with probability at least $n^{-2k}$, leading to the first rigorous sampling algorithm for a class of graphs. We present complementary results showing that spanning trees on dense random graphs also have inverse polynomial probability of being splittable, giving another class of graphs where equitable partitions can be efficiently sampled exactly. These proofs also guarantee fast almost-uniform sampling for the up-down walk on forests, giving another provably efficient randomized method for generating equitable partitions. Further, we show that problems with the well-studied ReCom algorithm for equitable partitioning are more extensive than previously known, even in special cases that were believed to be more promising. We present a family of graphs where the Markov chain fails to be irreducible when it must keep the components perfectly equitable; yet when the chain is allowed an imbalance of just one vertex between components, the rejection sampling step may take exponential time. This is true even when the graph satisfies desirable properties that have been conjectured to be sufficient for fast sampling.

Computing and Bounding Equilibrium Concentrations in Athermic Chemical Systems

from arXiv: Data Structures and Algorithms

Authors: Hamidreza Akef, Minki Hhan, David Soloveichik

Computing equilibrium concentrations of molecular complexes is generally analytically intractable and requires numerical approaches. In this work we focus on the polymer-monomer level, where indivisible molecules (monomers) combine to form complexes (polymers). Rather than employing free-energy parameters for each polymer, we focus on the athermic setting where all interactions preserve enthalpy. This setting aligns with the strongly bonded (domain-based) regime in DNA nanotechnology when strands can bind in different ways, but always with maximum overall bonding -- and is consistent with the saturated configurations in the Thermodynamic Binding Networks (TBNs) model. Within this context, we develop an iterative algorithm for assigning polymer concentrations to satisfy detailed-balance, where on-target (desired) polymers are in high concentrations and off-target (undesired) polymers are in low. Even if not directly executed, our algorithm provides effective insights into upper bounds on concentration of off-target polymers, connecting combinatorial arguments about discrete configurations such as those in the TBN model to real-valued concentrations. We conclude with an application of our method to decreasing leak in DNA logic and signal propagation. Our results offer a new framework for design and verification of equilibrium concentrations when configurations are distinguished by entropic forces.

Authors: Hamidreza Akef, Minki Hhan, David Soloveichik

Computing equilibrium concentrations of molecular complexes is generally analytically intractable and requires numerical approaches. In this work we focus on the polymer-monomer level, where indivisible molecules (monomers) combine to form complexes (polymers). Rather than employing free-energy parameters for each polymer, we focus on the athermic setting where all interactions preserve enthalpy. This setting aligns with the strongly bonded (domain-based) regime in DNA nanotechnology when strands can bind in different ways, but always with maximum overall bonding -- and is consistent with the saturated configurations in the Thermodynamic Binding Networks (TBNs) model. Within this context, we develop an iterative algorithm for assigning polymer concentrations to satisfy detailed-balance, where on-target (desired) polymers are in high concentrations and off-target (undesired) polymers are in low. Even if not directly executed, our algorithm provides effective insights into upper bounds on concentration of off-target polymers, connecting combinatorial arguments about discrete configurations such as those in the TBN model to real-valued concentrations. We conclude with an application of our method to decreasing leak in DNA logic and signal propagation. Our results offer a new framework for design and verification of equilibrium concentrations when configurations are distinguished by entropic forces.

An EPTAS for multiprocessor scheduling with rejection under a machine cost constraint

from arXiv: Data Structures and Algorithms

Authors: Mingyang Gong, Brendan Mumey

We study the multiprocessor scheduling with rejection problem under a machine cost constraint. In this problem, each job is either rejected with a rejection penalty or; accepted and scheduled on one of the machines for processing. The machine cost is proportional to the total processing time of the jobs scheduled on it. The problem aims to minimize the makespan of accepted jobs plus the total rejection penalty of rejected jobs while the total machine cost does not exceed a given upper bound. We present a simple $2$-approximation algorithm for the problem and we achieve an EPTAS when the number $m$ of machines is a fixed constant.

Authors: Mingyang Gong, Brendan Mumey

We study the multiprocessor scheduling with rejection problem under a machine cost constraint. In this problem, each job is either rejected with a rejection penalty or; accepted and scheduled on one of the machines for processing. The machine cost is proportional to the total processing time of the jobs scheduled on it. The problem aims to minimize the makespan of accepted jobs plus the total rejection penalty of rejected jobs while the total machine cost does not exceed a given upper bound. We present a simple $2$-approximation algorithm for the problem and we achieve an EPTAS when the number $m$ of machines is a fixed constant.

Fast Approximate Rank Determination and Selection with Group Testing

from arXiv: Data Structures and Algorithms

Authors: Adiesha Liyanage, Braeden Sopp, Brendan Mumey

Suppose that a group test operation is available for checking order relations in a set, can this speed up problems like finding the minimum/maximum element, rank determination and selection? We consider a one-sided group test to be available, where queries are of the form $u \le_Q V$ or $V \le_Q u$, and the answer is `yes' if and only if there is some $v \in V$ such that $u \le v$ or $v \le u$, respectively. We restrict attention to total orders and focus on query-complexity; for min or max finding, we give a Las Vegas algorithm that makes $\mathcal{O}(\log^2 n)$ expected queries. We also give randomized approximate algorithms for rank determination and selection; we allow a relative error of $1 \pm \delta$ for $\delta > 0$ in the estimated rank or selected element. In this case, we give a Monte Carlo algorithm for approximate rank determination with expected query complexity $\tilde{\mathcal{O}}(1/\delta^2 - \log \epsilon)$, where $1-\epsilon$ is the probability that the algorithm succeeds. We also give a Monte Carlo algorithm for approximate selection that has expected query complexity $\tilde{\mathcal{O}}(-\log( \epsilon \delta^2) / \delta^4)$; it has probability at least $\frac{1}{2}$ to output an element $x$, and if so, $x$ has the desired approximate rank with probability $1-\epsilon$.

Authors: Adiesha Liyanage, Braeden Sopp, Brendan Mumey

Suppose that a group test operation is available for checking order relations in a set, can this speed up problems like finding the minimum/maximum element, rank determination and selection? We consider a one-sided group test to be available, where queries are of the form $u \le_Q V$ or $V \le_Q u$, and the answer is `yes' if and only if there is some $v \in V$ such that $u \le v$ or $v \le u$, respectively. We restrict attention to total orders and focus on query-complexity; for min or max finding, we give a Las Vegas algorithm that makes $\mathcal{O}(\log^2 n)$ expected queries. We also give randomized approximate algorithms for rank determination and selection; we allow a relative error of $1 \pm \delta$ for $\delta > 0$ in the estimated rank or selected element. In this case, we give a Monte Carlo algorithm for approximate rank determination with expected query complexity $\tilde{\mathcal{O}}(1/\delta^2 - \log \epsilon)$, where $1-\epsilon$ is the probability that the algorithm succeeds. We also give a Monte Carlo algorithm for approximate selection that has expected query complexity $\tilde{\mathcal{O}}(-\log( \epsilon \delta^2) / \delta^4)$; it has probability at least $\frac{1}{2}$ to output an element $x$, and if so, $x$ has the desired approximate rank with probability $1-\epsilon$.

Max-Cut with Multiple Cardinality Constraints

from arXiv: Data Structures and Algorithms

Authors: Yury Makarychev, Madhusudhan Reddy Pittu, Ali Vakilian

We study the classic Max-Cut problem under multiple cardinality constraints, which we refer to as the Constrained Max-Cut problem. Given a graph $G=(V, E)$, a partition of the vertices into $c$ disjoint parts $V_1, \ldots, V_c$, and cardinality parameters $k_1, \ldots, k_c$, the goal is to select a set $S \subseteq V$ such that $|S \cap V_i| = k_i$ for each $i \in [c]$, maximizing the total weight of edges crossing $S$ (i.e., edges with exactly one endpoint in $S$). By designing an approximate kernel for Constrained Max-Cut and building on the correlation rounding technique of Raghavendra and Tan (2012), we present a $(0.858 - \varepsilon)$-approximation algorithm for the problem when $c = O(1)$. The algorithm runs in time $O\left(\min\{k/\varepsilon, n\}^{\poly(c/\varepsilon)} + \poly(n)\right)$, where $k = \sum_{i \in [c]} k_i$ and $n=|V|$. This improves upon the $(\frac{1}{2} + \varepsilon_0)$-approximation of Feige and Langberg (2001) for $\maxcut_k$ (the special case when $c=1, k_1 = k$), and generalizes the $(0.858 - \varepsilon)$-approximation of Raghavendra and Tan (2012), which only applies when $\min\{k,n-k\}=\Omega(n)$ and does not handle multiple constraints. We also establish that, for general values of $c$, it is NP-hard to determine whether a feasible solution exists that cuts all edges. Finally, we present a $1/2$-approximation algorithm for Max-Cut under an arbitrary matroid constraint.

Authors: Yury Makarychev, Madhusudhan Reddy Pittu, Ali Vakilian

We study the classic Max-Cut problem under multiple cardinality constraints, which we refer to as the Constrained Max-Cut problem. Given a graph $G=(V, E)$, a partition of the vertices into $c$ disjoint parts $V_1, \ldots, V_c$, and cardinality parameters $k_1, \ldots, k_c$, the goal is to select a set $S \subseteq V$ such that $|S \cap V_i| = k_i$ for each $i \in [c]$, maximizing the total weight of edges crossing $S$ (i.e., edges with exactly one endpoint in $S$). By designing an approximate kernel for Constrained Max-Cut and building on the correlation rounding technique of Raghavendra and Tan (2012), we present a $(0.858 - \varepsilon)$-approximation algorithm for the problem when $c = O(1)$. The algorithm runs in time $O\left(\min\{k/\varepsilon, n\}^{\poly(c/\varepsilon)} + \poly(n)\right)$, where $k = \sum_{i \in [c]} k_i$ and $n=|V|$. This improves upon the $(\frac{1}{2} + \varepsilon_0)$-approximation of Feige and Langberg (2001) for $\maxcut_k$ (the special case when $c=1, k_1 = k$), and generalizes the $(0.858 - \varepsilon)$-approximation of Raghavendra and Tan (2012), which only applies when $\min\{k,n-k\}=\Omega(n)$ and does not handle multiple constraints. We also establish that, for general values of $c$, it is NP-hard to determine whether a feasible solution exists that cuts all edges. Finally, we present a $1/2$-approximation algorithm for Max-Cut under an arbitrary matroid constraint.

Thursday, July 17

The unpredictability conundrum

from Ben Recht

How do we establish the limits of prediction?

Scrolling through the deluge of people advertising their ICML papers on social media, I found myself bedeviled by a philosophical question about the logical reconstruction of machine learning. In the most famous machine learning papers, the central claims take the form:

Y is predictable from X, and method M demonstrates a particular accuracy on this prediction problem.

The AlexNet paper showed that ImageNet image classes were predictable from images, and a convolutional neural network achieved high accuracy. The GPT3 paper achieved high accuracy on a variety of natural language processing prediction problems with one-shot learning and transformer-based large language models. The Support Vector Network paper showed that support vector machines could recognize handwritten digits with high accuracy.

In these descriptions, I am being vague today about what precisely “X” and “Y” are. “X” could be streams of text, images, or electronic health records. “Y” could be translations, image classes, or cancer diagnoses. How papers build up these constructs and concepts also bugs me, but I will leave a full logical reconstruction of machine learning for another blog post.1

Today, I am interested in understanding the papers that argue the negative claim and whether they ever provide useful evidence. That is, the papers that assert:

Y is not predictable from X.

I call claims of this form “unpredictability arguments.” Papers making unpredictability arguments can get a lot of temporary traction in machine learning discourse. They give fuel to the petty battles inside the community. In our current landscape, they give ammunition for lay critiques of industry. They can even help bring on AI Winters if people take them seriously enough. The problem is they are much harder to justify as stated.

Unpredictability arguments can be purely theoretical. These might say something like “under this list of assumptions about how data is generated and this list of assumptions about how you build the prediction function, Y is not predictable from X.” Such arguments can be helpful for conceptualization, but they are too strongly tied to their assumptions. Yes, simple linear perceptrons can’t build an XOR of input bits. But simple multilayer perceptrons can. Moreover, if your prediction function is allowed to include simple numerical transformations of the input data, all of a sudden XOR is predictable by linear functions (that is, just add XOR as a feature.).

Theory bounds are weak because practitioners (and theorists for that matter) can simply change the rules. Yes, you can tell me that under some very contrived scenario, I can’t build a particular computer program. But machine learning practice is the apotheosis of computer science, fully embracing a Feyerabendian zeal for anything goes. I can always disagree with your assessment of how data is collected, generated, and processed. And there are an infinite number of computer programs I can write in PyTorch to disprove your unpredictability assertion.

Minsky and Papert’s arguments that I’m alluding to above—about whether perceptrons can compute parity—are a notable exception that proves the rule. No one really understands their arguments, and if you believed them without really digging into the details or trying some simple alternative prediction programs, you would have abandoned supervised learning for decades (oh wait, that’s actually what happened).

Now, what about experimental bounds? If someone writes a paper saying, “I did procedure P and found transformers were really bad at predicting Y from X,” what does this tell us? I’ve seen a lot of papers make such claims concerning LLMs and transformers. You’ll have people saying, “See, LLMs are stochastic parrots,” or “See, LLMs just learn epicycles.” In these papers, the predictability of the Y value is supposed to be obvious to the reader, and we’re supposed to be shocked that some machine learning model can’t predict it.

Sure, people can say all they want. As someone who wants better critiques of predictions, I’m never convinced by these papers, no matter how frequently they capture a viral zeitgeist. In the spirit of a Lakatosian offense, I can always counter by tweeting “your procedure is pretty arbitrary, my friend,” or “i don’t think that baseline is fair,” or “skill issue.” There is no reason to believe that the code shared in such papers is the only way for a method to predict Y from X. There is an infinite regress of hyperparameters, optimization variants, and so on. You only have so many forking paths you can explore in the garden before the conference deadline.

Most people would find competitions to be more compelling evidence. Machine learning is built upon competitive testing, leading to breakthroughs in proving Y predictable from X. Everyone loves the ImageNet competition. Benchmark scores are a central part of how companies showcase their latest language models. Can competitive testing “prove” Y is unpredictable?

What do we conclude when machine learning competitions fail? A notable example of failure is the Future of Families Challenge (formerly known as the Fragile Families Challenge). The goal here was to predict certain socioeconomic outcomes from complex, unstructured data recorded about a large birth cohort. After hundreds of researchers tried to predict the outcomes, the study concluded that the best predictions were not very accurate and were only slightly better than those from a simple benchmark model.”

What should we conclude from this competition? We could conclude that the Ys in this paper (including “material hardship,” GPA, “grit,” and eviction) are not predictable from the Xs. I could also conclude that poor predictability arises because the data is far worse than advertised (e.g., there is a lot of missing data in the dataset). I could conclude that the targets studied have poor construct validity. There’s a long line of objections that I can string together even when a competitive test fails to find predictability.

In any event, I don’t have good answers for how to think about this aspect of the philosophy of machine learning yet. I’m very much thinking out loud today! But I’m posting because I’d love to hear your thoughts on what would constitute compelling evidence to prove the impossibility of prediction.

Subscribe now

1

Really, it should be a paper. Let me know if you are soliciting invitations for special issues on the philosophy of engineering.

By Ben Recht

TR25-098 | IPS Lower Bounds for Formulas and Sum of1 ROABPs | Utsab Ghosal, Prerona Chatterjee, Partha Mukhopadhyay, Amit Sinhababu

from ECCC Papers

We give new lower bounds for the fragments of the Ideal Proof System (IPS) introduced by Grochow and Pitassi (JACM 2018). The Ideal Proof System is a central topic in algebraic proof complexity developed in the context of Nullstellensatz refutation (Beame, Impagliazzo, Krajicek, Pitassi, Pudlak, FOCS 1994) and simulates Extended Frege efficiently. Our main results are as follows. 1. mult-IPS_{Lin'}: We prove nearly quadratic-size formula lower bound for multilinear refutation (over the Boolean hypercube) of a variant of the subset-sum axiom polynomial. Extending this, we obtain a nearly matching qualitative statement for a constant degree target polynomial. 2. IPS_{Lin'}: Over the fields of characteristic zero, we prove exponential-size sum-of-ROABPs lower bound for the refutation of a variant of the subset-sum axiom polynomial. The result also extends over the fields of positive characteristics when the target polynomial is suitably modified. The modification is inspired by the recent results (Hakoniemi, Limaye, Tzameret, STOC 2024 and Behera, Limaye, Ramanathan, Srinivasan, ICALP 2025). The mult-IPS_{Lin'} lower bound result is obtained by combining the quadratic-size formula lower bound technique of Kalorkoti (SICOMP 1985) with some additional ideas. The proof technique of IPS_{Lin'} lower bound result is inspired by the recent lower bound result of Chatterjee, Kush, Saraf and Shpilka (CCC 2024).

We give new lower bounds for the fragments of the Ideal Proof System (IPS) introduced by Grochow and Pitassi (JACM 2018). The Ideal Proof System is a central topic in algebraic proof complexity developed in the context of Nullstellensatz refutation (Beame, Impagliazzo, Krajicek, Pitassi, Pudlak, FOCS 1994) and simulates Extended Frege efficiently. Our main results are as follows. 1. mult-IPS_{Lin'}: We prove nearly quadratic-size formula lower bound for multilinear refutation (over the Boolean hypercube) of a variant of the subset-sum axiom polynomial. Extending this, we obtain a nearly matching qualitative statement for a constant degree target polynomial. 2. IPS_{Lin'}: Over the fields of characteristic zero, we prove exponential-size sum-of-ROABPs lower bound for the refutation of a variant of the subset-sum axiom polynomial. The result also extends over the fields of positive characteristics when the target polynomial is suitably modified. The modification is inspired by the recent results (Hakoniemi, Limaye, Tzameret, STOC 2024 and Behera, Limaye, Ramanathan, Srinivasan, ICALP 2025). The mult-IPS_{Lin'} lower bound result is obtained by combining the quadratic-size formula lower bound technique of Kalorkoti (SICOMP 1985) with some additional ideas. The proof technique of IPS_{Lin'} lower bound result is inspired by the recent lower bound result of Chatterjee, Kush, Saraf and Shpilka (CCC 2024).

Which graph motif parameters count?

from arXiv: Computational Complexity

Authors: Markus Bläser, Radu Curticapean, Julian Dörfler, Christian Ikenmeyer

For a fixed graph H, the function #IndSub(H,*) maps graphs G to the count of induced H-copies in G; this function obviously "counts something" in that it has a combinatorial interpretation. Linear combinations of such functions are called graph motif parameters and have recently received significant attention in counting complexity after a seminal paper by Curticapean, Dell and Marx (STOC'17). We show that, among linear combinations of functions #IndSub(H,*) involving only graphs H without isolated vertices, precisely those with positive integer coefficients maintain a combinatorial interpretation. It is important to note that graph motif parameters can be nonnegative for all inputs G, even when some coefficients are negative. Formally, we show that evaluating any graph motif parameter with a negative coefficient is impossible in an oracle variant of #P, where an implicit graph is accessed by oracle queries. Our proof follows the classification of the relativizing closure properties of #P by Hertrampf, Vollmer, and Wagner (SCT'95) and the framework developed by Ikenmeyer and Pak (STOC'22), but our application of the required Ramsey theorem turns out to be more subtle, as graphs do not have the required Ramsey property. Our techniques generalize from graphs to relational structures, including colored graphs. Vastly generalizing this, we introduce motif parameters over categories that count occurrences of sub-objects in the category. We then prove a general dichotomy theorem that characterizes which such parameters have a combinatorial interpretation. Using known results in Ramsey theory for categories, we obtain a dichotomy for motif parameters of finite vector spaces as well as parameter sets.

Authors: Markus Bläser, Radu Curticapean, Julian Dörfler, Christian Ikenmeyer

For a fixed graph H, the function #IndSub(H,*) maps graphs G to the count of induced H-copies in G; this function obviously "counts something" in that it has a combinatorial interpretation. Linear combinations of such functions are called graph motif parameters and have recently received significant attention in counting complexity after a seminal paper by Curticapean, Dell and Marx (STOC'17). We show that, among linear combinations of functions #IndSub(H,*) involving only graphs H without isolated vertices, precisely those with positive integer coefficients maintain a combinatorial interpretation. It is important to note that graph motif parameters can be nonnegative for all inputs G, even when some coefficients are negative. Formally, we show that evaluating any graph motif parameter with a negative coefficient is impossible in an oracle variant of #P, where an implicit graph is accessed by oracle queries. Our proof follows the classification of the relativizing closure properties of #P by Hertrampf, Vollmer, and Wagner (SCT'95) and the framework developed by Ikenmeyer and Pak (STOC'22), but our application of the required Ramsey theorem turns out to be more subtle, as graphs do not have the required Ramsey property. Our techniques generalize from graphs to relational structures, including colored graphs. Vastly generalizing this, we introduce motif parameters over categories that count occurrences of sub-objects in the category. We then prove a general dichotomy theorem that characterizes which such parameters have a combinatorial interpretation. Using known results in Ramsey theory for categories, we obtain a dichotomy for motif parameters of finite vector spaces as well as parameter sets.

Searching for Falsified Clause in Random (log n)-CNFs is Hard for Randomized Communication

from arXiv: Computational Complexity

Authors: Artur Riazanov, Anastasia Sofronova, Dmitry Sokolov, Weiqiang Yuan

We show that for a randomly sampled unsatisfiable $O(\log n)$-CNF over $n$ variables the randomized two-party communication cost of finding a clause falsified by the given variable assignment is linear in $n$.

Authors: Artur Riazanov, Anastasia Sofronova, Dmitry Sokolov, Weiqiang Yuan

We show that for a randomly sampled unsatisfiable $O(\log n)$-CNF over $n$ variables the randomized two-party communication cost of finding a clause falsified by the given variable assignment is linear in $n$.

Online Block Packing

from arXiv: Data Structures and Algorithms

Authors: Ariel Ben Eliezer, Noam Nisan

We consider the algorithmic challenge that is faced by blockchains that have multidimensional block constraints and serve quasi-patient bidders. We provide online approximation algorithms for this problem, thus solving open problems left by [Babaioff and Nisan, EC 2025].

Authors: Ariel Ben Eliezer, Noam Nisan

We consider the algorithmic challenge that is faced by blockchains that have multidimensional block constraints and serve quasi-patient bidders. We provide online approximation algorithms for this problem, thus solving open problems left by [Babaioff and Nisan, EC 2025].

A near-complete resolution of the exponential-time complexity of k-opt for the traveling salesman problem

from arXiv: Data Structures and Algorithms

Authors: Sophia Heimann, Hung P. Hoang, Stefan Hougardy

The $k$-opt algorithm is one of the simplest and most widely used heuristics for solving the traveling salesman problem. Starting from an arbitrary tour, the $k$-opt algorithm improves the current tour in each iteration by exchanging up to $k$ edges. The algorithm continues until no further improvement of this kind is possible. For a long time, it remained an open question how many iterations the $k$-opt algorithm might require for small values of $k$, assuming the use of an optimal pivot rule. In this paper, we resolve this question for the cases $k = 3$ and $k = 4$ by proving that in both these cases an exponential number of iterations may be needed even if an optimal pivot rule is used. Combined with a recent result from Heimann, Hoang, and Hougardy (ICALP 2024), this provides a complete answer for all $k \geq 3$ regarding the number of iterations the $k$-opt algorithm may require under an optimal pivot rule. In addition we establish an analogous exponential lower bound for the 2.5-opt algorithm, a variant that generalizes 2-opt and is a restricted version of 3-opt. All our results hold for both the general and the metric traveling salesman problem.

Authors: Sophia Heimann, Hung P. Hoang, Stefan Hougardy

The $k$-opt algorithm is one of the simplest and most widely used heuristics for solving the traveling salesman problem. Starting from an arbitrary tour, the $k$-opt algorithm improves the current tour in each iteration by exchanging up to $k$ edges. The algorithm continues until no further improvement of this kind is possible. For a long time, it remained an open question how many iterations the $k$-opt algorithm might require for small values of $k$, assuming the use of an optimal pivot rule. In this paper, we resolve this question for the cases $k = 3$ and $k = 4$ by proving that in both these cases an exponential number of iterations may be needed even if an optimal pivot rule is used. Combined with a recent result from Heimann, Hoang, and Hougardy (ICALP 2024), this provides a complete answer for all $k \geq 3$ regarding the number of iterations the $k$-opt algorithm may require under an optimal pivot rule. In addition we establish an analogous exponential lower bound for the 2.5-opt algorithm, a variant that generalizes 2-opt and is a restricted version of 3-opt. All our results hold for both the general and the metric traveling salesman problem.

FastReChain: Highly Responsive and Low-Overhead Centralized Route Scheduling in Clos Datacenter Networks

from arXiv: Data Structures and Algorithms

Authors: Zihan Zhu, Dongchao Wu, Zhanbang Zhang, Jian Yang

Ever since Clos topologies were used in datacenter networks (DCNs), a practical centralized scheduling algorithm that supports dynamic scheduling has been absent. The introduction of optical switches in DCNs as a future-proof solution exacerbates this problem due to several properties of optical switches, such as the fact that they are generally bufferless and therefore rely on centralized scheduling, and that they have long switching times and therefore require the number of rearrangements to be minimized. In this paper, we propose a centralized scheduling algorithm that achieves theoretical maximum throughput even in one-rate bidirectional Clos networks, while producing schemes with near-minimal numbers of rearrangements. It is the only algorithm that directly supports bidirectional Clos networks and has a time efficiency high enough to support dynamic scheduling to date. For static minimal rewiring, its running time ranges from a fraction to a few hundredths of other algorithms, and the number of rearrangements has also been steadily improved, allowing for more frequent adjustments and less impact on ongoing communications. In addition, the algorithm is very flexible and can support various functional requirements in real-world environments. We achieve this result through the replacement chain concept and bitset optimization.

Authors: Zihan Zhu, Dongchao Wu, Zhanbang Zhang, Jian Yang

Ever since Clos topologies were used in datacenter networks (DCNs), a practical centralized scheduling algorithm that supports dynamic scheduling has been absent. The introduction of optical switches in DCNs as a future-proof solution exacerbates this problem due to several properties of optical switches, such as the fact that they are generally bufferless and therefore rely on centralized scheduling, and that they have long switching times and therefore require the number of rearrangements to be minimized. In this paper, we propose a centralized scheduling algorithm that achieves theoretical maximum throughput even in one-rate bidirectional Clos networks, while producing schemes with near-minimal numbers of rearrangements. It is the only algorithm that directly supports bidirectional Clos networks and has a time efficiency high enough to support dynamic scheduling to date. For static minimal rewiring, its running time ranges from a fraction to a few hundredths of other algorithms, and the number of rearrangements has also been steadily improved, allowing for more frequent adjustments and less impact on ongoing communications. In addition, the algorithm is very flexible and can support various functional requirements in real-world environments. We achieve this result through the replacement chain concept and bitset optimization.

Weighted $k$-Server Admits an Exponentially Competitive Algorithm

from arXiv: Data Structures and Algorithms

Authors: Adithya Bijoy, Ankit Mondal, Ashish Chiplunkar

The weighted $k$-server is a variant of the $k$-server problem, where the cost of moving a server is the server's weight times the distance through which it moves. The problem is famous for its intriguing properties and for evading standard techniques for designing and analyzing online algorithms. Even on uniform metric spaces with sufficiently many points, the deterministic competitive ratio of weighted $k$-server is known to increase doubly exponentially with respect to $k$, while the behavior of its randomized competitive ratio is not fully understood. Specifically, no upper bound better than doubly exponential is known, while the best known lower bound is singly exponential in $k$. In this paper, we close the exponential gap between these bounds by giving an $\exp(O(k^2))$-competitive randomized online algorithm for the weighted $k$-server problem on uniform metrics, thus breaking the doubly exponential barrier for deterministic algorithms for the first time. This is achieved by a recursively defined notion of a phase which, on the one hand, forces a lower bound on the cost of any offline solution, while, on the other hand, also admits a randomized online algorithm with bounded expected cost. The algorithm is also recursive; it involves running several algorithms virtually and in parallel and following the decisions of one of them in a random order. We also show that our techniques can be lifted to construct an $\exp(O(k^2))$-competitive randomized online algorithm for the generalized $k$-server problem on weighted uniform metrics.

Authors: Adithya Bijoy, Ankit Mondal, Ashish Chiplunkar

The weighted $k$-server is a variant of the $k$-server problem, where the cost of moving a server is the server's weight times the distance through which it moves. The problem is famous for its intriguing properties and for evading standard techniques for designing and analyzing online algorithms. Even on uniform metric spaces with sufficiently many points, the deterministic competitive ratio of weighted $k$-server is known to increase doubly exponentially with respect to $k$, while the behavior of its randomized competitive ratio is not fully understood. Specifically, no upper bound better than doubly exponential is known, while the best known lower bound is singly exponential in $k$. In this paper, we close the exponential gap between these bounds by giving an $\exp(O(k^2))$-competitive randomized online algorithm for the weighted $k$-server problem on uniform metrics, thus breaking the doubly exponential barrier for deterministic algorithms for the first time. This is achieved by a recursively defined notion of a phase which, on the one hand, forces a lower bound on the cost of any offline solution, while, on the other hand, also admits a randomized online algorithm with bounded expected cost. The algorithm is also recursive; it involves running several algorithms virtually and in parallel and following the decisions of one of them in a random order. We also show that our techniques can be lifted to construct an $\exp(O(k^2))$-competitive randomized online algorithm for the generalized $k$-server problem on weighted uniform metrics.

Pathfinding in Self-Deleting Graphs

from arXiv: Data Structures and Algorithms

Authors: Michal Dvořák, Dušan Knop, Michal Opler, Jan Pokorný, Ondřej Suchý, Krisztina Szilágyi

In this paper, we study the problem of pathfinding on traversal-dependent graphs, i.e., graphs whose edges change depending on the previously visited vertices. In particular, we study \emph{self-deleting graphs}, introduced by Carmesin et al. (Sarah Carmesin, David Woller, David Parker, Miroslav Kulich, and Masoumeh Mansouri. The Hamiltonian cycle and travelling salesperson problems with traversal-dependent edge deletion. J. Comput. Sci.), which consist of a graph $G=(V, E)$ and a function $f\colon V\rightarrow 2^E$, where $f(v)$ is the set of edges that will be deleted after visiting the vertex $v$. In the \textsc{(Shortest) Self-Deleting $s$-$t$-path} problem we are given a self-deleting graph and its vertices $s$ and $t$, and we are asked to find a (shortest) path from $s$ to $t$, such that it does not traverse an edge in $f(v)$ after visiting $v$ for any vertex $v$. We prove that \textsc{Self-Deleting $s$-$t$-path} is NP-hard even if the given graph is outerplanar, bipartite, has maximum degree $3$, bandwidth $2$ and $|f(v)|\leq 1$ for each vertex $v$. We show that \textsc{Shortest Self-Deleting $s$-$t$-path} is W[1]-complete parameterized by the length of the sought path and that \textsc{Self-Deleting $s$-$t$-path} is \W{1}-complete parameterized by the vertex cover number, feedback vertex set number and treedepth. We also show that the problem becomes FPT when we parameterize by the maximum size of $f(v)$ and several structural parameters. Lastly, we show that the problem does not admit a polynomial kernel even for parameterization by the vertex cover number and the maximum size of $f(v)$ combined already on 2-outerplanar graphs.

Authors: Michal Dvořák, Dušan Knop, Michal Opler, Jan Pokorný, Ondřej Suchý, Krisztina Szilágyi

In this paper, we study the problem of pathfinding on traversal-dependent graphs, i.e., graphs whose edges change depending on the previously visited vertices. In particular, we study \emph{self-deleting graphs}, introduced by Carmesin et al. (Sarah Carmesin, David Woller, David Parker, Miroslav Kulich, and Masoumeh Mansouri. The Hamiltonian cycle and travelling salesperson problems with traversal-dependent edge deletion. J. Comput. Sci.), which consist of a graph $G=(V, E)$ and a function $f\colon V\rightarrow 2^E$, where $f(v)$ is the set of edges that will be deleted after visiting the vertex $v$. In the \textsc{(Shortest) Self-Deleting $s$-$t$-path} problem we are given a self-deleting graph and its vertices $s$ and $t$, and we are asked to find a (shortest) path from $s$ to $t$, such that it does not traverse an edge in $f(v)$ after visiting $v$ for any vertex $v$. We prove that \textsc{Self-Deleting $s$-$t$-path} is NP-hard even if the given graph is outerplanar, bipartite, has maximum degree $3$, bandwidth $2$ and $|f(v)|\leq 1$ for each vertex $v$. We show that \textsc{Shortest Self-Deleting $s$-$t$-path} is W[1]-complete parameterized by the length of the sought path and that \textsc{Self-Deleting $s$-$t$-path} is \W{1}-complete parameterized by the vertex cover number, feedback vertex set number and treedepth. We also show that the problem becomes FPT when we parameterize by the maximum size of $f(v)$ and several structural parameters. Lastly, we show that the problem does not admit a polynomial kernel even for parameterization by the vertex cover number and the maximum size of $f(v)$ combined already on 2-outerplanar graphs.

Kernelization for list $H$-coloring for graphs with small vertex cover

from arXiv: Data Structures and Algorithms

Authors: Marta Piecyk, Astrid Pieterse, Paweł Rzążewski, Magnus Wahlström

For a fixed graph $H$, in the List $H$-Coloring problem, we are given a graph $G$ along with list $L(v) \subseteq V(H)$ for every $v \in V(G)$, and we have to determine if there exists a list homomorphism $\varphi$ from $(G,L)$ to $H$, i.e., an edge preserving mapping $\varphi: V(G)\to V(H)$ that satisfies $\varphi(v)\in L(v)$ for every $v\in V(G)$. Note that if $H$ is the complete graph on $q$ vertices, the problem is equivalent to List $q$-Coloring. We investigate the kernelization properties of List $H$-Coloring parameterized by the vertex cover number of $G$: given an instance $(G,L)$ and a vertex cover of $G$ of size $k$, can we reduce $(G,L)$ to an equivalent instance $(G',L')$ of List $H$-Coloring where the size of $G'$ is bounded by a low-degree polynomial $p(k)$ in $k$? This question has been investigated previously by Jansen and Pieterse [Algorithmica 2019], who provided an upper bound, which turns out to be optimal if $H$ is a complete graph, i.e., for List $q$-Coloring. This result was one of the first applications of the method of kernelization via bounded-degree polynomials. We define two new integral graph invariants, $c^*(H)$ and $d^*(H)$, with $d^*(H) \leq c^*(H) \leq d^*(H)+1$, and show that for every graph $H$, List $H$-Coloring -- has a kernel with $\mathcal{O}(k^{c^*(H)})$ vertices, -- admits no kernel of size $\mathcal{O}(k^{d^*(H)-\varepsilon})$ for any $\varepsilon > 0$, unless the polynomial hierarchy collapses. -- Furthermore, if $c^*(H) > d^*(H)$, then there is a kernel with $\mathcal{O}(k^{c^*(H)-\varepsilon})$ vertices where $\varepsilon \geq 2^{1-c^*(H)}$. Additionally, we show that for some classes of graphs, including powers of cycles and graphs $H$ where $\Delta(H) \leq c^*(H)$ (which in particular includes cliques), the bound $d^*(H)$ is tight, using the polynomial method. We conjecture that this holds in general.

Authors: Marta Piecyk, Astrid Pieterse, Paweł Rzążewski, Magnus Wahlström

For a fixed graph $H$, in the List $H$-Coloring problem, we are given a graph $G$ along with list $L(v) \subseteq V(H)$ for every $v \in V(G)$, and we have to determine if there exists a list homomorphism $\varphi$ from $(G,L)$ to $H$, i.e., an edge preserving mapping $\varphi: V(G)\to V(H)$ that satisfies $\varphi(v)\in L(v)$ for every $v\in V(G)$. Note that if $H$ is the complete graph on $q$ vertices, the problem is equivalent to List $q$-Coloring. We investigate the kernelization properties of List $H$-Coloring parameterized by the vertex cover number of $G$: given an instance $(G,L)$ and a vertex cover of $G$ of size $k$, can we reduce $(G,L)$ to an equivalent instance $(G',L')$ of List $H$-Coloring where the size of $G'$ is bounded by a low-degree polynomial $p(k)$ in $k$? This question has been investigated previously by Jansen and Pieterse [Algorithmica 2019], who provided an upper bound, which turns out to be optimal if $H$ is a complete graph, i.e., for List $q$-Coloring. This result was one of the first applications of the method of kernelization via bounded-degree polynomials. We define two new integral graph invariants, $c^*(H)$ and $d^*(H)$, with $d^*(H) \leq c^*(H) \leq d^*(H)+1$, and show that for every graph $H$, List $H$-Coloring -- has a kernel with $\mathcal{O}(k^{c^*(H)})$ vertices, -- admits no kernel of size $\mathcal{O}(k^{d^*(H)-\varepsilon})$ for any $\varepsilon > 0$, unless the polynomial hierarchy collapses. -- Furthermore, if $c^*(H) > d^*(H)$, then there is a kernel with $\mathcal{O}(k^{c^*(H)-\varepsilon})$ vertices where $\varepsilon \geq 2^{1-c^*(H)}$. Additionally, we show that for some classes of graphs, including powers of cycles and graphs $H$ where $\Delta(H) \leq c^*(H)$ (which in particular includes cliques), the bound $d^*(H)$ is tight, using the polynomial method. We conjecture that this holds in general.

Approaching Optimality for Solving Dense Linear Systems with Low-Rank Structure

from arXiv: Data Structures and Algorithms

Authors: Michał Dereziński, Aaron Sidford

We provide new high-accuracy randomized algorithms for solving linear systems and regression problems that are well-conditioned except for $k$ large singular values. For solving such $d \times d$ positive definite system our algorithms succeed whp. and run in time $\tilde O(d^2 + k^\omega)$. For solving such regression problems in a matrix $\mathbf{A} \in \mathbb{R}^{n \times d}$ our methods succeed whp. and run in time $\tilde O(\mathrm{nnz}(\mathbf{A}) + d^2 + k^\omega)$ where $\omega$ is the matrix multiplication exponent and $\mathrm{nnz}(\mathbf{A})$ is the number of non-zeros in $\mathbf{A}$. Our methods nearly-match a natural complexity limit under dense inputs for these problems and improve upon a trade-off in prior approaches that obtain running times of either $\tilde O(d^{2.065}+k^\omega)$ or $\tilde O(d^2 + dk^{\omega-1})$ for $d\times d$ systems. Moreover, we show how to obtain these running times even under the weaker assumption that all but $k$ of the singular values have a suitably bounded generalized mean. Consequently, we give the first nearly-linear time algorithm for computing a multiplicative approximation to the nuclear norm of an arbitrary dense matrix. Our algorithms are built on three general recursive preconditioning frameworks, where matrix sketching and low-rank update formulas are carefully tailored to the problems' structure.

Authors: Michał Dereziński, Aaron Sidford

We provide new high-accuracy randomized algorithms for solving linear systems and regression problems that are well-conditioned except for $k$ large singular values. For solving such $d \times d$ positive definite system our algorithms succeed whp. and run in time $\tilde O(d^2 + k^\omega)$. For solving such regression problems in a matrix $\mathbf{A} \in \mathbb{R}^{n \times d}$ our methods succeed whp. and run in time $\tilde O(\mathrm{nnz}(\mathbf{A}) + d^2 + k^\omega)$ where $\omega$ is the matrix multiplication exponent and $\mathrm{nnz}(\mathbf{A})$ is the number of non-zeros in $\mathbf{A}$. Our methods nearly-match a natural complexity limit under dense inputs for these problems and improve upon a trade-off in prior approaches that obtain running times of either $\tilde O(d^{2.065}+k^\omega)$ or $\tilde O(d^2 + dk^{\omega-1})$ for $d\times d$ systems. Moreover, we show how to obtain these running times even under the weaker assumption that all but $k$ of the singular values have a suitably bounded generalized mean. Consequently, we give the first nearly-linear time algorithm for computing a multiplicative approximation to the nuclear norm of an arbitrary dense matrix. Our algorithms are built on three general recursive preconditioning frameworks, where matrix sketching and low-rank update formulas are carefully tailored to the problems' structure.

Finite Pinwheel Scheduling: the k-Visits Problem

from arXiv: Data Structures and Algorithms

Authors: Sotiris Kanellopoulos, Christos Pergaminelis, Maria Kokkou, Euripides Markou, Aris Pagourtzis

Pinwheel Scheduling is a fundamental scheduling problem, in which each task $i$ is associated with a positive integer $d_i$, and the objective is to schedule one task per time slot, ensuring each task perpetually appears at least once in every $d_i$ time slots. Although conjectured to be PSPACE-complete, it remains open whether Pinwheel Scheduling is NP-hard (unless a compact input encoding is used) or even contained in NP. We introduce k-Visits, a finite version of Pinwheel Scheduling, where given n deadlines, the goal is to schedule each task exactly k times. While we observe that the 1-Visit problem is trivial, we prove that 2-Visits is strongly NP-complete through a surprising reduction from Numerical 3-Dimensional Matching (N3DM). As intermediate steps in the reduction, we define NP-complete variants of N3DM which may be of independent interest. We further extend our strong NP-hardness result to a generalization of k-Visits $k\geq 2$ in which the deadline of each task may vary throughout the schedule, as well as to a similar generalization of Pinwheel Scheduling, thus making progress towards settling the complexity of Pinwheel Scheduling. Additionally, we prove that 2-Visits can be solved in linear time if all deadlines are distinct, rendering it one of the rare natural problems which exhibit the interesting dichotomy of being in P if their input is a set and NP-complete if the input is a multiset. We achieve this through a Turing reduction from 2-Visits to a variation of N3DM, which we call Position Matching. Based on this reduction, we also show an FPT algorithm for 2-Visits parameterized by a value related to how close the input deadlines are to each other, as well as a linear-time algorithm for instances with up to two distinct deadlines.

Authors: Sotiris Kanellopoulos, Christos Pergaminelis, Maria Kokkou, Euripides Markou, Aris Pagourtzis

Pinwheel Scheduling is a fundamental scheduling problem, in which each task $i$ is associated with a positive integer $d_i$, and the objective is to schedule one task per time slot, ensuring each task perpetually appears at least once in every $d_i$ time slots. Although conjectured to be PSPACE-complete, it remains open whether Pinwheel Scheduling is NP-hard (unless a compact input encoding is used) or even contained in NP. We introduce k-Visits, a finite version of Pinwheel Scheduling, where given n deadlines, the goal is to schedule each task exactly k times. While we observe that the 1-Visit problem is trivial, we prove that 2-Visits is strongly NP-complete through a surprising reduction from Numerical 3-Dimensional Matching (N3DM). As intermediate steps in the reduction, we define NP-complete variants of N3DM which may be of independent interest. We further extend our strong NP-hardness result to a generalization of k-Visits $k\geq 2$ in which the deadline of each task may vary throughout the schedule, as well as to a similar generalization of Pinwheel Scheduling, thus making progress towards settling the complexity of Pinwheel Scheduling. Additionally, we prove that 2-Visits can be solved in linear time if all deadlines are distinct, rendering it one of the rare natural problems which exhibit the interesting dichotomy of being in P if their input is a set and NP-complete if the input is a multiset. We achieve this through a Turing reduction from 2-Visits to a variation of N3DM, which we call Position Matching. Based on this reduction, we also show an FPT algorithm for 2-Visits parameterized by a value related to how close the input deadlines are to each other, as well as a linear-time algorithm for instances with up to two distinct deadlines.

Wednesday, July 16

TR25-097 | On the Limits of Computationally Sound IPPs in the Isolated Model | Hadar Strauss

from ECCC Papers

Interactive proofs of proximity (IPPs) are a relaxation of interactive proofs, analogous to property testing, in which soundness is required to hold only for inputs that are far from the property being verified. In such proof systems, the verifier has oracle access to the input, and it engages in two types of activities before making its decision: querying the input oracle and communicating with the prover. The main objective is to achieve protocols where both the query and communication complexities are extremely low. Of particular interest are IPPs in which the querying and the interacting activities are performed independently, with no information flow from one activity to the other. Such IPPs were systematically studied by Goldreich, Rothblum, and Skverer (ITCS 2023), who introduced two variants: the pre-coordinated model, where the querying and interacting activities may use a common source of randomness, and the isolated model, where the two activities are fully independent, each operating with a separate source of randomness. We focus on what is possible under these models when soundness is relaxed to computational soundness. Our previous work (ECCC, TR24-131) showed that the pre-coordinated model becomes significantly more powerful under this relaxation. In this work, we consider the isolated model under the same relaxation and show a separation between the two models. We consider a property that, by our previous work, has a computationally sound IPP in the pre-coordinated model with poly-logarithmic complexities (assuming the existence of collision-resistant hashing functions), and show that any computationally sound IPP in the isolated model for this property must have either query complexity or communication complexity that is $n^{\Omega(1)}$, where $n$ is the length of the input.

Interactive proofs of proximity (IPPs) are a relaxation of interactive proofs, analogous to property testing, in which soundness is required to hold only for inputs that are far from the property being verified. In such proof systems, the verifier has oracle access to the input, and it engages in two types of activities before making its decision: querying the input oracle and communicating with the prover. The main objective is to achieve protocols where both the query and communication complexities are extremely low. Of particular interest are IPPs in which the querying and the interacting activities are performed independently, with no information flow from one activity to the other. Such IPPs were systematically studied by Goldreich, Rothblum, and Skverer (ITCS 2023), who introduced two variants: the pre-coordinated model, where the querying and interacting activities may use a common source of randomness, and the isolated model, where the two activities are fully independent, each operating with a separate source of randomness. We focus on what is possible under these models when soundness is relaxed to computational soundness. Our previous work (ECCC, TR24-131) showed that the pre-coordinated model becomes significantly more powerful under this relaxation. In this work, we consider the isolated model under the same relaxation and show a separation between the two models. We consider a property that, by our previous work, has a computationally sound IPP in the pre-coordinated model with poly-logarithmic complexities (assuming the existence of collision-resistant hashing functions), and show that any computationally sound IPP in the isolated model for this property must have either query complexity or communication complexity that is $n^{\Omega(1)}$, where $n$ is the length of the input.

Turing, Wagner, Ruth

from Computational Complexity

Douglas Hofstadter first published Gödel, Escher, Bach: an Eternal Golden Braid in 1979 and my then high school self tried, and failed, to read though the entire book. It focused on the contradictions, with Kurt Gödel's incompleteness theorems, M. C. Escher's Drawing Hands and Johann Sebastian Bach's Canon a 2 per tonos, a piece that keeps rising until it ends a whole tone higher than it started.

I'd prefer to focus less on the paradoxes and self-reference to the true beauty and complexity of computation. So now having had a long career in the field, who would I call on to capture the power and beauty of computing?

It has to start with Alan Turing. His seminal paper On Computable Numbers, with an Application to the Entscheidungsproblem gives a clean model for computation and still the best argument (Section 9) for why this simple model captures everything computable. The Entscheidungsproblem, that you can't mechanize all of mathematics, comes as a consequence, not as a goal of the paper. In a much later paper, The Chemical Basis of Morphogenesis, he shows how the beauty of nature can emerge naturally from computation, which of course we now know much better arises from discrete DNA sequences.

For music, instead of Bach's abstract works, I prefer to focus on the emotional power of music that still arises from a musical score that is not unlike a computer program in the way it lays out the composition. Take for example Richard Wagner's Prelude and Liebestod (the beginning and the end of his opera Tristan and Isolde). It captures the tension of the two lovers from the very first notes and keeps that tension going until it resolves at the very end.

While soccer and basketball have mostly continuous play, I prefer that great American game of baseball that after each pitch has a very discrete state space that stadiums would capture with a few light bulbs (strikes, balls, outs), and yet could keep the excitement and tension on every one of those pitches. No one dominated the game more in his time than George Herman "Babe" Ruth, who I might have admittedly also chose to keep the same syllable cadence as Hofstadter.

So let's thank Turing, Wagner, Ruth, and the many others that showed we can show how incredible complexity and beauty can arise in the simplicity of computation. So many stories to tell. 

By Lance Fortnow

Douglas Hofstadter first published Gödel, Escher, Bach: an Eternal Golden Braid in 1979 and my then high school self tried, and failed, to read though the entire book. It focused on the contradictions, with Kurt Gödel's incompleteness theorems, M. C. Escher's Drawing Hands and Johann Sebastian Bach's Canon a 2 per tonos, a piece that keeps rising until it ends a whole tone higher than it started.

I'd prefer to focus less on the paradoxes and self-reference to the true beauty and complexity of computation. So now having had a long career in the field, who would I call on to capture the power and beauty of computing?

It has to start with Alan Turing. His seminal paper On Computable Numbers, with an Application to the Entscheidungsproblem gives a clean model for computation and still the best argument (Section 9) for why this simple model captures everything computable. The Entscheidungsproblem, that you can't mechanize all of mathematics, comes as a consequence, not as a goal of the paper. In a much later paper, The Chemical Basis of Morphogenesis, he shows how the beauty of nature can emerge naturally from computation, which of course we now know much better arises from discrete DNA sequences.

For music, instead of Bach's abstract works, I prefer to focus on the emotional power of music that still arises from a musical score that is not unlike a computer program in the way it lays out the composition. Take for example Richard Wagner's Prelude and Liebestod (the beginning and the end of his opera Tristan and Isolde). It captures the tension of the two lovers from the very first notes and keeps that tension going until it resolves at the very end.

While soccer and basketball have mostly continuous play, I prefer that great American game of baseball that after each pitch has a very discrete state space that stadiums would capture with a few light bulbs (strikes, balls, outs), and yet could keep the excitement and tension on every one of those pitches. No one dominated the game more in his time than George Herman "Babe" Ruth, who I might have admittedly also chose to keep the same syllable cadence as Hofstadter.

So let's thank Turing, Wagner, Ruth, and the many others that showed we can show how incredible complexity and beauty can arise in the simplicity of computation. So many stories to tell. 

By Lance Fortnow

TR25-096 | Searching for Falsified Clause in Random $(\log{n})$-CNFs is Hard for Randomized Communication | Artur Riazanov, Anastasia Sofronova, Dmitry Sokolov, Weiqiang Yuan

from ECCC Papers

We show that for a randomly sampled unsatisfiable $O(\log n)$-CNF over $n$ variables the randomized two-party communication cost of finding a clause falsified by the given variable assignment is linear in $n$.

We show that for a randomly sampled unsatisfiable $O(\log n)$-CNF over $n$ variables the randomized two-party communication cost of finding a clause falsified by the given variable assignment is linear in $n$.

On the Complexity of the Optimal Correlated Equilibria in Extensive-Form Games

from arXiv: Computational Complexity

Authors: Vincent Cheval, Florian Horn, Soumyajit Paul, Mahsa Shirmohammadi

A major open question in algorithmic game theory is whether normal-form correlated equilibria (NFCE) can be computed efficiently in succinct games such as extensive-form games [DFF+25,6PR24,FP23,HvS08,VSF08,PR08]. Motivated by this question, we study the associated Threshold problem: deciding whether there exists a correlated equilibrium whose value exceeds a given threshold. We prove that this problem is PSPACE-hard for NFCE in multiplayer extensive-form games with perfect recall, even for fixed thresholds. To contextualize this result, we also establish the complexity of the Threshold problem for Nash equilibria in this setting, showing it is ER-complete. These results uncover a surprising complexity reversal: while optimal correlated equilibria are computationally simpler than optimal Nash in normal-form games, the opposite holds in extensive-form games, where computing optimal correlated equilibria is provably harder. Building on this line of inquiry, we also address a related question by [VSF08], who introduced the notions of extensive-form correlated equilibrium (EFCE) and agent-form correlated equilibrium (AFCE). They asked how difficult the Threshold problem is for AFCE; we answer this question by proving that it is NP-hard, even in two-player games without chance nodes. Complementing our hardness results, we establish tight complexity classifications for the Threshold problem across several correlated equilibrium concepts - including EFCE, AFCE, normal-form coarse, extensive-form coarse, and agent-form coarse correlated equilibria. For each of these solution concepts in multiplayer stochastic extensive-form games with perfect recall, we prove NP-completeness by providing matching NP upper bounds to the previously known hardness results. Together, our results provide the most complete landscape to date for the complexity of optimal equilibrium computation in extensive-form games.

Authors: Vincent Cheval, Florian Horn, Soumyajit Paul, Mahsa Shirmohammadi

A major open question in algorithmic game theory is whether normal-form correlated equilibria (NFCE) can be computed efficiently in succinct games such as extensive-form games [DFF+25,6PR24,FP23,HvS08,VSF08,PR08]. Motivated by this question, we study the associated Threshold problem: deciding whether there exists a correlated equilibrium whose value exceeds a given threshold. We prove that this problem is PSPACE-hard for NFCE in multiplayer extensive-form games with perfect recall, even for fixed thresholds. To contextualize this result, we also establish the complexity of the Threshold problem for Nash equilibria in this setting, showing it is ER-complete. These results uncover a surprising complexity reversal: while optimal correlated equilibria are computationally simpler than optimal Nash in normal-form games, the opposite holds in extensive-form games, where computing optimal correlated equilibria is provably harder. Building on this line of inquiry, we also address a related question by [VSF08], who introduced the notions of extensive-form correlated equilibrium (EFCE) and agent-form correlated equilibrium (AFCE). They asked how difficult the Threshold problem is for AFCE; we answer this question by proving that it is NP-hard, even in two-player games without chance nodes. Complementing our hardness results, we establish tight complexity classifications for the Threshold problem across several correlated equilibrium concepts - including EFCE, AFCE, normal-form coarse, extensive-form coarse, and agent-form coarse correlated equilibria. For each of these solution concepts in multiplayer stochastic extensive-form games with perfect recall, we prove NP-completeness by providing matching NP upper bounds to the previously known hardness results. Together, our results provide the most complete landscape to date for the complexity of optimal equilibrium computation in extensive-form games.

On the Complexity of the Skolem Problem at Low Orders

from arXiv: Computational Complexity

Authors: Piotr Bacik, Joël Ouaknine, James Worrell

The Skolem Problem asks to determine whether a given linear recurrence sequence (LRS) $\langle u_n \rangle_{n=0}^\infty$ over the integers has a zero term, that is, whether there exists $n$ such that $u_n = 0$. Decidability of the problem is open in general, with the most notable positive result being a decision procedure for LRS of order at most 4. In this paper we consider a bounded version of the Skolem Problem, in which the input consists of an LRS $\langle u_n \rangle_{n=0}^\infty$ and a bound $N \in \mathbb N$ (with all integers written in binary), and the task is to determine whether there exists $n\in\{0,\ldots,N\}$ such that $u_n=0$. We give a randomised algorithm for this problem that, for all $d\in \mathbb N$, runs in polynomial time on the class of LRS of order at most $d$. As a corollary we show that the (unrestricted) Skolem Problem for LRS of order at most 4 lies in $\mathsf{coRP}$, improving the best previous upper bound of $\mathsf{NP}^{\mathsf{RP}}$. The running time of our algorithm is exponential in the order of the LRS -- a dependence that appears necessary in view of the $\mathsf{NP}$-hardness of the Bounded Skolem Problem. However, even for LRS of a fixed order, the problem involves detecting zeros within an exponentially large range. For this, our algorithm relies on results from $p$-adic analysis to isolate polynomially many candidate zeros and then test in randomised polynomial time whether each candidate is an actual zero by reduction to arithmetic-circuit identity testing.

Authors: Piotr Bacik, Joël Ouaknine, James Worrell

The Skolem Problem asks to determine whether a given linear recurrence sequence (LRS) $\langle u_n \rangle_{n=0}^\infty$ over the integers has a zero term, that is, whether there exists $n$ such that $u_n = 0$. Decidability of the problem is open in general, with the most notable positive result being a decision procedure for LRS of order at most 4. In this paper we consider a bounded version of the Skolem Problem, in which the input consists of an LRS $\langle u_n \rangle_{n=0}^\infty$ and a bound $N \in \mathbb N$ (with all integers written in binary), and the task is to determine whether there exists $n\in\{0,\ldots,N\}$ such that $u_n=0$. We give a randomised algorithm for this problem that, for all $d\in \mathbb N$, runs in polynomial time on the class of LRS of order at most $d$. As a corollary we show that the (unrestricted) Skolem Problem for LRS of order at most 4 lies in $\mathsf{coRP}$, improving the best previous upper bound of $\mathsf{NP}^{\mathsf{RP}}$. The running time of our algorithm is exponential in the order of the LRS -- a dependence that appears necessary in view of the $\mathsf{NP}$-hardness of the Bounded Skolem Problem. However, even for LRS of a fixed order, the problem involves detecting zeros within an exponentially large range. For this, our algorithm relies on results from $p$-adic analysis to isolate polynomially many candidate zeros and then test in randomised polynomial time whether each candidate is an actual zero by reduction to arithmetic-circuit identity testing.

Equality is Far Weaker than Constant-Cost Communication

from arXiv: Computational Complexity

Authors: Mika Göös, Nathaniel Harms, Artur Riazanov

We exhibit an $n$-bit communication problem with a constant-cost randomized protocol but which requires $n^{\Omega(1)}$ deterministic (or even non-deterministic) queries to an Equality oracle. Therefore, even constant-cost randomized protocols cannot be efficiently "derandomized" using Equality oracles. This improves on several recent results and answers a question from the survey of Hatami and Hatami (SIGACT News 2024). It also gives a significantly simpler and quantitatively superior proof of the main result of Fang, G\"o\"os, Harms, and Hatami ( STOC 2025), that constant-cost communication does not reduce to the $k$-Hamming Distance hierarchy.

Authors: Mika Göös, Nathaniel Harms, Artur Riazanov

We exhibit an $n$-bit communication problem with a constant-cost randomized protocol but which requires $n^{\Omega(1)}$ deterministic (or even non-deterministic) queries to an Equality oracle. Therefore, even constant-cost randomized protocols cannot be efficiently "derandomized" using Equality oracles. This improves on several recent results and answers a question from the survey of Hatami and Hatami (SIGACT News 2024). It also gives a significantly simpler and quantitatively superior proof of the main result of Fang, G\"o\"os, Harms, and Hatami ( STOC 2025), that constant-cost communication does not reduce to the $k$-Hamming Distance hierarchy.

Eigenvalue Bounds for Symmetric Markov Chains on Multislices With Applications

from arXiv: Computational Complexity

Authors: Prashanth Amireddy, Amik Raj Behera, Srikanth Srinivasan, Madhu Sudan

We consider random walks on ``balanced multislices'' of any ``grid'' that respects the ``symmetries'' of the grid, and show that a broad class of such walks are good spectral expanders. (A grid is a set of points of the form $\mathcal{S}^n$ for finite $\mathcal{S}$, and a balanced multi-slice is the subset that contains an equal number of coordinates taking every value in $\mathcal{S}$. A walk respects symmetries if the probability of going from $u = (u_1,\ldots,u_n)$ to $v = (v_1,\ldots,v_n)$ is invariant under simultaneous permutations of the coordinates of $u$ and $v$.) Our main theorem shows that, under some technical conditions, every such walk where a single step leads to an almost $\mathcal{O}(1)$-wise independent distribution on the next state, conditioned on the previous state, satisfies a non-trivially small singular value bound. We give two applications of our theorem to error-correcting codes: (1) We give an analog of the Ore-DeMillo-Lipton-Schwartz-Zippel lemma for polynomials, and junta-sums, over balanced multislices. (2) We also give a local list-correction algorithm for $d$-junta-sums mapping an arbitrary grid $\mathcal{S}^n$ to an Abelian group, correcting from a near-optimal $(\frac{1}{|\mathcal{S}|^{d}} - \varepsilon)$ fraction of errors for every $\varepsilon > 0$, where a $d$-junta-sum is a sum of (arbitrarily many) $d$-juntas (and a $d$-junta is a function that depends on only $d$ of the $n$ variables). Our proofs are obtained by exploring the representation theory of the symmetric group and merging it with some careful spectral analysis.

Authors: Prashanth Amireddy, Amik Raj Behera, Srikanth Srinivasan, Madhu Sudan

We consider random walks on ``balanced multislices'' of any ``grid'' that respects the ``symmetries'' of the grid, and show that a broad class of such walks are good spectral expanders. (A grid is a set of points of the form $\mathcal{S}^n$ for finite $\mathcal{S}$, and a balanced multi-slice is the subset that contains an equal number of coordinates taking every value in $\mathcal{S}$. A walk respects symmetries if the probability of going from $u = (u_1,\ldots,u_n)$ to $v = (v_1,\ldots,v_n)$ is invariant under simultaneous permutations of the coordinates of $u$ and $v$.) Our main theorem shows that, under some technical conditions, every such walk where a single step leads to an almost $\mathcal{O}(1)$-wise independent distribution on the next state, conditioned on the previous state, satisfies a non-trivially small singular value bound. We give two applications of our theorem to error-correcting codes: (1) We give an analog of the Ore-DeMillo-Lipton-Schwartz-Zippel lemma for polynomials, and junta-sums, over balanced multislices. (2) We also give a local list-correction algorithm for $d$-junta-sums mapping an arbitrary grid $\mathcal{S}^n$ to an Abelian group, correcting from a near-optimal $(\frac{1}{|\mathcal{S}|^{d}} - \varepsilon)$ fraction of errors for every $\varepsilon > 0$, where a $d$-junta-sum is a sum of (arbitrarily many) $d$-juntas (and a $d$-junta is a function that depends on only $d$ of the $n$ variables). Our proofs are obtained by exploring the representation theory of the symmetric group and merging it with some careful spectral analysis.

Tileable Surfaces

from arXiv: Computational Geometry

Authors: David Brander, Jens Gravesen

We define a class of $C^k$-regular surfaces, $k \geq 1$, \emph{tileable surfaces}, that admit geometric tilings by a finite number of congruence classes of tiles. We show how to construct many examples, and examine the relationship with the well known tilings of the plane and sphere, as well as monohedral polyhedral surfaces.

Authors: David Brander, Jens Gravesen

We define a class of $C^k$-regular surfaces, $k \geq 1$, \emph{tileable surfaces}, that admit geometric tilings by a finite number of congruence classes of tiles. We show how to construct many examples, and examine the relationship with the well known tilings of the plane and sphere, as well as monohedral polyhedral surfaces.

FPT Parameterisations of Fractional and Generalised Hypertree Width

from arXiv: Data Structures and Algorithms

Authors: Matthias Lanzinger, Igor Razgon, Daniel Unterberger

We present the first fixed-parameter tractable (fpt) algorithms for precisely determining several central hypergraph decomposition parameters, including generalized hypertree width, fractional hypertree width, and adaptive width. Despite the recognized importance of these measures in complexity theory, databases, and constraint satisfaction, no exact fpt algorithms for any of them had previously been known. Our results are obtained for hypergraph classes of bounded rank and bounded degree. Our approach extends a recent algorithm for treewidth (Boja\'ncyk & Pilipczuk, LMCS 2022) utilizing monadic second-order (MSO) transductions. Leveraging this framework, we overcome the significant technical hurdles presented by hypergraphs, whose structural decompositions are technically much more intricate than their graph counterparts.

Authors: Matthias Lanzinger, Igor Razgon, Daniel Unterberger

We present the first fixed-parameter tractable (fpt) algorithms for precisely determining several central hypergraph decomposition parameters, including generalized hypertree width, fractional hypertree width, and adaptive width. Despite the recognized importance of these measures in complexity theory, databases, and constraint satisfaction, no exact fpt algorithms for any of them had previously been known. Our results are obtained for hypergraph classes of bounded rank and bounded degree. Our approach extends a recent algorithm for treewidth (Boja\'ncyk & Pilipczuk, LMCS 2022) utilizing monadic second-order (MSO) transductions. Leveraging this framework, we overcome the significant technical hurdles presented by hypergraphs, whose structural decompositions are technically much more intricate than their graph counterparts.

A Fast Coloring Oracle for Average Case Hypergraphs

from arXiv: Data Structures and Algorithms

Authors: Cassandra Marcussen, Edward Pyne, Ronitt Rubinfeld, Asaf Shapira, Shlomo Tauber

Hypergraph $2$-colorability is one of the classical NP-hard problems. Person and Schacht [SODA'09] designed a deterministic algorithm whose expected running time is polynomial over a uniformly chosen $2$-colorable $3$-uniform hypergraph. Lee, Molla, and Nagle recently extended this to $k$-uniform hypergraphs for all $k\geq 3$. Both papers relied heavily on the regularity lemma, hence their analysis was involved and their running time hid tower-type constants. Our first result in this paper is a new simple and elementary deterministic $2$-coloring algorithm that reproves the theorems of Person-Schacht and Lee-Molla-Nagle while avoiding the use of the regularity lemma. We also show how to turn our new algorithm into a randomized one with average expected running time of only $O(n)$. Our second and main result gives what we consider to be the ultimate evidence of just how easy it is to find a $2$-coloring of an average $2$-colorable hypergraph. We define a coloring oracle to be an algorithm which, given vertex $v$, assigns color red/blue to $v$ while inspecting as few edges as possible, so that the answers to any sequence of queries to the oracle are consistent with a single legal $2$-coloring of the input. Surprisingly, we show that there is a coloring oracle that, on average, can answer every vertex query in time $O(1)$.

Authors: Cassandra Marcussen, Edward Pyne, Ronitt Rubinfeld, Asaf Shapira, Shlomo Tauber

Hypergraph $2$-colorability is one of the classical NP-hard problems. Person and Schacht [SODA'09] designed a deterministic algorithm whose expected running time is polynomial over a uniformly chosen $2$-colorable $3$-uniform hypergraph. Lee, Molla, and Nagle recently extended this to $k$-uniform hypergraphs for all $k\geq 3$. Both papers relied heavily on the regularity lemma, hence their analysis was involved and their running time hid tower-type constants. Our first result in this paper is a new simple and elementary deterministic $2$-coloring algorithm that reproves the theorems of Person-Schacht and Lee-Molla-Nagle while avoiding the use of the regularity lemma. We also show how to turn our new algorithm into a randomized one with average expected running time of only $O(n)$. Our second and main result gives what we consider to be the ultimate evidence of just how easy it is to find a $2$-coloring of an average $2$-colorable hypergraph. We define a coloring oracle to be an algorithm which, given vertex $v$, assigns color red/blue to $v$ while inspecting as few edges as possible, so that the answers to any sequence of queries to the oracle are consistent with a single legal $2$-coloring of the input. Surprisingly, we show that there is a coloring oracle that, on average, can answer every vertex query in time $O(1)$.

Compressed data structures for Heegaard splittings

from arXiv: Data Structures and Algorithms

Authors: Henrique Ennes, Clément Maria

Heegaard splittings provide a natural representation of closed 3-manifolds by gluing handlebodies along a common surface. These splittings can be equivalently given by two finite sets of meridians lying in the surface, which define a Heegaard diagram. We present a data structure to effectively represent Heegaard diagrams as normal curves with respect to triangulations of a surface of complexity measured by the space required to express the normal coordinates' vectors in binary. This structure can be significantly more compressed than triangulations of 3-manifolds, given exponential gains for some families. Even with this succinct definition of complexity, we establish polynomial time algorithms for comparing and manipulating diagrams, performing stabilizations, detecting trivial stabilizations and reductions, and computing topological invariants of the underlying manifolds, such as their fundamental and first homology groups. We also contrast early implementations of our techniques with standard software programs for 3-manifolds, achieving better precision and faster algorithms for the average cases and exponential gains in speed for some particular presentations of the inputs.

Authors: Henrique Ennes, Clément Maria

Heegaard splittings provide a natural representation of closed 3-manifolds by gluing handlebodies along a common surface. These splittings can be equivalently given by two finite sets of meridians lying in the surface, which define a Heegaard diagram. We present a data structure to effectively represent Heegaard diagrams as normal curves with respect to triangulations of a surface of complexity measured by the space required to express the normal coordinates' vectors in binary. This structure can be significantly more compressed than triangulations of 3-manifolds, given exponential gains for some families. Even with this succinct definition of complexity, we establish polynomial time algorithms for comparing and manipulating diagrams, performing stabilizations, detecting trivial stabilizations and reductions, and computing topological invariants of the underlying manifolds, such as their fundamental and first homology groups. We also contrast early implementations of our techniques with standard software programs for 3-manifolds, achieving better precision and faster algorithms for the average cases and exponential gains in speed for some particular presentations of the inputs.

On Tight Robust Coresets for $k$-Medians Clustering

from arXiv: Data Structures and Algorithms

Authors: Lingxiao Huang, Zhenyu Jiang, Yi Li, Xuan Wu

This paper considers coresets for the robust $k$-medians problem with $m$ outliers, and new constructions in various metric spaces are obtained. Specifically, for metric spaces with a bounded VC or doubling dimension $d$, the coreset size is $O(m) + \tilde{O}(kd\varepsilon^{-2})$, which is optimal up to logarithmic factors. For Euclidean spaces, the coreset size is $O(m\varepsilon^{-1}) + \tilde{O}(\min\{k^{4/3}\varepsilon^{-2},k\varepsilon^{-3}\})$, improving upon a recent result by Jiang and Lou (ICALP 2025). These results also extend to robust $(k,z)$-clustering, yielding, for VC and doubling dimension, a coreset size of $O(m) + \tilde{O}(kd\varepsilon^{-2z})$ with the optimal linear dependence on $m$. This extended result improves upon the earlier work of Huang et al. (SODA 2025). The techniques introduce novel dataset decompositions, enabling chaining arguments to be applied jointly across multiple components.

Authors: Lingxiao Huang, Zhenyu Jiang, Yi Li, Xuan Wu

This paper considers coresets for the robust $k$-medians problem with $m$ outliers, and new constructions in various metric spaces are obtained. Specifically, for metric spaces with a bounded VC or doubling dimension $d$, the coreset size is $O(m) + \tilde{O}(kd\varepsilon^{-2})$, which is optimal up to logarithmic factors. For Euclidean spaces, the coreset size is $O(m\varepsilon^{-1}) + \tilde{O}(\min\{k^{4/3}\varepsilon^{-2},k\varepsilon^{-3}\})$, improving upon a recent result by Jiang and Lou (ICALP 2025). These results also extend to robust $(k,z)$-clustering, yielding, for VC and doubling dimension, a coreset size of $O(m) + \tilde{O}(kd\varepsilon^{-2z})$ with the optimal linear dependence on $m$. This extended result improves upon the earlier work of Huang et al. (SODA 2025). The techniques introduce novel dataset decompositions, enabling chaining arguments to be applied jointly across multiple components.

Scheduling on Identical Machines with Setup Time and Unknown Execution Time

from arXiv: Data Structures and Algorithms

Authors: Yasushi Kawase, Kazuhisa Makino, Vinh Long Phan, Hanna Sumita

In this study, we investigate a scheduling problem on identical machines in which jobs require initial setup before execution. We assume that an algorithm can dynamically form a batch (i.e., a collection of jobs to be processed together) from the remaining jobs. The setup time is modeled as a known monotone function of the set of jobs within a batch, while the execution time of each job remains unknown until completion. This uncertainty poses significant challenges for minimizing the makespan. We address these challenges by considering two scenarios: each job batch must be assigned to a single machine, or a batch may be distributed across multiple machines. For both scenarios, we analyze settings with and without preemption. Across these four settings, we design online algorithms that achieve asymptotically optimal competitive ratios with respect to both the number of jobs and the number of machines.

Authors: Yasushi Kawase, Kazuhisa Makino, Vinh Long Phan, Hanna Sumita

In this study, we investigate a scheduling problem on identical machines in which jobs require initial setup before execution. We assume that an algorithm can dynamically form a batch (i.e., a collection of jobs to be processed together) from the remaining jobs. The setup time is modeled as a known monotone function of the set of jobs within a batch, while the execution time of each job remains unknown until completion. This uncertainty poses significant challenges for minimizing the makespan. We address these challenges by considering two scenarios: each job batch must be assigned to a single machine, or a batch may be distributed across multiple machines. For both scenarios, we analyze settings with and without preemption. Across these four settings, we design online algorithms that achieve asymptotically optimal competitive ratios with respect to both the number of jobs and the number of machines.

Permutation patterns in streams

from arXiv: Data Structures and Algorithms

Authors: Benjamin Aram Berendsohn

Permutation patterns and pattern avoidance are central, well-studied concepts in combinatorics and computer science. Given two permutations $\tau$ and $\pi$, the pattern matching problem (PPM) asks whether $\tau$ contains $\pi$. This problem arises in various contexts in computer science and statistics and has been studied extensively in exact-, parameterized-, approximate-, property-testing- and other formulations. In this paper, we study pattern matching in a \emph{streaming setting}, when the input $\tau$ is revealed sequentially, one element at a time. There is extensive work on the space complexity of various statistics in streams of integers. The novelty of our setting is that the input stream is \emph{a permutation}, which allows inferring some information about future inputs. Our algorithms crucially take advantage of this fact, while existing lower bound techniques become difficult to apply. We show that the complexity of the problem changes dramatically depending on the pattern~$\pi$. The space requirement is: $\Theta(k\log{n})$ for the monotone patterns $\pi = 12\dots k$, or $\pi = k\dots21$, $O(\sqrt{n\log{n}})$ for $\pi \in \{312,132\}$, $O(\sqrt{n} \log n)$ for $\pi \in \{231,213\}$, and $\widetilde{\Theta}_{\pi}(n)$ for all other $\pi$. If $\tau$ is an arbitrary sequence of integers (not necessary a permutation), we show that the complexity is $\widetilde{\Theta}_{\pi}(n)$ in all except the first (monotone) cases.

Authors: Benjamin Aram Berendsohn

Permutation patterns and pattern avoidance are central, well-studied concepts in combinatorics and computer science. Given two permutations $\tau$ and $\pi$, the pattern matching problem (PPM) asks whether $\tau$ contains $\pi$. This problem arises in various contexts in computer science and statistics and has been studied extensively in exact-, parameterized-, approximate-, property-testing- and other formulations. In this paper, we study pattern matching in a \emph{streaming setting}, when the input $\tau$ is revealed sequentially, one element at a time. There is extensive work on the space complexity of various statistics in streams of integers. The novelty of our setting is that the input stream is \emph{a permutation}, which allows inferring some information about future inputs. Our algorithms crucially take advantage of this fact, while existing lower bound techniques become difficult to apply. We show that the complexity of the problem changes dramatically depending on the pattern~$\pi$. The space requirement is: $\Theta(k\log{n})$ for the monotone patterns $\pi = 12\dots k$, or $\pi = k\dots21$, $O(\sqrt{n\log{n}})$ for $\pi \in \{312,132\}$, $O(\sqrt{n} \log n)$ for $\pi \in \{231,213\}$, and $\widetilde{\Theta}_{\pi}(n)$ for all other $\pi$. If $\tau$ is an arbitrary sequence of integers (not necessary a permutation), we show that the complexity is $\widetilde{\Theta}_{\pi}(n)$ in all except the first (monotone) cases.

Deterministic Lower Bounds for $k$-Edge Connectivity in the Distributed Sketching Model

from arXiv: Data Structures and Algorithms

Authors: Peter Robinson, Ming Ming Tan

We study the $k$-edge connectivity problem on undirected graphs in the distributed sketching model, where we have $n$ nodes and a referee. Each node sends a single message to the referee based on its 1-hop neighborhood in the graph, and the referee must decide whether the graph is $k$-edge connected by taking into account the received messages. We present the first lower bound for deciding a graph connectivity problem in this model with a deterministic algorithm. Concretely, we show that the worst case message length is $\Omega( k )$ bits for $k$-edge connectivity, for any super-constant $k = O(\sqrt{n})$. Previously, only a lower bound of $\Omega( \log^3 n )$ bits was known for ($1$-edge) connectivity, due to Yu (SODA 2021). In fact, our result is the first super-polylogarithmic lower bound for a connectivity decision problem in the distributed graph sketching model. To obtain our result, we introduce a new lower bound graph construction, as well as a new 3-party communication complexity problem that we call UniqueOverlap. As this problem does not appear to be amenable to reductions to existing hard problems such as set disjointness or indexing due to correlations between the inputs of the three players, we leverage results from cross-intersecting set families to prove the hardness of UniqueOverlap for deterministic algorithms. Finally, we obtain the sought lower bound for deciding $k$-edge connectivity via a novel simulation argument that, in contrast to previous works, does not introduce any probability of error and thus works for deterministic algorithms.

Authors: Peter Robinson, Ming Ming Tan

We study the $k$-edge connectivity problem on undirected graphs in the distributed sketching model, where we have $n$ nodes and a referee. Each node sends a single message to the referee based on its 1-hop neighborhood in the graph, and the referee must decide whether the graph is $k$-edge connected by taking into account the received messages. We present the first lower bound for deciding a graph connectivity problem in this model with a deterministic algorithm. Concretely, we show that the worst case message length is $\Omega( k )$ bits for $k$-edge connectivity, for any super-constant $k = O(\sqrt{n})$. Previously, only a lower bound of $\Omega( \log^3 n )$ bits was known for ($1$-edge) connectivity, due to Yu (SODA 2021). In fact, our result is the first super-polylogarithmic lower bound for a connectivity decision problem in the distributed graph sketching model. To obtain our result, we introduce a new lower bound graph construction, as well as a new 3-party communication complexity problem that we call UniqueOverlap. As this problem does not appear to be amenable to reductions to existing hard problems such as set disjointness or indexing due to correlations between the inputs of the three players, we leverage results from cross-intersecting set families to prove the hardness of UniqueOverlap for deterministic algorithms. Finally, we obtain the sought lower bound for deciding $k$-edge connectivity via a novel simulation argument that, in contrast to previous works, does not introduce any probability of error and thus works for deterministic algorithms.

Improved sampling algorithms and Poincaré inequalities for non-log-concave distributions

from arXiv: Data Structures and Algorithms

Authors: Yuchen He, Zhehan Lei, Jianan Shao, Chihao Zhang

We study the problem of sampling from a distribution $\mu$ with density $\propto e^{-V}$ for some potential function $V:\mathbb R^d\to \mathbb R$ with query access to $V$ and $\nabla V$. We start with the following standard assumptions: (1) The potential function $V$ is $L$-smooth. (2) The second moment $\mathbf{E}_{X\sim \mu}[\|X\|^2]\leq M$. Recently, He and Zhang (COLT'25) showed that the query complexity of sampling from such distributions is at least $\left(\frac{LM}{d\epsilon}\right)^{\Omega(d)}$ where $\epsilon$ is the desired accuracy in total variation distance, and the Poincar\'e constant can be arbitrarily large. Meanwhile, another common assumption in the study of diffusion based samplers (see e.g., the work of Chen, Chewi, Li, Li, Salim and Zhang (ICLR'23)) strengthens the smoothness condition (1) to the following: (1*) The potential function of *every* distribution along the Ornstein-Uhlenbeck process starting from $\mu$ is $L$-smooth. We show that under the assumptions (1*) and (2), the query complexity of sampling from $\mu$ can be $\mathrm{poly}(L,d)\cdot \left(\frac{Ld+M}{\epsilon^2}\right)^{\mathcal{O}(L+1)}$, which is polynomial in $d$ and $\frac{1}{\epsilon}$ when $L=\mathcal{O}(1)$ and $M=\mathrm{poly}(d)$. This improves the algorithm with quasi-polynomial query complexity developed by Huang et al. (COLT'24). Our results imply that the seemly moderate strengthening of the smoothness condition (1) to (1*) can lead to an exponential gap in the query complexity of sampling algorithms. Moreover, we show that together with the assumption (1*) and the stronger moment assumption that $\|X\|$ is $\lambda$-sub-Gaussian for $X\sim\mu$, the Poincar\'e constant of $\mu$ is at most $\mathcal{O}(\lambda)^{2(L+1)}$. As an application of our technique, we obtain improved estimate of the Poincar\'e constant for mixture of Gaussians with the same covariance.

Authors: Yuchen He, Zhehan Lei, Jianan Shao, Chihao Zhang

We study the problem of sampling from a distribution $\mu$ with density $\propto e^{-V}$ for some potential function $V:\mathbb R^d\to \mathbb R$ with query access to $V$ and $\nabla V$. We start with the following standard assumptions: (1) The potential function $V$ is $L$-smooth. (2) The second moment $\mathbf{E}_{X\sim \mu}[\|X\|^2]\leq M$. Recently, He and Zhang (COLT'25) showed that the query complexity of sampling from such distributions is at least $\left(\frac{LM}{d\epsilon}\right)^{\Omega(d)}$ where $\epsilon$ is the desired accuracy in total variation distance, and the Poincar\'e constant can be arbitrarily large. Meanwhile, another common assumption in the study of diffusion based samplers (see e.g., the work of Chen, Chewi, Li, Li, Salim and Zhang (ICLR'23)) strengthens the smoothness condition (1) to the following: (1*) The potential function of *every* distribution along the Ornstein-Uhlenbeck process starting from $\mu$ is $L$-smooth. We show that under the assumptions (1*) and (2), the query complexity of sampling from $\mu$ can be $\mathrm{poly}(L,d)\cdot \left(\frac{Ld+M}{\epsilon^2}\right)^{\mathcal{O}(L+1)}$, which is polynomial in $d$ and $\frac{1}{\epsilon}$ when $L=\mathcal{O}(1)$ and $M=\mathrm{poly}(d)$. This improves the algorithm with quasi-polynomial query complexity developed by Huang et al. (COLT'24). Our results imply that the seemly moderate strengthening of the smoothness condition (1) to (1*) can lead to an exponential gap in the query complexity of sampling algorithms. Moreover, we show that together with the assumption (1*) and the stronger moment assumption that $\|X\|$ is $\lambda$-sub-Gaussian for $X\sim\mu$, the Poincar\'e constant of $\mu$ is at most $\mathcal{O}(\lambda)^{2(L+1)}$. As an application of our technique, we obtain improved estimate of the Poincar\'e constant for mixture of Gaussians with the same covariance.