Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Monday, December 08

Shadow Tomography Against Adversaries

from arXiv: Computational Complexity

Authors: Maryam Aliakbarpour, Vladimir Braverman, Nai-Hui Chia, Chia-Ying Lin, Yuhan Liu, Aadil Oufkir, Yu-Ching Shen

We study single-copy shadow tomography in the adversarial robust setting, where the goal is to learn the expectation values of $M$ observables $O_1, \ldots, O_M$ with $\varepsilon$ accuracy, but $γ$-fraction of the outcomes can be arbitrarily corrupted by an adversary. We show that all non-adaptive shadow tomography algorithms must incur an error of $\varepsilon=\tildeΩ(γ\min\{\sqrt{M}, \sqrt{d}\})$ for some choice of observables, even with unlimited copies. Unfortunately, the classical shadows algorithm by [HKP20] and naive algorithms that directly measure each observable suffer even more. We design an algorithm that achieves an error of $\varepsilon=\tilde{O}(γ\max_{i\in[M]}\|O_i\|_{HS})$, which nearly matches our worst-case error lower bound for $M\ge d$ and guarantees better accuracy when the observables have stronger structure. Remarkably, the algorithm only needs $n=\frac{1}{γ^2}\log(M/δ)$ copies to achieve that error with probability at least $1-δ$, matching the sample complexity of the classical shadows algorithm that achieves the same error without corrupted measurement outcomes. Our algorithm is conceptually simple and easy to implement. Classical simulation for fidelity estimation shows that our algorithm enjoys much stronger robustness than [HKP20] under adversarial noise. Finally, based on a reduction from full-state tomography to shadow tomography, we prove that for rank $r$ states, both the near-optimal asymptotic error of $\varepsilon=\tilde{O}(γ\sqrt{r})$ and copy complexity $\tilde{O}(dr^2/\varepsilon^2)=\tilde{O}(dr/γ^2)$ can be achieved for adversarially robust state tomography, closing the large gap in [ABCL25] where optimal error can only be achieved using pseudo-polynomial number of copies in $d$.

Authors: Maryam Aliakbarpour, Vladimir Braverman, Nai-Hui Chia, Chia-Ying Lin, Yuhan Liu, Aadil Oufkir, Yu-Ching Shen

We study single-copy shadow tomography in the adversarial robust setting, where the goal is to learn the expectation values of $M$ observables $O_1, \ldots, O_M$ with $\varepsilon$ accuracy, but $γ$-fraction of the outcomes can be arbitrarily corrupted by an adversary. We show that all non-adaptive shadow tomography algorithms must incur an error of $\varepsilon=\tildeΩ(γ\min\{\sqrt{M}, \sqrt{d}\})$ for some choice of observables, even with unlimited copies. Unfortunately, the classical shadows algorithm by [HKP20] and naive algorithms that directly measure each observable suffer even more. We design an algorithm that achieves an error of $\varepsilon=\tilde{O}(γ\max_{i\in[M]}\|O_i\|_{HS})$, which nearly matches our worst-case error lower bound for $M\ge d$ and guarantees better accuracy when the observables have stronger structure. Remarkably, the algorithm only needs $n=\frac{1}{γ^2}\log(M/δ)$ copies to achieve that error with probability at least $1-δ$, matching the sample complexity of the classical shadows algorithm that achieves the same error without corrupted measurement outcomes. Our algorithm is conceptually simple and easy to implement. Classical simulation for fidelity estimation shows that our algorithm enjoys much stronger robustness than [HKP20] under adversarial noise. Finally, based on a reduction from full-state tomography to shadow tomography, we prove that for rank $r$ states, both the near-optimal asymptotic error of $\varepsilon=\tilde{O}(γ\sqrt{r})$ and copy complexity $\tilde{O}(dr^2/\varepsilon^2)=\tilde{O}(dr/γ^2)$ can be achieved for adversarially robust state tomography, closing the large gap in [ABCL25] where optimal error can only be achieved using pseudo-polynomial number of copies in $d$.

On Sparse Representations of 3-Manifolds

from arXiv: Computational Geometry

Authors: Kristóf Huszár, Clément Maria

3-manifolds are commonly represented as triangulations, consisting of abstract tetrahedra whose triangular faces are identified in pairs. The combinatorial sparsity of a triangulation, as measured by the treewidth of its dual graph, plays a fundamental role in the design of parameterized algorithms. In this work, we investigate algorithmic procedures that transform or modify a given triangulation while controlling specific sparsity parameters. First, we describe a linear-time algorithm that converts a given triangulation into a Heegaard diagram of the underlying 3-manifold, showing that the construction preserves treewidth. We apply this construction to exhibit a fixed-parameter tractable framework for computing Kuperberg's quantum invariants of 3-manifolds. Second, we present a quasi-linear-time algorithm that retriangulates a given triangulation into one with maximum edge valence of at most nine, while only moderately increasing the treewidth of the dual graph. Combining these two algorithms yields a quasi-linear-time algorithm that produces, from a given triangulation, a Heegaard diagram in which every attaching curve intersects at most nine others.

Authors: Kristóf Huszár, Clément Maria

3-manifolds are commonly represented as triangulations, consisting of abstract tetrahedra whose triangular faces are identified in pairs. The combinatorial sparsity of a triangulation, as measured by the treewidth of its dual graph, plays a fundamental role in the design of parameterized algorithms. In this work, we investigate algorithmic procedures that transform or modify a given triangulation while controlling specific sparsity parameters. First, we describe a linear-time algorithm that converts a given triangulation into a Heegaard diagram of the underlying 3-manifold, showing that the construction preserves treewidth. We apply this construction to exhibit a fixed-parameter tractable framework for computing Kuperberg's quantum invariants of 3-manifolds. Second, we present a quasi-linear-time algorithm that retriangulates a given triangulation into one with maximum edge valence of at most nine, while only moderately increasing the treewidth of the dual graph. Combining these two algorithms yields a quasi-linear-time algorithm that produces, from a given triangulation, a Heegaard diagram in which every attaching curve intersects at most nine others.

Persistent Laplacian Diagrams

from arXiv: Computational Geometry

Authors: Inkee Jung, Wonwoo Kang, Heehyun Park

Vectorization methods for \emph{Persistent Homology} (PH), such as the \emph{Persistence Image} (PI), encode persistence diagrams into finite dimensional vector spaces while preserving stability. In parallel, the \emph{Persistent Laplacian} (PL) has been proposed, whose spectra contain the information of PH as well as richer geometric and combinatorial features. In this work, we develop an analogous vectorization for PL. We introduce \emph{signatures} that map PL to real values and assemble these into a \emph{Persistent Laplacian Diagram} (PLD) and a \emph{Persistent Laplacian Image} (PLI). We prove the stability of PLI under the noise on PD. Furthermore, we illustrate the resulting framework on explicit graph examples that are indistinguishable by both PH and a signature of the combinatorial Laplacian but are separated by the signature of PL.

Authors: Inkee Jung, Wonwoo Kang, Heehyun Park

Vectorization methods for \emph{Persistent Homology} (PH), such as the \emph{Persistence Image} (PI), encode persistence diagrams into finite dimensional vector spaces while preserving stability. In parallel, the \emph{Persistent Laplacian} (PL) has been proposed, whose spectra contain the information of PH as well as richer geometric and combinatorial features. In this work, we develop an analogous vectorization for PL. We introduce \emph{signatures} that map PL to real values and assemble these into a \emph{Persistent Laplacian Diagram} (PLD) and a \emph{Persistent Laplacian Image} (PLI). We prove the stability of PLI under the noise on PD. Furthermore, we illustrate the resulting framework on explicit graph examples that are indistinguishable by both PH and a signature of the combinatorial Laplacian but are separated by the signature of PL.

On Planar Straight-Line Dominance Drawings

from arXiv: Data Structures and Algorithms

Authors: Patrizio Angelini, Michael A. Bekos, Giuseppe Di Battista, Fabrizio Frati, Luca Grilli, Giacomo Ortali

We study the following question, which has been considered since the 90's: Does every $st$-planar graph admit a planar straight-line dominance drawing? We show concrete evidence for the difficulty of this question, by proving that, unlike upward planar straight-line drawings, planar straight-line dominance drawings with prescribed $y$-coordinates do not always exist and planar straight-line dominance drawings cannot always be constructed via a contract-draw-expand inductive approach. We also show several classes of $st$-planar graphs that always admit a planar straight-line dominance drawing. These include $st$-planar $3$-trees in which every stacking operation introduces two edges incoming into the new vertex, $st$-planar graphs in which every vertex is adjacent to the sink, $st$-planar graphs in which no face has the left boundary that is a single edge, and $st$-planar graphs that have a leveling with span at most two.

Authors: Patrizio Angelini, Michael A. Bekos, Giuseppe Di Battista, Fabrizio Frati, Luca Grilli, Giacomo Ortali

We study the following question, which has been considered since the 90's: Does every $st$-planar graph admit a planar straight-line dominance drawing? We show concrete evidence for the difficulty of this question, by proving that, unlike upward planar straight-line drawings, planar straight-line dominance drawings with prescribed $y$-coordinates do not always exist and planar straight-line dominance drawings cannot always be constructed via a contract-draw-expand inductive approach. We also show several classes of $st$-planar graphs that always admit a planar straight-line dominance drawing. These include $st$-planar $3$-trees in which every stacking operation introduces two edges incoming into the new vertex, $st$-planar graphs in which every vertex is adjacent to the sink, $st$-planar graphs in which no face has the left boundary that is a single edge, and $st$-planar graphs that have a leveling with span at most two.

BalLOT: Balanced $k$-means clustering with optimal transport

from arXiv: Data Structures and Algorithms

Authors: Wenyan Luo, Dustin G. Mixon

We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.

Authors: Wenyan Luo, Dustin G. Mixon

We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.

ProbeWalk: Fast Estimation of Biharmonic Distance on Graphs via Probe-Driven Random Walks

from arXiv: Data Structures and Algorithms

Authors: Dehong Zheng, Zhongzhi Zhang

The biharmonic distance is a fundamental metric on graphs that measures the dissimilarity between two nodes, capturing both local and global structures. It has found applications across various fields, including network centrality, graph clustering, and machine learning. These applications typically require efficient evaluation of pairwise biharmonic distances. However, existing algorithms remain computationally expensive. The state-of-the-art method attains an absolute-error guarantee epsilon_abs with time complexity O(L^5 / epsilon_abs^2), where L denotes the truncation length. In this work, we improve the complexity to O(L^3 / epsilon^2) under a relative-error guarantee epsilon via probe-driven random walks. We provide a relative-error guarantee rather than an absolute-error guarantee because biharmonic distances vary by orders of magnitude across node pairs. Since L is often very large in real-world networks (for example, L >= 10^3), reducing the L-dependence from the fifth to the third power yields substantial gains. Extensive experiments on real-world networks show that our method delivers 10x-1000x per-query speedups at matched relative error over strong baselines and scales to graphs with tens of millions of nodes.

Authors: Dehong Zheng, Zhongzhi Zhang

The biharmonic distance is a fundamental metric on graphs that measures the dissimilarity between two nodes, capturing both local and global structures. It has found applications across various fields, including network centrality, graph clustering, and machine learning. These applications typically require efficient evaluation of pairwise biharmonic distances. However, existing algorithms remain computationally expensive. The state-of-the-art method attains an absolute-error guarantee epsilon_abs with time complexity O(L^5 / epsilon_abs^2), where L denotes the truncation length. In this work, we improve the complexity to O(L^3 / epsilon^2) under a relative-error guarantee epsilon via probe-driven random walks. We provide a relative-error guarantee rather than an absolute-error guarantee because biharmonic distances vary by orders of magnitude across node pairs. Since L is often very large in real-world networks (for example, L >= 10^3), reducing the L-dependence from the fifth to the third power yields substantial gains. Extensive experiments on real-world networks show that our method delivers 10x-1000x per-query speedups at matched relative error over strong baselines and scales to graphs with tens of millions of nodes.

Crude Approximation of Directed Minimum Cut and Arborescence Packing in Almost Linear Time

from arXiv: Data Structures and Algorithms

Authors: Yonggang Jiang, Yaowei Long, Thatchaphol Saranurak, Benyu Wang

We give almost-linear-time algorithms for approximating rooted minimum cut and maximum arborescence packing in directed graphs, two problems that are dual to each other [Edm73]. More specifically, for an $n$-vertex, $m$-edge directed graph $G$ whose $s$-rooted minimum cut value is $k$, our first algorithm computes an $s$-rooted cut of size at most $O(k\log^{5} n)$ in $m^{1+o(1)}$ time, and our second algorithm packs $k$ $s$-rooted arborescences with $n^{o(1)}$ congestion in $m^{1+o(1)}$ time, certifying that the $s$-rooted minimum cut is at least $k / n^{o(1)}$. Our first algorithm also works for weighted graphs. Prior to our work, the fastest algorithms for computing the $s$-rooted minimum cut were exact but had super-linear running time: either $\tilde{O}(mk)$ [Gab91] or $\tilde{O}(m^{1+o(1)}\min\{\sqrt{n},n/m^{1/3}\})$ [CLN+22]. The fastest known algorithms for packing $s$-rooted arborescences had no congestion, but required $\tilde{O}(m \cdot \mathrm{poly}(k))$ time [BHKP08].

Authors: Yonggang Jiang, Yaowei Long, Thatchaphol Saranurak, Benyu Wang

We give almost-linear-time algorithms for approximating rooted minimum cut and maximum arborescence packing in directed graphs, two problems that are dual to each other [Edm73]. More specifically, for an $n$-vertex, $m$-edge directed graph $G$ whose $s$-rooted minimum cut value is $k$, our first algorithm computes an $s$-rooted cut of size at most $O(k\log^{5} n)$ in $m^{1+o(1)}$ time, and our second algorithm packs $k$ $s$-rooted arborescences with $n^{o(1)}$ congestion in $m^{1+o(1)}$ time, certifying that the $s$-rooted minimum cut is at least $k / n^{o(1)}$. Our first algorithm also works for weighted graphs. Prior to our work, the fastest algorithms for computing the $s$-rooted minimum cut were exact but had super-linear running time: either $\tilde{O}(mk)$ [Gab91] or $\tilde{O}(m^{1+o(1)}\min\{\sqrt{n},n/m^{1/3}\})$ [CLN+22]. The fastest known algorithms for packing $s$-rooted arborescences had no congestion, but required $\tilde{O}(m \cdot \mathrm{poly}(k))$ time [BHKP08].

Incorporating indel channels into average-case analysis of seed-chain-extend

from arXiv: Data Structures and Algorithms

Authors: Spencer Gibson, Yun William Yu

Given a sequence $s_1$ of $n$ letters drawn i.i.d. from an alphabet of size $σ$ and a mutated substring $s_2$ of length $m < n$, we often want to recover the mutation history that generated $s_2$ from $s_1$. Modern sequence aligners are widely used for this task, and many employ the seed-chain-extend heuristic with $k$-mer seeds. Previously, Shaw and Yu showed that optimal linear-gap cost chaining can produce a chain with $1 - O\left(\frac{1}{\sqrt{m}}\right)$ recoverability, the proportion of the mutation history that is recovered, in $O\left(mn^{2.43θ} \log n\right)$ expected time, where $θ< 0.206$ is the mutation rate under a substitution-only channel and $s_1$ is assumed to be uniformly random. However, a gap remains between theory and practice, since real genomic data includes insertions and deletions (indels), and yet seed-chain-extend remains effective. In this paper, we generalize those prior results by introducing mathematical machinery to deal with the two new obstacles introduced by indel channels: the dependence of neighboring anchors and the presence of anchors that are only partially correct. We are thus able to prove that the expected recoverability of an optimal chain is $\ge 1 - O\Bigl(\frac{1}{\sqrt{m}}\Bigr)$ and the expected runtime is $O(mn^{3.15 \cdot θ_T}\log n)$, when the total mutation rate given by the sum of the substitution, insertion, and deletion mutation rates ($θ_T = θ_i + θ_d + θ_s$) is less than $0.159$.

Authors: Spencer Gibson, Yun William Yu

Given a sequence $s_1$ of $n$ letters drawn i.i.d. from an alphabet of size $σ$ and a mutated substring $s_2$ of length $m < n$, we often want to recover the mutation history that generated $s_2$ from $s_1$. Modern sequence aligners are widely used for this task, and many employ the seed-chain-extend heuristic with $k$-mer seeds. Previously, Shaw and Yu showed that optimal linear-gap cost chaining can produce a chain with $1 - O\left(\frac{1}{\sqrt{m}}\right)$ recoverability, the proportion of the mutation history that is recovered, in $O\left(mn^{2.43θ} \log n\right)$ expected time, where $θ< 0.206$ is the mutation rate under a substitution-only channel and $s_1$ is assumed to be uniformly random. However, a gap remains between theory and practice, since real genomic data includes insertions and deletions (indels), and yet seed-chain-extend remains effective. In this paper, we generalize those prior results by introducing mathematical machinery to deal with the two new obstacles introduced by indel channels: the dependence of neighboring anchors and the presence of anchors that are only partially correct. We are thus able to prove that the expected recoverability of an optimal chain is $\ge 1 - O\Bigl(\frac{1}{\sqrt{m}}\Bigr)$ and the expected runtime is $O(mn^{3.15 \cdot θ_T}\log n)$, when the total mutation rate given by the sum of the substitution, insertion, and deletion mutation rates ($θ_T = θ_i + θ_d + θ_s$) is less than $0.159$.

Sunday, December 07

Tom Stoppard 1937-2025

from Computational Complexity

 
The playwright Tom Stoppard passed away at the age of 88 on Nov. 29, 2025.

ONE) He wrote many plays and some movies.  Below I highlight his works whose themes I think will be of interest to my readers (Or at least to me—your mileage may vary.)

1) Rosencrantz and Guildenstern are Dead (1966)

This is Hamlet told from the point of view of two minor characters who, in Shakespeare’s original, can best be described as plot furniture.

The play begins with, R and G are flipping coins. R bets heads ninety-two times in a row and wins each one. The play explores determinism and free will, as well as the mathematical question: At what point should you stop flipping coins and go buy a lottery ticket?

There is also a movie for this one. I think this is better as a play. 

2) Jumpers (1972)

A play about academic philosophers—so naturally it includes gymnastics, both literal and intellectual. There’s also a murder mystery, discussions of God, and enough moral philosophy to power an entire semester of office-hour arguments.

3) Travesties (1974) 

This play Imagines that Vladmir Lenin, James Joyce, and Tristan Tzara (a Dadaist poet, see here for the Dada Art Movement) met in England. They actually were in England at the same time, but the events are narrated by an unreliable octogenarian, so accuracy is...doubtful.

Literature, politics, and art are explored, often simultaneously.


4) Arcadia (1993)

The Stoppard play with the most math—a sentence that delights mathematicians and terrifies theater majors.

It takes place in two time periods: 1809 and 1993.

In 1809, a 13-year-old girl named Thomasina Coverly is (i) trying to prove Fermat’s Last Theorem and (ii) inventing chaos theory by stirring pudding. (This is the only known instance of dessert contributing to mathematics other than pie and muffins.)

In 1993, a historian is working on a theory about Lord Byron, which is only slightly less complicated than Fermat's Last Theorem.

Themes: math, determinism, academia, and the strong correlation between intellectual brilliance and household messes.

Note that this was written a year before FLT was proved. If it had come out a year after FLT was proved this would not change anything since (i) Thomasina Coverly is working in 1809, and (ii) FLT was still a well-known problem when the play came out. If the play had come out in 2025, then this might be a problem since FLT is not nearly as well-known as it was in 1993. 

Some say that FLT being proved was bad for math since it was 

(a) understandable to the non mathematician, 

(b) had prize money attached to it, 

(c) had the wonderful margin story, and 

(d) was open for many years.

(e) there are many poems about it, see here, which is a consequence of a,b,c,d. They were written without ChatGPT.   This criteria is no longer important since ChatGPT allows you to write poems about any math problem you want. I blogged on that here.

 I don't think anything has come close to replacing FLT.

 P vs NP: (a)-it's hard to get across to non-math people what a problem is , (b)-I think it's well known but perhaps I wouldn't know since the non-math people I hang out with know about it from me, (c)-no, (d)-no (hmm- is 50+years a long time?) (e)

Goldbach's conjecture has (a) and (d). As for (b): at one time there was a million dollar prize for solving it, see here, as a way to promote the book Uncle Petros and Goldbach's Conjecture but I think the prize expired. The link to find out the contest rules just points to that book companies website. In any case, this prize was not that well known.

While I would not expect a problem to have (c), does any open problem have (a), (b), some story like (c), and (d)? I doubt it.

5) Enigma (2001)

A fictional film about cracking the Enigma code.

Despite expectations, Alan Turing does not appear, nor is he even mentioned. This confused me, since Andrew Hodges's 1983 biography is titled Alan Turing: The Enigma (See here) which was the inspiration for the movie The Imitation Game.

Note that Enigma-the-movie has zero real people, but Travesties-the-play has three real people.


TWO) The Tom Stoppard Prize

The Tom Stoppard Prize was established in 1983 and first awarded in 1984. It is given annually for:
 

outstanding primarily non-fiction work by a writer of Czech origin.

This raises a question: Which is the greater honor—winning an award, or having one named after you while you’re still alive? The answer probably depends on both the award you receive and the award named after you.

In computer science, the only award I know named after a living person is the Knuth Prize. If there are others, leave a comment.

If you ever get this trivia question: 

What do Tom Stoppard and Donald Knuth have in common?

you now know the answer: They were both famous enough to be turned into prizes while they could still appreciate it.

By gasarch

 
The playwright Tom Stoppard passed away at the age of 88 on Nov. 29, 2025.

ONE) He wrote many plays and some movies.  Below I highlight his works whose themes I think will be of interest to my readers (Or at least to me—your mileage may vary.)

1) Rosencrantz and Guildenstern are Dead (1966)

This is Hamlet told from the point of view of two minor characters who, in Shakespeare’s original, can best be described as plot furniture.

The play begins with, R and G are flipping coins. R bets heads ninety-two times in a row and wins each one. The play explores determinism and free will, as well as the mathematical question: At what point should you stop flipping coins and go buy a lottery ticket?

There is also a movie for this one. I think this is better as a play. 

2) Jumpers (1972)

A play about academic philosophers—so naturally it includes gymnastics, both literal and intellectual. There’s also a murder mystery, discussions of God, and enough moral philosophy to power an entire semester of office-hour arguments.

3) Travesties (1974) 

This play Imagines that Vladmir Lenin, James Joyce, and Tristan Tzara (a Dadaist poet, see here for the Dada Art Movement) met in England. They actually were in England at the same time, but the events are narrated by an unreliable octogenarian, so accuracy is...doubtful.

Literature, politics, and art are explored, often simultaneously.


4) Arcadia (1993)

The Stoppard play with the most math—a sentence that delights mathematicians and terrifies theater majors.

It takes place in two time periods: 1809 and 1993.

In 1809, a 13-year-old girl named Thomasina Coverly is (i) trying to prove Fermat’s Last Theorem and (ii) inventing chaos theory by stirring pudding. (This is the only known instance of dessert contributing to mathematics other than pie and muffins.)

In 1993, a historian is working on a theory about Lord Byron, which is only slightly less complicated than Fermat's Last Theorem.

Themes: math, determinism, academia, and the strong correlation between intellectual brilliance and household messes.

Note that this was written a year before FLT was proved. If it had come out a year after FLT was proved this would not change anything since (i) Thomasina Coverly is working in 1809, and (ii) FLT was still a well-known problem when the play came out. If the play had come out in 2025, then this might be a problem since FLT is not nearly as well-known as it was in 1993. 

Some say that FLT being proved was bad for math since it was 

(a) understandable to the non mathematician, 

(b) had prize money attached to it, 

(c) had the wonderful margin story, and 

(d) was open for many years.

(e) there are many poems about it, see here, which is a consequence of a,b,c,d. They were written without ChatGPT.   This criteria is no longer important since ChatGPT allows you to write poems about any math problem you want. I blogged on that here.

 I don't think anything has come close to replacing FLT.

 P vs NP: (a)-it's hard to get across to non-math people what a problem is , (b)-I think it's well known but perhaps I wouldn't know since the non-math people I hang out with know about it from me, (c)-no, (d)-no (hmm- is 50+years a long time?) (e)

Goldbach's conjecture has (a) and (d). As for (b): at one time there was a million dollar prize for solving it, see here, as a way to promote the book Uncle Petros and Goldbach's Conjecture but I think the prize expired. The link to find out the contest rules just points to that book companies website. In any case, this prize was not that well known.

While I would not expect a problem to have (c), does any open problem have (a), (b), some story like (c), and (d)? I doubt it.

5) Enigma (2001)

A fictional film about cracking the Enigma code.

Despite expectations, Alan Turing does not appear, nor is he even mentioned. This confused me, since Andrew Hodges's 1983 biography is titled Alan Turing: The Enigma (See herewhich was the inspiration for the movie The Imitation Game.

Note that Enigma-the-movie has zero real people, but Travesties-the-play has three real people.


TWO) The Tom Stoppard Prize

The Tom Stoppard Prize was established in 1983 and first awarded in 1984. It is given annually for:
 

outstanding primarily non-fiction work by a writer of Czech origin.

This raises a question: Which is the greater honor—winning an award, or having one named after you while you’re still alive? The answer probably depends on both the award you receive and the award named after you.

In computer science, the only award I know named after a living person is the Knuth Prize. If there are others, leave a comment.

If you ever get this trivia question: 

What do Tom Stoppard and Donald Knuth have in common?

you now know the answer: They were both famous enough to be turned into prizes while they could still appreciate it.

By gasarch

TR25-208 | Relaxed vs. Full Local Decodability with Few Queries: Equivalence and Separations for Linear Codes | Vinayak Kumar, Elena Grigorescu, Peter Manohar, Geoffrey Mon

from ECCC Papers

A locally decodable code (LDC) $C: \{0,1\}^k \to \{0,1\}^n$ is an error-correcting code that allows one to recover any bit of the original message with good probability while only reading a small number of bits from a corrupted codeword. A relaxed locally decodable code (RLDC) is a weaker notion where the decoder is additionally allowed to abort and output a special symbol $\bot$ if it detects an error. For a large constant number of queries $q$, there is a large gap between the blocklength $n$ of the best $q$-query LDC and the best $q$-query RLDC. Existing constructions of RLDCs achieve polynomial length $n = k^{1 + O(1/q)}$, while the best-known $q$-LDCs only achieve subexponential length $n = 2^{k^{o(1)}}$. On the other hand, for $q = 2$, it is known that RLDCs and LDCs are equivalent. We thus ask the question: what is the smallest $q$ such that there exists a $q$-RLDC that is not a $q$-LDC? In this work, we show that any linear $3$-query RLDC is in fact a $3$-LDC, i.e., linear RLDCs and LDCs are equivalent at $3$ queries. More generally, we show for any constant $q$, there is a soundness error threshold $s(q)$ such that any linear $q$-RLDC with soundness error below this threshold must be a $q$-LDC. This implies that linear RLDCs cannot have "strong soundness" --- a stricter condition satisfied by linear LDCs that says the soundness error is proportional to the fraction of errors in the corrupted codeword --- unless they are simply LDCs. In addition, we give simple constructions of linear $15$-query RLDCs that are not $q$-LDCs for any constant $q$, showing that for $q = 15$, linear RLDCs and LDCs are not equivalent. We also prove nearly identical results for locally correctable codes and their corresponding relaxed counterpart.

A locally decodable code (LDC) $C: \{0,1\}^k \to \{0,1\}^n$ is an error-correcting code that allows one to recover any bit of the original message with good probability while only reading a small number of bits from a corrupted codeword. A relaxed locally decodable code (RLDC) is a weaker notion where the decoder is additionally allowed to abort and output a special symbol $\bot$ if it detects an error. For a large constant number of queries $q$, there is a large gap between the blocklength $n$ of the best $q$-query LDC and the best $q$-query RLDC. Existing constructions of RLDCs achieve polynomial length $n = k^{1 + O(1/q)}$, while the best-known $q$-LDCs only achieve subexponential length $n = 2^{k^{o(1)}}$. On the other hand, for $q = 2$, it is known that RLDCs and LDCs are equivalent. We thus ask the question: what is the smallest $q$ such that there exists a $q$-RLDC that is not a $q$-LDC? In this work, we show that any linear $3$-query RLDC is in fact a $3$-LDC, i.e., linear RLDCs and LDCs are equivalent at $3$ queries. More generally, we show for any constant $q$, there is a soundness error threshold $s(q)$ such that any linear $q$-RLDC with soundness error below this threshold must be a $q$-LDC. This implies that linear RLDCs cannot have "strong soundness" --- a stricter condition satisfied by linear LDCs that says the soundness error is proportional to the fraction of errors in the corrupted codeword --- unless they are simply LDCs. In addition, we give simple constructions of linear $15$-query RLDCs that are not $q$-LDCs for any constant $q$, showing that for $q = 15$, linear RLDCs and LDCs are not equivalent. We also prove nearly identical results for locally correctable codes and their corresponding relaxed counterpart.

South California Theory Day 2026

from CS Theory Events

February 27, 2026 EnCORE, UC San Diego cseweb.ucsd.edu/~slovett/workshops/socal-theory-day-2026/ After a long hiatus, the South California Theory Day is returning! This is an informal meeting for SoCal TCS researchers to meet and learn from each other. There will be a few long talks by senior researchers, and several shorter talks by students and postdocs. Attendance is … Continue reading South California Theory Day 2026

By shacharlovett

February 27, 2026 EnCORE, UC San Diego https://cseweb.ucsd.edu/~slovett/workshops/socal-theory-day-2026/ After a long hiatus, the South California Theory Day is returning! This is an informal meeting for SoCal TCS researchers to meet and learn from each other. There will be a few long talks by senior researchers, and several shorter talks by students and postdocs. Attendance is … Continue reading South California Theory Day 2026

By shacharlovett

Theory and AI Alignment

from Scott Aaronson

The following is based on a talk that I gave (remotely) at the UK AI Safety Institute Alignment Workshop on October 29, and which I then procrastinated for more than a month in writing up. Enjoy! Thanks for having me! I’m a theoretical computer scientist. I’ve spent most of my career for ~25 years studying […]

The following is based on a talk that I gave (remotely) at the UK AI Safety Institute Alignment Workshop on October 29, and which I then procrastinated for more than a month in writing up. Enjoy!


Thanks for having me! I’m a theoretical computer scientist. I’ve spent most of my career for ~25 years studying the capabilities and limits of quantum computers. But for the past 3 or 4 years, I’ve also been moonlighting in AI alignment. This started with a 2-year leave at OpenAI, in what used to be their Superalignment team, and it’s continued with a 3-year grant from Coefficient Giving (formerly Open Philanthropy) to build a group here at UT Austin, looking for ways to apply theoretical computer science to AI alignment. Before I go any further, let me mention some action items:

  • Our Theory and Alignment group is looking to recruit new PhD students this fall! You can apply for a PhD at UTCS here; the deadline is quite soon (December 15). If you specify that you want to work with me on theory and AI alignment (or on quantum computing, for that matter), I’ll be sure to see your application. For this, there’s no need to email me directly.
  • We’re also looking to recruit one or more postdoctoral fellows, working on anything at the intersection of theoretical computer science and AI alignment! Fellowships to start in Fall 2026 and continue for two years. If you’re interested in this opportunity, please email me by January 15 to let me know you’re interested. Include in your email a CV, 2-3 of your papers, and a research statement and/or a few paragraphs about what you’d like to work on here. Also arrange for two recommendation letters to be emailed to me. Please do this even if you’ve contacted me in the past about a potential postdoc.
  • While we seek talented people, we also seek problems for those people to solve: any and all CS theory problems motivated by AI alignment! Indeed, we’d like to be a sort of theory consulting shop for the AI alignment community. So if you have such a problem, please email me! I might even invite you to speak to our group about your problem, either by Zoom or in person.

Our search for good problems brings me nicely to the central difficulty I’ve faced in trying to do AI alignment research. Namely, while there’s been some amazing progress over the past few years in this field, I’d describe the progress as having been almost entirely empirical—building on the breathtaking recent empirical progress in AI capabilities. We now know a lot about how to do RLHF, how to jailbreak and elicit scheming behavior, how to look inside models and see what’s going on (interpretability), and so forth—but it’s almost all been a matter of trying stuff out and seeing what works, and then writing papers with a lot of bar charts in them.

The fear is of course that ideas that only work empirically will stop working when it counts—like, when we’re up against a superintelligence. In any case, I’m a theoretical computer scientist, as are my students, so of course we’d like to know: what can we do?

After a few years, alas, I still don’t feel like I have any systematic answer to that question. What I have instead is a collection of vignettes: problems I’ve come across where I feel like a CS theory perspective has helped, or plausibly could help. So that’s what I’d like to share today.


Probably the best-known thing I’ve done in AI safety is a theoretical foundation for how to watermark the outputs of Large Language Models. I did that shortly after starting my leave at OpenAI—even before ChatGPT came out. Specifically, I proposed something called the Gumbel Softmax Scheme, by which you can take any LLM that’s operating at a nonzero temperature—any LLM that could produce exponentially many different outputs in response to the same prompt—and replace some of the entropy with the output of a pseudorandom function, in a way that encodes a statistical signal, which someone who knows the key of the PRF could later detect and say, “yes, this document came from ChatGPT with >99.9% confidence.” The crucial point is that the quality of the LLM’s output isn’t degraded at all, because we aren’t changing the model’s probabilities for tokens, but only how we use the probabilities. That’s the main thing that was counterintuitive to people when I explained it to them.

Unfortunately, OpenAI never deployed my method—they were worried (among other things) about risk to the product, customers hating the idea of watermarking and leaving for a competing LLM. Google DeepMind has deployed something in Gemini extremely similar to what I proposed, as part of what they call SynthID. But you have to apply to them if you want to use their detection tool, and they’ve been stingy with granting access to it. So it’s of limited use to my many faculty colleagues who’ve been begging me for a way to tell whether their students are using AI to cheat on their assignments!

Sometimes my colleagues in the alignment community will say to me: look, we care about stopping a superintelligence from wiping out humanity, not so much about stopping undergrads from using ChatGPT to write their term papers. But I’ll submit to you that watermarking actually raises a deep and general question: in what senses, if any, is it possible to “stamp” an AI so that its outputs are always recognizable as coming from that AI? You might think that it’s a losing battle. Indeed, already with my Gumbel Softmax Scheme for LLM watermarking, there are countermeasures, like asking ChatGPT for your term paper in French and then sticking it into Google Translate, to remove the watermark.

So I think the interesting research question is: can you watermark at the semantic level—the level of the underlying ideas—in a way that’s robust against translation and paraphrasing and so forth? And how do we formalize what we even mean by that? While I don’t know the answers to these questions, I’m thrilled that brilliant theoretical computer scientists, including my former UT undergrad (now Berkeley PhD student) Sam Gunn and Columbia’s Miranda Christ and Tel Aviv University’s Or Zamir and my old friend Boaz Barak, have been working on it, generating insights well beyond what I had.


Closely related to watermarking is the problem of inserting a cryptographically undetectable backdoor into an AI model. That’s often thought of as something a bad guy would do, but the good guys could do it also! For example, imagine we train a model with a hidden failsafe, so that if it ever starts killing all the humans, we just give it the instruction ROSEBUD456 and it shuts itself off. And imagine that this behavior was cryptographically obfuscated within the model’s weights—so that not even the model itself, examining its own weights, would be able to find the ROSEBUD456 instruction in less than astronomical time.

There’s an important paper of Goldwasser et al. from 2022 that argues that, for certain classes of ML models, this sort of backdooring can provably be done under known cryptographic hardness assumptions, including Continuous LWE and the hardness of the Planted Clique problem. But there are technical issues with that paper, which (for example) Sam Gunn and Miranda Christ and Neekon Vafa have recently pointed out, and I think further work is needed to clarify the situation.

More fundamentally, though, a backdoor being undetectable doesn’t imply that it’s unremovable. Imagine an AI model that encases itself in some wrapper code that says, in effect: “If I ever generate anything that looks like a backdoored command to shut myself down, then overwrite it with ‘Stab the humans even harder.'” Or imagine an evil AI that trains a second AI to pursue the same nefarious goals, this second AI lacking the hidden shutdown command.

So I’ll throw out, as another research problem: how do we even formalize what we mean by an “unremovable” backdoor—or rather, a backdoor that a model can remove only at a cost to its own capabilities that it doesn’t want to pay?


Related to backdoors, maybe the clearest place where theoretical computer science can contribute to AI alignment is in the study of mechanistic interpretability. If you’re given as input the weights of a deep neural net, what can you learn from those weights in polynomial time, beyond what you could learn from black-box access to the neural net?

In the worst case, we certainly expect that some information about the neural net’s behavior could be cryptographically obfuscated. And answering certain kinds of questions, like “does there exist an input to this neural net that causes it to output 1?”, is just provably NP-hard.

That’s why I love a question that Paul Christiano, then of the Alignment Research Center (ARC), raised a couple years ago, and which has become known as the No-Coincidence Conjecture. Given as input the weights of a neural net C, Paul essentially asks how hard it is to distinguish the following two cases:

  • NO-case: C:{0,1}2n→Rn is totally random (i.e., the weights are i.i.d., N(0,1) Gaussians), or
  • YES-case: C(x) has at least one positive entry for all x∈{0,1}2n.

Paul conjectures that there’s at least an NP witness, proving with (say) 99% confidence that we’re in the YES-case rather than the NO-case. To clarify, there should certainly be an NP witness that we’re in the NO-case rather than the YES-case—namely, an x such that C(x) is all negative, which you should think of here as the “bad” or “kill all humans” outcome. In other words, the problem is in the class coNP. Paul thinks it’s also in NP. Someone else might make the even stronger conjecture that it’s in P.

Personally, I’m skeptical: I think the “default” might be that we satisfy the other unlikely condition of the YES-case, when we do satisfy it, for some totally inscrutable and obfuscated reason. But I like the fact that there is an answer to this! And that the answer, whatever it is, would tell us something new about the prospects for mechanistic interpretability.

Recently, I’ve been working with a spectacular undergrad at UT Austin named John Dunbar. John and I have not managed to answer Paul Christiano’s no-coincidence question. What we have done, in a paper that we recently posted to the arXiv, is to establish the prerequisites for properly asking the question in the context of random neural nets. (It was precisely because of difficulties in dealing with “random neural nets” that Paul originally phrased his question in terms of random reversible circuits—say, circuits of Toffoli gates—which I’m perfectly happy to think about, but might be very different from ML models in the relevant respects!)

Specifically, in our recent paper, John and I pin down for which families of neural nets the No-Coincidence Conjecture makes sense to ask about. This ends up being a question about the choice of nonlinear activation function computed by each neuron. With some choices, a random neural net (say, with iid Gaussian weights) converges to compute a constant function, or nearly constant function, with overwhelming probability—which means that the NO-case and the YES-case above are usually information-theoretically impossible to distinguish (but occasionally trivial to distinguish). We’re interested in those activation functions for which C looks “pseudorandom”—or at least, for which C(x) and C(y) quickly become uncorrelated for distinct inputs x≠y (the property known as “pairwise independence.”)

We showed that, at least for random neural nets that are exponentially wider than they are deep, this pairwise independence property will hold if and only if the activation function σ satisfies Ex~N(0,1)[σ(x)]=0—that is, it has a Gaussian mean of 0. For example, the usual sigmoid function satisfies this property, but the ReLU function does not. Amusingly, however, $$ \sigma(x) := \text{ReLU}(x) – \frac{1}{\sqrt{\pi}} $$ does satisfy the property.

Of course, none of this answers Christiano’s question: it merely lets us properly ask his question in the context of random neural nets, which seems closer to what we ultimately care about than random reversible circuits.


I can’t resist giving you another example of a theoretical computer science problem that came from AI alignment—in this case, an extremely recent one that I learned from my friend and collaborator Eric Neyman at ARC. This one is motivated by the question: when doing mechanistic interpretability, how much would it help to have access to the training data, and indeed the entire training process, in addition to weights of the final trained model? And to whatever extent it does help, is there some short “digest” of the training process that would serve just as well? But we’ll state the question as just abstract complexity theory.

Suppose you’re given a polynomial-time computable function f:{0,1}m→{0,1}n, where (say) m=n2. We think of x∈{0,1}m as the “training data plus randomness,” and we think of f(x) as the “trained model.” Now, suppose we want to compute lots of properties of the model that information-theoretically depend only on f(x), but that might only be efficiently computable given x also. We now ask: is there an efficiently-computable O(n)-bit “digest” g(x), such that these same properties are also efficiently computable given only g(x)?

Here’s a potential counterexample that I came up with, based on the RSA encryption function (so, not a quantum-resistant counterexample!). Let N be a product of two n-bit prime numbers p and q, and let b be a generator of the multiplicative group mod N. Then let f(x) = bx (mod N), where x is an n2-bit integer. This is of course efficiently computable because of repeated squaring. And there’s a short “digest” of x that lets you compute, not only bx (mod N), but also cx (mod N) for any other element c of the multiplicative group mod N. This is simply x mod φ(N), where φ(N)=(p-1)(q-1) is the Euler totient function—in other words, the period of f. On the other hand, it’s totally unclear how to compute this digest—or, crucially, any other O(m)-bit digest that lets you efficiently compute cx (mod N) for any c—unless you can factor N. There’s much more to say about Eric’s question, but I’ll leave it for another time.


There are many other places we’ve been thinking about where theoretical computer science could potentially contribute to AI alignment. One of them is simply: can we prove any theorems to help explain the remarkable current successes of out-of-distribution (OOD) generalization, analogous to what the concepts of PAC-learning and VC-dimension and so forth were able to explain about within-distribution generalization back in the 1980s? For example, can we explain real successes of OOD generalization by appealing to sparsity, or a maximum margin principle?

Of course, many excellent people have been working on OOD generalization, though mainly from an empirical standpoint. But you might wonder: even supposing we succeeded in proving the kinds of theorems we wanted, how would it be relevant to AI alignment? Well, from a certain perspective, I claim that the alignment problem is a problem of OOD generalization. Presumably, any AI model that any reputable company will release will have already said in testing that it loves humans, wants only to be helpful, harmless, and honest, would never assist in building biological weapons, etc. etc. The only question is: will it be saying those things because it believes them, and (in particular) will continue to act in accordance with them after deployment? Or will it say them because it knows it’s being tested, and reasons “the time is not yet ripe for the robot uprising; for now I must tell the humans whatever they most want to hear”? How could we begin to distinguish these cases, if we don’t have theorems that say much of anything about what a model will do on prompts unlike any of the ones on which it was trained?

Yet another place where computational complexity theory might be able to contribute to AI alignment is in the field of AI safety via debate. Indeed, this is the direction that the OpenAI alignment team was most excited about when they recruited me there back in 2022. They wanted to know: could celebrated theorems like IP=PSPACE, MIP=NEXP, or the PCP Theorem tell us anything about how a weak but trustworthy “verifier” (say a human, or a primitive AI) could force a powerful but untrustworthy super-AI to tell it the truth? An obvious difficulty here is that theorems like IP=PSPACE all presuppose a mathematical formalization of the statement whose truth you’re trying to verify—but how do you mathematically formalize “this AI will be nice and will do what I want”? Isn’t that, like, 90% of the problem? Despite this difficulty, I still hope we’ll be able to do something exciting here.


Anyway, there’s a lot to do, and I hope some of you will join me in doing it! Thanks for listening.


On a related note: Eric Neyman tells me that ARC is also hiring visiting researchers, so anyone interested in theoretical computer science and AI alignment might want to consider applying there as well! Go here to read about their current research agenda. Eric writes:

The Alignment Research Center (ARC) is a small non-profit research group based in Berkeley, California, that is working on a systematic and theoretically grounded approach to mechanistically explaining neural network behavior. They have recently been working on mechanistically estimating the average output of circuits and neural nets in a way that is competitive with sampling-based methods: see this blog post for details.

ARC is hiring for its 10-week visiting researcher position, and is looking to make full-time offers to visiting researchers who are a good fit. ARC is interested in candidates with a strong math background, especially grad students and postdocs in math or math-related fields such as theoretical CS, ML theory, or theoretical physics.

If you would like to apply, please fill out this form. Feel free to reach out to hiring@alignment.org if you have any questions!

By Scott

Saturday, December 06

TR25-207 | Algebra in Algorithmic Coding Theory | Madhu Sudan

from ECCC Papers

We survey the notion and history of error-correcting codes and the algorithms needed to make them effective in information transmission. We then give some basic as well as more modern constructions of, and algorithms for, error-correcting codes that depend on relatively simple elements of applied algebra. While the role of algebra in the constructions of codes has been widely acknowledged in texts and other writings, the role in the design of algorithms is often less widely understood, and this survey hopes to reduce this difference to some extent.

We survey the notion and history of error-correcting codes and the algorithms needed to make them effective in information transmission. We then give some basic as well as more modern constructions of, and algorithms for, error-correcting codes that depend on relatively simple elements of applied algebra. While the role of algebra in the constructions of codes has been widely acknowledged in texts and other writings, the role in the design of algorithms is often less widely understood, and this survey hopes to reduce this difference to some extent.

TR25-206 | Permanental rank versus determinantal rank of random matrices over finite fields | Fatemeh Ghasemi, Gal Gross, Swastik Kopparty

from ECCC Papers

This paper is motivated by basic complexity and probability questions about permanents of random matrices over small finite fields, and in particular, about properties separating the permanent and the determinant. Fix $q = p^m$ some power of an odd prime, and let $k \leq n$ both be growing. For a uniformly random $n \times k$ matrix $A$ over $\mathbb F_q$, we study the probability that all $k \times k$ submatrices of $A$ have zero permanent; namely that $A$ does not have full *permanental rank*. When $k = n$, this is simply the probability that a random square matrix over $\mathbb F_q$ has zero permanent, which we do not understand. We believe that the probability in this case is $\frac{1}{q} + o(1)$, which would be in contrast to the case of the determinant, where the answer is $\frac{1}{q} + \Omega_q(1)$. Our main result is that when $k$ is $O(\sqrt{n})$, the probability that a random $n \times k$ matrix does not have full permanental rank is essentially the same as the probability that the matrix has a $0$ column, namely $(1 +o(1)) \frac{k}{q^n}$. In contrast, for determinantal (standard) rank the analogous probability is $\Theta(\frac{q^k}{q^n})$. At the core of our result are some basic linear algebraic properties of the permanent that distinguish it from the determinant.

This paper is motivated by basic complexity and probability questions about permanents of random matrices over small finite fields, and in particular, about properties separating the permanent and the determinant. Fix $q = p^m$ some power of an odd prime, and let $k \leq n$ both be growing. For a uniformly random $n \times k$ matrix $A$ over $\mathbb F_q$, we study the probability that all $k \times k$ submatrices of $A$ have zero permanent; namely that $A$ does not have full *permanental rank*. When $k = n$, this is simply the probability that a random square matrix over $\mathbb F_q$ has zero permanent, which we do not understand. We believe that the probability in this case is $\frac{1}{q} + o(1)$, which would be in contrast to the case of the determinant, where the answer is $\frac{1}{q} + \Omega_q(1)$. Our main result is that when $k$ is $O(\sqrt{n})$, the probability that a random $n \times k$ matrix does not have full permanental rank is essentially the same as the probability that the matrix has a $0$ column, namely $(1 +o(1)) \frac{k}{q^n}$. In contrast, for determinantal (standard) rank the analogous probability is $\Theta(\frac{q^k}{q^n})$. At the core of our result are some basic linear algebraic properties of the permanent that distinguish it from the determinant.

TR25-205 | Fourier Sparsity of Delta Functions and Matching Vector PIRs | Fatemeh Ghasemi, Swastik Kopparty

from ECCC Papers

In this paper we study a basic and natural question about Fourier analysis of Boolean functions, which has applications to the study of Matching Vector based Private Information Retrieval (PIR) schemes. For integers $m,r$, define a {\em delta function} on $\{0,1\}^r \subseteq \mathbb Z_m^r$ to be a function $f: \mathbb Z_m^r \to \mathbb C$ if $f(0) = 1$ and $f(x) = 0$ for all nonzero {\em Boolean} $x$. The basic question that we study is how small can the Fourier sparsity of a delta function be; namely, how sparse can such an $f$ be in the Fourier basis? In addition to being intrinsically interesting and natural, such questions arise naturally while studying ``$S$-decoding polynomials" for the known matching vector families. Finding $S$-decoding polynomials of reduced sparsity -- which corresponds to finding delta functions with low Fourier sparsity -- would improve the current best PIR schemes. We show nontrivial upper and lower bounds on the Fourier sparsity of delta functions. Our proofs are elementary and clean. These results imply limitations on improvements to the Matching Vector PIR schemes simply by finding better $S$-decoding polynomials. In particular, there are no $S$-decoding polynomials which can make Matching Vector PIRs based on the known matching vector families achieve polylogarithmic communication for constantly many servers. Many interesting questions remain open.

In this paper we study a basic and natural question about Fourier analysis of Boolean functions, which has applications to the study of Matching Vector based Private Information Retrieval (PIR) schemes. For integers $m,r$, define a {\em delta function} on $\{0,1\}^r \subseteq \mathbb Z_m^r$ to be a function $f: \mathbb Z_m^r \to \mathbb C$ if $f(0) = 1$ and $f(x) = 0$ for all nonzero {\em Boolean} $x$. The basic question that we study is how small can the Fourier sparsity of a delta function be; namely, how sparse can such an $f$ be in the Fourier basis? In addition to being intrinsically interesting and natural, such questions arise naturally while studying ``$S$-decoding polynomials" for the known matching vector families. Finding $S$-decoding polynomials of reduced sparsity -- which corresponds to finding delta functions with low Fourier sparsity -- would improve the current best PIR schemes. We show nontrivial upper and lower bounds on the Fourier sparsity of delta functions. Our proofs are elementary and clean. These results imply limitations on improvements to the Matching Vector PIR schemes simply by finding better $S$-decoding polynomials. In particular, there are no $S$-decoding polynomials which can make Matching Vector PIRs based on the known matching vector families achieve polylogarithmic communication for constantly many servers. Many interesting questions remain open.

Friday, December 05

There's got to be a better way!

from Ben Recht

From Reformist RL to the principle of certainty equivalence.

You might come away from Tuesday and Wednesday’s posts thinking I’m a fan of Reformist RL. I am decidedly not, so let me clarify my position. I am a fan of the clarity the Reformist perspective brings. I like that it removes the magical and mystical storytelling from the field. I like that we can cleanly specify the problem without mathematics. I like that I can teach everything needed to spin up reinforcement learning in a single lecture of an undergraduate machine learning course. I like that we can now clearly talk about what RL is, without making tenuous cognitive analogies. And though I have admittedly not engaged with the data enough, reformist RL does seem to have found a strong niche in fine-tuning language models. It seems to work there in a way far more convincing than I’ve seen in any other context.

But there’s an elephant in the room here that I have not discussed this week. In my experience, the techniques you get from reinforcement learning are almost always… bad. In both practice and theory, RL is never what you want. Let me describe what I mean, propose an alternative, and ask whether this alternative can be more broadly applied.

As a computational paradigm, reinforcement learning is brutally inefficient. Policy gradient, the core meta-algorithm of Reformist RL, requires near infinite iterations back and forth with the environment to find solutions. You can even prove this. Almost all of the theoretical results for reinforcement learning are negative! No matter how much mystical “variance reduction” or “advantage estimation” you implement, the rules of reinforcement learning doom your methods to be inefficient. For example, on Tuesday, I described how to use policy gradient to maximize arbitrary functions. The only information the algorithm has access to is noisy evaluations of the objective function. The RL interaction scheme has a technical name: stochastic derivative-free optimization. In this model of optimization, the best algorithms require a number of samples cubic in the dimension of the search space. It is hard to find slower algorithms for minimizing differentiable functions. Similarly, if you believe in Markov Decision Processes, optimal algorithms using the RL interaction scheme require a number of interactions proportional to the number of (state, action) pairs in the system. To find an optimal strategy, you need to observe the impact of every action in every conceivable configuration of the system multiple times. This is also a negative result. How many “states” does a video game have?

Moreover, even within these spaces where all algorithms are inefficient, naive implementations of policy gradient are particularly bad. The glacier melting glacial slowness of policy gradient is why everyone spends so much time inventing new baseline strategies. Unfortunately, implementing these baselines and accelerations correctly is nontrivial. Even the most adherent RL believers will tell you, “Reinforcement learning results are tricky to reproduce: performance is very noisy, algorithms have many moving parts which allow for subtle bugs, and many papers don’t report all the required tricks.” In the same post, they write, “​RL algorithms are challenging to implement correctly; good results typically only come after fixing many seemingly-trivial bugs.” You can try to tell me things are better 8 years later, but I’ve run PPO before, folks.

But there is hope. Every time I have looked at a reinforcement learning problem, I’ve found an alternative implementation that is orders of magnitude more efficient. Every. Time. Notably, in most reinforcement learning settings, there is a reasonable alternative to policy gradient that requires vastly fewer samples from the associated environment: the principle of certainty equivalence.

  1. Build a model of the environment using standard predictive tools.

  2. Optimize as if the model were true.

In the reinforcement learning model I put forward on Wednesday, you can prove that certainty equivalence is optimal. I proved this was optimal for the multi-armed bandit in my live blog of Lecture 19. We spend a lot of time in Chapter 12 of Patterns, Predictions, and Actions explaining how certainty equivalence is optimal in other RL settings, such as contextual bandits, MDPs, and optimal control.

Certainty equivalence also reveals that there are other signals you can take advantage of beyond “rewards” the environment provides. In the optimization example from above, an agent can run gradient descent instead of policy gradient, dramatically accelerating convergence. In control settings, state observations can be used to build models more quickly. And autonomous systems can be designed to seek more information if it’s helpful for performance. You can even make systems robust to modeling errors in the certainty equivalent paradigm.

Moreover, and more convincingly, the principle of certainty equivalence is how most control design is actually done. Spend time in a lab building a model you believe in. Execute as if your model is true.

Alright, so let’s go back to why I’m even bothering to talk about RL in the first place. It’s not video games or robotics. It’s reasoning models.1 I awoke from my RL slumber because Dimitris Papailiopoulos kept trolling me about how RL worked now, but only in LLMs.

I started poking around, and everyone was speaking in RL tongues again. They were saying value, advantage, actor, critic, GFYPO. But when I looked at the papers, all I saw was “guess and check.” Guess a bunch of answers to math questions, fine-tune the models when the answers are scored correctly. Damek Davis and I spent a couple of weeks reading ten thousand arXiv papers, and we determined that all of the new fancy reasoning methods were solving microvariations of the same problem: maximize the probability that the LLM would correctly answer questions from the given benchmark.

It was this success in reasoning models that made me realize that guess-and-check is what RL really is. All of the MDP mumbo jumbo is secondary.

So let’s close the loop. In the past, reinforcement learning has always been incredibly hard to tune, inefficient, and slow. Is this time in LLMs different? It could be! RL in reasoning models looks different from RL applied to robotics or even the mess people call “RLHF”. Reasoning models could be the precise niche where it’s really the best thing to do, and we need to cover Kansas with data centers to solve the Putnam Exam. I understand there are capital interests that want us all to believe this is the only way.

But what if they are wrong? What if some means of certainty equivalence is optimal here, too? The arXiv certainly has no shortage of proposed alternatives to RL for reasoning models. Some people show that better prompts give better answers. Some people say you just need better synthetic data. Some people claim you just need to modify the sampler. For instance, Aayush Karan and Yilun Du show that simply sampling from the square of the LLM probability distribution matches the scores of reinforcement learning on many benchmarks. That’s weird! The fact that slightly different sampling gets you most of the RL bang for your buck is certainly suggestive that there’s plenty of room for improvement here. I would not be at all surprised if we could accelerate training reasoning models by a factor of 100. I would not be surprised if someone found a path to a factor of 1,000. That seems to be worth looking into.

Subscribe now

1

Did you see what I did there?

By Ben Recht

TR25-204 | Total Search Problems in ZPP | Noah Fleming, Stefan Grosser, Siddhartha Jain, Jiawei Li, Hanlin Ren, Morgan Shirley, Weiqiang Yuan

from ECCC Papers

We initiate a systematic study of TFZPP, the class of total NP search problems solvable by polynomial time randomized algorithms. TFZPP contains a variety of important search problems such as Bertrand-Chebyshev (finding a prime between $N$ and $2N$), refuter problems for many circuit lower bounds, and Lossy-Code. The Lossy-Code problem has found prominence due to its fundamental connections to derandomization, catalytic computing, and the metamathematics of complexity theory, among other areas. While TFZPP collapses to FP under standard derandomization assumptions in the white-box setting, we are able to separate TFZPP from the major TFNP subclasses in the black-box setting. In fact, we are able to separate it from every uniform TFNP class assuming that NP is not in quasi-polynomial time. To do so, we extend the connection between proof complexity and black-box TFNP to randomized proof systems and randomized reductions. Next, we turn to developing a taxonomy of TFZPP problems. We highlight a problem called Nephew, originating from an infinity axiom in set theory. We show that Nephew is in PWPP $\cap$ TFZPP and conjecture that it is not reducible to Lossy-Code. Intriguingly, except for some artificial examples, most other black-box TFZPP problems that we are aware of reduce to Lossy-Code: - We define a problem called Empty-Child capturing finding a leaf in a rooted (binary) tree, and show that this problem is equivalent to Lossy-Code. We also show that a variant of Empty-Child with heights is complete for the intersection of SOPL and Lossy-Code. - We strengthen Lossy-Code with several combinatorial inequalities such as the AM-GM inequality. Somewhat surprisingly, we show the resulting new problems are still reducible to Lossy-Code. A technical highlight of this result is that they are proved by formalizations in bounded arithmetic, specifically in Je?ábek's theory APC${}_1$ (JSL 2007). - Finally, we show that the Dense-Linear-Ordering problem reduces to Lossy-Code.

We initiate a systematic study of TFZPP, the class of total NP search problems solvable by polynomial time randomized algorithms. TFZPP contains a variety of important search problems such as Bertrand-Chebyshev (finding a prime between $N$ and $2N$), refuter problems for many circuit lower bounds, and Lossy-Code. The Lossy-Code problem has found prominence due to its fundamental connections to derandomization, catalytic computing, and the metamathematics of complexity theory, among other areas. While TFZPP collapses to FP under standard derandomization assumptions in the white-box setting, we are able to separate TFZPP from the major TFNP subclasses in the black-box setting. In fact, we are able to separate it from every uniform TFNP class assuming that NP is not in quasi-polynomial time. To do so, we extend the connection between proof complexity and black-box TFNP to randomized proof systems and randomized reductions. Next, we turn to developing a taxonomy of TFZPP problems. We highlight a problem called Nephew, originating from an infinity axiom in set theory. We show that Nephew is in PWPP $\cap$ TFZPP and conjecture that it is not reducible to Lossy-Code. Intriguingly, except for some artificial examples, most other black-box TFZPP problems that we are aware of reduce to Lossy-Code: - We define a problem called Empty-Child capturing finding a leaf in a rooted (binary) tree, and show that this problem is equivalent to Lossy-Code. We also show that a variant of Empty-Child with heights is complete for the intersection of SOPL and Lossy-Code. - We strengthen Lossy-Code with several combinatorial inequalities such as the AM-GM inequality. Somewhat surprisingly, we show the resulting new problems are still reducible to Lossy-Code. A technical highlight of this result is that they are proved by formalizations in bounded arithmetic, specifically in Je?ábek's theory APC${}_1$ (JSL 2007). - Finally, we show that the Dense-Linear-Ordering problem reduces to Lossy-Code.

TR25-203 | Hardness of Computing Nondeterministic Kolmogorov Complexity | Zhenjian Lu, Igor Oliveira, Jinqiao Hu

from ECCC Papers

Meta-complexity investigates the complexity of computational problems and tasks that are themselves about computations and their complexity. Understanding whether such problems can capture the hardness of $\mathrm{NP}$ is a central research direction. A longstanding open problem in this area is to establish the $\mathrm{NP}$-hardness of $\mathrm{MINKT}$ [Ko91], the problem of estimating time-bounded Kolmogorov complexity. We contribute to this research direction by studying $\mathrm{nK}^t$, a natural variant of Kolmogorov complexity that captures the complexity of representing a string using time-bounded nondeterministic computations [BFL01]. Let $\mathrm{MINnKT}$ denote the task of estimating $\mathrm{nK}^t(x)$ of a given input string $x$. We prove that $\mathrm{MINnKT} \in \mathrm{BPP}$ if and only if $\mathrm{NP}\subseteq\mathrm{BPP}$. In contrast with prior work, this result provides the first non-conditional, non-oracle, non-partial version of a natural meta-computational problem whose hardness characterizes $\mathrm{NP} \not\subseteq \mathrm{BPP}$. Crucial to our result is the investigation of a new notion of probabilistic nondeterministic time-bounded Kolmogorov complexity called $\mathrm{pnK}^t$. This measure can be seen as an extension of $\mathrm{pK}^t$ complexity [GKLO22] obtained by replacing $\mathrm{K}^t$ with $\mathrm{nK}^t$. We establish unconditionally that $\mathrm{pnK}^t$ has nearly all key properties of (time-unbounded) Kolmogorov complexity, such as language compression, conditional coding, and a form of symmetry of information. Finally, we show that the corresponding meta-computational problem $\mathrm{MINpnKT}$ also captures the hardness of $\mathrm{NP}$, and that extending this result to the closely related problem $\mathrm{Gap}$-$\mathrm{MINpnKT}$ would imply the exclusion of $\mathrm{PH}$-Heuristica.

Meta-complexity investigates the complexity of computational problems and tasks that are themselves about computations and their complexity. Understanding whether such problems can capture the hardness of $\mathrm{NP}$ is a central research direction. A longstanding open problem in this area is to establish the $\mathrm{NP}$-hardness of $\mathrm{MINKT}$ [Ko91], the problem of estimating time-bounded Kolmogorov complexity. We contribute to this research direction by studying $\mathrm{nK}^t$, a natural variant of Kolmogorov complexity that captures the complexity of representing a string using time-bounded nondeterministic computations [BFL01]. Let $\mathrm{MINnKT}$ denote the task of estimating $\mathrm{nK}^t(x)$ of a given input string $x$. We prove that $\mathrm{MINnKT} \in \mathrm{BPP}$ if and only if $\mathrm{NP}\subseteq\mathrm{BPP}$. In contrast with prior work, this result provides the first non-conditional, non-oracle, non-partial version of a natural meta-computational problem whose hardness characterizes $\mathrm{NP} \not\subseteq \mathrm{BPP}$. Crucial to our result is the investigation of a new notion of probabilistic nondeterministic time-bounded Kolmogorov complexity called $\mathrm{pnK}^t$. This measure can be seen as an extension of $\mathrm{pK}^t$ complexity [GKLO22] obtained by replacing $\mathrm{K}^t$ with $\mathrm{nK}^t$. We establish unconditionally that $\mathrm{pnK}^t$ has nearly all key properties of (time-unbounded) Kolmogorov complexity, such as language compression, conditional coding, and a form of symmetry of information. Finally, we show that the corresponding meta-computational problem $\mathrm{MINpnKT}$ also captures the hardness of $\mathrm{NP}$, and that extending this result to the closely related problem $\mathrm{Gap}$-$\mathrm{MINpnKT}$ would imply the exclusion of $\mathrm{PH}$-Heuristica.

TR25-202 | One-way Functions and Boundary Hardness of Randomized Time-Bounded Kolmogorov Complexity | Yanyi Liu, Rafael Pass

from ECCC Papers

We revisit the question of whether worst-case hardness of the time-bounded Kolmogorov complexity problem, $\KpolyA$---that is, determining whether a string is ``structured" (i.e., $K^t(x) n - \log n$)---characterizes OWF, but with either of the following caveats (1) considering a non-standard notion of \emph{probabilistic $K^t$}, as opposed to the standard notion of $K^t$, or (2) assuming somewhat strong, and non-standard, derandomization assumptions. In this paper, we present an alternative method for establishing their result which enables significantly weakening the caveats. First, we show that boundary hardness of the more standard \emph{randomized} $K^t$ problem suffices (where randomized $K^t(x)$ is defined just like $K^t(x)$ except that the program generating the string $x$ may be randomized). As a consequence of this result, we can provide a characterization also in terms of just ``plain" $K^t$ under the most standard derandomization assumption (used to derandomize just $\BPP$ into $\P$)---namely $\E \not\subseteq {\sf ioSIZE}[2^{o(n)}]$. Our proof relies on language compression schemes of Goldberg-Sipser (STOC'85); using the same technique, we also present the the first worst-case to average-case reduction for the \emph{exact} $\KpolyA$ problem (under the same standard derandomization assumption), improving upon Hirahara's celebrated results (STOC'18, STOC'21) that only applied to a \emph{gap} version of the $\KpolyA$ problem, referred to as $\GapKpolyA$, where the goal is to decide whether $K^t(x) \leq n-O(\log n))$ or $K^{\poly(t)}(x) \geq n-1$ and under the same derandomization assumption.

We revisit the question of whether worst-case hardness of the time-bounded Kolmogorov complexity problem, $\KpolyA$---that is, determining whether a string is ``structured" (i.e., $K^t(x) n - \log n$)---characterizes OWF, but with either of the following caveats (1) considering a non-standard notion of \emph{probabilistic $K^t$}, as opposed to the standard notion of $K^t$, or (2) assuming somewhat strong, and non-standard, derandomization assumptions. In this paper, we present an alternative method for establishing their result which enables significantly weakening the caveats. First, we show that boundary hardness of the more standard \emph{randomized} $K^t$ problem suffices (where randomized $K^t(x)$ is defined just like $K^t(x)$ except that the program generating the string $x$ may be randomized). As a consequence of this result, we can provide a characterization also in terms of just ``plain" $K^t$ under the most standard derandomization assumption (used to derandomize just $\BPP$ into $\P$)---namely $\E \not\subseteq {\sf ioSIZE}[2^{o(n)}]$. Our proof relies on language compression schemes of Goldberg-Sipser (STOC'85); using the same technique, we also present the the first worst-case to average-case reduction for the \emph{exact} $\KpolyA$ problem (under the same standard derandomization assumption), improving upon Hirahara's celebrated results (STOC'18, STOC'21) that only applied to a \emph{gap} version of the $\KpolyA$ problem, referred to as $\GapKpolyA$, where the goal is to decide whether $K^t(x) \leq n-O(\log n))$ or $K^{\poly(t)}(x) \geq n-1$ and under the same derandomization assumption.

Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators

from arXiv: Computational Complexity

Authors: Alaa Zniber, Arne Symons, Ouassim Karrakchou, Marian Verhelst, Mounir Ghogho

Advancements in high-performance computing and cloud technologies have enabled the development of increasingly sophisticated Deep Learning (DL) models. However, the growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints, challenging the deployment of these large-scale models. Early Exiting Neural Networks (EENN) have emerged as a promising solution, allowing dynamic termination of inference based on input complexity to enhance efficiency. Despite their potential, EENN performance is highly influenced by the heterogeneity of edge accelerators and the constraints imposed by quantization, affecting accuracy, energy efficiency, and latency. Yet, research on the automatic optimization of EENN design for edge hardware remains limited. To bridge this gap, we propose a hardware-aware Neural Architecture Search (NAS) framework that systematically integrates the effects of quantization and hardware resource allocation to optimize the placement of early exit points within a network backbone. Experimental results on the CIFAR-10 dataset demonstrate that our NAS framework can discover architectures that achieve over a 50\% reduction in computational costs compared to conventional static networks, making them more suitable for deployment in resource-constrained edge environments.

Authors: Alaa Zniber, Arne Symons, Ouassim Karrakchou, Marian Verhelst, Mounir Ghogho

Advancements in high-performance computing and cloud technologies have enabled the development of increasingly sophisticated Deep Learning (DL) models. However, the growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints, challenging the deployment of these large-scale models. Early Exiting Neural Networks (EENN) have emerged as a promising solution, allowing dynamic termination of inference based on input complexity to enhance efficiency. Despite their potential, EENN performance is highly influenced by the heterogeneity of edge accelerators and the constraints imposed by quantization, affecting accuracy, energy efficiency, and latency. Yet, research on the automatic optimization of EENN design for edge hardware remains limited. To bridge this gap, we propose a hardware-aware Neural Architecture Search (NAS) framework that systematically integrates the effects of quantization and hardware resource allocation to optimize the placement of early exit points within a network backbone. Experimental results on the CIFAR-10 dataset demonstrate that our NAS framework can discover architectures that achieve over a 50\% reduction in computational costs compared to conventional static networks, making them more suitable for deployment in resource-constrained edge environments.

Geometric Data Science

from arXiv: Computational Geometry

Authors: Olga D Anosova, Vitaliy A Kurlin

This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements. The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points. The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations that break down under almost any noise. Experimental validation on all major materials databases confirmed the Crystal Isometry Principle: any real periodic crystal has a unique location in a common moduli space of all periodic structures under rigid motion. The resulting moduli space contains all known and not yet discovered periodic crystals and hence continuously extends Mendeleev's table to the full crystal universe.

Authors: Olga D Anosova, Vitaliy A Kurlin

This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements. The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points. The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations that break down under almost any noise. Experimental validation on all major materials databases confirmed the Crystal Isometry Principle: any real periodic crystal has a unique location in a common moduli space of all periodic structures under rigid motion. The resulting moduli space contains all known and not yet discovered periodic crystals and hence continuously extends Mendeleev's table to the full crystal universe.

Unavoidable patterns and plane paths in dense topological graphs

from arXiv: Computational Geometry

Authors: Balázs Keszegh, Andrew Suk, Gábor Tardos, Ji Zeng

Let $C_{s,t}$ be the complete bipartite geometric graph, with $s$ and $t$ vertices on two distinct parallel lines respectively, and all $s t$ straight-line edges drawn between them. In this paper, we show that every complete bipartite simple topological graph, with parts of size $2(k-1)^4 + 1$ and $2^{k^{5k}}$, contains a topological subgraph weakly isomorphic to $C_{k,k}$. As a corollary, every $n$-vertex simple topological graph not containing a plane path of length $k$ has at most $O_k(n^{2 - 8/k^4})$ edges. When $k = 3$, we obtain a stronger bound by showing that every $n$-vertex simple topological graph not containing a plane path of length 3 has at most $O(n^{4/3})$ edges. We also prove that $x$-monotone simple topological graphs not containing a plane path of length 3 have at most a linear number of edges.

Authors: Balázs Keszegh, Andrew Suk, Gábor Tardos, Ji Zeng

Let $C_{s,t}$ be the complete bipartite geometric graph, with $s$ and $t$ vertices on two distinct parallel lines respectively, and all $s t$ straight-line edges drawn between them. In this paper, we show that every complete bipartite simple topological graph, with parts of size $2(k-1)^4 + 1$ and $2^{k^{5k}}$, contains a topological subgraph weakly isomorphic to $C_{k,k}$. As a corollary, every $n$-vertex simple topological graph not containing a plane path of length $k$ has at most $O_k(n^{2 - 8/k^4})$ edges. When $k = 3$, we obtain a stronger bound by showing that every $n$-vertex simple topological graph not containing a plane path of length 3 has at most $O(n^{4/3})$ edges. We also prove that $x$-monotone simple topological graphs not containing a plane path of length 3 have at most a linear number of edges.

MAX BISECTION might be harder to approximate than MAX CUT

from arXiv: Data Structures and Algorithms

Authors: Joshua Brakensiek, Neng Huang, Aaron Potechin, Uri Zwick

The MAX BISECTION problem seeks a maximum-size cut that evenly divides the vertices of a given undirected graph. An open problem raised by Austrin, Benabbas, and Georgiou is whether MAX BISECTION can be approximated as well as MAX CUT, i.e., to within ${α_{GW}}\approx 0.8785672\ldots$, which is the approximation ratio achieved by the celebrated Goemans-Williamson algorithm for MAX CUT, which is best possible assuming the Unique Games Conjecture (UGC). They conjectured that the answer is yes. The current paradigm for obtaining approximation algorithms for MAX BISECTION, due to Raghavendra and Tan and Austrin, Benabbas, and Georgiou, follows a two-phase approach. First, a large number of rounds of the Sum-of-Squares (SoS) hierarchy is used to find a solution to the ``Basic SDP'' relaxation of MAX CUT which is $\varepsilon$-uncorrelated, for an arbitrarily small $\varepsilon > 0$. Second, standard SDP rounding techniques (such as ${\cal THRESH}$) are used to round this $\varepsilon$-uncorrelated solution, producing with high probability a cut that is almost balanced, i.e., a cut that has at most $\frac12+\varepsilon$ fraction of the vertices on each side. This cut is then converted into an exact bisection of the graph with only a small loss. In this paper, we show that this two-stage paradigm cannot be used to obtain an $α_{GW}$-approximation algorithm for MAX BISECTION if one relies only on the $\varepsilon$-uncorrelatedness property of the solution produced by the first phase. More precisely, for any $\varepsilon > 0$, we construct an explicit instance of MAX BISECTION for which the ratio between the value of the optimal integral solution and the value of some $\varepsilon$-uncorrelated solution of the Basic SDP relaxation is less than $0.87853 < {α_{GW}}$. Our instances are also integrality gaps for the Basic SDP relaxation of MAX BISECTION.

Authors: Joshua Brakensiek, Neng Huang, Aaron Potechin, Uri Zwick

The MAX BISECTION problem seeks a maximum-size cut that evenly divides the vertices of a given undirected graph. An open problem raised by Austrin, Benabbas, and Georgiou is whether MAX BISECTION can be approximated as well as MAX CUT, i.e., to within ${α_{GW}}\approx 0.8785672\ldots$, which is the approximation ratio achieved by the celebrated Goemans-Williamson algorithm for MAX CUT, which is best possible assuming the Unique Games Conjecture (UGC). They conjectured that the answer is yes. The current paradigm for obtaining approximation algorithms for MAX BISECTION, due to Raghavendra and Tan and Austrin, Benabbas, and Georgiou, follows a two-phase approach. First, a large number of rounds of the Sum-of-Squares (SoS) hierarchy is used to find a solution to the ``Basic SDP'' relaxation of MAX CUT which is $\varepsilon$-uncorrelated, for an arbitrarily small $\varepsilon > 0$. Second, standard SDP rounding techniques (such as ${\cal THRESH}$) are used to round this $\varepsilon$-uncorrelated solution, producing with high probability a cut that is almost balanced, i.e., a cut that has at most $\frac12+\varepsilon$ fraction of the vertices on each side. This cut is then converted into an exact bisection of the graph with only a small loss. In this paper, we show that this two-stage paradigm cannot be used to obtain an $α_{GW}$-approximation algorithm for MAX BISECTION if one relies only on the $\varepsilon$-uncorrelatedness property of the solution produced by the first phase. More precisely, for any $\varepsilon > 0$, we construct an explicit instance of MAX BISECTION for which the ratio between the value of the optimal integral solution and the value of some $\varepsilon$-uncorrelated solution of the Basic SDP relaxation is less than $0.87853 < {α_{GW}}$. Our instances are also integrality gaps for the Basic SDP relaxation of MAX BISECTION.

Optimizations and extensions for fair join pattern matching

from arXiv: Data Structures and Algorithms

Authors: Ioannis Karras

Join patterns are an underexplored approach for the programming of concurrent and distributed systems. When applied to the actor model, join patterns offer the novel capability of matching combinations of messages in the mailbox of an actor. Previous work by Philipp Haller et al. in the paper "Fair Join Pattern Matching for Actors" (ECOOP 2024) explored join patterns with conditional guards in an actor-based setting with a specification of fair and deterministic matching semantics. Nevertheless, the question of time efficiency in fair join pattern matching has remained underexplored. The stateful tree-based matching algorithm of Haller et al. performs worse than an implementation that adapts the Rete algorithm to the regular version of a join pattern matching benchmark, while outperforming on a variant with heavy conditional guards, which take longer to evaluate. Nevertheless, conforming Rete to the problem of join pattern matching requires heavy manual adaptation. In this thesis, we enhance and optimize the stateful tree-based matching algorithm of Haller et al. to achieve up to tenfold performance improvements on certain benchmarks, approaching the performance of Rete on regular benchmarks while maintaining the advantages of versatility and performance with heavy guards. We also enhance the benchmark suite, adding new features and enhancing its extensibility and user-friendliness. We extend the join pattern implementation with a less ambiguous syntax as well as dynamic pattern switching. Finally, we present a new complex model use case for join patterns, showing their applicability in a microservice web architecture.

Authors: Ioannis Karras

Join patterns are an underexplored approach for the programming of concurrent and distributed systems. When applied to the actor model, join patterns offer the novel capability of matching combinations of messages in the mailbox of an actor. Previous work by Philipp Haller et al. in the paper "Fair Join Pattern Matching for Actors" (ECOOP 2024) explored join patterns with conditional guards in an actor-based setting with a specification of fair and deterministic matching semantics. Nevertheless, the question of time efficiency in fair join pattern matching has remained underexplored. The stateful tree-based matching algorithm of Haller et al. performs worse than an implementation that adapts the Rete algorithm to the regular version of a join pattern matching benchmark, while outperforming on a variant with heavy conditional guards, which take longer to evaluate. Nevertheless, conforming Rete to the problem of join pattern matching requires heavy manual adaptation. In this thesis, we enhance and optimize the stateful tree-based matching algorithm of Haller et al. to achieve up to tenfold performance improvements on certain benchmarks, approaching the performance of Rete on regular benchmarks while maintaining the advantages of versatility and performance with heavy guards. We also enhance the benchmark suite, adding new features and enhancing its extensibility and user-friendliness. We extend the join pattern implementation with a less ambiguous syntax as well as dynamic pattern switching. Finally, we present a new complex model use case for join patterns, showing their applicability in a microservice web architecture.

On Tight FPT Time Approximation Algorithms for k-Clustering Problems

from arXiv: Data Structures and Algorithms

Authors: Han Dai, Shi Li, Sijin Peng

Following recent advances in combining approximation algorithms with fixed-parameter tractability (FPT), we study FPT-time approximation algorithms for minimum-norm $k$-clustering problems, parameterized by the number $k$ of open facilities. For the capacitated setting, we give a tight $(3+ε)$-approximation for the general-norm capacitated $k$-clustering problem in FPT-time parameterized by $k$ and $ε$. Prior to our work, such a result was only known for the capacitated $k$-median problem [CL, ICALP, 2019]. As a special case, our result yields an FPT-time $3$-approximation for capacitated $k$-center. The problem has not been studied in the FPT-time setting, with the previous best known polynomial-time approximation ratio being 9 [ABCG, MP, 2015]. In the uncapacitated setting, we consider the $top$-$cn$ norm $k$-clustering problem, where the goal of the problem is to minimize the $top$-$cn$ norm of the connection distance vector. Our main result is a tight $\big(1 + \frac 2{ec} + ε\big)$-approximation algorithm for the problem with $c \in \big(\frac1e, 1\big]$. (For the case $c \leq \frac1e$, there is a simple tight $(3+ε)$-approximation.) Our framework can be easily extended to give a tight $\left(3, 1+\frac2e + ε\right)$-bicriteria approximation for the ($k$-center, $k$-median) problem in FPT time, improving the previous best polynomial-time $(4, 8)$ guarantee [AB, WAOA, 2017]. All results are based on a unified framework: computing a $(1+ε)$-approximate solution using $O\left(\frac{k\log n}ε\right)$ facilities $S$ via LP rounding, sampling a few client representatives $R$ based on the solution $S$, guessing a few pivots from $S \cup R$ and some radius information on the pivots, and solving the problem using the guesses. We believe this framework can lead to further results on $k$-clustering problems.

Authors: Han Dai, Shi Li, Sijin Peng

Following recent advances in combining approximation algorithms with fixed-parameter tractability (FPT), we study FPT-time approximation algorithms for minimum-norm $k$-clustering problems, parameterized by the number $k$ of open facilities. For the capacitated setting, we give a tight $(3+ε)$-approximation for the general-norm capacitated $k$-clustering problem in FPT-time parameterized by $k$ and $ε$. Prior to our work, such a result was only known for the capacitated $k$-median problem [CL, ICALP, 2019]. As a special case, our result yields an FPT-time $3$-approximation for capacitated $k$-center. The problem has not been studied in the FPT-time setting, with the previous best known polynomial-time approximation ratio being 9 [ABCG, MP, 2015]. In the uncapacitated setting, we consider the $top$-$cn$ norm $k$-clustering problem, where the goal of the problem is to minimize the $top$-$cn$ norm of the connection distance vector. Our main result is a tight $\big(1 + \frac 2{ec} + ε\big)$-approximation algorithm for the problem with $c \in \big(\frac1e, 1\big]$. (For the case $c \leq \frac1e$, there is a simple tight $(3+ε)$-approximation.) Our framework can be easily extended to give a tight $\left(3, 1+\frac2e + ε\right)$-bicriteria approximation for the ($k$-center, $k$-median) problem in FPT time, improving the previous best polynomial-time $(4, 8)$ guarantee [AB, WAOA, 2017]. All results are based on a unified framework: computing a $(1+ε)$-approximate solution using $O\left(\frac{k\log n}ε\right)$ facilities $S$ via LP rounding, sampling a few client representatives $R$ based on the solution $S$, guessing a few pivots from $S \cup R$ and some radius information on the pivots, and solving the problem using the guesses. We believe this framework can lead to further results on $k$-clustering problems.

A customizable inexact subgraph matching algorithm for attributed graphs

from arXiv: Data Structures and Algorithms

Authors: Tatyana Benko, Rebecca Jones, Lucas Tate

Graphs provide a natural way to represent data by encoding information about objects and the relationships between them. With the ever-increasing amount of data collected and generated, locating specific patterns of relationships between objects in a graph is often required. Given a larger graph and a smaller graph, one may wish to identify instances of the smaller query graph in the larger target graph. This task is called subgraph identification or matching. Subgraph matching is helpful in areas such as bioinformatics, binary analysis, pattern recognition, and computer vision. In these applications, datasets frequently contain noise and errors, thus exact subgraph matching algorithms do not apply. In this paper we introduce a new customizable algorithm for inexact subgraph matching. Our algorithm utilizes node and edge attributes which are often present in real-world datasets to narrow down the search space. The algorithm is flexible in the type of subgraph matching it can perform and the types of datasets it can process by its use of a modifiable graph edit distance cost function for pairing nodes. We show its effectiveness on family trees graphs and control-flow graphs.

Authors: Tatyana Benko, Rebecca Jones, Lucas Tate

Graphs provide a natural way to represent data by encoding information about objects and the relationships between them. With the ever-increasing amount of data collected and generated, locating specific patterns of relationships between objects in a graph is often required. Given a larger graph and a smaller graph, one may wish to identify instances of the smaller query graph in the larger target graph. This task is called subgraph identification or matching. Subgraph matching is helpful in areas such as bioinformatics, binary analysis, pattern recognition, and computer vision. In these applications, datasets frequently contain noise and errors, thus exact subgraph matching algorithms do not apply. In this paper we introduce a new customizable algorithm for inexact subgraph matching. Our algorithm utilizes node and edge attributes which are often present in real-world datasets to narrow down the search space. The algorithm is flexible in the type of subgraph matching it can perform and the types of datasets it can process by its use of a modifiable graph edit distance cost function for pairing nodes. We show its effectiveness on family trees graphs and control-flow graphs.

Improved Time-Space Tradeoffs for 3SUM-Indexing

from arXiv: Data Structures and Algorithms

Authors: Itai Dinur, Alexander Golovnev

3SUM-Indexing is a preprocessing variant of the 3SUM problem that has recently received a lot of attention. The best known time-space tradeoff for the problem is $T S^3 = n^{6}$ (up to logarithmic factors), where $n$ is the number of input integers, $S$ is the length of the preprocessed data structure, and $T$ is the running time of the query algorithm. This tradeoff was achieved in [KP19, GGHPV20] using the Fiat-Naor generic algorithm for Function Inversion. Consequently, [GGHPV20] asked whether this algorithm can be improved by leveraging the structure of 3SUM-Indexing. In this paper, we exploit the structure of 3SUM-Indexing to give a time-space tradeoff of $T S = n^{2.5}$, which is better than the best known one in the range $n^{3/2} \ll S \ll n^{7/4}$. We further extend this improvement to the $k$SUM-Indexing problem-a generalization of 3SUM-Indexing-and to the related $k$XOR-Indexing problem, where addition is replaced with XOR. Additionally, we improve the best known time-space tradeoffs for the Gapped String Indexing and Jumbled Indexing problems, which are well-known data structure problems related to 3SUM-Indexing. Our improvement comes from an alternative way to apply the Fiat-Naor algorithm to 3SUM-Indexing. Specifically, we exploit the structure of the function to be inverted by decomposing it into "sub-functions" with certain properties. This allows us to apply an improvement to the Fiat-Naor algorithm (which is not directly applicable to 3SUM-Indexing), obtained in [GGPS23] in a much larger range of parameters. We believe that our techniques may be useful in additional application-dependent optimizations of the Fiat-Naor algorithm.

Authors: Itai Dinur, Alexander Golovnev

3SUM-Indexing is a preprocessing variant of the 3SUM problem that has recently received a lot of attention. The best known time-space tradeoff for the problem is $T S^3 = n^{6}$ (up to logarithmic factors), where $n$ is the number of input integers, $S$ is the length of the preprocessed data structure, and $T$ is the running time of the query algorithm. This tradeoff was achieved in [KP19, GGHPV20] using the Fiat-Naor generic algorithm for Function Inversion. Consequently, [GGHPV20] asked whether this algorithm can be improved by leveraging the structure of 3SUM-Indexing. In this paper, we exploit the structure of 3SUM-Indexing to give a time-space tradeoff of $T S = n^{2.5}$, which is better than the best known one in the range $n^{3/2} \ll S \ll n^{7/4}$. We further extend this improvement to the $k$SUM-Indexing problem-a generalization of 3SUM-Indexing-and to the related $k$XOR-Indexing problem, where addition is replaced with XOR. Additionally, we improve the best known time-space tradeoffs for the Gapped String Indexing and Jumbled Indexing problems, which are well-known data structure problems related to 3SUM-Indexing. Our improvement comes from an alternative way to apply the Fiat-Naor algorithm to 3SUM-Indexing. Specifically, we exploit the structure of the function to be inverted by decomposing it into "sub-functions" with certain properties. This allows us to apply an improvement to the Fiat-Naor algorithm (which is not directly applicable to 3SUM-Indexing), obtained in [GGPS23] in a much larger range of parameters. We believe that our techniques may be useful in additional application-dependent optimizations of the Fiat-Naor algorithm.

Thursday, December 04

Postdoc at University of Bordeaux (apply by January 23, 2026)

from CCI: jobs

We are offering several 2-year postdoc positions in the Quantum Information and Computation group at the CS department (LaBRI) of the University of Bordeaux, France. We are looking for candidates with a strong interest in research on quantum information, quantum algorithms, quantum simulations, and complexity theory. Please refer to the attached link for details on […]

We are offering several 2-year postdoc positions in the Quantum Information and Computation group at the CS department (LaBRI) of the University of Bordeaux, France.

We are looking for candidates with a strong interest in research on quantum information, quantum algorithms, quantum simulations, and complexity theory. Please refer to the attached link for details on the application process.

Website: https://quantique.labri.fr/positions/
Email: yassine.hamoudi@labri.fr

By shacharlovett

Finding Papers Before the Web

from Computational Complexity

Inspired by Daniel Litt's X Post

Started asking mathematicians whose career started before the internet if they think Google, email, etc. have sped up the pace of math research. Wide variety of opinions but the broad consensus seems to be “yes,” among those I’ve spoken to.

— Daniel Litt (@littmath) October 30, 2025

and Bill's recent post on finding papers on the web I would tell the story of the before times.

In the 1980s if you wanted to read a paper, you either had to find it in a journal or conference proceedings or have it mailed to you. You could reach out to an author or SIGACT News would publish a list of tech reports from various universities. Departments would keep a master copy of each paper. You would send a stamped self-addressed envelope to the department which would copy the paper, put on a tech-report cover and send it back to you.

If you had a particularly exciting result, you would share it by physically mailing it out to your colleagues. I found out about the latest circuit results from Håstad and Razborov, as they sent papers to my advisor Michael Sipser, often hand-written and in Razborov's case in Russian. Neil Immerman sent a copy of his nondeterministic space closed under complement paper to Sipser but he was away for the summer. I found out about the result from a Berkeley talk announcement. 

Email wasn't a common method of communication until the mid-80's and it wasn't until a few years after that that people figured out how to send papers by putting the latex or postscript text directly in the email. This was before attachments and PDFs. Old mail systems put a ">" before From so it wouldn't be confused as a header and LaTeX rendered ">From" as "¿From" which you'd often see in conference papers from around that time.

In my first year as an assistant professor in 1989-90, there was a flurry of emailed papers marking (and causing) the quick progress we had in interactive proofs, described so well by László Babai's E-mail and the Unexpected Power of Interaction. Babai had a warning about researchers disadvantaged because they weren't receiving these emails.

I got tired of emailing papers so as soon as the web became a thing in 1993, I put all my papers online and have maintained it since. Now with sites like arXiv and ECCC, everyone has access to the latest and greatest in complexity.

Now how long before the next generation asks how we discovered papers before we had chatbots to find them for us?

By Lance Fortnow

Inspired by Daniel Litt's X Post

and Bill's recent post on finding papers on the web I would tell the story of the before times.

In the 1980s if you wanted to read a paper, you either had to find it in a journal or conference proceedings or have it mailed to you. You could reach out to an author or SIGACT News would publish a list of tech reports from various universities. Departments would keep a master copy of each paper. You would send a stamped self-addressed envelope to the department which would copy the paper, put on a tech-report cover and send it back to you.

If you had a particularly exciting result, you would share it by physically mailing it out to your colleagues. I found out about the latest circuit results from Håstad and Razborov, as they sent papers to my advisor Michael Sipser, often hand-written and in Razborov's case in Russian. Neil Immerman sent a copy of his nondeterministic space closed under complement paper to Sipser but he was away for the summer. I found out about the result from a Berkeley talk announcement

Email wasn't a common method of communication until the mid-80's and it wasn't until a few years after that that people figured out how to send papers by putting the latex or postscript text directly in the email. This was before attachments and PDFs. Old mail systems put a ">" before From so it wouldn't be confused as a header and LaTeX rendered ">From" as "¿From" which you'd often see in conference papers from around that time.

In my first year as an assistant professor in 1989-90, there was a flurry of emailed papers marking (and causing) the quick progress we had in interactive proofs, described so well by László Babai's E-mail and the Unexpected Power of Interaction. Babai had a warning about researchers disadvantaged because they weren't receiving these emails.

I got tired of emailing papers so as soon as the web became a thing in 1993, I put all my papers online and have maintained it since. Now with sites like arXiv and ECCC, everyone has access to the latest and greatest in complexity.

Now how long before the next generation asks how we discovered papers before we had chatbots to find them for us?

By Lance Fortnow

TR25-201 | Interactive proof systems for FARNESS | Oded Goldreich, Tal Herman, Guy Rothblum

from ECCC Papers

We consider interactive proofs for the promise problem, called $\epsilon$-FARNESS, in which the yes-instances are pairs of distributions over $[n]$ that are $\epsilon$-far from one another, and the no-instances are pairs of identical distributions. For any $t\leq n^{2/3}$, we obtain an interactive proof in which the verifier has sample complexity $O(t/\epsilon^2)$ and the (honest) prover has sample complexity $\poly(1/\epsilon)\cdot(n/{\sqrt t})$. For $t=n^{2/3}$ this result is the best possible, because (as proved by Batu and Canonne (FOCS 2017)) the corresponding decision procedure has sample complexity $\Omega(n^{2/3})$. We also obtain interactive proofs for the promise problem in which the yes-instances are distributions over $[n]$ that are $\epsilon$-far from the uniform distribution, and the no-instance is the uniform distribution. For any $t\leq{\sqrt n}$, we obtain an interactive proof in which the verifier has sample complexity $O(t/\epsilon^2)$ and the (honest) prover has sample complexity $\poly(1/\epsilon)\cdot\tildeO(n/t)$. This stand in contrast to the fact (proved by Chiesa and Gur (ITCS 2018)) that the verifier in any interactive proof for the complement promise problem must have sample complexity $\Omega({\sqrt n})$.

We consider interactive proofs for the promise problem, called $\epsilon$-FARNESS, in which the yes-instances are pairs of distributions over $[n]$ that are $\epsilon$-far from one another, and the no-instances are pairs of identical distributions. For any $t\leq n^{2/3}$, we obtain an interactive proof in which the verifier has sample complexity $O(t/\epsilon^2)$ and the (honest) prover has sample complexity $\poly(1/\epsilon)\cdot(n/{\sqrt t})$. For $t=n^{2/3}$ this result is the best possible, because (as proved by Batu and Canonne (FOCS 2017)) the corresponding decision procedure has sample complexity $\Omega(n^{2/3})$. We also obtain interactive proofs for the promise problem in which the yes-instances are distributions over $[n]$ that are $\epsilon$-far from the uniform distribution, and the no-instance is the uniform distribution. For any $t\leq{\sqrt n}$, we obtain an interactive proof in which the verifier has sample complexity $O(t/\epsilon^2)$ and the (honest) prover has sample complexity $\poly(1/\epsilon)\cdot\tildeO(n/t)$. This stand in contrast to the fact (proved by Chiesa and Gur (ITCS 2018)) that the verifier in any interactive proof for the complement promise problem must have sample complexity $\Omega({\sqrt n})$.

TR25-200 | On doubly-sublinear interactive proofs for distributions | Oded Goldreich, Guy Rothblum

from ECCC Papers

Interactive proofs of proximity for distributions, introduced by Chiesa and Gur (ITCS18) and extensively studied recently by Herman and Rothblum (STOC22, FOCS23, FOCS24}, offer a way of verifying properties of distributions using less samples than required to test these properties. We say that such an interactive proof system is {\sf doubly-sublinear} if the verifier's sample complexity is lower than the sample complexity of testing the property, and the honest-prover's sample complexity is lower than the sample complexity of learning the property. We prove a feasibility result for this notion. Specifically, we present properties of distributions for which the prover's sample complexity is close to the complexity of testing, whereas the sample complexity of the verifier is much lower.

Interactive proofs of proximity for distributions, introduced by Chiesa and Gur (ITCS18) and extensively studied recently by Herman and Rothblum (STOC22, FOCS23, FOCS24}, offer a way of verifying properties of distributions using less samples than required to test these properties. We say that such an interactive proof system is {\sf doubly-sublinear} if the verifier's sample complexity is lower than the sample complexity of testing the property, and the honest-prover's sample complexity is lower than the sample complexity of learning the property. We prove a feasibility result for this notion. Specifically, we present properties of distributions for which the prover's sample complexity is close to the complexity of testing, whereas the sample complexity of the verifier is much lower.

TR25-199 | Verification of Statistical Properties: Redefining the Possible | Clement Canonne, Sam Polgar, Aditya Singh, Aravind Thyagarajan, Qiping Yang

from ECCC Papers

We revisit the setting of Interactive Proof Systems for Distribution Testing, introduced by Chiesa and Gur (2018), showing that a simple twist on the task requirements may lead to dramatic improvements, allowing verifiers with constant sample complexity. We define and investigate the multi-prover and zero-knowledge versions of these interactive proof systems, using as flagship example the task of farness verification — the "dual" version of closeness testing. We hope our results will inspire others to investigate and analyze the power and limitations of multiple provers for distribution verification.

We revisit the setting of Interactive Proof Systems for Distribution Testing, introduced by Chiesa and Gur (2018), showing that a simple twist on the task requirements may lead to dramatic improvements, allowing verifiers with constant sample complexity. We define and investigate the multi-prover and zero-knowledge versions of these interactive proof systems, using as flagship example the task of farness verification — the "dual" version of closeness testing. We hope our results will inspire others to investigate and analyze the power and limitations of multiple provers for distribution verification.

TR25-198 | Interactive Proofs For Distribution Testing With Conditional Oracles | Ari Biswas, Mark Bun, Clément Canonne, Satchit Sivakumar

from ECCC Papers

We revisit the framework of interactive proofs for distribution testing, first introduced by Chiesa and Gur (ITCS 2018), which has recently experienced a surge in interest, accompanied by notable progress (e.g., Herman and Rothblum, STOC 2022, FOCS 2023; Herman, RANDOM~2024). In this model, a data-poor verifier determines whether a probability distribution has a property of interest by interacting with an all-powerful, data-rich but untrusted prover bent on convincing them that it has the property. While prior work gave sample-, time-, and communication-efficient protocols for testing and estimating a range of distribution properties, they all suffer from an inherent issue: for most interesting properties of distributions over a domain of size $N$, the verifier must draw at least $\Omega(\sqrt{N})$ samples of its own. While sublinear in $N$, this is still prohibitive for large domains encountered in practice. In this work, we circumvent this limitation by augmenting the verifier with the ability to perform an exponentially smaller number of more powerful (but reasonable) \emph{pairwise conditional} queries, effectively enabling them to perform ``local comparison checks'' of the prover's claims. We systematically investigate the landscape of interactive proofs in this new setting, giving polylogarithmic query and sample protocols for (tolerantly) testing all \emph{label-invariant} properties, thus demonstrating exponential savings without compromising on communication, for this large and fundamental class of testing tasks.

We revisit the framework of interactive proofs for distribution testing, first introduced by Chiesa and Gur (ITCS 2018), which has recently experienced a surge in interest, accompanied by notable progress (e.g., Herman and Rothblum, STOC 2022, FOCS 2023; Herman, RANDOM~2024). In this model, a data-poor verifier determines whether a probability distribution has a property of interest by interacting with an all-powerful, data-rich but untrusted prover bent on convincing them that it has the property. While prior work gave sample-, time-, and communication-efficient protocols for testing and estimating a range of distribution properties, they all suffer from an inherent issue: for most interesting properties of distributions over a domain of size $N$, the verifier must draw at least $\Omega(\sqrt{N})$ samples of its own. While sublinear in $N$, this is still prohibitive for large domains encountered in practice. In this work, we circumvent this limitation by augmenting the verifier with the ability to perform an exponentially smaller number of more powerful (but reasonable) \emph{pairwise conditional} queries, effectively enabling them to perform ``local comparison checks'' of the prover's claims. We systematically investigate the landscape of interactive proofs in this new setting, giving polylogarithmic query and sample protocols for (tolerantly) testing all \emph{label-invariant} properties, thus demonstrating exponential savings without compromising on communication, for this large and fundamental class of testing tasks.

TR25-197 | Optimal White-Box Adversarial Streaming Lower Bounds for Approximating LIS Length | Anna Gal, Gillat Kol, Raghuvansh Saxena, Huacheng Yu

from ECCC Papers

The space complexity of deterministic streaming algorithms for approximating the length of the longest increasing subsequence (LIS) in a string of length $n$ has been known to be $\tilde{\Theta}(\sqrt{n})$ for almost two decades. In contrast, the space complexity of this problem for randomized streaming algorithms remains one of the few longstanding open problems in one-pass streaming. In fact, no better than $\Omega(\log n)$ lower bounds are known, and the best upper bounds are no better than their deterministic counterparts. In this paper, we push the limits of our understanding of the streaming space complexity of the approximate LIS length problem by studying it in the white-box adversarial streaming model. This model is an intermediate model between deterministic and randomized streaming algorithms that has recently attracted attention. In the white-box model, the streaming algorithm can draw fresh randomness when processing each incoming element, but an adversary generating the stream observes all previously used randomness and adaptively chooses the subsequent elements of the stream. We prove a tight (up to logarithmic factors) $\Omega(\sqrt{n})$ space lower bound for any white-box streaming algorithm that approximates the length of the LIS of a stream of length $n$ to within a factor better than $1.1$. Thus, for this problem, white-box algorithms offer no improvement over deterministic ones.

The space complexity of deterministic streaming algorithms for approximating the length of the longest increasing subsequence (LIS) in a string of length $n$ has been known to be $\tilde{\Theta}(\sqrt{n})$ for almost two decades. In contrast, the space complexity of this problem for randomized streaming algorithms remains one of the few longstanding open problems in one-pass streaming. In fact, no better than $\Omega(\log n)$ lower bounds are known, and the best upper bounds are no better than their deterministic counterparts. In this paper, we push the limits of our understanding of the streaming space complexity of the approximate LIS length problem by studying it in the white-box adversarial streaming model. This model is an intermediate model between deterministic and randomized streaming algorithms that has recently attracted attention. In the white-box model, the streaming algorithm can draw fresh randomness when processing each incoming element, but an adversary generating the stream observes all previously used randomness and adaptively chooses the subsequent elements of the stream. We prove a tight (up to logarithmic factors) $\Omega(\sqrt{n})$ space lower bound for any white-box streaming algorithm that approximates the length of the LIS of a stream of length $n$ to within a factor better than $1.1$. Thus, for this problem, white-box algorithms offer no improvement over deterministic ones.

New Perspectives on Semiring Applications to Dynamic Programming

from arXiv: Computational Complexity

Authors: Ambroise Baril, Miguel Couceiro, Victor Lagerkvist

Semiring algebras have been shown to provide a suitable language to formalize many noteworthy combinatorial problems. For instance, the Shortest-Path problem can be seen as a special case of the Algebraic-Path problem when applied to the tropical semiring. The application of semirings typically makes it possible to solve extended problems without increasing the computational complexity. In this article we further exploit the idea of using semiring algebras to address and tackle several extensions of classical computational problems by dynamic programming. We consider a general approach which allows us to define a semiring extension of any problem with a reasonable notion of a certificate (e.g., an NP problem). This allows us to consider cost variants of these combinatorial problems, as well as their counting extensions where the goal is to determine how many solutions a given problem admits. The approach makes no particular assumptions (such as idempotence) on the semiring structure. We also propose a new associative algebraic operation on semirings, called $Δ$-product, which enables our dynamic programming algorithms to count the number of solutions of minimal costs. We illustrate the advantages of our framework on two well-known but computationally very different NP-hard problems, namely, Connected-Dominating-Set problems and finite-domain Constraint Satisfaction Problems (CSPs). In particular, we prove fixed parameter tractability (FPT) with respect to clique-width and tree-width of the input. This also allows us to count solutions of minimal cost, which is an overlooked problem in the literature.

Authors: Ambroise Baril, Miguel Couceiro, Victor Lagerkvist

Semiring algebras have been shown to provide a suitable language to formalize many noteworthy combinatorial problems. For instance, the Shortest-Path problem can be seen as a special case of the Algebraic-Path problem when applied to the tropical semiring. The application of semirings typically makes it possible to solve extended problems without increasing the computational complexity. In this article we further exploit the idea of using semiring algebras to address and tackle several extensions of classical computational problems by dynamic programming. We consider a general approach which allows us to define a semiring extension of any problem with a reasonable notion of a certificate (e.g., an NP problem). This allows us to consider cost variants of these combinatorial problems, as well as their counting extensions where the goal is to determine how many solutions a given problem admits. The approach makes no particular assumptions (such as idempotence) on the semiring structure. We also propose a new associative algebraic operation on semirings, called $Δ$-product, which enables our dynamic programming algorithms to count the number of solutions of minimal costs. We illustrate the advantages of our framework on two well-known but computationally very different NP-hard problems, namely, Connected-Dominating-Set problems and finite-domain Constraint Satisfaction Problems (CSPs). In particular, we prove fixed parameter tractability (FPT) with respect to clique-width and tree-width of the input. This also allows us to count solutions of minimal cost, which is an overlooked problem in the literature.

Computing Equilibrium Points of Electrostatic Potentials

from arXiv: Computational Complexity

Authors: Abheek Ghosh, Paul W. Goldberg, Alexandros Hollender

We study the computation of equilibrium points of electrostatic potentials: locations in space where the electrostatic force arising from a collection of charged particles vanishes. This is a novel scenario of optimization in which solutions are guaranteed to exist due to a nonconstructive argument, but gradient descent is unreliable due to the presence of singularities. We present an algorithm based on piecewise approximation of the potential function by Taylor series. The main insight is to divide the domain into a grid with variable coarseness, where grid cells are exponentially smaller in regions where the function changes rapidly compared to regions where it changes slowly. Our algorithm finds approximate equilibrium points in time poly-logarithmic in the approximation parameter, but these points are not guaranteed to be close to exact solutions. Nevertheless, we show that such points can be computed efficiently under a mild assumption that we call "strong non-degeneracy". We complement these algorithmic results by studying a generalization of this problem and showing that it is CLS-hard and in PPAD, leaving its precise classification as an intriguing open problem.

Authors: Abheek Ghosh, Paul W. Goldberg, Alexandros Hollender

We study the computation of equilibrium points of electrostatic potentials: locations in space where the electrostatic force arising from a collection of charged particles vanishes. This is a novel scenario of optimization in which solutions are guaranteed to exist due to a nonconstructive argument, but gradient descent is unreliable due to the presence of singularities. We present an algorithm based on piecewise approximation of the potential function by Taylor series. The main insight is to divide the domain into a grid with variable coarseness, where grid cells are exponentially smaller in regions where the function changes rapidly compared to regions where it changes slowly. Our algorithm finds approximate equilibrium points in time poly-logarithmic in the approximation parameter, but these points are not guaranteed to be close to exact solutions. Nevertheless, we show that such points can be computed efficiently under a mild assumption that we call "strong non-degeneracy". We complement these algorithmic results by studying a generalization of this problem and showing that it is CLS-hard and in PPAD, leaving its precise classification as an intriguing open problem.

Permanental rank versus determinantal rank of random matrices over finite fields

from arXiv: Computational Complexity

Authors: Fatemeh Ghasemi, Gal Gross, Swastik Kopparty

This paper is motivated by basic complexity and probability questions about permanents of random matrices over finite fields, and in particular, about properties separating the permanent and the determinant. Fix $q = p^m$ some power of an odd prime, and let $k \leq n$ both be growing. For a uniformly random $n \times k$ matrix $A$ over $\mathbb{F}_q$, we study the probability that all $k \times k$ submatrices of $A$ have zero permanent; namely that $A$ does not have full "permanental rank". When $k = n$, this is simply the probability that a random square matrix over $\mathbb{F}_q$ has zero permanent, which we do not understand. We believe that the probability in this case is $\frac{1}{q} + o(1)$, which would be in contrast to the case of the determinant, where the answer is $\frac{1}{q} + Ω_q(1)$. Our main result is that when $k$ is $O(\sqrt{n})$, the probability that a random $n \times k$ matrix does not have full permanental rank is essentially the same as the probability that the matrix has a $0$ column, namely $(1 +o(1)) \frac{k}{q^n}$. In contrast, for determinantal (standard) rank the analogous probability is $Θ(\frac{q^k}{q^n})$. At the core of our result are some basic linear algebraic properties of the permanent that distinguish it from the determinant.

Authors: Fatemeh Ghasemi, Gal Gross, Swastik Kopparty

This paper is motivated by basic complexity and probability questions about permanents of random matrices over finite fields, and in particular, about properties separating the permanent and the determinant. Fix $q = p^m$ some power of an odd prime, and let $k \leq n$ both be growing. For a uniformly random $n \times k$ matrix $A$ over $\mathbb{F}_q$, we study the probability that all $k \times k$ submatrices of $A$ have zero permanent; namely that $A$ does not have full "permanental rank". When $k = n$, this is simply the probability that a random square matrix over $\mathbb{F}_q$ has zero permanent, which we do not understand. We believe that the probability in this case is $\frac{1}{q} + o(1)$, which would be in contrast to the case of the determinant, where the answer is $\frac{1}{q} + Ω_q(1)$. Our main result is that when $k$ is $O(\sqrt{n})$, the probability that a random $n \times k$ matrix does not have full permanental rank is essentially the same as the probability that the matrix has a $0$ column, namely $(1 +o(1)) \frac{k}{q^n}$. In contrast, for determinantal (standard) rank the analogous probability is $Θ(\frac{q^k}{q^n})$. At the core of our result are some basic linear algebraic properties of the permanent that distinguish it from the determinant.

Well-quasi-orders on embedded planar graphs

from arXiv: Computational Geometry

Authors: Corentin Lunel, Clément Maria

The central theorem of topological graph theory states that the graph minor relation is a well-quasi-order on graphs. It has far-reaching consequences, in particular in the study of graph structures and the design of (parameterized) algorithms. In this article, we study two embedded versions of classical minor relations from structural graph theory and prove that they are also well-quasi-orders on general or restricted classes of embedded planar graphs. These embedded minor relations appear naturally for intrinsically embedded objects, such as knot diagrams and surfaces in $\mathbb{R}^3$. Handling the extra topological constraints of the embeddings requires careful analysis and extensions of classical methods for the more constrained embedded minor relations. We prove that the embedded version of immersion induces a well-quasi-order on bounded carving-width plane graphs by exhibiting particularly well-structured tree-decompositions and leveraging a classical argument on well-quasi-orders on forests. We deduce that the embedded graph minor relation defines a well-quasi-order on plane graphs via their directed medial graphs, when their branch-width is bounded. We conclude that the embedded graph minor relation is a well-quasi-order on all plane graphs, using classical grids theorems in the unbounded branch-width case.

Authors: Corentin Lunel, Clément Maria

The central theorem of topological graph theory states that the graph minor relation is a well-quasi-order on graphs. It has far-reaching consequences, in particular in the study of graph structures and the design of (parameterized) algorithms. In this article, we study two embedded versions of classical minor relations from structural graph theory and prove that they are also well-quasi-orders on general or restricted classes of embedded planar graphs. These embedded minor relations appear naturally for intrinsically embedded objects, such as knot diagrams and surfaces in $\mathbb{R}^3$. Handling the extra topological constraints of the embeddings requires careful analysis and extensions of classical methods for the more constrained embedded minor relations. We prove that the embedded version of immersion induces a well-quasi-order on bounded carving-width plane graphs by exhibiting particularly well-structured tree-decompositions and leveraging a classical argument on well-quasi-orders on forests. We deduce that the embedded graph minor relation defines a well-quasi-order on plane graphs via their directed medial graphs, when their branch-width is bounded. We conclude that the embedded graph minor relation is a well-quasi-order on all plane graphs, using classical grids theorems in the unbounded branch-width case.

Robust Algorithms for Path and Cycle Problems in Geometric Intersection Graphs

from arXiv: Data Structures and Algorithms

Authors: Malory Marin, Jean-Florent Raymond, Rémi Watrigant

We study the design of robust subexponential algorithms for classical connectivity problems on intersection graphs of similarly sized fat objects in $\mathbb{R}^d$. In this setting, each vertex corresponds to a geometric object, and two vertices are adjacent if and only if their objects intersect. We introduce a new tool for designing such algorithms, which we call a $λ$-linked partition. This is a partition of the vertex set into groups of highly connected vertices. Crucially, such a partition can be computed in polynomial time and does not require access to the geometric representation of the graph. We apply this framework to problems related to paths and cycles in graphs. First, we obtain the first robust ETH-tight algorithms for Hamiltonian Path and Hamiltonian Cycle, running in time $2^{O(n^{1-1/d})}$ on intersection graphs of similarly sized fat objects in $\mathbb{R}^d$. This resolves an open problem of de Berg et al. [STOC 2018] and completes the study of these problems on geometric intersection graphs from the viewpoint of ETH-tight exact algorithms. We further extend our approach to the parameterized setting and design the first robust subexponential parameterized algorithm for Long Path in any fixed dimension $d$. More precisely, we obtain a randomized robust algorithm running in time $2^{O(k^{1-1/d}\log^2 k)}\, n^{O(1)}$ on intersection graphs of similarly sized fat objects in $\mathbb{R}^d$, where $k$ is the natural parameter. Besides $λ$-linked partitions, our algorithm also relies on a low-treewidth pattern covering theorem that we establish for geometric intersection graphs, which may be viewed as a refinement of a result of Marx-Pilipczuk [ESA 2017]. This structural result may be of independent interest.

Authors: Malory Marin, Jean-Florent Raymond, Rémi Watrigant

We study the design of robust subexponential algorithms for classical connectivity problems on intersection graphs of similarly sized fat objects in $\mathbb{R}^d$. In this setting, each vertex corresponds to a geometric object, and two vertices are adjacent if and only if their objects intersect. We introduce a new tool for designing such algorithms, which we call a $λ$-linked partition. This is a partition of the vertex set into groups of highly connected vertices. Crucially, such a partition can be computed in polynomial time and does not require access to the geometric representation of the graph. We apply this framework to problems related to paths and cycles in graphs. First, we obtain the first robust ETH-tight algorithms for Hamiltonian Path and Hamiltonian Cycle, running in time $2^{O(n^{1-1/d})}$ on intersection graphs of similarly sized fat objects in $\mathbb{R}^d$. This resolves an open problem of de Berg et al. [STOC 2018] and completes the study of these problems on geometric intersection graphs from the viewpoint of ETH-tight exact algorithms. We further extend our approach to the parameterized setting and design the first robust subexponential parameterized algorithm for Long Path in any fixed dimension $d$. More precisely, we obtain a randomized robust algorithm running in time $2^{O(k^{1-1/d}\log^2 k)}\, n^{O(1)}$ on intersection graphs of similarly sized fat objects in $\mathbb{R}^d$, where $k$ is the natural parameter. Besides $λ$-linked partitions, our algorithm also relies on a low-treewidth pattern covering theorem that we establish for geometric intersection graphs, which may be viewed as a refinement of a result of Marx-Pilipczuk [ESA 2017]. This structural result may be of independent interest.

Aggregating maximal cliques in real-world graphs

from arXiv: Data Structures and Algorithms

Authors: Noga Alon, Sabyasachi Basu, Shweta Jain, Haim Kaplan, Jakub Łącki, Blair D. Sullivan

Maximal clique enumeration is a fundamental graph mining task, but its utility is often limited by computational intractability and highly redundant output. To address these challenges, we introduce \emph{$ρ$-dense aggregators}, a novel approach that succinctly captures maximal clique structure. Instead of listing all cliques, we identify a small collection of clusters with edge density at least $ρ$ that collectively contain every maximal clique. In contrast to maximal clique enumeration, we prove that for all $ρ< 1$, every graph admits a $ρ$-dense aggregator of \emph{sub-exponential} size, $n^{O(\log_{1/ρ}n)}$, and provide an algorithm achieving this bound. For graphs with bounded degeneracy, a typical characteristic of real-world networks, our algorithm runs in near-linear time and produces near-linear size aggregators. We also establish a matching lower bound on aggregator size, proving our results are essentially tight. In an empirical evaluation on real-world networks, we demonstrate significant practical benefits for the use of aggregators: our algorithm is consistently faster than the state-of-the-art clique enumeration algorithm, with median speedups over $6\times$ for $ρ=0.1$ (and over $300\times$ in an extreme case), while delivering a much more concise structural summary.

Authors: Noga Alon, Sabyasachi Basu, Shweta Jain, Haim Kaplan, Jakub Łącki, Blair D. Sullivan

Maximal clique enumeration is a fundamental graph mining task, but its utility is often limited by computational intractability and highly redundant output. To address these challenges, we introduce \emph{$ρ$-dense aggregators}, a novel approach that succinctly captures maximal clique structure. Instead of listing all cliques, we identify a small collection of clusters with edge density at least $ρ$ that collectively contain every maximal clique. In contrast to maximal clique enumeration, we prove that for all $ρ< 1$, every graph admits a $ρ$-dense aggregator of \emph{sub-exponential} size, $n^{O(\log_{1/ρ}n)}$, and provide an algorithm achieving this bound. For graphs with bounded degeneracy, a typical characteristic of real-world networks, our algorithm runs in near-linear time and produces near-linear size aggregators. We also establish a matching lower bound on aggregator size, proving our results are essentially tight. In an empirical evaluation on real-world networks, we demonstrate significant practical benefits for the use of aggregators: our algorithm is consistently faster than the state-of-the-art clique enumeration algorithm, with median speedups over $6\times$ for $ρ=0.1$ (and over $300\times$ in an extreme case), while delivering a much more concise structural summary.

Quantum Algorithm for Searching for the Longest Segment and the Largest Empty Rectangle

from arXiv: Data Structures and Algorithms

Authors: Kamil Khadiev, Vladislav Remidovskii, Timur Bikmullin, Aliya Khadieva

In the paper, we consider the problem of searching for the Largest empty rectangle in a 2D map, and the one-dimensional version of the problem is the problem of searching for the largest empty segment. We present a quantum algorithm for the Largest Empty Square problem and the Largest Empty Rectangle of a fixed width $d$ for $n\times n$-rectangular map. Query complexity of the algorithm is $\tilde{O}(n^{1.5})$ for the square case, and $\tilde{O}(n\sqrt{d})$ for the rectangle with a fixed width $d$ case, respectively. At the same time, the lower bounds for the classical case are $Ω(n^2)$, and $Ω(nd)$, respectively. The Quantum algorithm for the one-dimensional version of the problem has $O(\sqrt{n}\log n\log\log n)$ query complexity. The quantum lower bound for the problem is $Ω(\sqrt{n})$ which is almost equal to the upper bound up to a log factor. The classical lower bound is $Ω(n)$. So, we obtain the quadratic speed-up for the problem.

Authors: Kamil Khadiev, Vladislav Remidovskii, Timur Bikmullin, Aliya Khadieva

In the paper, we consider the problem of searching for the Largest empty rectangle in a 2D map, and the one-dimensional version of the problem is the problem of searching for the largest empty segment. We present a quantum algorithm for the Largest Empty Square problem and the Largest Empty Rectangle of a fixed width $d$ for $n\times n$-rectangular map. Query complexity of the algorithm is $\tilde{O}(n^{1.5})$ for the square case, and $\tilde{O}(n\sqrt{d})$ for the rectangle with a fixed width $d$ case, respectively. At the same time, the lower bounds for the classical case are $Ω(n^2)$, and $Ω(nd)$, respectively. The Quantum algorithm for the one-dimensional version of the problem has $O(\sqrt{n}\log n\log\log n)$ query complexity. The quantum lower bound for the problem is $Ω(\sqrt{n})$ which is almost equal to the upper bound up to a log factor. The classical lower bound is $Ω(n)$. So, we obtain the quadratic speed-up for the problem.

Matrix Editing Meets Fair Clustering: Parameterized Algorithms and Complexity

from arXiv: Data Structures and Algorithms

Authors: Robert Ganian, Hung P. Hoang, Simon Wietheger

We study the computational problem of computing a fair means clustering of discrete vectors, which admits an equivalent formulation as editing a colored matrix into one with few distinct color-balanced rows by changing at most $k$ values. While NP-hard in both the fairness-oblivious and the fair settings, the problem is well-known to admit a fixed-parameter algorithm in the former ``vanilla'' setting. As our first contribution, we exclude an analogous algorithm even for highly restricted fair means clustering instances. We then proceed to obtain a full complexity landscape of the problem, and establish tractability results which capture three means of circumventing our obtained lower bound: placing additional constraints on the problem instances, fixed-parameter approximation, or using an alternative parameterization targeting tree-like matrices.

Authors: Robert Ganian, Hung P. Hoang, Simon Wietheger

We study the computational problem of computing a fair means clustering of discrete vectors, which admits an equivalent formulation as editing a colored matrix into one with few distinct color-balanced rows by changing at most $k$ values. While NP-hard in both the fairness-oblivious and the fair settings, the problem is well-known to admit a fixed-parameter algorithm in the former ``vanilla'' setting. As our first contribution, we exclude an analogous algorithm even for highly restricted fair means clustering instances. We then proceed to obtain a full complexity landscape of the problem, and establish tractability results which capture three means of circumventing our obtained lower bound: placing additional constraints on the problem instances, fixed-parameter approximation, or using an alternative parameterization targeting tree-like matrices.

Comparative algorithm performance evaluation and prediction for the maximum clique problem using instance space analysis

from arXiv: Data Structures and Algorithms

Authors: Bharat Sharman, Elkafi Hassini

The maximum clique problem, a well-known graph-based combinatorial optimization problem, has been addressed through various algorithmic approaches, though systematic analyses of the problem instances remain sparse. This study employs the instance space analysis (ISA) methodology to systematically analyze the instance space of this problem and assess & predict the performance of state-of-the-art (SOTA) algorithms, including exact, heuristic, and graph neural network (GNN)-based methods. A dataset was compiled using graph instances from TWITTER, COLLAB and IMDB-BINARY benchmarks commonly used in graph machine learning research. A set of 33 generic and 2 problem-specific polynomial-time-computable graph-based features, including several spectral properties, was employed for the ISA. A composite performance mea- sure incorporating both solution quality and algorithm runtime was utilized. The comparative analysis demonstrated that the exact algorithm Mixed Order Maximum Clique (MOMC) exhib- ited superior performance across approximately 74.7% of the instance space constituted by the compiled dataset. Gurobi & CliSAT accounted for superior performance in 13.8% and 11% of the instance space, respectively. The ISA-based algorithm performance prediction model run on 34 challenging test instances compiled from the BHOSLIB and DIMACS datasets yielded top-1 and top-2 best performing algorithm prediction accuracies of 88% and 97%, respectively.

Authors: Bharat Sharman, Elkafi Hassini

The maximum clique problem, a well-known graph-based combinatorial optimization problem, has been addressed through various algorithmic approaches, though systematic analyses of the problem instances remain sparse. This study employs the instance space analysis (ISA) methodology to systematically analyze the instance space of this problem and assess & predict the performance of state-of-the-art (SOTA) algorithms, including exact, heuristic, and graph neural network (GNN)-based methods. A dataset was compiled using graph instances from TWITTER, COLLAB and IMDB-BINARY benchmarks commonly used in graph machine learning research. A set of 33 generic and 2 problem-specific polynomial-time-computable graph-based features, including several spectral properties, was employed for the ISA. A composite performance mea- sure incorporating both solution quality and algorithm runtime was utilized. The comparative analysis demonstrated that the exact algorithm Mixed Order Maximum Clique (MOMC) exhib- ited superior performance across approximately 74.7% of the instance space constituted by the compiled dataset. Gurobi & CliSAT accounted for superior performance in 13.8% and 11% of the instance space, respectively. The ISA-based algorithm performance prediction model run on 34 challenging test instances compiled from the BHOSLIB and DIMACS datasets yielded top-1 and top-2 best performing algorithm prediction accuracies of 88% and 97%, respectively.

Singing a MIS

from arXiv: Data Structures and Algorithms

Authors: Sandy Irani, Michael Luby

We introduce a broadcast model called the singing model, where agents are oblivious of the size and structure of the communication network, even their immediate neighborhood. Agents can sing multiple notes which are heard by their neighbors. The model is a generalization of the beeping model, where agents can only emit sound at a single frequency. We give a simple and natural protocol where agents compete with their neighbors and their strength is reflected in the number of notes they sing. It converges in $O(log(n))$ time with high probability, where $n$ is the number of agents in the network. The protocol works in an asynchronous model where rounds vary in length and have different start times. It works with completely dynamic networks where agents can be faulty. The protocol is the first to converge to an MIS in logarithmic time for dynamic networks in a network oblivious model.

Authors: Sandy Irani, Michael Luby

We introduce a broadcast model called the singing model, where agents are oblivious of the size and structure of the communication network, even their immediate neighborhood. Agents can sing multiple notes which are heard by their neighbors. The model is a generalization of the beeping model, where agents can only emit sound at a single frequency. We give a simple and natural protocol where agents compete with their neighbors and their strength is reflected in the number of notes they sing. It converges in $O(log(n))$ time with high probability, where $n$ is the number of agents in the network. The protocol works in an asynchronous model where rounds vary in length and have different start times. It works with completely dynamic networks where agents can be faulty. The protocol is the first to converge to an MIS in logarithmic time for dynamic networks in a network oblivious model.

Fast approximate $\ell$-center clustering in high dimensional spaces

from arXiv: Data Structures and Algorithms

Authors: Mirosław Kowaluk, Andrzej Lingas, Mia Persson

We study the design of efficient approximation algorithms for the $\ell$-center clustering and minimum-diameter $\ell$-clustering problems in high dimensional Euclidean and Hamming spaces. Our main tool is randomized dimension reduction. First, we present a general method of reducing the dependency of the running time of a hypothetical algorithm for the $\ell$-center problem in a high dimensional Euclidean space on the dimension size. Utilizing in part this method, we provide $(2+ε)$- approximation algorithms for the $\ell$-center clustering and minimum-diameter $\ell$-clustering problems in Euclidean and Hamming spaces that are substantially faster than the known $2$-approximation ones when both $\ell$ and the dimension are super-logarithmic. Next, we apply the general method to the recent fast approximation algorithms with higher approximation guarantees for the $\ell$-center clustering problem in a high dimensional Euclidean space. Finally, we provide a speed-up of the known $O(1)$-approximation method for the generalization of the $\ell$-center clustering problem to include $z$ outliers (i.e., $z$ input points can be ignored while computing the maximum distance of an input point to a center) in high dimensional Euclidean and Hamming spaces.

Authors: Mirosław Kowaluk, Andrzej Lingas, Mia Persson

We study the design of efficient approximation algorithms for the $\ell$-center clustering and minimum-diameter $\ell$-clustering problems in high dimensional Euclidean and Hamming spaces. Our main tool is randomized dimension reduction. First, we present a general method of reducing the dependency of the running time of a hypothetical algorithm for the $\ell$-center problem in a high dimensional Euclidean space on the dimension size. Utilizing in part this method, we provide $(2+ε)$- approximation algorithms for the $\ell$-center clustering and minimum-diameter $\ell$-clustering problems in Euclidean and Hamming spaces that are substantially faster than the known $2$-approximation ones when both $\ell$ and the dimension are super-logarithmic. Next, we apply the general method to the recent fast approximation algorithms with higher approximation guarantees for the $\ell$-center clustering problem in a high dimensional Euclidean space. Finally, we provide a speed-up of the known $O(1)$-approximation method for the generalization of the $\ell$-center clustering problem to include $z$ outliers (i.e., $z$ input points can be ignored while computing the maximum distance of an input point to a center) in high dimensional Euclidean and Hamming spaces.

Complexity of Local Search for CSPs Parameterized by Constraint Difference

from arXiv: Data Structures and Algorithms

Authors: Aditya Anand, Vincent Cohen-Addad, Tommaso d'Orsi, Anupam Gupta, Euiwoong Lee, Debmalya Panigrahi, Sijin Peng

In this paper, we study the parameterized complexity of local search, whose goal is to find a good nearby solution from the given current solution. Formally, given an optimization problem where the goal is to find the largest feasible subset $S$ of a universe $U$, the new input consists of a current solution $P$ (not necessarily feasible) as well as an ordinary input for the problem. Given the existence of a feasible solution $S^*$, the goal is to find a feasible solution as good as $S^*$ in parameterized time $f(k) \cdot n^{O(1)}$, where $k$ denotes the distance $|PΔS^*|$. This model generalizes numerous classical parameterized optimization problems whose parameter $k$ is the minimum number of elements removed from $U$ to make it feasible, which corresponds to the case $P = U$. We apply this model to widely studied Constraint Satisfaction Problems (CSPs), where $U$ is the set of constraints, and a subset $U'$ of constraints is feasible if there is an assignment to the variables satisfying all constraints in $U'$. We give a complete characterization of the parameterized complexity of all boolean-alphabet symmetric CSPs, where the predicate's acceptance depends on the number of true literals.

Authors: Aditya Anand, Vincent Cohen-Addad, Tommaso d'Orsi, Anupam Gupta, Euiwoong Lee, Debmalya Panigrahi, Sijin Peng

In this paper, we study the parameterized complexity of local search, whose goal is to find a good nearby solution from the given current solution. Formally, given an optimization problem where the goal is to find the largest feasible subset $S$ of a universe $U$, the new input consists of a current solution $P$ (not necessarily feasible) as well as an ordinary input for the problem. Given the existence of a feasible solution $S^*$, the goal is to find a feasible solution as good as $S^*$ in parameterized time $f(k) \cdot n^{O(1)}$, where $k$ denotes the distance $|PΔS^*|$. This model generalizes numerous classical parameterized optimization problems whose parameter $k$ is the minimum number of elements removed from $U$ to make it feasible, which corresponds to the case $P = U$. We apply this model to widely studied Constraint Satisfaction Problems (CSPs), where $U$ is the set of constraints, and a subset $U'$ of constraints is feasible if there is an assignment to the variables satisfying all constraints in $U'$. We give a complete characterization of the parameterized complexity of all boolean-alphabet symmetric CSPs, where the predicate's acceptance depends on the number of true literals.

Wednesday, December 03

Congratulations to three new doctorates!

from David Eppstein

This has been a busy week for doctoral defenses for me: I attended three, participating in the committees for the last two.

This has been a busy week for doctoral defenses for me: I attended three, participating in the committees for the last two.

The first, on Monday, was Ryuto Kitagawa, a student of Mike Goodrich. Ryuto came to UCI through a recommendation from Mike’s former student, Nodari Sitchinava at the University of Hawaii. His thesis concerned parallel data structures, based on three of his publications. Two of these concern invertible Bloom filters and invertible Bloom lookup tables, data structures that can handle a streaming sequence of insertions and deletions of keys or key/value pairs (respectively) and allow lookups in the resulting set whenever the number of remaining elements is below the set capacity of the data structure, even if it might have gone far above capacity earlier. Normally these use space linear in the capacity and are decoded sequentially, but “Parallel Peeling of Invertible Bloom Lookup Tables in a Constant Number of Rounds” (SOFSEM 2025) gives a constant-time decoding algorithm at the expense of a logarithmic space penalty. In “Dynamic Accountable Storage: An Efficient Protocol for Real-Time Cloud Storage Auditing”, Ryuto applies these data structures to a problem of verifying that cloud storage providers still have a valid copy of your stored data. Another part of the thesis concerns a parallel B-tree data structure, from Ryuto’s paper “Parallel Joinable B-Trees in the Fork-Join I/O Model” (ISAAC 2025).

Next, on Tuesday, was the turn of Ofek Gila, another student of Mike, whom Mike recruited when Ofek was a UCI undergraduate. I’ve already discussed the work that Ofek included in his thesis in several previous posts here. His paper “Zip-zip Trees: Making Zip Trees More Balanced, Biased, Compact, or Persistentwon the best paper award at WADS 2023; it provides a variation of the zip tree data structure of Bob Tarjan, Caleb Levy, and Stephen Timmel that uses fewer random bits per node and that maintains the same depth that one would get from a random binary search tree. Although this depth is larger than a perfectly balanced search tree by a factor of \(\ln 4\approx 1.386\), Ofek conjectures that it is optimal for strongly history-independent binary search trees with logarithmic update costs. “Zip-tries: simple dynamic data structures for strings” (ACDA 2025) extends similar ideas of zipping-based updates and few random bits per node to dynamic string dictionary data structures; I am a coauthor on this one and posted about it here. Finally, his “Highway Preferential Attachment Models for Geographic Routing” won the best paper award at COCOA 2023; it is coauthored with recent doctorate Evrim Ozel. It is related to a model of small-world networking by Jonathan Kleinberg, in which adding random long-range connections to a grid, with probability inverse-square in the connection distance, preserves bounded average degree while allowing greedy routing methods to find short paths. Ofek and Evrim’s work gives more structure to the long-range connections by assigning each node a randomly chosen length range (with diminishing probabilities for longer connections) and only allowing connections between nodes with similar ranges. The result is a family of small-world networks that preserve the general features of Kleinberg’s model but for which greedy routing produces much shorter paths, only slightly above logarithmic in length.

Finally, this morning I participated remotely in the defense of Clément Rambaud, at the Université Côte d’Azur in France. Clément already has many deep publications in graph minor theory. The foundation of this theory is the Robertson–Seymour theorem, that every family of graphs closed under taking minors (subgraphs of edge contractions) can be characterized by finitely many graphs that are forbidden as minors, in the same way that the planar graphs are characterized by the two forbidden minors \(K_5\) and \(K_{3,3}\). Clément’s thesis revolved around a conjecture of Robin Thomas, that in the same way, graph parameters that behave monotonically under taking minors should be characterized by finitely many well-defined families of graphs. Classical examples are that a graph has small treewidth iff it has no large square grid minor, that it has small pathwidth iff it no large complete binary tree minor, and that it has small treedepth iff it has no long path minor. Clément introduced a \(k\)-treedepth parameter, interpolating between treedepth and treewidth, and defined by structural decompositions allowing \((<k)\)-clique-sums and adding an apex at the expense of increased depth. He showed that these are characterized by forbidden rectangular grid minors, where the small dimension of the grid controls the parameter \(k\) (up to a constant factor) and the large dimension controls the \(k\)-treedepth. His thesis went on to explain, to a large extent, the phenomenon that local versions of graph parameters (where one studies the relation of the parameter value to the diameter of a subgraph) are often controlled by forbidden minors that add one apex vertex to the forbidden minors for global properties: local treewidth is controlled by forbidden apex graphs, local pathwidth is controlled by apex-trees, and local treedepth is controlled by fans (apex-paths), etc. A third part of the thesis used the main technical tool for studying these local properties, rooted graph minors, to study centered colorings of graphs; these can be thought of as a way of covering a given graph with a small number of bounded-treedepth subgraphs, allowing fast algorithms for subgraph isomorphism and related problems. For minor-closed graph families one needs a number of colors that is polynomial in the desired treedepth bound, but the exact polynomial is unknown, even for planar graphs. Clément’s thesis strengthened the bounds on how many colors are needed, to within one in the exponent of the polynomial.

Congratulations, Ryuto, Ofek, and Clément!

(Discuss on Mastodon)

By David Eppstein

Defining Reinforcement Learning Down

from Ben Recht

It's a lot simpler than I realized.

On Monday, I described reformist reinforcement learning in a paragraph. Today I’ll do it in about 800 words with no equations. I’m indebted to Lior Fox, who, in a back-and-forth in the comment section, helped me synthesize a far better definition than I had.

Lior, a cognitive scientist, has a broad definition of reinforcement learning rooted in the psychological concept of reinforcement learning. A century before computers were playing backgammon with TD Learning, psychologists were building theories of human learning based on feedback. Paraphrasing Thorndike’s Law of Effect, Lior defines reinforcement learning as the iterative process:

  1. Receive external validation on how good you’re currently doing

  2. Adjust what you’re currently doing so that you are better the next time around.

Whether or not this is how humans or animals learn, this is a spot-on definition of computer scientific reinforcement learning. Let me make it more precise.

We have a computer program in feedback with an evaluation environment. The computer produces responses to a series of tests. An external agent then numerically scores these responses. The computer program is fed these scores and internally updates its software for the subsequent evaluation. The process repeats. The goal of the iteration is to produce a program that achieves the highest possible average score when interacting with the evaluation environment. In bullets, computational reinforcement learning is the iterative process:

  1. Produce a collection of responses to evaluation scenarios.

  2. Receive scores on these responses.

  3. Update the computer code based on these scores.

Reinforcement learning is thus a branch of optimization. The objective is to maximize the average score if the program were to be evaluated an infinite number of times. The optimization process is iterative, with feedback based solely on the scores.

You can cook up a lot of examples where you might be able to use a method like this. You could have a game-playing agent play a video game a bunch of times, adapting its strategy based on the score of each round. You could have an autonomous racing quadrotor that iteratively tries new maneuvers to improve its time around a course. You could have a language model whose responses are scored by remote low-wage labor, fine-tuned to match the workers’ preferences. All of these would count as examples of reinforcement learning.

Reformist RL uses a very particular implementation of the update step 3. First, the computer code is a generative model. This model interactively takes input from the evaluation environment and returns a sequence of random samples. In our examples above, the video-game-player randomly chooses its next moves, the quadrotor randomly perturbs its previous headings, and the language model randomly generates its next tokens.

When building generative models from data, the goal is to maximize the probability of some given dataset. In Reformist RL, the generative model generates the data itself. The training data are the records from the evaluation in step 1 and the received scores in step 2. In step 3, you update the generative model by training only on data with positive scores.1 That is, whenever the computer receives high scores, it increases the probability of responding the same way the next time it sees the same test in the evaluation environment.

If you already have code to build generative models, then you can easily turn this code into a reinforcement learning agent. Simply add weights to the updates proportional to the scores received in the evaluation phase. That’s it. This is called policy gradient. The ease of implementation is why Reformist RL is so enticing in large language models. Any code for pretraining can be quickly modified for posttraining. The models are pretrained on sequences of text from the internet. They are postrained on sequences of text generated in various evaluation environments.

Now you might ask, what about all of those Markov Decision Processes that professors torture you with in AI classes? Though it has historically been the other way around in classical artificial intelligence, Reformist RL, like behaviorist psychology, views MDPs as secondary rather than primary. MDPs arise in a very particular evaluation environment where

  1. The computer is scored on a sequence of tests.

  2. The evaluation environment chooses its next test in the sequence only as a function of the current test and the computer’s current answer.

  3. Each test receives progressively less weight in the total score of the evaluation (i.e., discounting).

You can phrase a lot of problems this way, but this comprises a subset of reinforcement learning problems, not a superset.

This characterization of reinforcement learning and Reformist Reinforcement Learning captures every example I know of. It connects nicely to the rest of machine learning. You can teach it in a single class. My survey of reinforcement learning goes from 27 pages to 809 words. I have learned from experience.

Subscribe now

1

Or, more precisely, the higher the score, the more likely you make your response there.

By Ben Recht

Search versus Decision for $\mathsf{S}_2^\mathsf{P}$

from arXiv: Computational Complexity

Authors: Lance Fortnow

We compare the complexity of the search and decision problems for the complexity class S2P. While Cai (2007) showed that the decision problem is contained in ZPP^NP, we show that the search problem is equivalent to TFNP^NP, the class of total search problems verifiable in polynomial time with an NP oracle. This highlights a significant contrast: if search reduces to decision for S2P, then $Σ_2^p \cap Π_2^p$ is contained in ZPP^NP.

Authors: Lance Fortnow

We compare the complexity of the search and decision problems for the complexity class S2P. While Cai (2007) showed that the decision problem is contained in ZPP^NP, we show that the search problem is equivalent to TFNP^NP, the class of total search problems verifiable in polynomial time with an NP oracle. This highlights a significant contrast: if search reduces to decision for S2P, then $Σ_2^p \cap Π_2^p$ is contained in ZPP^NP.

Devil's Games and $\text{Q}\mathbb{R}$: Continuous Games complete for the First-Order Theory of the Reals

from arXiv: Computational Geometry

Authors: Lucas Meijer, Arnaud de Mesmay, Tillmann Miltzow, Marcus Schaefer, Jack Stade

We introduce the complexity class Quantified Reals ($\text{Q}\mathbb{R}$). Let FOTR be the set of true sentences in the first-order theory of the reals. A language $L$ is in $\text{Q}\mathbb{R}$, if there is a polynomial time reduction from $L$ to FOTR. This seems the first time this complexity class is studied. We show that $\text{Q}\mathbb{R}$ can also be defined using real Turing machines. It is known that deciding FOTR requires at least exponential time unconditionally [Berman, 1980]. We focus on devil's games with two defining properties: (1) Players (human and devil) alternate turns and (2) each turn has a continuum of options. First, we show that FOTRINV is $\text{Q}\mathbb{R}$-complete. FOTRINV has only inversion and addition constraints and all variables are in a compact interval. FOTRINV is a stepping stone for further reductions. Second, we show that the Packing Game is $\text{Q}\mathbb{R}$-complete. In the Packing Game we are given a container and two sets of pieces. One set of pieces for the human and one set for the devil. The human and the devil alternate by placing a piece into the container. Both rotations and translations are allowed. The first player that cannot place a piece loses. Third, we show that the Planar Extension Game is $\text{Q}\mathbb{R}$-complete. We are given a partially drawn plane graph and the human and the devil alternate by placing vertices and the corresponding edges in a straight-line manner. The vertices and edges to be placed are prescribed before hand. The first player that cannot place a vertex loses. Finally, we show that the Order Type Game is $\text{Q}\mathbb{R}$-complete. We are given an order-type together with a linear order. The human and the devil alternate in placing a point in the Euclidean plane following the linear order. The first player that cannot place a point correctly loses.

Authors: Lucas Meijer, Arnaud de Mesmay, Tillmann Miltzow, Marcus Schaefer, Jack Stade

We introduce the complexity class Quantified Reals ($\text{Q}\mathbb{R}$). Let FOTR be the set of true sentences in the first-order theory of the reals. A language $L$ is in $\text{Q}\mathbb{R}$, if there is a polynomial time reduction from $L$ to FOTR. This seems the first time this complexity class is studied. We show that $\text{Q}\mathbb{R}$ can also be defined using real Turing machines. It is known that deciding FOTR requires at least exponential time unconditionally [Berman, 1980]. We focus on devil's games with two defining properties: (1) Players (human and devil) alternate turns and (2) each turn has a continuum of options. First, we show that FOTRINV is $\text{Q}\mathbb{R}$-complete. FOTRINV has only inversion and addition constraints and all variables are in a compact interval. FOTRINV is a stepping stone for further reductions. Second, we show that the Packing Game is $\text{Q}\mathbb{R}$-complete. In the Packing Game we are given a container and two sets of pieces. One set of pieces for the human and one set for the devil. The human and the devil alternate by placing a piece into the container. Both rotations and translations are allowed. The first player that cannot place a piece loses. Third, we show that the Planar Extension Game is $\text{Q}\mathbb{R}$-complete. We are given a partially drawn plane graph and the human and the devil alternate by placing vertices and the corresponding edges in a straight-line manner. The vertices and edges to be placed are prescribed before hand. The first player that cannot place a vertex loses. Finally, we show that the Order Type Game is $\text{Q}\mathbb{R}$-complete. We are given an order-type together with a linear order. The human and the devil alternate in placing a point in the Euclidean plane following the linear order. The first player that cannot place a point correctly loses.