Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Monday, June 01

Theory Jobs 2026

from Kamathematics

The Theory community has a tradition of a crowdsourced spreadsheet of sharing who has accepted which jobs (see, e.g., 2025). You can report jobs accepted via this form (anonymous), and view the results on this sheet. If you feel a change needs to be made to the results reported in the form, please email me or fill out this … Continue reading "Theory Jobs 2026"

The Theory community has a tradition of a crowdsourced spreadsheet of sharing who has accepted which jobs (see, e.g., 2025).

You can report jobs accepted via this form (anonymous), and view the results on this sheet. If you feel a change needs to be made to the results reported in the form, please email me or fill out this other form (also anonymous, unless you choose to identify yourself).

The rules are as follows:

  • You are welcome to add yourself, or people your department has hired.
  • Hires should be connected to theoretical computer science, broadly defined.
  • Only add jobs that you are absolutely sure have been offered and accepted. This is not the place for speculation and rumors. Please, be particularly careful when adding senior hires (people who already have an academic or industrial job) — end dates of their current positions might be still in the future.

To keep this list focused around theory, mild curation will be applied: individuals should have publications at theory-centric conferences such as STOC, FOCS, SODA, ICALP, STACS, COLT, CRYPTO, CCC, ITCS, etc., or publications of a similar style. If you don’t meet this criterion, but still identify as a theorist/theoretician (at least partially), please email me (ideally, with a sentence or two of explanation) at g@csail.mit.edu and I’ll include you, no further questions asked.

By Gautam

Odd Scenarios about Research Claims and Authorships

from Computational Complexity

 Odd Scenarios about Research Claims

I blogged about OpenAI's achievement of having AI solve a math problem here.
My post had a few comments about authorship of such results.

Lance had a post about co-authorship and AI here

There are times when an author on a paper prefers to not be listed as a co-author.
There are times when an author wants to give credit in odd places.

We give some scenarios.

1) Professors Alice and Bob  prove a theorem. They have their grad student Carol
write up the result. Alice and Bob intentionally leave their name
off of the paper since they want Carol to get a career boost.

2) Alice has a conjecture and does some work on it. Famous Mathematicians
Bob and Carol solve it and offer Alice a co-authorship. Alice declines thinking that
having it known that Bob and Carol cared about her work is more important
(and Alice thinks, rightly or wrongly, that it's more impressive if she is not a co-author)

3) Alice has a good idea and tells it to Bob and Carol. They incorporate the idea
into a paper and make her a co-author. Later she tells them her idea is wrong and
to take her name OFF of the journal version. They say, no, they will keep her name
on the journal version. She does not inquire why they will do this for fear they
will rethink it. She offers to proofread the paper very carefully to earn her keep.
She does that and does a really good job.

4) Alice has a conjecture and does some work on it. Other people solve it
and offer Alice a co-authorship. Alice declines since she doesn't really understand
the final paper.

5) Alice and Bob are working for the NSA. They come up with a breakthrough (either making or
breaking codes). They can't publish it because of national security.

6) A pair of scenarios.

6a) Alice, Bob, and Carol prove a theorem in number theory that leads to a new classical
factoring algorithm. They code up the algorithm and find that it's 1000 times
faster than the best algorithms known. They make it into a package and want to sell it.
 However, they claim it's a quantum computer since they think it will sell better that way.
They do not claim that it's Shor's algorithm.
They just claim that it's 1000 times faster than current algorithms, which is true.

If someone buys it because they want to factor large numbers, do they care
that it's not quantum?

If someone buys it because they want to explore the quantum aspect, they do care.
b) Alice, Bob, and Carol are quantum engineers. They build a quantum computer and
use it to factor large numbers. They find that it factors 1000 times faster than current classical
algorithms. However, by this point there has been so much quantum hype that has been
debunked, that, in order to sell it, they claim it's a classical algorithm achieved
by a breakthrough in number theory. They do not claim it's Shor's algorithm,
just that it's 1000 times faster which is true.

If someone buys it because they want to factor large numbers, do they care
that it's not quantum?

If someone buys it because they want to explore the number-theory aspects, they do care.

7) A pair of scenarios.

7a) Alice, Bob, and Carol work for a math research lab. They use AI to crack an open problem in math.
The AI did all the heavy lifting and produced a very readable proof.  They claim AI didn't do much
since they think this will increase how much grant money they can get.

If someone just wants to read the proof, then do they care that it's AI-generated?

If someone wants to give Alice-Bob-Carol grant money, then they  may dislike the lying, but
Alice-Bob-Carol could probably use AI to get more results. Hence Alice-Bob-Carol were fools
to hide the use of AI.

7b) An AI company hires Alice, Bob, and Carol to solve a math problem. They succeed and
produce a readable proof. The company then claims that AI did it to increase attention and funding.

If someone just wants to read the proof, then do they care that it's human-generated?

If someone wants to give the company venture capital money, then they do care.

8) Alice reads a novel and likes it. She later finds out that it was written by AI.
Now she decides she doesn't like it. Is that rational? What if the cover of the novel
said quite clearly ``This novel was written by AI'' but she missed it (perhaps she
has a used copy where the cover is torn off). Then she can't use the
``I mind that they lied'' excuse for why she doesn't like it.

9) In Scenario 7 you could replace ``novel'' with ``song'' or ``movie''
or ``episode of a TV show'' or whatever you want.

10) If I found out that a novelty song about Ramsey Theory was actually
AI-generated, then would I mind? I would probably search my files and find
that I was the one who asked AI to do it a few years ago and forgot about it.
Do you know anyone else who would use ChatGPT to write songs about Ramsey Theory?

11) Having written point 9 I am morally obligated to give you a pointer to
a Ramsey Song I wrote with the help of ChatGPT. It's about the proof of
van der Waerden's theorem and is titled

I like big blocks (and I cannot lie) here.


By gasarch

 Odd Scenarios about Research Claims

I blogged about OpenAI's achievement of having AI solve a math problem here.
My post had a few comments about authorship of such results.

Lance had a post about co-authorship and AI here

There are times when an author on a paper prefers to not be listed as a co-author.
There are times when an author wants to give credit in odd places.

We give some scenarios.

1) Professors Alice and Bob  prove a theorem. They have their grad student Carol
write up the result. Alice and Bob intentionally leave their name
off of the paper since they want Carol to get a career boost.

2) Alice has a conjecture and does some work on it. Famous Mathematicians
Bob and Carol solve it and offer Alice a co-authorship. Alice declines thinking that
having it known that Bob and Carol cared about her work is more important
(and Alice thinks, rightly or wrongly, that it's more impressive if she is not a co-author)

3) Alice has a good idea and tells it to Bob and Carol. They incorporate the idea
into a paper and make her a co-author. Later she tells them her idea is wrong and
to take her name OFF of the journal version. They say, no, they will keep her name
on the journal version. She does not inquire why they will do this for fear they
will rethink it. She offers to proofread the paper very carefully to earn her keep.
She does that and does a really good job.

4) Alice has a conjecture and does some work on it. Other people solve it
and offer Alice a co-authorship. Alice declines since she doesn't really understand
the final paper.

5) Alice and Bob are working for the NSA. They come up with a breakthrough (either making or
breaking codes). They can't publish it because of national security.

6) A pair of scenarios.

6a) Alice, Bob, and Carol prove a theorem in number theory that leads to a new classical
factoring algorithm. They code up the algorithm and find that it's 1000 times
faster than the best algorithms known. They make it into a package and want to sell it.
 However, they claim it's a quantum computer since they think it will sell better that way.
They do not claim that it's Shor's algorithm.
They just claim that it's 1000 times faster than current algorithms, which is true.

If someone buys it because they want to factor large numbers, do they care
that it's not quantum?

If someone buys it because they want to explore the quantum aspect, they do care.
b) Alice, Bob, and Carol are quantum engineers. They build a quantum computer and
use it to factor large numbers. They find that it factors 1000 times faster than current classical
algorithms. However, by this point there has been so much quantum hype that has been
debunked, that, in order to sell it, they claim it's a classical algorithm achieved
by a breakthrough in number theory. They do not claim it's Shor's algorithm,
just that it's 1000 times faster which is true.

If someone buys it because they want to factor large numbers, do they care
that it's not quantum?

If someone buys it because they want to explore the number-theory aspects, they do care.

7) A pair of scenarios.

7a) Alice, Bob, and Carol work for a math research lab. They use AI to crack an open problem in math.
The AI did all the heavy lifting and produced a very readable proof.  They claim AI didn't do much
since they think this will increase how much grant money they can get.

If someone just wants to read the proof, then do they care that it's AI-generated?

If someone wants to give Alice-Bob-Carol grant money, then they  may dislike the lying, but
Alice-Bob-Carol could probably use AI to get more results. Hence Alice-Bob-Carol were fools
to hide the use of AI.

7b) An AI company hires Alice, Bob, and Carol to solve a math problem. They succeed and
produce a readable proof. The company then claims that AI did it to increase attention and funding.

If someone just wants to read the proof, then do they care that it's human-generated?

If someone wants to give the company venture capital money, then they do care.

8) Alice reads a novel and likes it. She later finds out that it was written by AI.
Now she decides she doesn't like it. Is that rational? What if the cover of the novel
said quite clearly ``This novel was written by AI'' but she missed it (perhaps she
has a used copy where the cover is torn off). Then she can't use the
``I mind that they lied'' excuse for why she doesn't like it.

9) In Scenario 7 you could replace ``novel'' with ``song'' or ``movie''
or ``episode of a TV show'' or whatever you want.

10) If I found out that a novelty song about Ramsey Theory was actually
AI-generated, then would I mind? I would probably search my files and find
that I was the one who asked AI to do it a few years ago and forgot about it.
Do you know anyone else who would use ChatGPT to write songs about Ramsey Theory?

11) Having written point 9 I am morally obligated to give you a pointer to
a Ramsey Song I wrote with the help of ChatGPT. It's about the proof of
van der Waerden's theorem and is titled

I like big blocks (and I cannot lie) here.


By gasarch

Aspects of Coherence in Dependence Logic

from arXiv: Computational Complexity

Authors: Timon Barlag, Nicolas Fröhlich, Miika Hannula, Phokion G. Kolaitis, Juha Kontinen, Arne Meier, Jouko Väänänen

Dependence logic extends first-order logic with dependence atoms asserting that the value of a variable is determined by the values of certain other variables. The semantics of dependence logic has a second-order character and involves sets of assignments, called teams, instead of individual assignments as in the classical Tarski semantics. Since the model-checking problem is known to be NP-complete even for quantifier-free dependence logic (DQF) formulas, researchers have pursued conditions on formulas that make this problem tractable. In 2010, Jarmo Kontinen introduced the notion of k-coherence for dependence logic formulas, where k is a positive integer. This notion asserts that if the formula is satisfied in a structure by all k-element subteams of a given team, then the given team itself satisfies the formula. It has been proved that k-coherent DQF-formulas have a tame model-checking problem, because such formulas admit a first-order rewriting. In this paper, we investigate the structural and algorithmic aspects of coherence. We show that if a DQF-formula is first-order ewritable, then it is k-coherent for some positive integer k. Thus, for DQF-formulas, coherence is equivalent to first-order rewritability. Furthermore, we show that an analogous result holds for universally quantified dependence logic formulas under a stronger notion of coherence. After this, we focus on the complexity of deciding if a given dependence logic formula is k-coherent. We establish that this decision problem is highly undecidable for arbitrary dependence logic formulas, while for DQF-formulas this problem is co-recursively enumerable. Furthermore, we pinpoint the computational complexity of the coherence problem for propositional dependence logic formulas by showing that this problem is complete for the second level of the exponential hierarchy.

Authors: Timon Barlag, Nicolas Fröhlich, Miika Hannula, Phokion G. Kolaitis, Juha Kontinen, Arne Meier, Jouko Väänänen

Dependence logic extends first-order logic with dependence atoms asserting that the value of a variable is determined by the values of certain other variables. The semantics of dependence logic has a second-order character and involves sets of assignments, called teams, instead of individual assignments as in the classical Tarski semantics. Since the model-checking problem is known to be NP-complete even for quantifier-free dependence logic (DQF) formulas, researchers have pursued conditions on formulas that make this problem tractable. In 2010, Jarmo Kontinen introduced the notion of k-coherence for dependence logic formulas, where k is a positive integer. This notion asserts that if the formula is satisfied in a structure by all k-element subteams of a given team, then the given team itself satisfies the formula. It has been proved that k-coherent DQF-formulas have a tame model-checking problem, because such formulas admit a first-order rewriting. In this paper, we investigate the structural and algorithmic aspects of coherence. We show that if a DQF-formula is first-order ewritable, then it is k-coherent for some positive integer k. Thus, for DQF-formulas, coherence is equivalent to first-order rewritability. Furthermore, we show that an analogous result holds for universally quantified dependence logic formulas under a stronger notion of coherence. After this, we focus on the complexity of deciding if a given dependence logic formula is k-coherent. We establish that this decision problem is highly undecidable for arbitrary dependence logic formulas, while for DQF-formulas this problem is co-recursively enumerable. Furthermore, we pinpoint the computational complexity of the coherence problem for propositional dependence logic formulas by showing that this problem is complete for the second level of the exponential hierarchy.

Diffusion-Robust Optimization over Graphs

from arXiv: Computational Complexity

Authors: Liviu Aolaritei, Ricky Huang, Michael I. Jordan, Paul Grigas

We introduce a diffusion-based uncertainty model for robust optimization on directed graphs, in which perturbations of edge weights propagate along adjacent edges and satisfy conservation constraints at nodes. This topology-aware structure is natural in networked systems where uncertainty is induced by flows and local interactions, including transportation, logistics, communication, and energy networks. We analyze how such diffusive uncertainty reshapes the computational landscape of robust graph optimization. For convex network problems, such as minimum-cost flow and maximum flow, the resulting formulations remain convex and admit polynomial-time solution methods across all diffusion regimes considered. For combinatorial problems, the effect is more delicate. We focus on two canonical combinatorial graph problems, shortest path and the traveling salesman problem (TSP), which provide complementary benchmarks: shortest path is polynomial-time solvable in the nominal setting, whereas TSP is already NP-hard. We show that, for shortest path, propagation depth induces a sharp transition between tractable and intractable robust counterparts. For the traveling salesman problem, robustness often adds no computational complexity beyond ordinary TSP, because the structure of Hamiltonian cycles makes the fixed-tour adversarial problem collapse to explicit formulas. Together, these results show that topology-aware uncertainty can fundamentally change robust combinatorial optimization, with tractability governed by the interaction between propagation, budget geometry, and the structure of feasible solutions.

Authors: Liviu Aolaritei, Ricky Huang, Michael I. Jordan, Paul Grigas

We introduce a diffusion-based uncertainty model for robust optimization on directed graphs, in which perturbations of edge weights propagate along adjacent edges and satisfy conservation constraints at nodes. This topology-aware structure is natural in networked systems where uncertainty is induced by flows and local interactions, including transportation, logistics, communication, and energy networks. We analyze how such diffusive uncertainty reshapes the computational landscape of robust graph optimization. For convex network problems, such as minimum-cost flow and maximum flow, the resulting formulations remain convex and admit polynomial-time solution methods across all diffusion regimes considered. For combinatorial problems, the effect is more delicate. We focus on two canonical combinatorial graph problems, shortest path and the traveling salesman problem (TSP), which provide complementary benchmarks: shortest path is polynomial-time solvable in the nominal setting, whereas TSP is already NP-hard. We show that, for shortest path, propagation depth induces a sharp transition between tractable and intractable robust counterparts. For the traveling salesman problem, robustness often adds no computational complexity beyond ordinary TSP, because the structure of Hamiltonian cycles makes the fixed-tour adversarial problem collapse to explicit formulas. Together, these results show that topology-aware uncertainty can fundamentally change robust combinatorial optimization, with tractability governed by the interaction between propagation, budget geometry, and the structure of feasible solutions.

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

from arXiv: Computational Complexity

Authors: Anej Svete, William Merrill, Ryan Cotterell, Ashish Sabharwal

Recent work describes what transformers can and cannot compute through connections to boolean circuits, but existing results lack exact characterizations and are sensitive to modeling choices. Padded transformers -- to whose input filler symbols such as ``...'' are appended -- emerge as a useful gadget for establishing equivalences to circuit classes by providing polynomial space for adaptive parallel computation. However, only a limited set of padded transformer idealizations has been studied, leaving open how robustly these equivalences hold under changes to attention type, model width, and uniformity. We find that, under practical assumptions, padded transformers are surprisingly robust to all of these, and identify numeric precision and model depth as the main factors affecting expressivity. Concretely, we prove that polynomially padded $\text{L-uniform}$ constant-precision transformers are equivalent to $\text{L-uniform AC}^0$, while growing-precision ones achieve $\text{L-uniform TC}^0$ regardless of width. Furthermore, looping enables sequential processing analogous to circuits: $\log^d N$-looped constant-precision transformers reach $\text{FO-uniform AC}^d$, and growing-precision ones reach $\text{FO-uniform TC}^d$. Interestingly, growing width or precision beyond logarithmic does not increase expressivity, and all our results hold for both softmax and average hard attention transformers.

Authors: Anej Svete, William Merrill, Ryan Cotterell, Ashish Sabharwal

Recent work describes what transformers can and cannot compute through connections to boolean circuits, but existing results lack exact characterizations and are sensitive to modeling choices. Padded transformers -- to whose input filler symbols such as ``...'' are appended -- emerge as a useful gadget for establishing equivalences to circuit classes by providing polynomial space for adaptive parallel computation. However, only a limited set of padded transformer idealizations has been studied, leaving open how robustly these equivalences hold under changes to attention type, model width, and uniformity. We find that, under practical assumptions, padded transformers are surprisingly robust to all of these, and identify numeric precision and model depth as the main factors affecting expressivity. Concretely, we prove that polynomially padded $\text{L-uniform}$ constant-precision transformers are equivalent to $\text{L-uniform AC}^0$, while growing-precision ones achieve $\text{L-uniform TC}^0$ regardless of width. Furthermore, looping enables sequential processing analogous to circuits: $\log^d N$-looped constant-precision transformers reach $\text{FO-uniform AC}^d$, and growing-precision ones reach $\text{FO-uniform TC}^d$. Interestingly, growing width or precision beyond logarithmic does not increase expressivity, and all our results hold for both softmax and average hard attention transformers.

How Many Slopes Does Polynomial Area Cost?

from arXiv: Computational Geometry

Authors: Michael A. Bekos, Eleni Katsanou, Philipp Kindermann, Maria Eleni Pavlidi

In this work, we study the interplay between the number of slopes, the number of bends per edge, and the area requirements for planar drawings of bounded-degree graphs. Our motivation stems from the fact that, while numerous algorithms produce planar drawings with few slopes for graphs of relatively small degree in polynomial area, existing approaches for higher-degree graphs often require super-polynomial area. We address this gap in the literature by presenting new constructions that yield polynomial-area drawings with few bends per edge while slightly increasing the required number of slopes, thereby providing the first systematic study of slopes, bends and area trade-offs.

Authors: Michael A. Bekos, Eleni Katsanou, Philipp Kindermann, Maria Eleni Pavlidi

In this work, we study the interplay between the number of slopes, the number of bends per edge, and the area requirements for planar drawings of bounded-degree graphs. Our motivation stems from the fact that, while numerous algorithms produce planar drawings with few slopes for graphs of relatively small degree in polynomial area, existing approaches for higher-degree graphs often require super-polynomial area. We address this gap in the literature by presenting new constructions that yield polynomial-area drawings with few bends per edge while slightly increasing the required number of slopes, thereby providing the first systematic study of slopes, bends and area trade-offs.

Tree Containment Parameterized by Scanwidth

from arXiv: Data Structures and Algorithms

Authors: Leo van Iersel, Mark Jones, Mathias Weller

TREE CONTAINMENT is a central decision problem in mathematical phylogenetics, asking whether a given rooted phylogenetic tree is embeddable in ("displayed by") a given rooted phylogenetic network. While the problem is NP-complete for general networks, many algorithmic advances have relied on structural parameters that capture how "tree-like" a network is. In this paper we investigate TREE CONTAINMENT under the structural parameter scanwidth, a directed width measure generalizing popular parameters measuring tree-likeness of phylogenetic networks. We first present a parameterized algorithm that solves the problem in $O(4^{k + k\log{k}} n + nm^2)$ time, where $n$ and $m$ are the numbers of nodes and arcs in the network and $k$ is the width of a given tree-extension. Complementing this upper bound, we prove a matching lower bound under the Exponential-Time Hypothesis (ETH), showing that there is no algorithm for TREE CONTAINMENT that runs in $2^{o(c\log{c})} n^{O(1)}$ time, even on binary inputs, where $c$ is the directed cutwidth of the input network, which upper-bounds the scanwidth $k$.

Authors: Leo van Iersel, Mark Jones, Mathias Weller

TREE CONTAINMENT is a central decision problem in mathematical phylogenetics, asking whether a given rooted phylogenetic tree is embeddable in ("displayed by") a given rooted phylogenetic network. While the problem is NP-complete for general networks, many algorithmic advances have relied on structural parameters that capture how "tree-like" a network is. In this paper we investigate TREE CONTAINMENT under the structural parameter scanwidth, a directed width measure generalizing popular parameters measuring tree-likeness of phylogenetic networks. We first present a parameterized algorithm that solves the problem in $O(4^{k + k\log{k}} n + nm^2)$ time, where $n$ and $m$ are the numbers of nodes and arcs in the network and $k$ is the width of a given tree-extension. Complementing this upper bound, we prove a matching lower bound under the Exponential-Time Hypothesis (ETH), showing that there is no algorithm for TREE CONTAINMENT that runs in $2^{o(c\log{c})} n^{O(1)}$ time, even on binary inputs, where $c$ is the directed cutwidth of the input network, which upper-bounds the scanwidth $k$.

Neuro-symbolic Syntactic Parsing: Shaping a Neural Network with the CYK Algorithm

from arXiv: Data Structures and Algorithms

Authors: Fabio Massimo Zanzotto, Federico Ranaldi, Giorgio Satta

In this paper, we show the possibility of a direct injection of algorithms into neural network architecture. We focus on a complex algorithm, that is, Cocke-Youger-Kasami (CYK) for parsing context-free grammars in Chomsky Normal Form and we propose CYKNN, a simple recurrent neural network architecture for encoding the CYK algorithm in trainable matrix-vector multiplications.We experimented with a very simple grammar with 4 variations showing that our approach outperforms existing LLMs with more than 20B parameters with an in-context learning setting and smaller LLMs of the Qwen family fine-tuned with LoRA. Our attempt paves the way to a different approach to neuro-symbolic methodologies.

Authors: Fabio Massimo Zanzotto, Federico Ranaldi, Giorgio Satta

In this paper, we show the possibility of a direct injection of algorithms into neural network architecture. We focus on a complex algorithm, that is, Cocke-Youger-Kasami (CYK) for parsing context-free grammars in Chomsky Normal Form and we propose CYKNN, a simple recurrent neural network architecture for encoding the CYK algorithm in trainable matrix-vector multiplications.We experimented with a very simple grammar with 4 variations showing that our approach outperforms existing LLMs with more than 20B parameters with an in-context learning setting and smaller LLMs of the Qwen family fine-tuned with LoRA. Our attempt paves the way to a different approach to neuro-symbolic methodologies.

An Optimal Algorithm for Binary Closest String

from arXiv: Data Structures and Algorithms

Authors: Nick Fischer, Mursalin Habib

We revisit the Binary Closest String problem, which asks, given a set of binary strings $X \subseteq \{0, 1\}^n$, to compute a string minimizing the maximum Hamming distance to $X$. A long line of work has focused on parameterized algorithms with respect to the optimal distance $d$, yielding a sequence of improvements from $O^*(d^d)$ through $O^*(16^d)$, $O^*(9.513^d)$, $O^*(8^d)$, $O^*(6.731^d)$ to the current best-known running time of $O^*(5^d)$ [Chen, Ma, Wang; Algorithmica '16]. We present a faster randomized algorithm running in time $O^*(4^d)$. Our result matches a recent fine-grained lower bound [Abboud, Fischer, Goldenberg, Karthik C.S., Safier; ESA '23], and is therefore conditionally optimal. As an extra benefit, our algorithm is remarkably simple.

Authors: Nick Fischer, Mursalin Habib

We revisit the Binary Closest String problem, which asks, given a set of binary strings $X \subseteq \{0, 1\}^n$, to compute a string minimizing the maximum Hamming distance to $X$. A long line of work has focused on parameterized algorithms with respect to the optimal distance $d$, yielding a sequence of improvements from $O^*(d^d)$ through $O^*(16^d)$, $O^*(9.513^d)$, $O^*(8^d)$, $O^*(6.731^d)$ to the current best-known running time of $O^*(5^d)$ [Chen, Ma, Wang; Algorithmica '16]. We present a faster randomized algorithm running in time $O^*(4^d)$. Our result matches a recent fine-grained lower bound [Abboud, Fischer, Goldenberg, Karthik C.S., Safier; ESA '23], and is therefore conditionally optimal. As an extra benefit, our algorithm is remarkably simple.

Stepsize Hedging: an Alternative Mechanism for Accelerating Gradient Descent

from arXiv: Data Structures and Algorithms

Authors: Jason M. Altschuler, Pablo A. Parrilo

Can gradient descent be accelerated by just choosing better stepsizes? Surprisingly, the answer is yes. This short expository article provides an accessible introduction to this phenomenon of stepsize hedging.

Authors: Jason M. Altschuler, Pablo A. Parrilo

Can gradient descent be accelerated by just choosing better stepsizes? Surprisingly, the answer is yes. This short expository article provides an accessible introduction to this phenomenon of stepsize hedging.

Retriever Portfolios: A Principled Approach to Adaptive RAG

from arXiv: Data Structures and Algorithms

Authors: Miltiadis Stouras, Vincent Cohen-Addad, Silvio Lattanzi, Ola Svensson

Retrieval-augmented generation (RAG) systems typically rely on a single retriever and a single set of hyperparameters, despite facing highly heterogeneous queries that range from simple factoid questions to complex multi-hop reasoning. We propose a method that automatically selects a small, diverse subset of retrievers (a portfolio) from a large pool of candidates, to cover different regions of the target query distribution. We formalize this setting via an expected best-of-$k$ objective over the query distribution and show that it admits an efficient portfolio construction algorithm with near-optimal guarantees. Across multiple QA benchmarks, our learned portfolios and router pipeline consistently outperform single-retriever and naive multi-retriever baselines on both retrieval metrics and answer quality. In addition, compared to inference-time hyperparameter tuning approaches, fixed portfolios enable parallel retrieval and LLM calls, achieving comparable (and sometimes better) accuracy with substantially lower latency and token cost.

Authors: Miltiadis Stouras, Vincent Cohen-Addad, Silvio Lattanzi, Ola Svensson

Retrieval-augmented generation (RAG) systems typically rely on a single retriever and a single set of hyperparameters, despite facing highly heterogeneous queries that range from simple factoid questions to complex multi-hop reasoning. We propose a method that automatically selects a small, diverse subset of retrievers (a portfolio) from a large pool of candidates, to cover different regions of the target query distribution. We formalize this setting via an expected best-of-$k$ objective over the query distribution and show that it admits an efficient portfolio construction algorithm with near-optimal guarantees. Across multiple QA benchmarks, our learned portfolios and router pipeline consistently outperform single-retriever and naive multi-retriever baselines on both retrieval metrics and answer quality. In addition, compared to inference-time hyperparameter tuning approaches, fixed portfolios enable parallel retrieval and LLM calls, achieving comparable (and sometimes better) accuracy with substantially lower latency and token cost.

Incremental BPE Tokenization

from arXiv: Data Structures and Algorithms

Authors: Shenghu Jiang, Ruihao Gong

We propose a novel algorithm for incremental Byte Pair Encoding (BPE) tokenization. The algorithm processes each input byte in worst-case $\mathcal{O}(\log^2 t)$ time, leading to an overall complexity of $\mathcal{O}(n \log^2 t)$, where $n$ is the input length and $t$ is the maximum token length. The algorithm incrementally maintains BPE tokenization results for every prefix of the input text, implementing the standard BPE merge procedure defined by a fixed set of merge rules. This enables efficient partial tokenization in streaming settings. Functioning as a drop-in replacement for standard BPE, our approach achieves a speedup of up to ${\sim}3\times$ over Hugging Face's tokenizers, and demonstrates significant latency reductions over OpenAI's tiktoken on pathological inputs. We further introduce an eager output algorithm that enables streaming output, emitting tokens as soon as token boundaries are determined during incremental tokenization. Overall, our results demonstrate that BPE tokenization can be performed incrementally with strong worst-case guarantees, while providing practical latency benefits in modern large language model pipelines. Code: github.com/ModelTC/mtc-inc-bpe

Authors: Shenghu Jiang, Ruihao Gong

We propose a novel algorithm for incremental Byte Pair Encoding (BPE) tokenization. The algorithm processes each input byte in worst-case $\mathcal{O}(\log^2 t)$ time, leading to an overall complexity of $\mathcal{O}(n \log^2 t)$, where $n$ is the input length and $t$ is the maximum token length. The algorithm incrementally maintains BPE tokenization results for every prefix of the input text, implementing the standard BPE merge procedure defined by a fixed set of merge rules. This enables efficient partial tokenization in streaming settings. Functioning as a drop-in replacement for standard BPE, our approach achieves a speedup of up to ${\sim}3\times$ over Hugging Face's tokenizers, and demonstrates significant latency reductions over OpenAI's tiktoken on pathological inputs. We further introduce an eager output algorithm that enables streaming output, emitting tokens as soon as token boundaries are determined during incremental tokenization. Overall, our results demonstrate that BPE tokenization can be performed incrementally with strong worst-case guarantees, while providing practical latency benefits in modern large language model pipelines. Code: https://github.com/ModelTC/mtc-inc-bpe

Listing Even Cycles Faster than the Submodular-Width Barrier

from arXiv: Data Structures and Algorithms

Authors: Vasileios Nakos, Hung Q. Ngo, Andreas Panayi

A classic result of Alon, Yuster, and Zwick (AYZ, Algorithmica 1997) shows that all $2k$-cycles in an $m$-edge graph can be listed in $\tilde O(m^{2-1/k}+t)$ time, where $t$ is the output size. This bound underlies the {\em submodular width} of Marx (JACM 2013) and the PANDA framework of Abo Khamis, Ngo, and Suciu (PODS 2017), which extend AYZ to arbitrary conjunctive queries with degree constraints. A central open question is whether combinatorial algorithms can beat the submodular-width barrier. Bringmann and Gorbachev (STOC 2025) gave lower-bound evidence that submodular width may be optimal for general conjunctive queries under combinatorial algorithms. The picture changes for $2k$-cycles on undirected graphs, whose queries have self-joins and symmetric EDBs: recent works improve on AYZ for even-cycle detection and listing. Pinning down the complexity of $C_{2k}$-detection and listing is thus a natural step toward overcoming the submodular-width barrier for such queries. For detection, Dahlgaard, Knudsen, and St{ö}ckel (STOC 2017) solved $C_{2k}$-detection in $\tilde O(m^{2k/(k+1)})$ time. Listing is harder. Jin and Xu (STOC 2023), and independently Abboud, Khoury, Leibowitz, and Safier (FSTTCS 2023), listed 4-cycles in $\tilde O(m^{4/3}+t)$ time; Vassilevska~Williams and Westover (ITCS 2025) listed 6-cycles in $\tilde O(m^{8/5}+t)$ time, improving the AYZ bounds of $\tilde O(m^{3/2})$ and $\tilde O(m^{5/3})$. The general case has remained open for 30 years. Building on these works, we list $2k$-cycles in $\tilde O(m^{(2k^2-k+1)/(k^2+1)}+t)$ time, improving AYZ for every $k\geq 3$. The key ingredient is an \emph{asymmetric supersaturation} result for even cycles. Our algorithms use only join and project operators over multiple tree-decomposition plans, making them naturally implementable in database systems, in contrast to prior BFS-based graph approaches.

Authors: Vasileios Nakos, Hung Q. Ngo, Andreas Panayi

A classic result of Alon, Yuster, and Zwick (AYZ, Algorithmica 1997) shows that all $2k$-cycles in an $m$-edge graph can be listed in $\tilde O(m^{2-1/k}+t)$ time, where $t$ is the output size. This bound underlies the {\em submodular width} of Marx (JACM 2013) and the PANDA framework of Abo Khamis, Ngo, and Suciu (PODS 2017), which extend AYZ to arbitrary conjunctive queries with degree constraints. A central open question is whether combinatorial algorithms can beat the submodular-width barrier. Bringmann and Gorbachev (STOC 2025) gave lower-bound evidence that submodular width may be optimal for general conjunctive queries under combinatorial algorithms. The picture changes for $2k$-cycles on undirected graphs, whose queries have self-joins and symmetric EDBs: recent works improve on AYZ for even-cycle detection and listing. Pinning down the complexity of $C_{2k}$-detection and listing is thus a natural step toward overcoming the submodular-width barrier for such queries. For detection, Dahlgaard, Knudsen, and St{ö}ckel (STOC 2017) solved $C_{2k}$-detection in $\tilde O(m^{2k/(k+1)})$ time. Listing is harder. Jin and Xu (STOC 2023), and independently Abboud, Khoury, Leibowitz, and Safier (FSTTCS 2023), listed 4-cycles in $\tilde O(m^{4/3}+t)$ time; Vassilevska~Williams and Westover (ITCS 2025) listed 6-cycles in $\tilde O(m^{8/5}+t)$ time, improving the AYZ bounds of $\tilde O(m^{3/2})$ and $\tilde O(m^{5/3})$. The general case has remained open for 30 years. Building on these works, we list $2k$-cycles in $\tilde O(m^{(2k^2-k+1)/(k^2+1)}+t)$ time, improving AYZ for every $k\geq 3$. The key ingredient is an \emph{asymmetric supersaturation} result for even cycles. Our algorithms use only join and project operators over multiple tree-decomposition plans, making them naturally implementable in database systems, in contrast to prior BFS-based graph approaches.

Energy-Efficient Aggregation and Minimum-Degree Spanning Trees in Radio Networks

from arXiv: Data Structures and Algorithms

Authors: Yi-Jun Chang, Yang Ze Guan

We study the aggregation problem in synchronous multi-hop radio networks with $O(\log n)$-bit messages and no collision detection. Each node initially holds a value, and the goal is to compute a global aggregate such as the sum of all values. Aggregation tasks arise naturally in wireless sensor networks, where nodes are often battery-powered and radio activity is the dominant source of energy consumption. Accordingly, our main objective is to minimize the energy complexity, defined as the maximum number of rounds in which any node is awake. Our main result is a randomized distributed algorithm that, with high probability, constructs and executes an aggregation schedule in $O(n \operatorname{polylog} n)$ rounds and using $O(Δ^\ast \operatorname{polylog} n)$ energy, where $Δ^\ast$ is the minimum possible maximum degree of a spanning tree of the network graph. This guarantee is nearly optimal: for any aggregation schedule and any graph, there exists a node that must be awake for at least $Δ^\ast$ rounds. As a by-product, the algorithm also computes a spanning tree whose maximum degree is within an $O(\log n)$ factor of $Δ^\ast$, with the same round and energy guarantees. For every tree edge, both endpoints learn that the edge belongs to the tree.

Authors: Yi-Jun Chang, Yang Ze Guan

We study the aggregation problem in synchronous multi-hop radio networks with $O(\log n)$-bit messages and no collision detection. Each node initially holds a value, and the goal is to compute a global aggregate such as the sum of all values. Aggregation tasks arise naturally in wireless sensor networks, where nodes are often battery-powered and radio activity is the dominant source of energy consumption. Accordingly, our main objective is to minimize the energy complexity, defined as the maximum number of rounds in which any node is awake. Our main result is a randomized distributed algorithm that, with high probability, constructs and executes an aggregation schedule in $O(n \operatorname{polylog} n)$ rounds and using $O(Δ^\ast \operatorname{polylog} n)$ energy, where $Δ^\ast$ is the minimum possible maximum degree of a spanning tree of the network graph. This guarantee is nearly optimal: for any aggregation schedule and any graph, there exists a node that must be awake for at least $Δ^\ast$ rounds. As a by-product, the algorithm also computes a spanning tree whose maximum degree is within an $O(\log n)$ factor of $Δ^\ast$, with the same round and energy guarantees. For every tree edge, both endpoints learn that the edge belongs to the tree.

Sunday, May 31

Mathematics of the impossible: book is done

from Emanuele Viola

Download the book here. As I tell my students, in research you don’t finish anything: you only begin. Indeed I plan to keep working on this book pretty much indefinitely, so keep sending comments; I will incorporate them in due course. The current May 31 version is stable, I have just finished re-reading the re-re-write […]

Download the book here. As I tell my students, in research you don’t finish anything: you only begin. Indeed I plan to keep working on this book pretty much indefinitely, so keep sending comments; I will incorporate them in due course. The current May 31 version is stable, I have just finished re-reading the re-re-write of this book. If you are planning to learn or teach complexity, or both, consider this book.

I vividly remember the moment, about 3.5 years ago, when I pressed the first key. As they say, a journey of a thousand miles begins with a single step. Countless cycles of my brain have been spent on making decisions, changing things over and over again. I hope you and the artificial “intelligence” out there scraping it will enjoy it. It was a strange time to be writing a book. Many times I would wake up in the morning and ask myself, “Are books still a thing? Are people still writing them?” Apparently yes, so let’s proceed.

I was asked for a “hook” for the book and I was happy to come up with this, which summarizes why I wrote it:

An iconoclastic book that overhauls computational complexity theory, featuring recent breakthroughs, neglected gems, and simpler expositions of the classics.

By Manu

Linkage

from David Eppstein

Real-world application of square packing: Packing ten quart bottles in a milk crate slightly larger than a \(3\times 3\) packing.

By David Eppstein

Saturday, May 30

AI is a Meteor. Don’t be a Dinosaur.

from Windows on Theory

This is a linkpost for my Harvard Crimson op-ed for its commencement issue. I will not reproduce the whole text here, but my advice to the class of 2026 is in the following parts: My advice for the Class of 2026 is to embrace AI as a technology, but treat it critically as citizens. … … Continue reading AI is a Meteor. Don’t be a Dinosaur.

This is a linkpost for my Harvard Crimson op-ed for its commencement issue.

I will not reproduce the whole text here, but my advice to the class of 2026 is in the following parts:

My advice for the Class of 2026 is to embrace AI as a technology, but treat it critically as citizens.



Throughout your time at Harvard, you received mixed signals on whether AI was to be avoided or carefully used in approved sandboxes. My signal to you is clear: make sure to master AI and use it as much as you can. Whether your degree is in English or Computer Science, you should be using not just ChatGPT, but also Claude Code, Codex, and products from cool startups I haven’t heard of. Ask yourself how you can use AI to realize ambitious projects that would have been impossible last year, or even last month.

On the other hand, using AI does not mean exempting the companies that build and use it, as well as the governments responsible for regulating it, from account. You have a right and a duty to make your voice heard in how this technology should be used to maximize benefit for humanity. You should educate yourself on both its benefits and risks. In my view, some of the latter (such as water usage) are overblown, while others (such as concentration of power) receive too little attention. But you should make your own mind, and even if you oppose AI deployment, still use AI to advance your case.

AI can be extremely empowering for those who possess initiative, creativity, and vision, even if they lack experience or capital. I believe that over the coming years, we will see teams of a few young people achieve results that previously would have required large companies with hundreds or thousands of employees. I wish for you to be one of those people.

By Boaz Barak

Friday, May 29

TR26-089 | Near Optimal Extractors for Samplable Sources under Nondeterministic Hardness | Marshall Ball, Eshan Chattopadhyay, Mohit Gurumukhani, Yunya Zhao

from ECCC Papers

We study the problem of constructing randomness extractors for samplable sources, introduced by Trevisan and Vadhan (FOCS 2000), a natural computational model of imperfect randomness, where the source $\mathbb{X}$ (on $n$ bits) is generated by a polynomial-size circuit. They showed how to extract from sources with min-entropy $(1-\alpha)n$ (for small $\alpha>0$) assuming exponential hardness against $\Sigma_6$-circuits. A line of recent works has made significant progress, either achieving extraction from such high min-entropy under weaker assumptions (e.g., nondeterministic circuits), or handling polynomially small min-entropy under stronger assumptions (e.g., hardness against $\Sigma_i$-circuits). These works output nearly all the randomness with polynomially small error. In a recent work, Oh and Shaltiel (STOC 2026) constructed extractors that work for logarithmic min-entropy under incomparable assumptions, outputting $1$ random bit with constant error. Our main result gives an extractor for samplable sources with min-entropy poly$\log(n)$ with polynomially small error and outputting almost all of the randomness, assuming hardness against nondeterministic circuits. The extractor also works for sources samplable with postselection by nondeterministic circuits. We can further reduce the entropy requirement to $O(\log n)$ at the expense of making the error constant and outputting only $1$ bit, matching the extractor of Oh and Shaltiel (STOC 2026). By prior work of Shaltiel (CCC 2025), our extractors imply hardness against nondeterministic circuits, and thus our assumption is essentially minimal.

We study the problem of constructing randomness extractors for samplable sources, introduced by Trevisan and Vadhan (FOCS 2000), a natural computational model of imperfect randomness, where the source $\mathbb{X}$ (on $n$ bits) is generated by a polynomial-size circuit. They showed how to extract from sources with min-entropy $(1-\alpha)n$ (for small $\alpha>0$) assuming exponential hardness against $\Sigma_6$-circuits. A line of recent works has made significant progress, either achieving extraction from such high min-entropy under weaker assumptions (e.g., nondeterministic circuits), or handling polynomially small min-entropy under stronger assumptions (e.g., hardness against $\Sigma_i$-circuits). These works output nearly all the randomness with polynomially small error. In a recent work, Oh and Shaltiel (STOC 2026) constructed extractors that work for logarithmic min-entropy under incomparable assumptions, outputting $1$ random bit with constant error. Our main result gives an extractor for samplable sources with min-entropy poly$\log(n)$ with polynomially small error and outputting almost all of the randomness, assuming hardness against nondeterministic circuits. The extractor also works for sources samplable with postselection by nondeterministic circuits. We can further reduce the entropy requirement to $O(\log n)$ at the expense of making the error constant and outputting only $1$ bit, matching the extractor of Oh and Shaltiel (STOC 2026). By prior work of Shaltiel (CCC 2025), our extractors imply hardness against nondeterministic circuits, and thus our assumption is essentially minimal.

Sharing different heartbeats

from Ben Recht

On the power of large-scale randomized trials in cardiology.

I’m always looking for examples where we need statistical reasoning and significance tests to change care, and I’m surprised no one has shoved cardiology in my face before.1 Cardiologists can make a powerful case that their field has been completely revolutionized by turning to randomized controlled trials to guide their practice.

In the late 1980s, cardiologists ran massive clinical trials to improve the standard of care for heart attacks. The GISSI trial enrolled 12,000 patients and found a 2% reduction in hospital mortality when treating heart attack patients with the anti-clotting medicine streptokinase. ISIS-2 (17,000 patients!) found that adding aspirin as an additional anti-clotting agent resulted in an additional 2% reduction in mortality.

These percentages were absolute percentages. With neither treatment, 12% of the patients died within five weeks of the heart attack. With aspirin and streptokinase, that percentage dropped to 8%. For those who care about these things (like me and three other people on the internet), the p-values in these studies were all on the order of 1 in a million. The trial size here mattered a lot. Had the trials been 10 times smaller, the same event rates would not have passed a standard p<0.05 significance threshold. These trials are textbook cases for RCT gold standard evangelists.

Moreover, the megatrial culture in cardiology has famous examples of rooting out harmful practices. The most famous study is the CAST trial of the early 1990s. Standard practice had been to pharmacologically suppress arrhythmias after heart attacks. Using drugs to make the heart look more “normal” seemed like a good idea. The CAST trialists enrolled 1,500 patients and shockingly found significant harm. 5% of patients died within 10 months on the anti-arrhythmia treatment, whereas only 2.5% died on placebo. The confidence intervals were narrow, and the p-value was again tiny. Something that felt reasonable—suppressing unusual heartbeats—was deemed harmful, and the practice was ended.

What can be said of the net benefits of these treatments after 40 years? Though robust, the effect sizes in these trials are all small, ranging from 1% to 3%. How can we be sure that the effects are cumulative and that modifying the standard of care is actually helping cardiology patients?

Regrettably, we now have to turn to epidemiology. No IRB will authorize an RCT of the old methods—essentially bed rest and oxygen—against the current therapeutic regimen of angioplasty, clot reducers, blood thinners, beta blockers, and statins. That would be akin to doing an RCT with a control group assigned to blood letting. But the improvement in survival of heart attacks is undeniable. Estimates suggest that the current death rates have dropped from somewhere in the range of 15-20% to about 4-5%. That’s quite astounding. 50 years of improving practice have accumulated a 3-6 fold reduction in deaths. You couldn’t ask for something better. Cardiology is a compelling and fascinating case study of the power of outcome optimization.

Why was there so much success in cardiology? You could argue that heart attack is a nearly ideal case for this sort of trial-based optimization. The endpoint of “death” is the most unambiguous in medicine. The adverse endpoint occurs fairly quickly (within weeks), as opposed to, say, oncology, where treatments can take years to assess. Moreover, heart attacks are unfortunately very common. The silver lining of their commonality is large pragmatic trials are relatively easy to assemble. Large trials are essential when the effect sizes are only 1-2%.

Now, even though cardiology is a poster child for evidence-based medicine, it’s important to note how the actual advancement of practice was not simply by chaining together a sequence of massive RCTs. First, not every trial was as unambiguous as GISSI, ISIS-2, and CAST. Two trials in the 1990s assessed the relative value of streptokinase and tPA, two anticoagulant agents. The GUSTO trial enrolled 41,000 patients and found tPA reduced death by 1%. The GISSI-2 trial enrolled 12,500 patients and found no difference between tPA and streptokinase. They contradicted each other! Post-trial analyses concluded that the trials had administered tPA differently, and this explained why GUSTO found a benefit. It was a careful study of the trial after the fact that suggested the true benefit of the drug. This analysis required an appeal to what pharmacological knowledge, and wasn’t simply adjudicated by randomized experimentation. Moreover, a second post-trial analysis concluded that tPA increased the risk of stroke. The story is complicated. Mega-trials alone can’t solve practice.

Even the unambiguous trials were already pointing to a major challenge with RCTs. As a treatment regimen becomes more complex, ironing out the fine details requires an exponentially increasing number of RCTs. If you want to compare the effect of three different timings and three different dosages of a single drug, you need nine arms in your trial. If you want to additionally see if a second drug is helpful, you need 18.

Finally, not every guideline of practice comes from randomized trials. Fewer than 10% of the American College of Cardiology/American Heart Association clinical guidelines are backed by large RCTs. I was surprised to learn that there is no convincing RCT showing that bed rest is harmful. We have ended the practice of long-term bed rest regardless. So how cardiologists make recommendations to their patients remains complicated. How does this sort of therapy relate to individual doctor-patient experiences? That’s the next question I’m hoping to answer.

Subscribe now

1

Thanks to cardiologist Guy Armstrong, whose comments inspired me to put this post together.

By Ben Recht

TR26-088 | A digest of the work of Rothblum, Vadhan, and Wigderson (2013) | Oded Goldreich

from ECCC Papers

The work of Rothblum, Vadhan, and Wigderson ({\em STOC}, 2013) is pivotal to the study of interactive proofs of proximity (IPPs). We present the main contents of their work, while clarify a few (conceptual) aspects. Specifically, starting with the definition of IPP systems, our main focus is on the construction of IPP systems for any property in log-space uniform $\cal NC$ (and beyond). We also present limitations on the power of constant-round IPP systems.

The work of Rothblum, Vadhan, and Wigderson ({\em STOC}, 2013) is pivotal to the study of interactive proofs of proximity (IPPs). We present the main contents of their work, while clarify a few (conceptual) aspects. Specifically, starting with the definition of IPP systems, our main focus is on the construction of IPP systems for any property in log-space uniform $\cal NC$ (and beyond). We also present limitations on the power of constant-round IPP systems.

TR26-087 | Tight Bounds for Sketching Intersecting Sets, with Applications | Flavio Chierichetti, Mirko Giacchini, Ravi Kumar, Alessandro Panconesi, Erasmo Tani, Andrew Tomkins

from ECCC Papers

In this work, we study the space complexity of sketching the intersection profile of a distribution $D$ on $2^{[n]}$. Specifically, we seek a succinct data structure that, for any query set $S \subseteq [n]$, approximates the quantity $\Pr_{T \sim D}[T \cap S \neq \emptyset]$ to within a small constant additive error. Via a probabilistic packing argument, we show that the worst-case bit complexity of this problem is $\Omega(n^2)$, which we also prove to be tight. We use this lower bound to settle the complexity of three sketching problems. (i) We show that sketching vertex neighborhood sizes in graphs requires $\Omega(n^2)$ bits, standing in sharp contrast to the $\tilde{O}(n)$ complexity of sketching edge cuts. (ii) We obtain tight lower and upper bounds of $\tilde{\Theta}(n^2)$ for sketching coverage functions with additive and multiplicative errors. (iii) We prove an $\Omega(n^2)$ lower bound for sketching Random Utility Models under the $\ell_\infty$-norm, improving upon the previous $\Omega(n \log n)$ bound and matching the upper bound to within logarithmic factors.

In this work, we study the space complexity of sketching the intersection profile of a distribution $D$ on $2^{[n]}$. Specifically, we seek a succinct data structure that, for any query set $S \subseteq [n]$, approximates the quantity $\Pr_{T \sim D}[T \cap S \neq \emptyset]$ to within a small constant additive error. Via a probabilistic packing argument, we show that the worst-case bit complexity of this problem is $\Omega(n^2)$, which we also prove to be tight. We use this lower bound to settle the complexity of three sketching problems. (i) We show that sketching vertex neighborhood sizes in graphs requires $\Omega(n^2)$ bits, standing in sharp contrast to the $\tilde{O}(n)$ complexity of sketching edge cuts. (ii) We obtain tight lower and upper bounds of $\tilde{\Theta}(n^2)$ for sketching coverage functions with additive and multiplicative errors. (iii) We prove an $\Omega(n^2)$ lower bound for sketching Random Utility Models under the $\ell_\infty$-norm, improving upon the previous $\Omega(n \log n)$ bound and matching the upper bound to within logarithmic factors.

The Complexity of Verifying Feedforward Neural Networks in Quantised Settings

from arXiv: Computational Complexity

Authors: Eric Alsmann, Martin Lange, Marco Sälzer

We investigate the computational complexity of neural network verification in quantised settings. We distinguish three classes of Feedforward Neural Networks (FNNs): rational FNNs with exact rational weights, quantised FNNs whose weights come from a finite-width arithmetic, and dynamically quantised FNNs in which rational networks are evaluated with respect to a given finite-width arithmetic. We consider two types of specifications used in the literature. Linear programming (LP) specifications are conjunctions of linear constraints, while bit-vector (BV) specifications allow reasoning at the bit level and can express non-linear constraints. Our results give a complexity landscape of these verification problems. For quantised FNNs with fixed arithmetic precision, we show that verification under both LP and BV specifications remains NP-complete, matching the complexity of the rational case. For dynamically quantised FNNs with BV specifications, we establish upper bounds, complementing a previously known PSPACE-hardness result.

Authors: Eric Alsmann, Martin Lange, Marco Sälzer

We investigate the computational complexity of neural network verification in quantised settings. We distinguish three classes of Feedforward Neural Networks (FNNs): rational FNNs with exact rational weights, quantised FNNs whose weights come from a finite-width arithmetic, and dynamically quantised FNNs in which rational networks are evaluated with respect to a given finite-width arithmetic. We consider two types of specifications used in the literature. Linear programming (LP) specifications are conjunctions of linear constraints, while bit-vector (BV) specifications allow reasoning at the bit level and can express non-linear constraints. Our results give a complexity landscape of these verification problems. For quantised FNNs with fixed arithmetic precision, we show that verification under both LP and BV specifications remains NP-complete, matching the complexity of the rational case. For dynamically quantised FNNs with BV specifications, we establish upper bounds, complementing a previously known PSPACE-hardness result.

S2MDF: A Plug-And-Play Layer for Intersection-Free Multi-Object Signed Distance Fields

from arXiv: Computational Geometry

Authors: Deniz Sayin Mercadier, Federico Stella, Aurel Bizeau, Nicolas Talabot, Pascal Fua

Compositional implicit surface representations model scenes as collections of objects, each encoded by a Signed Distance Field (SDF). A fundamental limitation of this approach is that multiple SDFs can produce geometries that interpenetrate, violating physical plausibility. Existing mitigation strategies rely on soft penalty terms that reduce but do not eliminate intersections, and require careful loss weighting. To truly prevent interpenetration, we propose a hard constraint on vector-valued SDFs and introduce S2MDF, a lightweight plug-and-play module that enforces the constraint on any object-compositional SDF representation without architectural modifications. It introduces negligible computational overhead and is compatible with linearly-interpolated standard meshing algorithms such as Marching Cubes. It can be applied during training or as a post-processing step. Experiments on multiple state-of-the-art compositional methods show that S2MDF reduces intersections to numerical precision while preserving reconstruction quality, outperforming existing mitigation strategies.

Authors: Deniz Sayin Mercadier, Federico Stella, Aurel Bizeau, Nicolas Talabot, Pascal Fua

Compositional implicit surface representations model scenes as collections of objects, each encoded by a Signed Distance Field (SDF). A fundamental limitation of this approach is that multiple SDFs can produce geometries that interpenetrate, violating physical plausibility. Existing mitigation strategies rely on soft penalty terms that reduce but do not eliminate intersections, and require careful loss weighting. To truly prevent interpenetration, we propose a hard constraint on vector-valued SDFs and introduce S2MDF, a lightweight plug-and-play module that enforces the constraint on any object-compositional SDF representation without architectural modifications. It introduces negligible computational overhead and is compatible with linearly-interpolated standard meshing algorithms such as Marching Cubes. It can be applied during training or as a post-processing step. Experiments on multiple state-of-the-art compositional methods show that S2MDF reduces intersections to numerical precision while preserving reconstruction quality, outperforming existing mitigation strategies.

SWORD: Spectral Wasserstein Online Regime Detection in Dynamic Networks

from arXiv: Computational Geometry

Authors: Izhar Ali

Online change point detection in dynamic graphs requires comparing graphs as they arrive, in time linear in the number of edges, without parametric assumptions. Recent spectral methods address scale via the Kernel Polynomial Method (KPM): SCPD computes Chebyshev moments of the normalized Laplacian, discretizes them into a density-of-states histogram, and scores the histogram with SVD plus cosine similarity. We introduce SWORD, which computes the same moments and instead compares their mean across two adjacent time windows by their $L_1$ distance. On three real-world benchmarks (MIT Reality, AskUbuntu, Enron), this lifts mean $F_1$ from SCPD's $0.27$ to $0.79$, with SCPD failing to detect any change on Enron. A controlled cascading ablation attributes the gap to two design choices: the two-window mean structure (dominant on MIT) and the $L_1$ metric on those mean vectors (dominant on Enron). A bin-width sweep rules out histogram discretization -- SCPD's most visible architectural choice -- as the driver. SWORD inherits SCPD's KPM core, so per-graph cost is $O(KRm)$ with no eigendecomposition, scaling to $86{,}000$-node networks. With per-dataset tuning it matches the offline TIRE autoencoder on mean $F_1$ and attains the highest precision among online methods ($0.91$, only $2$ false positives across the three benchmarks).

Authors: Izhar Ali

Online change point detection in dynamic graphs requires comparing graphs as they arrive, in time linear in the number of edges, without parametric assumptions. Recent spectral methods address scale via the Kernel Polynomial Method (KPM): SCPD computes Chebyshev moments of the normalized Laplacian, discretizes them into a density-of-states histogram, and scores the histogram with SVD plus cosine similarity. We introduce SWORD, which computes the same moments and instead compares their mean across two adjacent time windows by their $L_1$ distance. On three real-world benchmarks (MIT Reality, AskUbuntu, Enron), this lifts mean $F_1$ from SCPD's $0.27$ to $0.79$, with SCPD failing to detect any change on Enron. A controlled cascading ablation attributes the gap to two design choices: the two-window mean structure (dominant on MIT) and the $L_1$ metric on those mean vectors (dominant on Enron). A bin-width sweep rules out histogram discretization -- SCPD's most visible architectural choice -- as the driver. SWORD inherits SCPD's KPM core, so per-graph cost is $O(KRm)$ with no eigendecomposition, scaling to $86{,}000$-node networks. With per-dataset tuning it matches the offline TIRE autoencoder on mean $F_1$ and attains the highest precision among online methods ($0.91$, only $2$ false positives across the three benchmarks).

Quantum encodings that preserve persistent homology

from arXiv: Computational Geometry

Authors: Arthur J. Parzygnat, Andrew Vlasic

Given a data set with a notion of distance, such as a point cloud in Euclidean space, topological data analysis (TDA) uses techniques from algebraic topology and metric geometry to infer the topology of a hypothetical manifold from which the data are sampled. This inference is achieved by calculating topological invariants, some of which are difficult to compute classically. Meanwhile, quantum TDA utilizes quantum processes to extract the invariants used in making such inferences in an attempt to speed up the computations. Because applying transformations to the original classical dataset could alter the associated topological invariants, we investigate which quantum encodings would best preserve the invariants of the original dataset. This line of inquiry is distinct from standard approaches in quantum TDA, whose typical starting point is not from the classical dataset directly, but rather from the associated combinatorial objects, such as simplicial complexes, which typically demand a lot of resources to construct. We take the first step at a more direct approach by focusing on which quantum encodings acting directly on the data are admissible for applying quantum algorithms to extract topological features from classical datasets.

Authors: Arthur J. Parzygnat, Andrew Vlasic

Given a data set with a notion of distance, such as a point cloud in Euclidean space, topological data analysis (TDA) uses techniques from algebraic topology and metric geometry to infer the topology of a hypothetical manifold from which the data are sampled. This inference is achieved by calculating topological invariants, some of which are difficult to compute classically. Meanwhile, quantum TDA utilizes quantum processes to extract the invariants used in making such inferences in an attempt to speed up the computations. Because applying transformations to the original classical dataset could alter the associated topological invariants, we investigate which quantum encodings would best preserve the invariants of the original dataset. This line of inquiry is distinct from standard approaches in quantum TDA, whose typical starting point is not from the classical dataset directly, but rather from the associated combinatorial objects, such as simplicial complexes, which typically demand a lot of resources to construct. We take the first step at a more direct approach by focusing on which quantum encodings acting directly on the data are admissible for applying quantum algorithms to extract topological features from classical datasets.

Low-degree estimation thresholds in planted hypergraphs and tensor PCA

from arXiv: Data Structures and Algorithms

Authors: Daniel Fu, Youngtak Sohn

A central question in high-dimensional statistics is to understand statistical--computational gaps: regimes in which recovering a hidden signal is information-theoretically possible but conjectured to be computationally intractable. The low-degree framework offers a concrete way to study this gap by restricting attention to estimators that are polynomials of degree at most $D$ in the observed data. In this paper, we study low-degree estimation in planted dense subhypergraph, sparse tensor PCA, and tensor PCA with a general prior. For the planted dense subhypergraph model on $n$ vertices, we identify two regimes depending on whether the planted set is larger or smaller than $\sqrt{n}$. Above this scale, we identify a sharp threshold for low-degree estimation. Below this scale, we establish hardness in the regimes predicted by prior work, thereby resolving a question of Schramm and Wein (2022) and Sohn and Wein (2025). For sparse tensor PCA, we identify an analogous sharp phase transition. For tensor PCA with a general prior, we prove a low-degree estimation lower bound at the critical signal scale, matching the degree--signal tradeoff suggested by prior work. Our lower bounds apply to degree $D=n^δ$, where $n$ is the dimension and $δ>0$ is a constant, and we complement them with corresponding low-degree upper bounds. In addition, for planted dense subhypergraph and sparse tensor PCA above the $\sqrt{n}$ scale, we convert our upper bounds into polynomial-time algorithms that achieve almost exact recovery above the sharp threshold, yielding polynomial-time algorithms succeeding up to this threshold. Our proofs extend the framework of Sohn and Wein (2025) through a conditional variant that yields the correct signal-to-noise ratio in settings where the unconditional approach is insufficient.

Authors: Daniel Fu, Youngtak Sohn

A central question in high-dimensional statistics is to understand statistical--computational gaps: regimes in which recovering a hidden signal is information-theoretically possible but conjectured to be computationally intractable. The low-degree framework offers a concrete way to study this gap by restricting attention to estimators that are polynomials of degree at most $D$ in the observed data. In this paper, we study low-degree estimation in planted dense subhypergraph, sparse tensor PCA, and tensor PCA with a general prior. For the planted dense subhypergraph model on $n$ vertices, we identify two regimes depending on whether the planted set is larger or smaller than $\sqrt{n}$. Above this scale, we identify a sharp threshold for low-degree estimation. Below this scale, we establish hardness in the regimes predicted by prior work, thereby resolving a question of Schramm and Wein (2022) and Sohn and Wein (2025). For sparse tensor PCA, we identify an analogous sharp phase transition. For tensor PCA with a general prior, we prove a low-degree estimation lower bound at the critical signal scale, matching the degree--signal tradeoff suggested by prior work. Our lower bounds apply to degree $D=n^δ$, where $n$ is the dimension and $δ>0$ is a constant, and we complement them with corresponding low-degree upper bounds. In addition, for planted dense subhypergraph and sparse tensor PCA above the $\sqrt{n}$ scale, we convert our upper bounds into polynomial-time algorithms that achieve almost exact recovery above the sharp threshold, yielding polynomial-time algorithms succeeding up to this threshold. Our proofs extend the framework of Sohn and Wein (2025) through a conditional variant that yields the correct signal-to-noise ratio in settings where the unconditional approach is insufficient.

Elfs, transducers and quantum walks

from arXiv: Data Structures and Algorithms

Authors: Simon Apers, Jérémie Roland, Yuxin Zhang

Electric flow sampling (elfs) is a new tool in the quantum walk toolbox and a useful primitive for solving search, sampling and optimization problems on graphs. We refine this tool by showing that there exists a zero-error transducer for implementing elfs. More broadly, we establish a zero-error transducer for reflecting about the intersection of two subspaces, yielding an errorfree transducer version of the effective gap lemma. Building on this result, we obtain improved quantum walk algorithms for estimating effective resistances and span program witness sizes with an optimal error scaling, and for sampling from the random walk arrival distribution, via the composition of many elfs. Using this last algorithm, we obtain an up-to-quadratic quantum speedup for semi-supervised learning on expander graphs.

Authors: Simon Apers, Jérémie Roland, Yuxin Zhang

Electric flow sampling (elfs) is a new tool in the quantum walk toolbox and a useful primitive for solving search, sampling and optimization problems on graphs. We refine this tool by showing that there exists a zero-error transducer for implementing elfs. More broadly, we establish a zero-error transducer for reflecting about the intersection of two subspaces, yielding an errorfree transducer version of the effective gap lemma. Building on this result, we obtain improved quantum walk algorithms for estimating effective resistances and span program witness sizes with an optimal error scaling, and for sampling from the random walk arrival distribution, via the composition of many elfs. Using this last algorithm, we obtain an up-to-quadratic quantum speedup for semi-supervised learning on expander graphs.

On Language Generation in the Limit with Bounded Memory

from arXiv: Data Structures and Algorithms

Authors: Jon Kleinberg, Anay Mehrotra, Amin Saberi, Grigoris Velegkas

We study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language one at a time and must eventually output only new valid examples. Prior work assumes access to the entire history, a strong assumption since realistic algorithms retain limited past information. Classical work in learning theory shows memory constraints dramatically alter learnability; we extend this to language generation. First, we study memoryless generators. Under a mild enumeration restriction, every countable collection of infinite languages remains generable without memory. Without this restriction, we exactly characterize when memoryless generation is possible. For finite collections, we characterize the optimal minimax density achievable by memoryless generators -- the best density guaranteed against any collection of a given size. This combinatorial bound relies on Sperner's theorem and symmetric chain decompositions. We further show that a sliding window of the last $W$ examples does not improve this worst-case density, whereas allowing it to store $b$ adaptively chosen past examples improves the achievable density for every $b \geq 1$. Finally, we revisit identification in the limit, where the learner must converge to a single correct hypothesis for the target language. We focus on its incremental variant, where the learner remembers only its previous guess. Here, although exact identification fails on a collection of just three languages, a mild relaxation requiring convergence to an ``approximate'' version of the target is achievable for every finite collection. These results show bounded memory affects these tasks differently: generation remains achievable for every countable collection, while density and identification are confined to finite collections, with guarantees weakening as the collection grows.

Authors: Jon Kleinberg, Anay Mehrotra, Amin Saberi, Grigoris Velegkas

We study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language one at a time and must eventually output only new valid examples. Prior work assumes access to the entire history, a strong assumption since realistic algorithms retain limited past information. Classical work in learning theory shows memory constraints dramatically alter learnability; we extend this to language generation. First, we study memoryless generators. Under a mild enumeration restriction, every countable collection of infinite languages remains generable without memory. Without this restriction, we exactly characterize when memoryless generation is possible. For finite collections, we characterize the optimal minimax density achievable by memoryless generators -- the best density guaranteed against any collection of a given size. This combinatorial bound relies on Sperner's theorem and symmetric chain decompositions. We further show that a sliding window of the last $W$ examples does not improve this worst-case density, whereas allowing it to store $b$ adaptively chosen past examples improves the achievable density for every $b \geq 1$. Finally, we revisit identification in the limit, where the learner must converge to a single correct hypothesis for the target language. We focus on its incremental variant, where the learner remembers only its previous guess. Here, although exact identification fails on a collection of just three languages, a mild relaxation requiring convergence to an ``approximate'' version of the target is achievable for every finite collection. These results show bounded memory affects these tasks differently: generation remains achievable for every countable collection, while density and identification are confined to finite collections, with guarantees weakening as the collection grows.

Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion

from arXiv: Data Structures and Algorithms

Authors: Anay Mehrotra, Phuc Tran, Van H. Vu, Manolis Zampetakis

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we observe $n$ units across $m$ times under unknown, non-uniform treatment assignments. The data in this setting is naturally represented as a matrix of all unit--time treatment effects. Estimating heterogeneous treatment effects can then be expressed as obtaining a good estimation of each row's average in this matrix. This allows us to formulate the problem as matrix completion, which can be solved under natural low-rankness assumptions. However, existing matrix-completion guarantees are not powerful enough to get meaningful bounds for the per-row guarantee required for estimating the heterogeneous treatment effect; roughly speaking, they are only useful for estimating average treatment effect bounds, as also illustrated in a recent line of work. We give a simple, computationally efficient estimator that, without knowledge of the propensities and under standard low-rankness and regularity assumptions, achieves a row-wise $\ell_2$ error of $\tilde{O}(\sqrt{\frac{1}{n} + \frac{n}{m^2}})$. Technically, our analysis establishes the first sharp row-wise $\ell_2$-perturbation bound for low-rank approximation, complementing existing spectral-, Frobenius-, and entrywise perturbation theory.

Authors: Anay Mehrotra, Phuc Tran, Van H. Vu, Manolis Zampetakis

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we observe $n$ units across $m$ times under unknown, non-uniform treatment assignments. The data in this setting is naturally represented as a matrix of all unit--time treatment effects. Estimating heterogeneous treatment effects can then be expressed as obtaining a good estimation of each row's average in this matrix. This allows us to formulate the problem as matrix completion, which can be solved under natural low-rankness assumptions. However, existing matrix-completion guarantees are not powerful enough to get meaningful bounds for the per-row guarantee required for estimating the heterogeneous treatment effect; roughly speaking, they are only useful for estimating average treatment effect bounds, as also illustrated in a recent line of work. We give a simple, computationally efficient estimator that, without knowledge of the propensities and under standard low-rankness and regularity assumptions, achieves a row-wise $\ell_2$ error of $\tilde{O}(\sqrt{\frac{1}{n} + \frac{n}{m^2}})$. Technically, our analysis establishes the first sharp row-wise $\ell_2$-perturbation bound for low-rank approximation, complementing existing spectral-, Frobenius-, and entrywise perturbation theory.

A Radius-Sensitive Approximation Algorithm for Connected Submodular Maximization

from arXiv: Data Structures and Algorithms

Authors: Philip Cervenjak, Junhao Gan, Naonori Kakimura, Seeun William Umboh, Anthony Wirth

Connected Submodular Maximization (CSM) is a graph problem with important applications to wireless network deployment, path planning, epidemic outbreaks, and cancer genome studies. In CSM, we are given a graph $G$, a non-negative monotone submodular function $f$ on subsets of the vertex set of $G$, and an integer $k$. The goal is to select a tree in $G$, with $k$ edges, whose vertex set maximizes $f$. We also study the more general Directed and Directed Rooted variants of CSM (DCSM and DRCSM respectively). In both variants, $G$ is directed and the solution must be an out-tree in $G$, with $k$ edges, whose vertex set maximizes $f$; DRCSM further specifies a vertex to be the root of the selected out-tree. For CSM, several previous works have proposed polynomial time approximation algorithms; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{\sqrt{k}})$-approximation. We can also parameterize the approximation factor by the radius of the optimal solution, denoted by $r$; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{r})$-approximation. In this paper, we improve on the state-of-the-art approximation factor for CSM with respect to $r$ as well as $k$, noting that $r \leq k$. We propose a polynomial time framework that, for (Directed) CSM, achieves a $Ω(\frac{\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation for every constant $\varepsilon \in (0, 1]$. For DRCSM, our framework achieves a $Ω(\frac{δ\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation that violates the size constraint by at most a factor of $1 + δ$ for every $δ\in [\frac{1}{k}, 1]$. A key component of our framework is GreedyRadius, which is an algorithm for DRCSM that takes another algorithm with a bicriteria approximation factor in terms of $k$ and outputs a solution with the same bicriteria approximation factor (up to constants) in terms of $r$.

Authors: Philip Cervenjak, Junhao Gan, Naonori Kakimura, Seeun William Umboh, Anthony Wirth

Connected Submodular Maximization (CSM) is a graph problem with important applications to wireless network deployment, path planning, epidemic outbreaks, and cancer genome studies. In CSM, we are given a graph $G$, a non-negative monotone submodular function $f$ on subsets of the vertex set of $G$, and an integer $k$. The goal is to select a tree in $G$, with $k$ edges, whose vertex set maximizes $f$. We also study the more general Directed and Directed Rooted variants of CSM (DCSM and DRCSM respectively). In both variants, $G$ is directed and the solution must be an out-tree in $G$, with $k$ edges, whose vertex set maximizes $f$; DRCSM further specifies a vertex to be the root of the selected out-tree. For CSM, several previous works have proposed polynomial time approximation algorithms; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{\sqrt{k}})$-approximation. We can also parameterize the approximation factor by the radius of the optimal solution, denoted by $r$; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{r})$-approximation. In this paper, we improve on the state-of-the-art approximation factor for CSM with respect to $r$ as well as $k$, noting that $r \leq k$. We propose a polynomial time framework that, for (Directed) CSM, achieves a $Ω(\frac{\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation for every constant $\varepsilon \in (0, 1]$. For DRCSM, our framework achieves a $Ω(\frac{δ\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation that violates the size constraint by at most a factor of $1 + δ$ for every $δ\in [\frac{1}{k}, 1]$. A key component of our framework is GreedyRadius, which is an algorithm for DRCSM that takes another algorithm with a bicriteria approximation factor in terms of $k$ and outputs a solution with the same bicriteria approximation factor (up to constants) in terms of $r$.

Quadratic Sums-of-Powers for Fixed-Parameter Tractable Quantum-Circuit Simulation

from arXiv: Data Structures and Algorithms

Authors: Alexis de Colnet, Floris Geerts, Rihan Hai, Alfons Laarman, Joon Hyung Lee, Guillermo A. Pérez

Strongly simulating a quantum circuit, that is, computing an output amplitude, amounts to summing the circuit's Feynman paths, a weighted count over assignments to the Boolean ``path'' variables. The circuit's gates induce correlations among these variables, forming a graph whose structure determines the hardness of the simulation task. This sum-of-powers viewpoint underlies recent simulators built on knowledge-representation tools from artificial intelligence, namely binary decision diagrams and weighted model counting. We show that the structural quantity most accurately governing the difficulty is the rank-width of the path-variable graph, and we give an algorithm that evaluates the amplitude in time that is exponential only in this rank-width and polynomial in the circuit size. Rank-width can be far smaller than the widths that control competing methods: as corollaries, our algorithm reproduces a recent decision-diagram simulation breakthrough as a special case and matches the Markov--Shi tensor-network contraction bound. To complement this, we exhibit circuit families on which our algorithm provably beats both competing methods. The new method applies to every circuit built from Hadamard and diagonal gates, in particular to circuits over Clifford+T. In practical terms, general-purpose decision-diagram and model-counting tools can serve as the workhorse, with our specialized algorithm dispatched to exploit a small rank-width of the associated graph when it is present.

Authors: Alexis de Colnet, Floris Geerts, Rihan Hai, Alfons Laarman, Joon Hyung Lee, Guillermo A. Pérez

Strongly simulating a quantum circuit, that is, computing an output amplitude, amounts to summing the circuit's Feynman paths, a weighted count over assignments to the Boolean ``path'' variables. The circuit's gates induce correlations among these variables, forming a graph whose structure determines the hardness of the simulation task. This sum-of-powers viewpoint underlies recent simulators built on knowledge-representation tools from artificial intelligence, namely binary decision diagrams and weighted model counting. We show that the structural quantity most accurately governing the difficulty is the rank-width of the path-variable graph, and we give an algorithm that evaluates the amplitude in time that is exponential only in this rank-width and polynomial in the circuit size. Rank-width can be far smaller than the widths that control competing methods: as corollaries, our algorithm reproduces a recent decision-diagram simulation breakthrough as a special case and matches the Markov--Shi tensor-network contraction bound. To complement this, we exhibit circuit families on which our algorithm provably beats both competing methods. The new method applies to every circuit built from Hadamard and diagonal gates, in particular to circuits over Clifford+T. In practical terms, general-purpose decision-diagram and model-counting tools can serve as the workhorse, with our specialized algorithm dispatched to exploit a small rank-width of the associated graph when it is present.

Selection Hyper-heuristics Can Automatically Adjust the Learning Period to Optimally Solve Pseudo-Boolean Problems

from arXiv: Data Structures and Algorithms

Authors: Benjamin Doerr, Pietro S. Oliveto, John Alasdair Warwicker

The Random Gradient hyper-heuristic was recently shown to be able to learn the optimal neighbourhood size when optimizing the LeadingOnes benchmark via the Randomised Local Search (RLS) meta-heuristic. However, for this to happen, a learning period of a certain length $τ$ had to be used, differently from classic hyper-heuristics, which change their behaviour based on the success of only the previous iteration. In this paper, we show how to automatically set this new parameter value, relieving the user from the non-trivial task of controlling this novel algorithm parameter. We prove that the resulting hyper-heuristic selects the optimal neighbourhood size in a $1-o(1)$ fraction of the iterations and, consequently, optimises the LeadingOnes benchmark in the best possible time (apart from lower-order terms) achievable with these neighborhood sizes.

Authors: Benjamin Doerr, Pietro S. Oliveto, John Alasdair Warwicker

The Random Gradient hyper-heuristic was recently shown to be able to learn the optimal neighbourhood size when optimizing the LeadingOnes benchmark via the Randomised Local Search (RLS) meta-heuristic. However, for this to happen, a learning period of a certain length $τ$ had to be used, differently from classic hyper-heuristics, which change their behaviour based on the success of only the previous iteration. In this paper, we show how to automatically set this new parameter value, relieving the user from the non-trivial task of controlling this novel algorithm parameter. We prove that the resulting hyper-heuristic selects the optimal neighbourhood size in a $1-o(1)$ fraction of the iterations and, consequently, optimises the LeadingOnes benchmark in the best possible time (apart from lower-order terms) achievable with these neighborhood sizes.

TC-MIS: Maximal Independent Set on Tensor-cores

from arXiv: Data Structures and Algorithms

Authors: Prajjwal Nijhara, Dip Sankar Banerjee

Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to irregular memory access and workload imbalance, specialized GPU algorithms have achieved good performance, processing million-vertex graphs in milliseconds. Modern GPUs are equipped with Tensor Cores (TCs), specialized units for matrix operations with 8-16x higher throughput than CUDA Cores (CCs), which are extensively used for ML, DL, and inference tasks but remain largely unexplored for graph algorithms. In this paper, we present TC-MIS, a TC-accelerated algorithm that reformulates key phases of MIS computation as sparse matrix-vector multiplication (SpMV). TC-MIS tiles the graph adjacency matrix and employs Warp Matrix Multiply-Accumulate (WMMA) operations to transform irregular graph traversal into regular, massively parallel computation. Our evaluation across TC-enabled microarchitectures (Ampere, Ada Lovelace, Hopper, Blackwell) demonstrates that TC-MIS achieves an average speedup of 2.84x on RTX A5000, 4.84x on L40S, 18.80x on H200 GPUs, and 5.20x on RTX 5080 with a maximum speedup of 44.38x on H200 GPU over state-of-the-art methods, while maintaining solution quality comparable to that obtained by established heuristics that produce near-maximum independent sets.

Authors: Prajjwal Nijhara, Dip Sankar Banerjee

Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to irregular memory access and workload imbalance, specialized GPU algorithms have achieved good performance, processing million-vertex graphs in milliseconds. Modern GPUs are equipped with Tensor Cores (TCs), specialized units for matrix operations with 8-16x higher throughput than CUDA Cores (CCs), which are extensively used for ML, DL, and inference tasks but remain largely unexplored for graph algorithms. In this paper, we present TC-MIS, a TC-accelerated algorithm that reformulates key phases of MIS computation as sparse matrix-vector multiplication (SpMV). TC-MIS tiles the graph adjacency matrix and employs Warp Matrix Multiply-Accumulate (WMMA) operations to transform irregular graph traversal into regular, massively parallel computation. Our evaluation across TC-enabled microarchitectures (Ampere, Ada Lovelace, Hopper, Blackwell) demonstrates that TC-MIS achieves an average speedup of 2.84x on RTX A5000, 4.84x on L40S, 18.80x on H200 GPUs, and 5.20x on RTX 5080 with a maximum speedup of 44.38x on H200 GPU over state-of-the-art methods, while maintaining solution quality comparable to that obtained by established heuristics that produce near-maximum independent sets.

Sampling Directed Eulerian Tours in $\widetilde O(m^{3/2})$ Time

from arXiv: Data Structures and Algorithms

Authors: Nima Anari

We give a randomized algorithm that samples a nearly uniform Eulerian tour of a directed Eulerian multigraph with $m$ arcs in $\widetilde O(m^{3/2})$ time. The guarantee is worst-case, applies to arbitrary directed Eulerian multigraphs, and breaks the $mn$-type arborescence-sampling barrier on sparse graphs. The core case is a $2$-in/$2$-out graph. We introduce a new local Markov chain, the flip--repair walk: one step locally splits a tour into two circuits and then chooses uniformly among the local flips that repair the state to one tour. We prove that this walk mixes in nearly linear many steps and implement the walk using a dynamic chord data structure. A pointwise degree-reduction wrapper extends the sampler from this degree-two core to arbitrary degrees while preserving the $\widetilde O(m^{3/2})$ total running time. The high-level algorithmic plan, the switching-network reduction, and the dynamic data-structure argument were devised by the author. The author conjectured the mixing theorem underlying the analysis, and GPT 5.5 Pro Extended produced its linear-algebra proof. Codex assisted with manuscript assembly and typesetting.

Authors: Nima Anari

We give a randomized algorithm that samples a nearly uniform Eulerian tour of a directed Eulerian multigraph with $m$ arcs in $\widetilde O(m^{3/2})$ time. The guarantee is worst-case, applies to arbitrary directed Eulerian multigraphs, and breaks the $mn$-type arborescence-sampling barrier on sparse graphs. The core case is a $2$-in/$2$-out graph. We introduce a new local Markov chain, the flip--repair walk: one step locally splits a tour into two circuits and then chooses uniformly among the local flips that repair the state to one tour. We prove that this walk mixes in nearly linear many steps and implement the walk using a dynamic chord data structure. A pointwise degree-reduction wrapper extends the sampler from this degree-two core to arbitrary degrees while preserving the $\widetilde O(m^{3/2})$ total running time. The high-level algorithmic plan, the switching-network reduction, and the dynamic data-structure argument were devised by the author. The author conjectured the mixing theorem underlying the analysis, and GPT 5.5 Pro Extended produced its linear-algebra proof. Codex assisted with manuscript assembly and typesetting.

Explaining Rankings with Hidden Group Bonuses

from arXiv: Data Structures and Algorithms

Authors: Alvin Hong Yao Yan, Suraj Shetiya, Sujoy Bhore, Priyanka Golia, Diptarka Chakraborty

Determining a linear utility function that correlates with observed candidate rankings is a foundational problem with applications in domains such as admissions, hiring, and recommendation systems, e.g., [Storandt and Funke, AAAI'19, Zhang et al., KDD'23, Wang et al., ICDE'24 (best paper award), Chen and Wong, VLDB'24]. Traditionally, these models assume full visibility into the feature sets used to determine the utility score. However, real-world scenarios often involve sensitive attributes that are hidden or partially observed, yet may influence outcomes through additive bonuses designed to promote fairness, as in [Gale and Marian, ICDE'24]. Motivated by such practical concerns, we study a variant of the ranking explanation problem where sensitive features are unobserved but may influence candidate rankings through group-specific linear boosts. We present a formal framework for modeling this problem and develop an algorithmic solution that leverages constraint satisfaction and automated reasoning techniques to jointly infer the linear scoring parameters and latent group bonuses consistent with the observed rankings. We further show that determining a satisfying linear function with group-specific bonuses is \textsf{NP}-hard in general, but when the feature dimension and the number of groups are constant, the problem admits a polynomial-time solution. Our approach is the first to address this nuanced variant, which captures key real-world challenges in fair ranking and admission systems. We perform extensive experiments on both real-world and synthetic datasets, demonstrating that our method effectively recovers hidden bonus structures and provides faithful explanations of observed ranking outcomes.

Authors: Alvin Hong Yao Yan, Suraj Shetiya, Sujoy Bhore, Priyanka Golia, Diptarka Chakraborty

Determining a linear utility function that correlates with observed candidate rankings is a foundational problem with applications in domains such as admissions, hiring, and recommendation systems, e.g., [Storandt and Funke, AAAI'19, Zhang et al., KDD'23, Wang et al., ICDE'24 (best paper award), Chen and Wong, VLDB'24]. Traditionally, these models assume full visibility into the feature sets used to determine the utility score. However, real-world scenarios often involve sensitive attributes that are hidden or partially observed, yet may influence outcomes through additive bonuses designed to promote fairness, as in [Gale and Marian, ICDE'24]. Motivated by such practical concerns, we study a variant of the ranking explanation problem where sensitive features are unobserved but may influence candidate rankings through group-specific linear boosts. We present a formal framework for modeling this problem and develop an algorithmic solution that leverages constraint satisfaction and automated reasoning techniques to jointly infer the linear scoring parameters and latent group bonuses consistent with the observed rankings. We further show that determining a satisfying linear function with group-specific bonuses is \textsf{NP}-hard in general, but when the feature dimension and the number of groups are constant, the problem admits a polynomial-time solution. Our approach is the first to address this nuanced variant, which captures key real-world challenges in fair ranking and admission systems. We perform extensive experiments on both real-world and synthetic datasets, demonstrating that our method effectively recovers hidden bonus structures and provides faithful explanations of observed ranking outcomes.

Distributed Gaussian Mean Testing under Communication Constraints: messages, samples, and coins

from arXiv: Data Structures and Algorithms

Authors: Clément L. Canonne, Nimitt

We revisit the problem of Gaussian mean testing in a distributed, communication constrained setting, where each of $n$ users independently observes samples from an unknown $d$-dimensional spherical Gaussian distribution $\mathcal{G}(μ,\mathbb{I}_d)$, and can communicate up to $\ell$ bits to a central referee. The referee's goal is then to distinguish between cases (i) $\|μ\|_2 = 0$ versus (ii) $\|μ\|_2\ge \varepsilon$. This problem has been considered in the private- and public-coin settings, when each user holds exactly one sample, or more generally when each holds exactly $m$ samples. In this work, we significantly generalize the question in three directions: when the users only share a small number $s$ of random bits, when each user holds a different number of samples $m_k$, and when each user can send a different number of bits $\ell_k$ to the referee.

Authors: Clément L. Canonne, Nimitt

We revisit the problem of Gaussian mean testing in a distributed, communication constrained setting, where each of $n$ users independently observes samples from an unknown $d$-dimensional spherical Gaussian distribution $\mathcal{G}(μ,\mathbb{I}_d)$, and can communicate up to $\ell$ bits to a central referee. The referee's goal is then to distinguish between cases (i) $\|μ\|_2 = 0$ versus (ii) $\|μ\|_2\ge \varepsilon$. This problem has been considered in the private- and public-coin settings, when each user holds exactly one sample, or more generally when each holds exactly $m$ samples. In this work, we significantly generalize the question in three directions: when the users only share a small number $s$ of random bits, when each user holds a different number of samples $m_k$, and when each user can send a different number of bits $\ell_k$ to the referee.

An Improved Greedy Approximation for (Metric) $k$-Means

from arXiv: Data Structures and Algorithms

Authors: Moses Charikar, Vincent Cohen-Addad, Ruiquan Gao, Fabrizio Grandoni, Euiwoong Lee, Ernest van Wijland

Clustering is a basic task in data analysis and machine learning, and the optimization of clustering objectives are well-studied optimization problems; amongst these, the $k$-Means objective is arguably the most well known. Given a collection of points in a metric space, the goal is to partition them into $k$ clusters, each with an associated center, so as to minimize the sum of squared distances of points to their cluster centers. In this paper, we present a polynomial-time $3+2\sqrt{2}+ε<5.83$-approximation algorithm for $k$-Means in general metrics. This substantially improves on the current-best $(9+ε)$-approximation in [Ahmadian, Norouzi-Fard, Svensson, Ward - FOCS'17, SICOMP'20], and even slightly improves on the $5.92$-approximation in [Cohen-Addad, Esfandiari, Mirrokni, Narayanan - STOC'22] for the Euclidean special case. A natural approach for $k$-Means is to leverage Lagrangian Multiplier Preserving (LMP) approximations for the facility location problem. The previous best results for $k$-Means build upon an adaptation of an LMP $3$-approximation for facility location with metric connection costs in [Jain, Vazirani - J.ACM'01] based on a primal-dual method, rather than on the improved LMP greedy $2$-approximation for the same problem in [Jain, Mahdian, Markakis, Saberi, Vazirani - J.ACM'03]. The barrier to using the improved LMP algorithm was that no adaptation of this algorithm and its analysis to the case of squared metric connection costs was known (since squared distances violate triangle inequality). Our main contribution is overcoming this barrier by providing such an adaptation. This new LMP approximation algorithm is then combined with the framework recently introduced in [Cohen-Addad, Grandoni, Lee, Schwiegelshohn, Svensson - STOC'25] for the related (metric) $k$-Median problem.

Authors: Moses Charikar, Vincent Cohen-Addad, Ruiquan Gao, Fabrizio Grandoni, Euiwoong Lee, Ernest van Wijland

Clustering is a basic task in data analysis and machine learning, and the optimization of clustering objectives are well-studied optimization problems; amongst these, the $k$-Means objective is arguably the most well known. Given a collection of points in a metric space, the goal is to partition them into $k$ clusters, each with an associated center, so as to minimize the sum of squared distances of points to their cluster centers. In this paper, we present a polynomial-time $3+2\sqrt{2}+ε<5.83$-approximation algorithm for $k$-Means in general metrics. This substantially improves on the current-best $(9+ε)$-approximation in [Ahmadian, Norouzi-Fard, Svensson, Ward - FOCS'17, SICOMP'20], and even slightly improves on the $5.92$-approximation in [Cohen-Addad, Esfandiari, Mirrokni, Narayanan - STOC'22] for the Euclidean special case. A natural approach for $k$-Means is to leverage Lagrangian Multiplier Preserving (LMP) approximations for the facility location problem. The previous best results for $k$-Means build upon an adaptation of an LMP $3$-approximation for facility location with metric connection costs in [Jain, Vazirani - J.ACM'01] based on a primal-dual method, rather than on the improved LMP greedy $2$-approximation for the same problem in [Jain, Mahdian, Markakis, Saberi, Vazirani - J.ACM'03]. The barrier to using the improved LMP algorithm was that no adaptation of this algorithm and its analysis to the case of squared metric connection costs was known (since squared distances violate triangle inequality). Our main contribution is overcoming this barrier by providing such an adaptation. This new LMP approximation algorithm is then combined with the framework recently introduced in [Cohen-Addad, Grandoni, Lee, Schwiegelshohn, Svensson - STOC'25] for the related (metric) $k$-Median problem.

Residual-Entropy Accounting for Routed Atom-Budgeted Learned Indexes

from arXiv: Data Structures and Algorithms

Authors: Faruk Alpay, Levent Sarioglu

We study exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture. An ordered directory routes each query to a contiguous interval, a counted local predictor returns a certified rank window, and exact repair resolves the remaining uncertainty by comparisons. The result is scoped to this architecture and does not claim guarantees for arbitrary learned-index designs such as unconstrained RMI dispatch, hash routing, neural routing, or exact-payload indexes without additional accounting. The main parameter is conditional residual answer entropy: the entropy of the exact answer after the leaf, predictor output, certificate, and charged pre-repair information are observed. We prove a two-sided accounting theorem showing that this functional gives the query-time scale under the stated architecture and local predictor-atom budget. Directory space, sorted-array storage, and transcript-indexed repair-program space are treated as separate system costs, so the theorem is not a byte-level space lower bound or a compact implementation recipe. We also give a rank-spread specialization in which the radius term log(1 + Delta) is valid only when many residual ranks remain likely after the predictor transcript is known. For counted piecewise-linear segments, we make the profile term non-oracular, derive a shadow-price allocation rule, compute finite-instance RGapM and GapM values on real SOSD and Zenodo samples, and report benchmarks against PGM-index, RadixSpline, and binary search. The benchmarks expose overheads and bottlenecks rather than claiming speed for the shadow prototype.

Authors: Faruk Alpay, Levent Sarioglu

We study exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture. An ordered directory routes each query to a contiguous interval, a counted local predictor returns a certified rank window, and exact repair resolves the remaining uncertainty by comparisons. The result is scoped to this architecture and does not claim guarantees for arbitrary learned-index designs such as unconstrained RMI dispatch, hash routing, neural routing, or exact-payload indexes without additional accounting. The main parameter is conditional residual answer entropy: the entropy of the exact answer after the leaf, predictor output, certificate, and charged pre-repair information are observed. We prove a two-sided accounting theorem showing that this functional gives the query-time scale under the stated architecture and local predictor-atom budget. Directory space, sorted-array storage, and transcript-indexed repair-program space are treated as separate system costs, so the theorem is not a byte-level space lower bound or a compact implementation recipe. We also give a rank-spread specialization in which the radius term log(1 + Delta) is valid only when many residual ranks remain likely after the predictor transcript is known. For counted piecewise-linear segments, we make the profile term non-oracular, derive a shadow-price allocation rule, compute finite-instance RGapM and GapM values on real SOSD and Zenodo samples, and report benchmarks against PGM-index, RadixSpline, and binary search. The benchmarks expose overheads and bottlenecks rather than claiming speed for the shadow prototype.

Thursday, May 28

Research Fellow in AI Security and Alignment at MATS Research (apply by June 7, 2026)

from CCI: jobs

MATS Autumn 2026 is a 10-week fully-funded research fellowship for aspiring AI alignment, control, security, and governance researchers. Fellows work directly with mentors from Anthropic, Google DeepMind, OpenAI, Redwood Research and more, with housing and office space provided in Berkeley/London. MATS has accelerated 500+ researchers; of alumni who graduated before 2025, 80% now work on […]

MATS Autumn 2026 is a 10-week fully-funded research fellowship for aspiring AI alignment, control, security, and governance researchers. Fellows work directly with mentors from Anthropic, Google DeepMind, OpenAI, Redwood Research and more, with housing and office space provided in Berkeley/London. MATS has accelerated 500+ researchers; of alumni who graduated before 2025, 80% now work on AI safety

Website: https://www.matsprogram.org/apply?utm_source=cstheory-jobs&utm_medium=job-board&utm_campaign=a26
Email: raj@matsprogram.org

By shacharlovett

Dispatches from the possibly last days of human relevance

from Scott Aaronson

As most readers have presumably heard by now, Paul Erdös’s Unit Distance Problem from 1946—one of the central open problems from the field of discrete geometry—has been solved by an internal OpenAI model. Erdös had conjectured that, given n points in the plane, at most n1+o(1) pairs of them could be unit distance apart. Using […]

As most readers have presumably heard by now, Paul Erdös’s Unit Distance Problem from 1946—one of the central open problems from the field of discrete geometry—has been solved by an internal OpenAI model. Erdös had conjectured that, given n points in the plane, at most n1+o(1) pairs of them could be unit distance apart. Using high-powered results from algebraic number theory, GPT refuted this, constructing a set with n1+ε unit-distance pairs, for ε ~ 10-38. Shortly afterward, Will Sawin, a human (!), improved GPT’s construction to get ~n1.014 pairs. There’s since been a claim to improve this further, to ~n1.034. Meanwhile, the best known upper bound remains n4/3, improving Erdös’s n3/2.

The entire process seems have been one-shot: my former student Lijie Chen simply gave GPT the problem, then GPT thought for a while and output a several-page argument that, on analysis by human experts, turned out to be correct. Of course there’s selection bias here; we’re not hearing as much about the hundreds of other problems GPT was given that it didn’t solve (isn’t that the case with humans too?). Clearly, too, GPT was helped by the facts that human mathematicians had wasted most of their time trying to prove Erdös right rather than looking for a counterexample, and that, even if they did look for a counterexample, they’d need to be experts in algebraic number theory to find this one, which hardly any discrete geometers are. So, maybe that suggests that AI, right now, is “merely” picking various medium-hanging fruits that human mathematicians missed for contingent reasons? With emphasis on the “right now.”

In a companion paper, OpenAI helpfully included commentary from Timothy Gowers, Noga Alon, Will Sawin, Daniel Litt, and many other experts, reflecting on the breakthrough, the path that GPT took to get to it (which can actually be seen by examining its chain-of-thought), and what this might mean for the future of mathematical research.

I heard the news maybe an hour after it broke, when some UT grad students came to my office to tell me. For what it’s worth: these students were morose, musing about how everything might soon be over for young scientists and mathematicians like themselves. I don’t know whether they’re right, but I feel like I should tell the truth about what their reaction was.

[Update: News has been coming faster than I can write about it, but today we learned that another important conjecture of Erdös has been refuted. Erdös and Szemerédi’s strong sumset conjecture over R, from the 1970s, had said that, if A is a finite set of real numbers, then either |A+A| or |A×A| must be at least |A|2-o(1). In this case humans, including the aforementioned Sawin, did almost all of the work of constructing the counterexample, but they were directly inspired by GPT’s earlier refutation of the unit distance conjecture. It remains open whether such a counterexample exists where A is a set of integers.]

Then, a few days later, a team at DeepMind, including my UT Austin colleague Swarat Chaudhuri, announced that they were able to use a system called AlphaProof Nexus to settle nine more (!) Erdös problems, many of them in additive combinatorics, along with miscellaneous other open math problems. Notably, in this case the AI also fully formalized its proofs in Lean.

And then, just today, Jelani Nelson alerted me to a new CS theory paper, which solves a longstanding open problem about electrical flows on graphs using a proof from GPT5.5Pro.

It seems to me that we’re now over the top of this particular rollercoaster, and it will keep accelerating until we reach the bottom, wherever that might be. I don’t know whether to hope or dread that solutions to P versus NP and all our other great problems will be included in the ride—that our role, as human mathematicians, will be reduced to (at most) deciding which questions we find interesting and then understanding AI models’ answers to those questions.

But maybe that won’t happen. Maybe the new AI mathematicians will soon hit a wall, because they lack the uncomputable quantum gravity microtubules of Penrose and Hameroff, or some other magic human ingredient. The fantastical thing is that, one way or the other, we’re going to find out empirically before very long.


Readers may have also seen the news that multiple prizewinning entries in a short fiction contest called the Commonwealth Prize, give overwhelming indications of having been written by AIs. As Kelsey Piper puts it:

There are, let’s say, also some noticeable similarities in the prose style between the winning stories that were flagged for AI use. AI chatbots love metaphors and similes, and they often spit out ones that sound vaguely pleasing but are logically incoherent or ascribe properties to things that don’t make sense.

“The Serpent in the Grove” gave us, “The girl smiled like sunrise over a sink.” “The Bastion’s Shadow” says, “She carried it now in her bag, heavy as a charm.” “Mehendi Nights” describes something as “swaying against plaster like a warning bell.”

The Commonwealth Foundation, whose judges chose these stories, hasn’t exactly covered itself in glory—saying, on the one hand, that it strictly forbids AI use but on the other, that it will continue taking authors at their word that they didn’t use AI, no matter the immensity of evidence to the contrary. As many others have pointed out, judges more versed in AI would’ve ironically been better placed to notice the signs of its use.

If only there were some sort of automated way to detect AI-generated text. Someone should really get on that problem, don’t you think?


But maybe we should just throw in the towel—as some of my colleagues have already done in the context of undergraduate projects? Maybe we should simply say that a good story is a good story, regardless of what manner of entity produced it?

As it happens, just last week I read my very first AI-written story that affected me as a story, to the extent that I wanted to read it more than once. This happened when I gave GPT5.5Pro the following simple prompt:

Write me a story about the most ancient Israelites that’s riveting like the stories of the Bible but that’s also consistent with all of the archeological evidence.

You can read the result here. One of my Facebook friends called it “disturbingly good,” and whatever the problems with the piece, I share that feeling. Of course, I’m well aware that GPT could easily generate a thousand stories like it—sampled from the same probability distribution—and then I could even do statistics on which tropes were the most common. This makes it feel silly to overindex on the first story that happened to be output, and yet somehow I did.


I feel like at this point, both the prophets of AI utopia like Ray Kurzweil, and of AI doom like Eliezer Yudkowsky, could be forgiven for asking: dude, will you listen to us YET? Do you still find it prudent to call this new form of terrestrial intelligence a stochastic parrot, a laughable fraud, or a fad that’s about to go away? Fear it all you want, hate it even, but at least respect it!

Which brings me to the other big AI news from the past week, namely that Pope Leo released his first encyclical, which is entitled “Safeguarding the Human Person in the Time of Artificial Intelligence.” I read it and … well, I certainly agreed with the theme that such a world-changing technology needs to be developed for the common good (as the Pope would have it, like the walls of Jerusalem), rather than for the profit or vanity of any one individual or company (in his analogy, like the Tower of Babel). I had quibbles with some of the other parts. Zvi Mowshowitz, as he often does, had a superb paragraph-by-paragraph analysis. Amusingly, there are indications that parts of the encyclical were written by AI.

To me, though, maybe the most notable part was that Chris Olah, who leads Anthropic’s interpretability team, was standing next to the Pope at the ceremony, and delivered his own remarks. I felt like Chris, who I met even before Anthropic existed, was a non-obvious yet inspired choice here, one of the rare figures in frontier AI whose technical and moral authority are both completely unimpeachable by anyone.

And so, at this momentous era for the human project, and on no less of an authority than that of the Vicar of Christ himself, the Supreme Pontiff and the Successor of Peter, I hereby throw myself on the wisdom and mercy of … uhh, I guess, Chris Olah and his team at Anthropic.

Chris, if I am soon to share the earth with entities that can prove the Riemann Hypothesis and solve quantum gravity after 30 seconds of thought, then may you understand those entities well enough to cause them to be nice.


Endnote: I should have foreseen, but didn’t, that the comments on this post would be dominated by people looking for ways to minimize whichever specific AI accomplishments I blogged about. Thus, it turns out, the ability of AI to solve Erdös problems just demonstrates that Erdös’s problems were never “serious” math in the first place—nothing like algebraic geometry or Grothendieck-style theory-building, which remain untouched. Likewise, the story I shared was obvious AI slop.

I had taken it as obvious that, when assessing AI’s impact on the world, one needs to look at least somewhat into the future: to remember where things were four years ago, compared to where they are today, and at least try to draw a straight line through the data, if not the exponential that seems to fit better.

Does anyone seriously doubt at this point that major open problems in algebraic geometry and other “Grothendieck-friendly” areas of math will fall to future AI models? Or that AI-written stories will improve, not only to win literary awards from AI-naïve judges, but to avoid the features that commenters here are complaining about? And that, whenever that happens, there will be new confident reasons not to care immediately offered up in comment sections like mine?

Apparently people do still doubt—hence the throwaway remark in my post about Penrose and Hameroff and microtubules. If not that or something like it, what exactly do they think the ceiling will be, and why?

Recently (I should have mentioned this before), I came across what I consider one of the greatest social experiments of all time, one that illuminates people’s reactions to every AI advance. A Twitter/X user named SHL0MS displayed the following AI-generated fake “Monet painting,” and asked people to explain what made it worse than real Monet paintings:

If you haven’t seen this yet, I recommend that you try the exercise yourself before reading further.

As it was, numerous art aficionados responded at length, savaging the flat, lifeless, uncreative AI slop, the emotionless composition, the missing spark, the lack of tranquility, the harshness, the lack of depth and symbiosis, and on and on and on.

Only after they had all said their piece did SHL0MS reveal that this is an actual Monet painting.

By Scott

Gauge Geometry of Hodge Zero-Mode Transport in Parameter-Dependent Topological Data Analysis

from arXiv: Computational Geometry

Authors: Satoshi Kanno, Rei Nishimura, Hiroshi Yamauchi, Yoshi-aki Shimada

We propose a practical computational framework for detecting structural changes in parameter-dependent topological data. In many applications, such as time-series data analysis, anomaly detection, and monitoring of systems under changing control parameters, persistence diagrams describe the birth and death of topological features at each parameter value, but they do not fully capture how these features are reorganized over time. To address this limitation, we represent homological features by zero modes of the ordinary combinatorial Hodge Laplacian and track the corresponding feature spaces in a common ambient chain space. This allows us to compute curvature and holonomy as descriptors of local reorganization and accumulated memory in evolving topological structures. Curvature highlights parameter regions where homological features mix or change rapidly, while holonomy summarizes the net effect of such changes after a closed cycle. We also establish stability estimates showing that these descriptors are robust under perturbations of the Hodge Laplacian on regular regions. Numerical experiments on controlled time-dependent point-cloud data show that the proposed method detects tracking instability, distinguishes systems with nearly identical persistence diagrams, and captures cycle-level memory invisible to pointwise feature matching. These results suggest that zero-mode transport geometry can serve as a useful computational tool for analyzing dynamic topological data.

Authors: Satoshi Kanno, Rei Nishimura, Hiroshi Yamauchi, Yoshi-aki Shimada

We propose a practical computational framework for detecting structural changes in parameter-dependent topological data. In many applications, such as time-series data analysis, anomaly detection, and monitoring of systems under changing control parameters, persistence diagrams describe the birth and death of topological features at each parameter value, but they do not fully capture how these features are reorganized over time. To address this limitation, we represent homological features by zero modes of the ordinary combinatorial Hodge Laplacian and track the corresponding feature spaces in a common ambient chain space. This allows us to compute curvature and holonomy as descriptors of local reorganization and accumulated memory in evolving topological structures. Curvature highlights parameter regions where homological features mix or change rapidly, while holonomy summarizes the net effect of such changes after a closed cycle. We also establish stability estimates showing that these descriptors are robust under perturbations of the Hodge Laplacian on regular regions. Numerical experiments on controlled time-dependent point-cloud data show that the proposed method detects tracking instability, distinguishes systems with nearly identical persistence diagrams, and captures cycle-level memory invisible to pointwise feature matching. These results suggest that zero-mode transport geometry can serve as a useful computational tool for analyzing dynamic topological data.

Powers and Limitations of Synchronous Self-Assembly

from arXiv: Computational Geometry

Authors: Florent Becker, Phillip Drake, Matthew J. Patitz, Ryder Smith

In abstract models of algorithmic self-assembly, synchronization between attachments has emerged as a crucial distinction between the classical asynchronous model (aTAM) and a new synchronous model, the syncTAM. This paper presents recent advances in gauging the additional power afforded by the syncTAM. While it is known that the syncTAM and the aTAM are each unable to fully simulate the other, this paper offers evidence that the syncTAM is computationally significantly more powerful than the aTAM, especially in the non-cooperative setting. The additional power of the non-cooperative syncTAM is witnessed by the following constructions, all impossible in the non-cooperative aTAM: a flagpole, a strict self-assembly of a variant of the discrete Sierpinski triangle, and the ability to build the same assemblies (modulo scale factor) as directed aTAM systems. The second topic is that of limited synchronization, wherein, when the number of attachments is smaller than some threshold $l$, they happen synchronously, but attachments in excess of that number must wait. In that context, the precise value of $l$ is crucial, and changes to that value prevent simulation and can change which shapes can be obtained.

Authors: Florent Becker, Phillip Drake, Matthew J. Patitz, Ryder Smith

In abstract models of algorithmic self-assembly, synchronization between attachments has emerged as a crucial distinction between the classical asynchronous model (aTAM) and a new synchronous model, the syncTAM. This paper presents recent advances in gauging the additional power afforded by the syncTAM. While it is known that the syncTAM and the aTAM are each unable to fully simulate the other, this paper offers evidence that the syncTAM is computationally significantly more powerful than the aTAM, especially in the non-cooperative setting. The additional power of the non-cooperative syncTAM is witnessed by the following constructions, all impossible in the non-cooperative aTAM: a flagpole, a strict self-assembly of a variant of the discrete Sierpinski triangle, and the ability to build the same assemblies (modulo scale factor) as directed aTAM systems. The second topic is that of limited synchronization, wherein, when the number of attachments is smaller than some threshold $l$, they happen synchronously, but attachments in excess of that number must wait. In that context, the precise value of $l$ is crucial, and changes to that value prevent simulation and can change which shapes can be obtained.

A Fresh Look at Lamarckian Evolution and the Baldwin Effect

from arXiv: Data Structures and Algorithms

Authors: Inès Benito, Johannes F. Lutzeyer, Benjamin Doerr

Baldwinian and Lamarckian evolution have existed for a long time in evolutionary algorithms (EAs) without ever dominating the academic literature or practical applications. In this work, we use modern empirical and theoretical methods to revisit Lamarckian and Baldwinian evolution and rigorously compare them with the generic Darwinian evolution. On the empirical side, we run a comprehensive suite of experiments on graphs from six different datasets from the recent GraphBench benchmark on Maximum Independent Set and Maximum Cut problems. Our results show that Baldwinian and Lamarckian evolution consistently outperform Darwinian evolution, confirming the great potential of local search augmented evolutionary algorithms. Notably, in the great majority of cases, all EAs outperform recent deep learning baselines and approach the performance of highly specialised heuristic and exact solvers. We furthermore report a high-performing set of generalist parameters for all studied evolution types that we hope will be of use to practitioners in future. On the theoretical side, we extend the existing Deceptive Leading Block benchmark to arbitrary block length and use tools from modern theoretical runtime analysis to prove upper and lower bounds on the expected runtime. For block lengths greater than two, Baldwinian evolution is asymptotically faster than Lamarckian which is asymptotically faster than Darwinian evolution. When accounting for the cost of the local search procedure in fitness evaluations, the ordering depends on the implementation with Baldwinian evolution staying fastest from small block lengths onwards, explaining its strong empirical performance.

Authors: Inès Benito, Johannes F. Lutzeyer, Benjamin Doerr

Baldwinian and Lamarckian evolution have existed for a long time in evolutionary algorithms (EAs) without ever dominating the academic literature or practical applications. In this work, we use modern empirical and theoretical methods to revisit Lamarckian and Baldwinian evolution and rigorously compare them with the generic Darwinian evolution. On the empirical side, we run a comprehensive suite of experiments on graphs from six different datasets from the recent GraphBench benchmark on Maximum Independent Set and Maximum Cut problems. Our results show that Baldwinian and Lamarckian evolution consistently outperform Darwinian evolution, confirming the great potential of local search augmented evolutionary algorithms. Notably, in the great majority of cases, all EAs outperform recent deep learning baselines and approach the performance of highly specialised heuristic and exact solvers. We furthermore report a high-performing set of generalist parameters for all studied evolution types that we hope will be of use to practitioners in future. On the theoretical side, we extend the existing Deceptive Leading Block benchmark to arbitrary block length and use tools from modern theoretical runtime analysis to prove upper and lower bounds on the expected runtime. For block lengths greater than two, Baldwinian evolution is asymptotically faster than Lamarckian which is asymptotically faster than Darwinian evolution. When accounting for the cost of the local search procedure in fitness evaluations, the ordering depends on the implementation with Baldwinian evolution staying fastest from small block lengths onwards, explaining its strong empirical performance.

Disjunctive Sum of Squares

from arXiv: Data Structures and Algorithms

Authors: Amir Ali Ahmadi, Sanjeeb Dash, Yixuan Hua, Bartolomeo Stellato

We introduce the concept of disjunctive sum of squares for certifying nonnegativity of polynomials. Unlike the popular sum of squares approach where nonnegativity is certified by a single algebraic identity, the disjunctive sum of squares approach certifies nonnegativity with multiple algebraic identities which can be found in parallel. Our main result is a disjunctive Positivstellensatz proving that we can keep the degree of each algebraic identity as low as the degree of the polynomial whose nonnegativity is in question. Based on this result, we construct a semidefinite programming based converging hierarchy of lower bounds for the problem of minimizing a polynomial over a compact basic semialgebraic set, where the size of the largest semidefinite constraint is fixed throughout the hierarchy. We further prove a second disjunctive Positivstellensatz which leads to an optimization-free hierarchy for polynomial optimization. We specialize this result to the problem of proving copositivity of matrices. Finally, we describe how the disjunctive sum of squares approach can be combined with a branch-and-bound algorithm and we present numerical experiments on polynomial, copositive, and combinatorial optimization problems.

Authors: Amir Ali Ahmadi, Sanjeeb Dash, Yixuan Hua, Bartolomeo Stellato

We introduce the concept of disjunctive sum of squares for certifying nonnegativity of polynomials. Unlike the popular sum of squares approach where nonnegativity is certified by a single algebraic identity, the disjunctive sum of squares approach certifies nonnegativity with multiple algebraic identities which can be found in parallel. Our main result is a disjunctive Positivstellensatz proving that we can keep the degree of each algebraic identity as low as the degree of the polynomial whose nonnegativity is in question. Based on this result, we construct a semidefinite programming based converging hierarchy of lower bounds for the problem of minimizing a polynomial over a compact basic semialgebraic set, where the size of the largest semidefinite constraint is fixed throughout the hierarchy. We further prove a second disjunctive Positivstellensatz which leads to an optimization-free hierarchy for polynomial optimization. We specialize this result to the problem of proving copositivity of matrices. Finally, we describe how the disjunctive sum of squares approach can be combined with a branch-and-bound algorithm and we present numerical experiments on polynomial, copositive, and combinatorial optimization problems.

High-Quality Multi-Constraint Hypergraph Partitioning via Greedy Rebalancing

from arXiv: Data Structures and Algorithms

Authors: Nikolai Maas

Multi-constraint hypergraph partitioning is a generalization of balanced partitioning, where the vertex set of a hypergraph is partitioned such that the inter-block connectivity of hyperedges is minimized while balancing the vertices with regard to $d$ distinct constraints. A prominent class of applications is data distribution tasks, where this allows to achieve good load balance for $d$ different kinds of resources and simultaneously minimize the communication volume. Although the best approaches for single-constraint partitioning are usually complex (multilevel) algorithms with many components, we show that replacing only one component already leads to high-quality multi-constraint partitions: the rebalancing step, which restores balance for a partition that has (hopefully) small connectivity but violates the constraints. We design a multi-constraint rebalancing algorithm based on greedy local search, proving that balance is always restored for $d=2$ and bounded maximum weight. The key is to ensure monotonically decreasing global imbalance by choosing an imbalance metric where there is always a balance-improving move available. Integrating our algorithm into the state-of-the-art partitioner Mt-KaHyPar, we demonstrate an 11.5\,\% geometric mean connectivity reduction compared to the next best competitor (Metis) and better reliability regarding partition balance, even though the majority of inputs is outside of the theoretical guarantee.

Authors: Nikolai Maas

Multi-constraint hypergraph partitioning is a generalization of balanced partitioning, where the vertex set of a hypergraph is partitioned such that the inter-block connectivity of hyperedges is minimized while balancing the vertices with regard to $d$ distinct constraints. A prominent class of applications is data distribution tasks, where this allows to achieve good load balance for $d$ different kinds of resources and simultaneously minimize the communication volume. Although the best approaches for single-constraint partitioning are usually complex (multilevel) algorithms with many components, we show that replacing only one component already leads to high-quality multi-constraint partitions: the rebalancing step, which restores balance for a partition that has (hopefully) small connectivity but violates the constraints. We design a multi-constraint rebalancing algorithm based on greedy local search, proving that balance is always restored for $d=2$ and bounded maximum weight. The key is to ensure monotonically decreasing global imbalance by choosing an imbalance metric where there is always a balance-improving move available. Integrating our algorithm into the state-of-the-art partitioner Mt-KaHyPar, we demonstrate an 11.5\,\% geometric mean connectivity reduction compared to the next best competitor (Metis) and better reliability regarding partition balance, even though the majority of inputs is outside of the theoretical guarantee.

A Deterministic Separation Lemma

from arXiv: Data Structures and Algorithms

Authors: Abhishek Sahu

The \emph{Separation Lemma} is a simple yet powerful tool, akin to the well-known \emph{Isolation Lemma}, that guarantees the uniqueness of certain set sums. Bandopadhyay et al.\ introduced this lemma to establish lower bounds for the \ALP problem with respect to certain structural parameters, relying on random weight assignments in the process. The lemma's applicability extends well beyond that specific work, especially in proving hardness results. However, while effective, these hardness results inherently rely on probabilistic assumptions. In this work, we give a fully \emph{deterministic} construction for the weight assignment required by the Separation Lemma. We provide formal proofs of correctness, explicit examples, and show how deterministic weights can replace randomized ones, thereby derandomizing existing hardness results for path-packing problems. Our exposition highlights a clear progression from the original randomized foundations to deterministic constructions and their practical implications.

Authors: Abhishek Sahu

The \emph{Separation Lemma} is a simple yet powerful tool, akin to the well-known \emph{Isolation Lemma}, that guarantees the uniqueness of certain set sums. Bandopadhyay et al.\ introduced this lemma to establish lower bounds for the \ALP problem with respect to certain structural parameters, relying on random weight assignments in the process. The lemma's applicability extends well beyond that specific work, especially in proving hardness results. However, while effective, these hardness results inherently rely on probabilistic assumptions. In this work, we give a fully \emph{deterministic} construction for the weight assignment required by the Separation Lemma. We provide formal proofs of correctness, explicit examples, and show how deterministic weights can replace randomized ones, thereby derandomizing existing hardness results for path-packing problems. Our exposition highlights a clear progression from the original randomized foundations to deterministic constructions and their practical implications.

Efficient Algorithms for Interdicting Facilities in Trees and Bounded Treewidth Graphs

from arXiv: Data Structures and Algorithms

Authors: Ali Abbasi, Eli Friedman, Leana Golubchik, Samir Khuller, Marco Paolieri

Given a graph $G$ of $n$ nodes partitioned into facilities and customers, the $r$-edge interdiction covering problem (REIC) is to remove up to $r$ edges so as to maximize the total weight of customers disconnected from all facilities, which is called the covering objective function. While REIC is known to be NP-complete for general graphs, Fröhlich and Ruzika show that the problem can be solved in polynomial time when $G$ is a tree, providing an $O(n^7 r)$-time algorithm. We give an efficient $O(nr^2)$-time dynamic programming algorithm for REIC on trees that is fixed-parameter linear in $n$. Evaluating our solution on a benchmark of randomly generated tree networks with baselines of the Fröhlich and Ruzika algorithm and the Gurobi integer program solver, we demonstrate that in practice, our algorithm is both significantly faster and less sensitive to network topology and size. We extend our algorithm for REIC to graphs of bounded treewidth, a well-studied family of sparse graphs that generalizes trees, and obtain a matching runtime of $O(nr^2)$. We also consider the $r$-facility interdiction covering problem (RFIC), a novel variant of this network interdiction problem where the goal is to remove up to $r$ facilities to maximize the covering objective function over disconnected customers. We show that RFIC is NP-complete by observing it generalizes the small set bipartite vertex expansion problem (SSBVE), also known as the minimum $p$-union problem. We give an $O(nr^2)$-time algorithm for RFIC on trees, which also gives an $O(n^3)$-time algorithm for SSBVE on trees.

Authors: Ali Abbasi, Eli Friedman, Leana Golubchik, Samir Khuller, Marco Paolieri

Given a graph $G$ of $n$ nodes partitioned into facilities and customers, the $r$-edge interdiction covering problem (REIC) is to remove up to $r$ edges so as to maximize the total weight of customers disconnected from all facilities, which is called the covering objective function. While REIC is known to be NP-complete for general graphs, Fröhlich and Ruzika show that the problem can be solved in polynomial time when $G$ is a tree, providing an $O(n^7 r)$-time algorithm. We give an efficient $O(nr^2)$-time dynamic programming algorithm for REIC on trees that is fixed-parameter linear in $n$. Evaluating our solution on a benchmark of randomly generated tree networks with baselines of the Fröhlich and Ruzika algorithm and the Gurobi integer program solver, we demonstrate that in practice, our algorithm is both significantly faster and less sensitive to network topology and size. We extend our algorithm for REIC to graphs of bounded treewidth, a well-studied family of sparse graphs that generalizes trees, and obtain a matching runtime of $O(nr^2)$. We also consider the $r$-facility interdiction covering problem (RFIC), a novel variant of this network interdiction problem where the goal is to remove up to $r$ facilities to maximize the covering objective function over disconnected customers. We show that RFIC is NP-complete by observing it generalizes the small set bipartite vertex expansion problem (SSBVE), also known as the minimum $p$-union problem. We give an $O(nr^2)$-time algorithm for RFIC on trees, which also gives an $O(n^3)$-time algorithm for SSBVE on trees.

Quantum principal component analysis without eigenvector recovery

from arXiv: Data Structures and Algorithms

Authors: Yewei Yuan, Michele Minervini, Mark M. Wilde, Nana Liu

Principal component analysis (PCA) is traditionally implemented through a covariance or kernel matrix, leading-eigenvector extraction, and hard rank-$k$ projection. These steps can be computationally costly in high-dimensional and quantum-data settings, sensitive to small eigengaps, and unnecessary when downstream tasks only require principal-subspace scores. Such score-based objectives are important in applications such as anomaly detection, spectral-energy profiling, and other postselection tasks. To address these needs, we introduce a measurement-based soft PCA framework replacing the hard top-$k$ projector with an entropy-regularized Fermi--Dirac filter. This filter is the unique optimizer of an entropy-regularized variational formulation of PCA and converges to the classical PCA projector in the zero-temperature limit. This filter has a direct interpretation as a quantum measurement, which naturally suggests a quantum approach. For centered covariance operators represented by quantum feature states, a single fixed circuit, together with threshold calibration, accesses all optimal filters for different rank budgets or retained-variance levels without rank-dependent circuit updates or eigenvector recovery. For new inputs, the same calibrated quantum circuit yields soft principal subspace scores, spectral energy profiles, and postselected filtered states. The required centering of both training and test data is performed coherently inside the quantum protocol, which is particularly important for quantum data where no classical feature vectors or centered Gram matrix are directly available. By reframing PCA as a calibrated measurement task, this framework bypasses the need for iterative eigenvector extraction and achieves a dimension-independent sample complexity $O(η^{-2})$ for normalized fractional-rank or retained variance scoring at additive accuracy $η$.

Authors: Yewei Yuan, Michele Minervini, Mark M. Wilde, Nana Liu

Principal component analysis (PCA) is traditionally implemented through a covariance or kernel matrix, leading-eigenvector extraction, and hard rank-$k$ projection. These steps can be computationally costly in high-dimensional and quantum-data settings, sensitive to small eigengaps, and unnecessary when downstream tasks only require principal-subspace scores. Such score-based objectives are important in applications such as anomaly detection, spectral-energy profiling, and other postselection tasks. To address these needs, we introduce a measurement-based soft PCA framework replacing the hard top-$k$ projector with an entropy-regularized Fermi--Dirac filter. This filter is the unique optimizer of an entropy-regularized variational formulation of PCA and converges to the classical PCA projector in the zero-temperature limit. This filter has a direct interpretation as a quantum measurement, which naturally suggests a quantum approach. For centered covariance operators represented by quantum feature states, a single fixed circuit, together with threshold calibration, accesses all optimal filters for different rank budgets or retained-variance levels without rank-dependent circuit updates or eigenvector recovery. For new inputs, the same calibrated quantum circuit yields soft principal subspace scores, spectral energy profiles, and postselected filtered states. The required centering of both training and test data is performed coherently inside the quantum protocol, which is particularly important for quantum data where no classical feature vectors or centered Gram matrix are directly available. By reframing PCA as a calibrated measurement task, this framework bypasses the need for iterative eigenvector extraction and achieves a dimension-independent sample complexity $O(η^{-2})$ for normalized fractional-rank or retained variance scoring at additive accuracy $η$.

Smoothed Score Queries and the Complexity of Sampling

from arXiv: Data Structures and Algorithms

Authors: Jingbo Liu

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic \(\sqrtκ\) dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix \(Λ\), a smoothed-score query at noise level \(τ\) gives access to the resolvent \((Λ+τ^{-1}I)^{-1}\). Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\logκ+\log(e\sqrt d/δ_{\rm TV})\bigr)\log(e\sqrt d/δ_{\rm TV})\right)$ smoothed-score queries for total variation error \(δ_{\rm TV}\), improving the condition-number dependence from \(\sqrtκ\) to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in \(κ\); in particular, for fixed dimension and accuracy, the bit complexity is \(O(\log^2κ)\). To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an \(Ω(\logκ)\) lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

Authors: Jingbo Liu

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic \(\sqrtκ\) dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix \(Λ\), a smoothed-score query at noise level \(τ\) gives access to the resolvent \((Λ+τ^{-1}I)^{-1}\). Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\logκ+\log(e\sqrt d/δ_{\rm TV})\bigr)\log(e\sqrt d/δ_{\rm TV})\right)$ smoothed-score queries for total variation error \(δ_{\rm TV}\), improving the condition-number dependence from \(\sqrtκ\) to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in \(κ\); in particular, for fixed dimension and accuracy, the bit complexity is \(O(\log^2κ)\). To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an \(Ω(\logκ)\) lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

from arXiv: Data Structures and Algorithms

Authors: Sergei Tikhonov, Arsen Vasilyan

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.

Authors: Sergei Tikhonov, Arsen Vasilyan

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.