Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Friday, May 29

TR26-089 | Near Optimal Extractors for Samplable Sources under Nondeterministic Hardness | Marshall Ball, Eshan Chattopadhyay, Mohit Gurumukhani, Yunya Zhao

from ECCC Papers

We study the problem of constructing randomness extractors for samplable sources, introduced by Trevisan and Vadhan (FOCS 2000), a natural computational model of imperfect randomness, where the source $\mathbb{X}$ (on $n$ bits) is generated by a polynomial-size circuit. They showed how to extract from sources with min-entropy $(1-\alpha)n$ (for small $\alpha>0$) assuming exponential hardness against $\Sigma_6$-circuits. A line of recent works has made significant progress, either achieving extraction from such high min-entropy under weaker assumptions (e.g., nondeterministic circuits), or handling polynomially small min-entropy under stronger assumptions (e.g., hardness against $\Sigma_i$-circuits). These works output nearly all the randomness with polynomially small error. In a recent work, Oh and Shaltiel (STOC 2026) constructed extractors that work for logarithmic min-entropy under incomparable assumptions, outputting $1$ random bit with constant error. Our main result gives an extractor for samplable sources with min-entropy poly$\log(n)$ with polynomially small error and outputting almost all of the randomness, assuming hardness against nondeterministic circuits. The extractor also works for sources samplable with postselection by nondeterministic circuits. We can further reduce the entropy requirement to $O(\log n)$ at the expense of making the error constant and outputting only $1$ bit, matching the extractor of Oh and Shaltiel (STOC 2026). By prior work of Shaltiel (CCC 2025), our extractors imply hardness against nondeterministic circuits, and thus our assumption is essentially minimal.

We study the problem of constructing randomness extractors for samplable sources, introduced by Trevisan and Vadhan (FOCS 2000), a natural computational model of imperfect randomness, where the source $\mathbb{X}$ (on $n$ bits) is generated by a polynomial-size circuit. They showed how to extract from sources with min-entropy $(1-\alpha)n$ (for small $\alpha>0$) assuming exponential hardness against $\Sigma_6$-circuits. A line of recent works has made significant progress, either achieving extraction from such high min-entropy under weaker assumptions (e.g., nondeterministic circuits), or handling polynomially small min-entropy under stronger assumptions (e.g., hardness against $\Sigma_i$-circuits). These works output nearly all the randomness with polynomially small error. In a recent work, Oh and Shaltiel (STOC 2026) constructed extractors that work for logarithmic min-entropy under incomparable assumptions, outputting $1$ random bit with constant error. Our main result gives an extractor for samplable sources with min-entropy poly$\log(n)$ with polynomially small error and outputting almost all of the randomness, assuming hardness against nondeterministic circuits. The extractor also works for sources samplable with postselection by nondeterministic circuits. We can further reduce the entropy requirement to $O(\log n)$ at the expense of making the error constant and outputting only $1$ bit, matching the extractor of Oh and Shaltiel (STOC 2026). By prior work of Shaltiel (CCC 2025), our extractors imply hardness against nondeterministic circuits, and thus our assumption is essentially minimal.

Sharing different heartbeats

from Ben Recht

On the power of large-scale randomized trials in cardiology.

I’m always looking for examples where we need statistical reasoning and significance tests to change care, and I’m surprised no one has shoved cardiology in my face before.1 Cardiologists can make a powerful case that their field has been completely revolutionized by turning to randomized controlled trials to guide their practice.

In the late 1980s, cardiologists ran massive clinical trials to improve the standard of care for heart attacks. The GISSI trial enrolled 12,000 patients and found a 2% reduction in hospital mortality when treating heart attack patients with the anti-clotting medicine streptokinase. ISIS-2 (17,000 patients!) found that adding aspirin as an additional anti-clotting agent resulted in an additional 2% reduction in mortality.

These percentages were absolute percentages. With neither treatment, 12% of the patients died within five weeks of the heart attack. With aspirin and streptokinase, that percentage dropped to 8%. For those who care about these things (like me and three other people on the internet), the p-values in these studies were all on the order of 1 in a million. The trial size here mattered a lot. Had the trials been 10 times smaller, the same event rates would not have passed a standard p<0.05 significance threshold. These trials are textbook cases for RCT gold standard evangelists.

Moreover, the megatrial culture in cardiology has famous examples of rooting out harmful practices. The most famous study is the CAST trial of the early 1990s. Standard practice had been to pharmacologically suppress arrhythmias after heart attacks. Using drugs to make the heart look more “normal” seemed like a good idea. The CAST trialists enrolled 1,500 patients and shockingly found significant harm. 5% of patients died within 10 months on the anti-arrhythmia treatment, whereas only 2.5% died on placebo. The confidence intervals were narrow, and the p-value was again tiny. Something that felt reasonable—suppressing unusual heartbeats—was deemed harmful, and the practice was ended.

What can be said of the net benefits of these treatments after 40 years? Though robust, the effect sizes in these trials are all small, ranging from 1% to 3%. How can we be sure that the effects are cumulative and that modifying the standard of care is actually helping cardiology patients?

Regrettably, we now have to turn to epidemiology. No IRB will authorize an RCT of the old methods—essentially bed rest and oxygen—against the current therapeutic regimen of angioplasty, clot reducers, blood thinners, beta blockers, and statins. That would be akin to doing an RCT with a control group assigned to blood letting. But the improvement in survival of heart attacks is undeniable. Estimates suggest that the current death rates have dropped from somewhere in the range of 15-20% to about 4-5%. That’s quite astounding. 50 years of improving practice have accumulated a 3-6 fold reduction in deaths. You couldn’t ask for something better. Cardiology is a compelling and fascinating case study of the power of outcome optimization.

Why was there so much success in cardiology? You could argue that heart attack is a nearly ideal case for this sort of trial-based optimization. The endpoint of “death” is the most unambiguous in medicine. The adverse endpoint occurs fairly quickly (within weeks), as opposed to, say, oncology, where treatments can take years to assess. Moreover, heart attacks are unfortunately very common. The silver lining of their commonality is large pragmatic trials are relatively easy to assemble. Large trials are essential when the effect sizes are only 1-2%.

Now, even though cardiology is a poster child for evidence-based medicine, it’s important to note how the actual advancement of practice was not simply by chaining together a sequence of massive RCTs. First, not every trial was as unambiguous as GISSI, ISIS-2, and CAST. Two trials in the 1990s assessed the relative value of streptokinase and tPA, two anticoagulant agents. The GUSTO trial enrolled 41,000 patients and found tPA reduced death by 1%. The GISSI-2 trial enrolled 12,500 patients and found no difference between tPA and streptokinase. They contradicted each other! Post-trial analyses concluded that the trials had administered tPA differently, and this explained why GUSTO found a benefit. It was a careful study of the trial after the fact that suggested the true benefit of the drug. This analysis required an appeal to what pharmacological knowledge, and wasn’t simply adjudicated by randomized experimentation. Moreover, a second post-trial analysis concluded that tPA increased the risk of stroke. The story is complicated. Mega-trials alone can’t solve practice.

Even the unambiguous trials were already pointing to a major challenge with RCTs. As a treatment regimen becomes more complex, ironing out the fine details requires an exponentially increasing number of RCTs. If you want to compare the effect of three different timings and three different dosages of a single drug, you need nine arms in your trial. If you want to additionally see if a second drug is helpful, you need 18.

Finally, not every guideline of practice comes from randomized trials. Fewer than 10% of the American College of Cardiology/American Heart Association clinical guidelines are backed by large RCTs. I was surprised to learn that there is no convincing RCT showing that bed rest is harmful. We have ended the practice of long-term bed rest regardless. So how cardiologists make recommendations to their patients remains complicated. How does this sort of therapy relate to individual doctor-patient experiences? That’s the next question I’m hoping to answer.

Subscribe now

1

Thanks to cardiologist Guy Armstrong, whose comments inspired me to put this post together.

By Ben Recht

TR26-088 | A digest of the work of Rothblum, Vadhan, and Wigderson (2013) | Oded Goldreich

from ECCC Papers

The work of Rothblum, Vadhan, and Wigderson ({\em STOC}, 2013) is pivotal to the study of interactive proofs of proximity (IPPs). We present the main contents of their work, while clarify a few (conceptual) aspects. Specifically, starting with the definition of IPP systems, our main focus is on the construction of IPP systems for any property in log-space uniform $\cal NC$ (and beyond). We also present limitations on the power of constant-round IPP systems.

The work of Rothblum, Vadhan, and Wigderson ({\em STOC}, 2013) is pivotal to the study of interactive proofs of proximity (IPPs). We present the main contents of their work, while clarify a few (conceptual) aspects. Specifically, starting with the definition of IPP systems, our main focus is on the construction of IPP systems for any property in log-space uniform $\cal NC$ (and beyond). We also present limitations on the power of constant-round IPP systems.

TR26-087 | Tight Bounds for Sketching Intersecting Sets, with Applications | Flavio Chierichetti, Mirko Giacchini, Ravi Kumar, Alessandro Panconesi, Erasmo Tani, Andrew Tomkins

from ECCC Papers

In this work, we study the space complexity of sketching the intersection profile of a distribution $D$ on $2^{[n]}$. Specifically, we seek a succinct data structure that, for any query set $S \subseteq [n]$, approximates the quantity $\Pr_{T \sim D}[T \cap S \neq \emptyset]$ to within a small constant additive error. Via a probabilistic packing argument, we show that the worst-case bit complexity of this problem is $\Omega(n^2)$, which we also prove to be tight. We use this lower bound to settle the complexity of three sketching problems. (i) We show that sketching vertex neighborhood sizes in graphs requires $\Omega(n^2)$ bits, standing in sharp contrast to the $\tilde{O}(n)$ complexity of sketching edge cuts. (ii) We obtain tight lower and upper bounds of $\tilde{\Theta}(n^2)$ for sketching coverage functions with additive and multiplicative errors. (iii) We prove an $\Omega(n^2)$ lower bound for sketching Random Utility Models under the $\ell_\infty$-norm, improving upon the previous $\Omega(n \log n)$ bound and matching the upper bound to within logarithmic factors.

In this work, we study the space complexity of sketching the intersection profile of a distribution $D$ on $2^{[n]}$. Specifically, we seek a succinct data structure that, for any query set $S \subseteq [n]$, approximates the quantity $\Pr_{T \sim D}[T \cap S \neq \emptyset]$ to within a small constant additive error. Via a probabilistic packing argument, we show that the worst-case bit complexity of this problem is $\Omega(n^2)$, which we also prove to be tight. We use this lower bound to settle the complexity of three sketching problems. (i) We show that sketching vertex neighborhood sizes in graphs requires $\Omega(n^2)$ bits, standing in sharp contrast to the $\tilde{O}(n)$ complexity of sketching edge cuts. (ii) We obtain tight lower and upper bounds of $\tilde{\Theta}(n^2)$ for sketching coverage functions with additive and multiplicative errors. (iii) We prove an $\Omega(n^2)$ lower bound for sketching Random Utility Models under the $\ell_\infty$-norm, improving upon the previous $\Omega(n \log n)$ bound and matching the upper bound to within logarithmic factors.

The Complexity of Verifying Feedforward Neural Networks in Quantised Settings

from arXiv: Computational Complexity

Authors: Eric Alsmann, Martin Lange, Marco Sälzer

We investigate the computational complexity of neural network verification in quantised settings. We distinguish three classes of Feedforward Neural Networks (FNNs): rational FNNs with exact rational weights, quantised FNNs whose weights come from a finite-width arithmetic, and dynamically quantised FNNs in which rational networks are evaluated with respect to a given finite-width arithmetic. We consider two types of specifications used in the literature. Linear programming (LP) specifications are conjunctions of linear constraints, while bit-vector (BV) specifications allow reasoning at the bit level and can express non-linear constraints. Our results give a complexity landscape of these verification problems. For quantised FNNs with fixed arithmetic precision, we show that verification under both LP and BV specifications remains NP-complete, matching the complexity of the rational case. For dynamically quantised FNNs with BV specifications, we establish upper bounds, complementing a previously known PSPACE-hardness result.

Authors: Eric Alsmann, Martin Lange, Marco Sälzer

We investigate the computational complexity of neural network verification in quantised settings. We distinguish three classes of Feedforward Neural Networks (FNNs): rational FNNs with exact rational weights, quantised FNNs whose weights come from a finite-width arithmetic, and dynamically quantised FNNs in which rational networks are evaluated with respect to a given finite-width arithmetic. We consider two types of specifications used in the literature. Linear programming (LP) specifications are conjunctions of linear constraints, while bit-vector (BV) specifications allow reasoning at the bit level and can express non-linear constraints. Our results give a complexity landscape of these verification problems. For quantised FNNs with fixed arithmetic precision, we show that verification under both LP and BV specifications remains NP-complete, matching the complexity of the rational case. For dynamically quantised FNNs with BV specifications, we establish upper bounds, complementing a previously known PSPACE-hardness result.

S2MDF: A Plug-And-Play Layer for Intersection-Free Multi-Object Signed Distance Fields

from arXiv: Computational Geometry

Authors: Deniz Sayin Mercadier, Federico Stella, Aurel Bizeau, Nicolas Talabot, Pascal Fua

Compositional implicit surface representations model scenes as collections of objects, each encoded by a Signed Distance Field (SDF). A fundamental limitation of this approach is that multiple SDFs can produce geometries that interpenetrate, violating physical plausibility. Existing mitigation strategies rely on soft penalty terms that reduce but do not eliminate intersections, and require careful loss weighting. To truly prevent interpenetration, we propose a hard constraint on vector-valued SDFs and introduce S2MDF, a lightweight plug-and-play module that enforces the constraint on any object-compositional SDF representation without architectural modifications. It introduces negligible computational overhead and is compatible with linearly-interpolated standard meshing algorithms such as Marching Cubes. It can be applied during training or as a post-processing step. Experiments on multiple state-of-the-art compositional methods show that S2MDF reduces intersections to numerical precision while preserving reconstruction quality, outperforming existing mitigation strategies.

Authors: Deniz Sayin Mercadier, Federico Stella, Aurel Bizeau, Nicolas Talabot, Pascal Fua

Compositional implicit surface representations model scenes as collections of objects, each encoded by a Signed Distance Field (SDF). A fundamental limitation of this approach is that multiple SDFs can produce geometries that interpenetrate, violating physical plausibility. Existing mitigation strategies rely on soft penalty terms that reduce but do not eliminate intersections, and require careful loss weighting. To truly prevent interpenetration, we propose a hard constraint on vector-valued SDFs and introduce S2MDF, a lightweight plug-and-play module that enforces the constraint on any object-compositional SDF representation without architectural modifications. It introduces negligible computational overhead and is compatible with linearly-interpolated standard meshing algorithms such as Marching Cubes. It can be applied during training or as a post-processing step. Experiments on multiple state-of-the-art compositional methods show that S2MDF reduces intersections to numerical precision while preserving reconstruction quality, outperforming existing mitigation strategies.

SWORD: Spectral Wasserstein Online Regime Detection in Dynamic Networks

from arXiv: Computational Geometry

Authors: Izhar Ali

Online change point detection in dynamic graphs requires comparing graphs as they arrive, in time linear in the number of edges, without parametric assumptions. Recent spectral methods address scale via the Kernel Polynomial Method (KPM): SCPD computes Chebyshev moments of the normalized Laplacian, discretizes them into a density-of-states histogram, and scores the histogram with SVD plus cosine similarity. We introduce SWORD, which computes the same moments and instead compares their mean across two adjacent time windows by their $L_1$ distance. On three real-world benchmarks (MIT Reality, AskUbuntu, Enron), this lifts mean $F_1$ from SCPD's $0.27$ to $0.79$, with SCPD failing to detect any change on Enron. A controlled cascading ablation attributes the gap to two design choices: the two-window mean structure (dominant on MIT) and the $L_1$ metric on those mean vectors (dominant on Enron). A bin-width sweep rules out histogram discretization -- SCPD's most visible architectural choice -- as the driver. SWORD inherits SCPD's KPM core, so per-graph cost is $O(KRm)$ with no eigendecomposition, scaling to $86{,}000$-node networks. With per-dataset tuning it matches the offline TIRE autoencoder on mean $F_1$ and attains the highest precision among online methods ($0.91$, only $2$ false positives across the three benchmarks).

Authors: Izhar Ali

Online change point detection in dynamic graphs requires comparing graphs as they arrive, in time linear in the number of edges, without parametric assumptions. Recent spectral methods address scale via the Kernel Polynomial Method (KPM): SCPD computes Chebyshev moments of the normalized Laplacian, discretizes them into a density-of-states histogram, and scores the histogram with SVD plus cosine similarity. We introduce SWORD, which computes the same moments and instead compares their mean across two adjacent time windows by their $L_1$ distance. On three real-world benchmarks (MIT Reality, AskUbuntu, Enron), this lifts mean $F_1$ from SCPD's $0.27$ to $0.79$, with SCPD failing to detect any change on Enron. A controlled cascading ablation attributes the gap to two design choices: the two-window mean structure (dominant on MIT) and the $L_1$ metric on those mean vectors (dominant on Enron). A bin-width sweep rules out histogram discretization -- SCPD's most visible architectural choice -- as the driver. SWORD inherits SCPD's KPM core, so per-graph cost is $O(KRm)$ with no eigendecomposition, scaling to $86{,}000$-node networks. With per-dataset tuning it matches the offline TIRE autoencoder on mean $F_1$ and attains the highest precision among online methods ($0.91$, only $2$ false positives across the three benchmarks).

Quantum encodings that preserve persistent homology

from arXiv: Computational Geometry

Authors: Arthur J. Parzygnat, Andrew Vlasic

Given a data set with a notion of distance, such as a point cloud in Euclidean space, topological data analysis (TDA) uses techniques from algebraic topology and metric geometry to infer the topology of a hypothetical manifold from which the data are sampled. This inference is achieved by calculating topological invariants, some of which are difficult to compute classically. Meanwhile, quantum TDA utilizes quantum processes to extract the invariants used in making such inferences in an attempt to speed up the computations. Because applying transformations to the original classical dataset could alter the associated topological invariants, we investigate which quantum encodings would best preserve the invariants of the original dataset. This line of inquiry is distinct from standard approaches in quantum TDA, whose typical starting point is not from the classical dataset directly, but rather from the associated combinatorial objects, such as simplicial complexes, which typically demand a lot of resources to construct. We take the first step at a more direct approach by focusing on which quantum encodings acting directly on the data are admissible for applying quantum algorithms to extract topological features from classical datasets.

Authors: Arthur J. Parzygnat, Andrew Vlasic

Given a data set with a notion of distance, such as a point cloud in Euclidean space, topological data analysis (TDA) uses techniques from algebraic topology and metric geometry to infer the topology of a hypothetical manifold from which the data are sampled. This inference is achieved by calculating topological invariants, some of which are difficult to compute classically. Meanwhile, quantum TDA utilizes quantum processes to extract the invariants used in making such inferences in an attempt to speed up the computations. Because applying transformations to the original classical dataset could alter the associated topological invariants, we investigate which quantum encodings would best preserve the invariants of the original dataset. This line of inquiry is distinct from standard approaches in quantum TDA, whose typical starting point is not from the classical dataset directly, but rather from the associated combinatorial objects, such as simplicial complexes, which typically demand a lot of resources to construct. We take the first step at a more direct approach by focusing on which quantum encodings acting directly on the data are admissible for applying quantum algorithms to extract topological features from classical datasets.

Low-degree estimation thresholds in planted hypergraphs and tensor PCA

from arXiv: Data Structures and Algorithms

Authors: Daniel Fu, Youngtak Sohn

A central question in high-dimensional statistics is to understand statistical--computational gaps: regimes in which recovering a hidden signal is information-theoretically possible but conjectured to be computationally intractable. The low-degree framework offers a concrete way to study this gap by restricting attention to estimators that are polynomials of degree at most $D$ in the observed data. In this paper, we study low-degree estimation in planted dense subhypergraph, sparse tensor PCA, and tensor PCA with a general prior. For the planted dense subhypergraph model on $n$ vertices, we identify two regimes depending on whether the planted set is larger or smaller than $\sqrt{n}$. Above this scale, we identify a sharp threshold for low-degree estimation. Below this scale, we establish hardness in the regimes predicted by prior work, thereby resolving a question of Schramm and Wein (2022) and Sohn and Wein (2025). For sparse tensor PCA, we identify an analogous sharp phase transition. For tensor PCA with a general prior, we prove a low-degree estimation lower bound at the critical signal scale, matching the degree--signal tradeoff suggested by prior work. Our lower bounds apply to degree $D=n^δ$, where $n$ is the dimension and $δ>0$ is a constant, and we complement them with corresponding low-degree upper bounds. In addition, for planted dense subhypergraph and sparse tensor PCA above the $\sqrt{n}$ scale, we convert our upper bounds into polynomial-time algorithms that achieve almost exact recovery above the sharp threshold, yielding polynomial-time algorithms succeeding up to this threshold. Our proofs extend the framework of Sohn and Wein (2025) through a conditional variant that yields the correct signal-to-noise ratio in settings where the unconditional approach is insufficient.

Authors: Daniel Fu, Youngtak Sohn

A central question in high-dimensional statistics is to understand statistical--computational gaps: regimes in which recovering a hidden signal is information-theoretically possible but conjectured to be computationally intractable. The low-degree framework offers a concrete way to study this gap by restricting attention to estimators that are polynomials of degree at most $D$ in the observed data. In this paper, we study low-degree estimation in planted dense subhypergraph, sparse tensor PCA, and tensor PCA with a general prior. For the planted dense subhypergraph model on $n$ vertices, we identify two regimes depending on whether the planted set is larger or smaller than $\sqrt{n}$. Above this scale, we identify a sharp threshold for low-degree estimation. Below this scale, we establish hardness in the regimes predicted by prior work, thereby resolving a question of Schramm and Wein (2022) and Sohn and Wein (2025). For sparse tensor PCA, we identify an analogous sharp phase transition. For tensor PCA with a general prior, we prove a low-degree estimation lower bound at the critical signal scale, matching the degree--signal tradeoff suggested by prior work. Our lower bounds apply to degree $D=n^δ$, where $n$ is the dimension and $δ>0$ is a constant, and we complement them with corresponding low-degree upper bounds. In addition, for planted dense subhypergraph and sparse tensor PCA above the $\sqrt{n}$ scale, we convert our upper bounds into polynomial-time algorithms that achieve almost exact recovery above the sharp threshold, yielding polynomial-time algorithms succeeding up to this threshold. Our proofs extend the framework of Sohn and Wein (2025) through a conditional variant that yields the correct signal-to-noise ratio in settings where the unconditional approach is insufficient.

Elfs, transducers and quantum walks

from arXiv: Data Structures and Algorithms

Authors: Simon Apers, Jérémie Roland, Yuxin Zhang

Electric flow sampling (elfs) is a new tool in the quantum walk toolbox and a useful primitive for solving search, sampling and optimization problems on graphs. We refine this tool by showing that there exists a zero-error transducer for implementing elfs. More broadly, we establish a zero-error transducer for reflecting about the intersection of two subspaces, yielding an errorfree transducer version of the effective gap lemma. Building on this result, we obtain improved quantum walk algorithms for estimating effective resistances and span program witness sizes with an optimal error scaling, and for sampling from the random walk arrival distribution, via the composition of many elfs. Using this last algorithm, we obtain an up-to-quadratic quantum speedup for semi-supervised learning on expander graphs.

Authors: Simon Apers, Jérémie Roland, Yuxin Zhang

Electric flow sampling (elfs) is a new tool in the quantum walk toolbox and a useful primitive for solving search, sampling and optimization problems on graphs. We refine this tool by showing that there exists a zero-error transducer for implementing elfs. More broadly, we establish a zero-error transducer for reflecting about the intersection of two subspaces, yielding an errorfree transducer version of the effective gap lemma. Building on this result, we obtain improved quantum walk algorithms for estimating effective resistances and span program witness sizes with an optimal error scaling, and for sampling from the random walk arrival distribution, via the composition of many elfs. Using this last algorithm, we obtain an up-to-quadratic quantum speedup for semi-supervised learning on expander graphs.

On Language Generation in the Limit with Bounded Memory

from arXiv: Data Structures and Algorithms

Authors: Jon Kleinberg, Anay Mehrotra, Amin Saberi, Grigoris Velegkas

We study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language one at a time and must eventually output only new valid examples. Prior work assumes access to the entire history, a strong assumption since realistic algorithms retain limited past information. Classical work in learning theory shows memory constraints dramatically alter learnability; we extend this to language generation. First, we study memoryless generators. Under a mild enumeration restriction, every countable collection of infinite languages remains generable without memory. Without this restriction, we exactly characterize when memoryless generation is possible. For finite collections, we characterize the optimal minimax density achievable by memoryless generators -- the best density guaranteed against any collection of a given size. This combinatorial bound relies on Sperner's theorem and symmetric chain decompositions. We further show that a sliding window of the last $W$ examples does not improve this worst-case density, whereas allowing it to store $b$ adaptively chosen past examples improves the achievable density for every $b \geq 1$. Finally, we revisit identification in the limit, where the learner must converge to a single correct hypothesis for the target language. We focus on its incremental variant, where the learner remembers only its previous guess. Here, although exact identification fails on a collection of just three languages, a mild relaxation requiring convergence to an ``approximate'' version of the target is achievable for every finite collection. These results show bounded memory affects these tasks differently: generation remains achievable for every countable collection, while density and identification are confined to finite collections, with guarantees weakening as the collection grows.

Authors: Jon Kleinberg, Anay Mehrotra, Amin Saberi, Grigoris Velegkas

We study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language one at a time and must eventually output only new valid examples. Prior work assumes access to the entire history, a strong assumption since realistic algorithms retain limited past information. Classical work in learning theory shows memory constraints dramatically alter learnability; we extend this to language generation. First, we study memoryless generators. Under a mild enumeration restriction, every countable collection of infinite languages remains generable without memory. Without this restriction, we exactly characterize when memoryless generation is possible. For finite collections, we characterize the optimal minimax density achievable by memoryless generators -- the best density guaranteed against any collection of a given size. This combinatorial bound relies on Sperner's theorem and symmetric chain decompositions. We further show that a sliding window of the last $W$ examples does not improve this worst-case density, whereas allowing it to store $b$ adaptively chosen past examples improves the achievable density for every $b \geq 1$. Finally, we revisit identification in the limit, where the learner must converge to a single correct hypothesis for the target language. We focus on its incremental variant, where the learner remembers only its previous guess. Here, although exact identification fails on a collection of just three languages, a mild relaxation requiring convergence to an ``approximate'' version of the target is achievable for every finite collection. These results show bounded memory affects these tasks differently: generation remains achievable for every countable collection, while density and identification are confined to finite collections, with guarantees weakening as the collection grows.

Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion

from arXiv: Data Structures and Algorithms

Authors: Anay Mehrotra, Phuc Tran, Van H. Vu, Manolis Zampetakis

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we observe $n$ units across $m$ times under unknown, non-uniform treatment assignments. The data in this setting is naturally represented as a matrix of all unit--time treatment effects. Estimating heterogeneous treatment effects can then be expressed as obtaining a good estimation of each row's average in this matrix. This allows us to formulate the problem as matrix completion, which can be solved under natural low-rankness assumptions. However, existing matrix-completion guarantees are not powerful enough to get meaningful bounds for the per-row guarantee required for estimating the heterogeneous treatment effect; roughly speaking, they are only useful for estimating average treatment effect bounds, as also illustrated in a recent line of work. We give a simple, computationally efficient estimator that, without knowledge of the propensities and under standard low-rankness and regularity assumptions, achieves a row-wise $\ell_2$ error of $\tilde{O}(\sqrt{\frac{1}{n} + \frac{n}{m^2}})$. Technically, our analysis establishes the first sharp row-wise $\ell_2$-perturbation bound for low-rank approximation, complementing existing spectral-, Frobenius-, and entrywise perturbation theory.

Authors: Anay Mehrotra, Phuc Tran, Van H. Vu, Manolis Zampetakis

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we observe $n$ units across $m$ times under unknown, non-uniform treatment assignments. The data in this setting is naturally represented as a matrix of all unit--time treatment effects. Estimating heterogeneous treatment effects can then be expressed as obtaining a good estimation of each row's average in this matrix. This allows us to formulate the problem as matrix completion, which can be solved under natural low-rankness assumptions. However, existing matrix-completion guarantees are not powerful enough to get meaningful bounds for the per-row guarantee required for estimating the heterogeneous treatment effect; roughly speaking, they are only useful for estimating average treatment effect bounds, as also illustrated in a recent line of work. We give a simple, computationally efficient estimator that, without knowledge of the propensities and under standard low-rankness and regularity assumptions, achieves a row-wise $\ell_2$ error of $\tilde{O}(\sqrt{\frac{1}{n} + \frac{n}{m^2}})$. Technically, our analysis establishes the first sharp row-wise $\ell_2$-perturbation bound for low-rank approximation, complementing existing spectral-, Frobenius-, and entrywise perturbation theory.

A Radius-Sensitive Approximation Algorithm for Connected Submodular Maximization

from arXiv: Data Structures and Algorithms

Authors: Philip Cervenjak, Junhao Gan, Naonori Kakimura, Seeun William Umboh, Anthony Wirth

Connected Submodular Maximization (CSM) is a graph problem with important applications to wireless network deployment, path planning, epidemic outbreaks, and cancer genome studies. In CSM, we are given a graph $G$, a non-negative monotone submodular function $f$ on subsets of the vertex set of $G$, and an integer $k$. The goal is to select a tree in $G$, with $k$ edges, whose vertex set maximizes $f$. We also study the more general Directed and Directed Rooted variants of CSM (DCSM and DRCSM respectively). In both variants, $G$ is directed and the solution must be an out-tree in $G$, with $k$ edges, whose vertex set maximizes $f$; DRCSM further specifies a vertex to be the root of the selected out-tree. For CSM, several previous works have proposed polynomial time approximation algorithms; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{\sqrt{k}})$-approximation. We can also parameterize the approximation factor by the radius of the optimal solution, denoted by $r$; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{r})$-approximation. In this paper, we improve on the state-of-the-art approximation factor for CSM with respect to $r$ as well as $k$, noting that $r \leq k$. We propose a polynomial time framework that, for (Directed) CSM, achieves a $Ω(\frac{\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation for every constant $\varepsilon \in (0, 1]$. For DRCSM, our framework achieves a $Ω(\frac{δ\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation that violates the size constraint by at most a factor of $1 + δ$ for every $δ\in [\frac{1}{k}, 1]$. A key component of our framework is GreedyRadius, which is an algorithm for DRCSM that takes another algorithm with a bicriteria approximation factor in terms of $k$ and outputs a solution with the same bicriteria approximation factor (up to constants) in terms of $r$.

Authors: Philip Cervenjak, Junhao Gan, Naonori Kakimura, Seeun William Umboh, Anthony Wirth

Connected Submodular Maximization (CSM) is a graph problem with important applications to wireless network deployment, path planning, epidemic outbreaks, and cancer genome studies. In CSM, we are given a graph $G$, a non-negative monotone submodular function $f$ on subsets of the vertex set of $G$, and an integer $k$. The goal is to select a tree in $G$, with $k$ edges, whose vertex set maximizes $f$. We also study the more general Directed and Directed Rooted variants of CSM (DCSM and DRCSM respectively). In both variants, $G$ is directed and the solution must be an out-tree in $G$, with $k$ edges, whose vertex set maximizes $f$; DRCSM further specifies a vertex to be the root of the selected out-tree. For CSM, several previous works have proposed polynomial time approximation algorithms; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{\sqrt{k}})$-approximation. We can also parameterize the approximation factor by the radius of the optimal solution, denoted by $r$; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{r})$-approximation. In this paper, we improve on the state-of-the-art approximation factor for CSM with respect to $r$ as well as $k$, noting that $r \leq k$. We propose a polynomial time framework that, for (Directed) CSM, achieves a $Ω(\frac{\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation for every constant $\varepsilon \in (0, 1]$. For DRCSM, our framework achieves a $Ω(\frac{δ\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation that violates the size constraint by at most a factor of $1 + δ$ for every $δ\in [\frac{1}{k}, 1]$. A key component of our framework is GreedyRadius, which is an algorithm for DRCSM that takes another algorithm with a bicriteria approximation factor in terms of $k$ and outputs a solution with the same bicriteria approximation factor (up to constants) in terms of $r$.

Quadratic Sums-of-Powers for Fixed-Parameter Tractable Quantum-Circuit Simulation

from arXiv: Data Structures and Algorithms

Authors: Alexis de Colnet, Floris Geerts, Rihan Hai, Alfons Laarman, Joon Hyung Lee, Guillermo A. Pérez

Strongly simulating a quantum circuit, that is, computing an output amplitude, amounts to summing the circuit's Feynman paths, a weighted count over assignments to the Boolean ``path'' variables. The circuit's gates induce correlations among these variables, forming a graph whose structure determines the hardness of the simulation task. This sum-of-powers viewpoint underlies recent simulators built on knowledge-representation tools from artificial intelligence, namely binary decision diagrams and weighted model counting. We show that the structural quantity most accurately governing the difficulty is the rank-width of the path-variable graph, and we give an algorithm that evaluates the amplitude in time that is exponential only in this rank-width and polynomial in the circuit size. Rank-width can be far smaller than the widths that control competing methods: as corollaries, our algorithm reproduces a recent decision-diagram simulation breakthrough as a special case and matches the Markov--Shi tensor-network contraction bound. To complement this, we exhibit circuit families on which our algorithm provably beats both competing methods. The new method applies to every circuit built from Hadamard and diagonal gates, in particular to circuits over Clifford+T. In practical terms, general-purpose decision-diagram and model-counting tools can serve as the workhorse, with our specialized algorithm dispatched to exploit a small rank-width of the associated graph when it is present.

Authors: Alexis de Colnet, Floris Geerts, Rihan Hai, Alfons Laarman, Joon Hyung Lee, Guillermo A. Pérez

Strongly simulating a quantum circuit, that is, computing an output amplitude, amounts to summing the circuit's Feynman paths, a weighted count over assignments to the Boolean ``path'' variables. The circuit's gates induce correlations among these variables, forming a graph whose structure determines the hardness of the simulation task. This sum-of-powers viewpoint underlies recent simulators built on knowledge-representation tools from artificial intelligence, namely binary decision diagrams and weighted model counting. We show that the structural quantity most accurately governing the difficulty is the rank-width of the path-variable graph, and we give an algorithm that evaluates the amplitude in time that is exponential only in this rank-width and polynomial in the circuit size. Rank-width can be far smaller than the widths that control competing methods: as corollaries, our algorithm reproduces a recent decision-diagram simulation breakthrough as a special case and matches the Markov--Shi tensor-network contraction bound. To complement this, we exhibit circuit families on which our algorithm provably beats both competing methods. The new method applies to every circuit built from Hadamard and diagonal gates, in particular to circuits over Clifford+T. In practical terms, general-purpose decision-diagram and model-counting tools can serve as the workhorse, with our specialized algorithm dispatched to exploit a small rank-width of the associated graph when it is present.

Selection Hyper-heuristics Can Automatically Adjust the Learning Period to Optimally Solve Pseudo-Boolean Problems

from arXiv: Data Structures and Algorithms

Authors: Benjamin Doerr, Pietro S. Oliveto, John Alasdair Warwicker

The Random Gradient hyper-heuristic was recently shown to be able to learn the optimal neighbourhood size when optimizing the LeadingOnes benchmark via the Randomised Local Search (RLS) meta-heuristic. However, for this to happen, a learning period of a certain length $τ$ had to be used, differently from classic hyper-heuristics, which change their behaviour based on the success of only the previous iteration. In this paper, we show how to automatically set this new parameter value, relieving the user from the non-trivial task of controlling this novel algorithm parameter. We prove that the resulting hyper-heuristic selects the optimal neighbourhood size in a $1-o(1)$ fraction of the iterations and, consequently, optimises the LeadingOnes benchmark in the best possible time (apart from lower-order terms) achievable with these neighborhood sizes.

Authors: Benjamin Doerr, Pietro S. Oliveto, John Alasdair Warwicker

The Random Gradient hyper-heuristic was recently shown to be able to learn the optimal neighbourhood size when optimizing the LeadingOnes benchmark via the Randomised Local Search (RLS) meta-heuristic. However, for this to happen, a learning period of a certain length $τ$ had to be used, differently from classic hyper-heuristics, which change their behaviour based on the success of only the previous iteration. In this paper, we show how to automatically set this new parameter value, relieving the user from the non-trivial task of controlling this novel algorithm parameter. We prove that the resulting hyper-heuristic selects the optimal neighbourhood size in a $1-o(1)$ fraction of the iterations and, consequently, optimises the LeadingOnes benchmark in the best possible time (apart from lower-order terms) achievable with these neighborhood sizes.

TC-MIS: Maximal Independent Set on Tensor-cores

from arXiv: Data Structures and Algorithms

Authors: Prajjwal Nijhara, Dip Sankar Banerjee

Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to irregular memory access and workload imbalance, specialized GPU algorithms have achieved good performance, processing million-vertex graphs in milliseconds. Modern GPUs are equipped with Tensor Cores (TCs), specialized units for matrix operations with 8-16x higher throughput than CUDA Cores (CCs), which are extensively used for ML, DL, and inference tasks but remain largely unexplored for graph algorithms. In this paper, we present TC-MIS, a TC-accelerated algorithm that reformulates key phases of MIS computation as sparse matrix-vector multiplication (SpMV). TC-MIS tiles the graph adjacency matrix and employs Warp Matrix Multiply-Accumulate (WMMA) operations to transform irregular graph traversal into regular, massively parallel computation. Our evaluation across TC-enabled microarchitectures (Ampere, Ada Lovelace, Hopper, Blackwell) demonstrates that TC-MIS achieves an average speedup of 2.84x on RTX A5000, 4.84x on L40S, 18.80x on H200 GPUs, and 5.20x on RTX 5080 with a maximum speedup of 44.38x on H200 GPU over state-of-the-art methods, while maintaining solution quality comparable to that obtained by established heuristics that produce near-maximum independent sets.

Authors: Prajjwal Nijhara, Dip Sankar Banerjee

Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to irregular memory access and workload imbalance, specialized GPU algorithms have achieved good performance, processing million-vertex graphs in milliseconds. Modern GPUs are equipped with Tensor Cores (TCs), specialized units for matrix operations with 8-16x higher throughput than CUDA Cores (CCs), which are extensively used for ML, DL, and inference tasks but remain largely unexplored for graph algorithms. In this paper, we present TC-MIS, a TC-accelerated algorithm that reformulates key phases of MIS computation as sparse matrix-vector multiplication (SpMV). TC-MIS tiles the graph adjacency matrix and employs Warp Matrix Multiply-Accumulate (WMMA) operations to transform irregular graph traversal into regular, massively parallel computation. Our evaluation across TC-enabled microarchitectures (Ampere, Ada Lovelace, Hopper, Blackwell) demonstrates that TC-MIS achieves an average speedup of 2.84x on RTX A5000, 4.84x on L40S, 18.80x on H200 GPUs, and 5.20x on RTX 5080 with a maximum speedup of 44.38x on H200 GPU over state-of-the-art methods, while maintaining solution quality comparable to that obtained by established heuristics that produce near-maximum independent sets.

Sampling Directed Eulerian Tours in $\widetilde O(m^{3/2})$ Time

from arXiv: Data Structures and Algorithms

Authors: Nima Anari

We give a randomized algorithm that samples a nearly uniform Eulerian tour of a directed Eulerian multigraph with $m$ arcs in $\widetilde O(m^{3/2})$ time. The guarantee is worst-case, applies to arbitrary directed Eulerian multigraphs, and breaks the $mn$-type arborescence-sampling barrier on sparse graphs. The core case is a $2$-in/$2$-out graph. We introduce a new local Markov chain, the flip--repair walk: one step locally splits a tour into two circuits and then chooses uniformly among the local flips that repair the state to one tour. We prove that this walk mixes in nearly linear many steps and implement the walk using a dynamic chord data structure. A pointwise degree-reduction wrapper extends the sampler from this degree-two core to arbitrary degrees while preserving the $\widetilde O(m^{3/2})$ total running time. The high-level algorithmic plan, the switching-network reduction, and the dynamic data-structure argument were devised by the author. The author conjectured the mixing theorem underlying the analysis, and GPT 5.5 Pro Extended produced its linear-algebra proof. Codex assisted with manuscript assembly and typesetting.

Authors: Nima Anari

We give a randomized algorithm that samples a nearly uniform Eulerian tour of a directed Eulerian multigraph with $m$ arcs in $\widetilde O(m^{3/2})$ time. The guarantee is worst-case, applies to arbitrary directed Eulerian multigraphs, and breaks the $mn$-type arborescence-sampling barrier on sparse graphs. The core case is a $2$-in/$2$-out graph. We introduce a new local Markov chain, the flip--repair walk: one step locally splits a tour into two circuits and then chooses uniformly among the local flips that repair the state to one tour. We prove that this walk mixes in nearly linear many steps and implement the walk using a dynamic chord data structure. A pointwise degree-reduction wrapper extends the sampler from this degree-two core to arbitrary degrees while preserving the $\widetilde O(m^{3/2})$ total running time. The high-level algorithmic plan, the switching-network reduction, and the dynamic data-structure argument were devised by the author. The author conjectured the mixing theorem underlying the analysis, and GPT 5.5 Pro Extended produced its linear-algebra proof. Codex assisted with manuscript assembly and typesetting.

Explaining Rankings with Hidden Group Bonuses

from arXiv: Data Structures and Algorithms

Authors: Alvin Hong Yao Yan, Suraj Shetiya, Sujoy Bhore, Priyanka Golia, Diptarka Chakraborty

Determining a linear utility function that correlates with observed candidate rankings is a foundational problem with applications in domains such as admissions, hiring, and recommendation systems, e.g., [Storandt and Funke, AAAI'19, Zhang et al., KDD'23, Wang et al., ICDE'24 (best paper award), Chen and Wong, VLDB'24]. Traditionally, these models assume full visibility into the feature sets used to determine the utility score. However, real-world scenarios often involve sensitive attributes that are hidden or partially observed, yet may influence outcomes through additive bonuses designed to promote fairness, as in [Gale and Marian, ICDE'24]. Motivated by such practical concerns, we study a variant of the ranking explanation problem where sensitive features are unobserved but may influence candidate rankings through group-specific linear boosts. We present a formal framework for modeling this problem and develop an algorithmic solution that leverages constraint satisfaction and automated reasoning techniques to jointly infer the linear scoring parameters and latent group bonuses consistent with the observed rankings. We further show that determining a satisfying linear function with group-specific bonuses is \textsf{NP}-hard in general, but when the feature dimension and the number of groups are constant, the problem admits a polynomial-time solution. Our approach is the first to address this nuanced variant, which captures key real-world challenges in fair ranking and admission systems. We perform extensive experiments on both real-world and synthetic datasets, demonstrating that our method effectively recovers hidden bonus structures and provides faithful explanations of observed ranking outcomes.

Authors: Alvin Hong Yao Yan, Suraj Shetiya, Sujoy Bhore, Priyanka Golia, Diptarka Chakraborty

Determining a linear utility function that correlates with observed candidate rankings is a foundational problem with applications in domains such as admissions, hiring, and recommendation systems, e.g., [Storandt and Funke, AAAI'19, Zhang et al., KDD'23, Wang et al., ICDE'24 (best paper award), Chen and Wong, VLDB'24]. Traditionally, these models assume full visibility into the feature sets used to determine the utility score. However, real-world scenarios often involve sensitive attributes that are hidden or partially observed, yet may influence outcomes through additive bonuses designed to promote fairness, as in [Gale and Marian, ICDE'24]. Motivated by such practical concerns, we study a variant of the ranking explanation problem where sensitive features are unobserved but may influence candidate rankings through group-specific linear boosts. We present a formal framework for modeling this problem and develop an algorithmic solution that leverages constraint satisfaction and automated reasoning techniques to jointly infer the linear scoring parameters and latent group bonuses consistent with the observed rankings. We further show that determining a satisfying linear function with group-specific bonuses is \textsf{NP}-hard in general, but when the feature dimension and the number of groups are constant, the problem admits a polynomial-time solution. Our approach is the first to address this nuanced variant, which captures key real-world challenges in fair ranking and admission systems. We perform extensive experiments on both real-world and synthetic datasets, demonstrating that our method effectively recovers hidden bonus structures and provides faithful explanations of observed ranking outcomes.

Distributed Gaussian Mean Testing under Communication Constraints: messages, samples, and coins

from arXiv: Data Structures and Algorithms

Authors: Clément L. Canonne, Nimitt

We revisit the problem of Gaussian mean testing in a distributed, communication constrained setting, where each of $n$ users independently observes samples from an unknown $d$-dimensional spherical Gaussian distribution $\mathcal{G}(μ,\mathbb{I}_d)$, and can communicate up to $\ell$ bits to a central referee. The referee's goal is then to distinguish between cases (i) $\|μ\|_2 = 0$ versus (ii) $\|μ\|_2\ge \varepsilon$. This problem has been considered in the private- and public-coin settings, when each user holds exactly one sample, or more generally when each holds exactly $m$ samples. In this work, we significantly generalize the question in three directions: when the users only share a small number $s$ of random bits, when each user holds a different number of samples $m_k$, and when each user can send a different number of bits $\ell_k$ to the referee.

Authors: Clément L. Canonne, Nimitt

We revisit the problem of Gaussian mean testing in a distributed, communication constrained setting, where each of $n$ users independently observes samples from an unknown $d$-dimensional spherical Gaussian distribution $\mathcal{G}(μ,\mathbb{I}_d)$, and can communicate up to $\ell$ bits to a central referee. The referee's goal is then to distinguish between cases (i) $\|μ\|_2 = 0$ versus (ii) $\|μ\|_2\ge \varepsilon$. This problem has been considered in the private- and public-coin settings, when each user holds exactly one sample, or more generally when each holds exactly $m$ samples. In this work, we significantly generalize the question in three directions: when the users only share a small number $s$ of random bits, when each user holds a different number of samples $m_k$, and when each user can send a different number of bits $\ell_k$ to the referee.

An Improved Greedy Approximation for (Metric) $k$-Means

from arXiv: Data Structures and Algorithms

Authors: Moses Charikar, Vincent Cohen-Addad, Ruiquan Gao, Fabrizio Grandoni, Euiwoong Lee, Ernest van Wijland

Clustering is a basic task in data analysis and machine learning, and the optimization of clustering objectives are well-studied optimization problems; amongst these, the $k$-Means objective is arguably the most well known. Given a collection of points in a metric space, the goal is to partition them into $k$ clusters, each with an associated center, so as to minimize the sum of squared distances of points to their cluster centers. In this paper, we present a polynomial-time $3+2\sqrt{2}+ε<5.83$-approximation algorithm for $k$-Means in general metrics. This substantially improves on the current-best $(9+ε)$-approximation in [Ahmadian, Norouzi-Fard, Svensson, Ward - FOCS'17, SICOMP'20], and even slightly improves on the $5.92$-approximation in [Cohen-Addad, Esfandiari, Mirrokni, Narayanan - STOC'22] for the Euclidean special case. A natural approach for $k$-Means is to leverage Lagrangian Multiplier Preserving (LMP) approximations for the facility location problem. The previous best results for $k$-Means build upon an adaptation of an LMP $3$-approximation for facility location with metric connection costs in [Jain, Vazirani - J.ACM'01] based on a primal-dual method, rather than on the improved LMP greedy $2$-approximation for the same problem in [Jain, Mahdian, Markakis, Saberi, Vazirani - J.ACM'03]. The barrier to using the improved LMP algorithm was that no adaptation of this algorithm and its analysis to the case of squared metric connection costs was known (since squared distances violate triangle inequality). Our main contribution is overcoming this barrier by providing such an adaptation. This new LMP approximation algorithm is then combined with the framework recently introduced in [Cohen-Addad, Grandoni, Lee, Schwiegelshohn, Svensson - STOC'25] for the related (metric) $k$-Median problem.

Authors: Moses Charikar, Vincent Cohen-Addad, Ruiquan Gao, Fabrizio Grandoni, Euiwoong Lee, Ernest van Wijland

Clustering is a basic task in data analysis and machine learning, and the optimization of clustering objectives are well-studied optimization problems; amongst these, the $k$-Means objective is arguably the most well known. Given a collection of points in a metric space, the goal is to partition them into $k$ clusters, each with an associated center, so as to minimize the sum of squared distances of points to their cluster centers. In this paper, we present a polynomial-time $3+2\sqrt{2}+ε<5.83$-approximation algorithm for $k$-Means in general metrics. This substantially improves on the current-best $(9+ε)$-approximation in [Ahmadian, Norouzi-Fard, Svensson, Ward - FOCS'17, SICOMP'20], and even slightly improves on the $5.92$-approximation in [Cohen-Addad, Esfandiari, Mirrokni, Narayanan - STOC'22] for the Euclidean special case. A natural approach for $k$-Means is to leverage Lagrangian Multiplier Preserving (LMP) approximations for the facility location problem. The previous best results for $k$-Means build upon an adaptation of an LMP $3$-approximation for facility location with metric connection costs in [Jain, Vazirani - J.ACM'01] based on a primal-dual method, rather than on the improved LMP greedy $2$-approximation for the same problem in [Jain, Mahdian, Markakis, Saberi, Vazirani - J.ACM'03]. The barrier to using the improved LMP algorithm was that no adaptation of this algorithm and its analysis to the case of squared metric connection costs was known (since squared distances violate triangle inequality). Our main contribution is overcoming this barrier by providing such an adaptation. This new LMP approximation algorithm is then combined with the framework recently introduced in [Cohen-Addad, Grandoni, Lee, Schwiegelshohn, Svensson - STOC'25] for the related (metric) $k$-Median problem.

Residual-Entropy Accounting for Routed Atom-Budgeted Learned Indexes

from arXiv: Data Structures and Algorithms

Authors: Faruk Alpay, Levent Sarioglu

We study exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture. An ordered directory routes each query to a contiguous interval, a counted local predictor returns a certified rank window, and exact repair resolves the remaining uncertainty by comparisons. The result is scoped to this architecture and does not claim guarantees for arbitrary learned-index designs such as unconstrained RMI dispatch, hash routing, neural routing, or exact-payload indexes without additional accounting. The main parameter is conditional residual answer entropy: the entropy of the exact answer after the leaf, predictor output, certificate, and charged pre-repair information are observed. We prove a two-sided accounting theorem showing that this functional gives the query-time scale under the stated architecture and local predictor-atom budget. Directory space, sorted-array storage, and transcript-indexed repair-program space are treated as separate system costs, so the theorem is not a byte-level space lower bound or a compact implementation recipe. We also give a rank-spread specialization in which the radius term log(1 + Delta) is valid only when many residual ranks remain likely after the predictor transcript is known. For counted piecewise-linear segments, we make the profile term non-oracular, derive a shadow-price allocation rule, compute finite-instance RGapM and GapM values on real SOSD and Zenodo samples, and report benchmarks against PGM-index, RadixSpline, and binary search. The benchmarks expose overheads and bottlenecks rather than claiming speed for the shadow prototype.

Authors: Faruk Alpay, Levent Sarioglu

We study exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture. An ordered directory routes each query to a contiguous interval, a counted local predictor returns a certified rank window, and exact repair resolves the remaining uncertainty by comparisons. The result is scoped to this architecture and does not claim guarantees for arbitrary learned-index designs such as unconstrained RMI dispatch, hash routing, neural routing, or exact-payload indexes without additional accounting. The main parameter is conditional residual answer entropy: the entropy of the exact answer after the leaf, predictor output, certificate, and charged pre-repair information are observed. We prove a two-sided accounting theorem showing that this functional gives the query-time scale under the stated architecture and local predictor-atom budget. Directory space, sorted-array storage, and transcript-indexed repair-program space are treated as separate system costs, so the theorem is not a byte-level space lower bound or a compact implementation recipe. We also give a rank-spread specialization in which the radius term log(1 + Delta) is valid only when many residual ranks remain likely after the predictor transcript is known. For counted piecewise-linear segments, we make the profile term non-oracular, derive a shadow-price allocation rule, compute finite-instance RGapM and GapM values on real SOSD and Zenodo samples, and report benchmarks against PGM-index, RadixSpline, and binary search. The benchmarks expose overheads and bottlenecks rather than claiming speed for the shadow prototype.

Optimal Rates for Differentially Private Hypothesis Testing with E-values

from arXiv: Data Structures and Algorithms

Authors: Ben Jacobsen, Tomas Gonzales, Gavin Brown, Kassem Fawaz, Aaditya Ramdas

E-values have attracted considerable interest in recent years as flexible tools for enabling anytime-valid and adaptive data analysis. Hypothesis testing is at the core of many of these applications, which can often involve private or sensitive data. In this work, we answer a simple but important question: given two distributions $\mathbb{P}$ and $\mathbb{Q}$, what is the maximum achievable e-power when testing $X\sim \mathbb{P}^n$ against $X\sim\mathbb{Q}^n$ with e-values that satisfy $\varepsilon$-differential privacy? We characterize the optimal rate for this problem and provide an algorithm which matches it exactly. In the sequential setting, when observations arrive one-by-one and the analyst chooses when to halt, we give matching upper and lower bounds on the stopping times of any private e-process. Numerical experiments confirm the practicality of our algorithms, which require less data than the recently proposed DP-SPRT across a range of sequential testing problems and privacy levels.

Authors: Ben Jacobsen, Tomas Gonzales, Gavin Brown, Kassem Fawaz, Aaditya Ramdas

E-values have attracted considerable interest in recent years as flexible tools for enabling anytime-valid and adaptive data analysis. Hypothesis testing is at the core of many of these applications, which can often involve private or sensitive data. In this work, we answer a simple but important question: given two distributions $\mathbb{P}$ and $\mathbb{Q}$, what is the maximum achievable e-power when testing $X\sim \mathbb{P}^n$ against $X\sim\mathbb{Q}^n$ with e-values that satisfy $\varepsilon$-differential privacy? We characterize the optimal rate for this problem and provide an algorithm which matches it exactly. In the sequential setting, when observations arrive one-by-one and the analyst chooses when to halt, we give matching upper and lower bounds on the stopping times of any private e-process. Numerical experiments confirm the practicality of our algorithms, which require less data than the recently proposed DP-SPRT across a range of sequential testing problems and privacy levels.

Thursday, May 28

Research Fellow in AI Security and Alignment at MATS Research (apply by June 7, 2026)

from CCI: jobs

MATS Autumn 2026 is a 10-week fully-funded research fellowship for aspiring AI alignment, control, security, and governance researchers. Fellows work directly with mentors from Anthropic, Google DeepMind, OpenAI, Redwood Research and more, with housing and office space provided in Berkeley/London. MATS has accelerated 500+ researchers; of alumni who graduated before 2025, 80% now work on […]

MATS Autumn 2026 is a 10-week fully-funded research fellowship for aspiring AI alignment, control, security, and governance researchers. Fellows work directly with mentors from Anthropic, Google DeepMind, OpenAI, Redwood Research and more, with housing and office space provided in Berkeley/London. MATS has accelerated 500+ researchers; of alumni who graduated before 2025, 80% now work on AI safety

Website: https://www.matsprogram.org/apply?utm_source=cstheory-jobs&utm_medium=job-board&utm_campaign=a26
Email: raj@matsprogram.org

By shacharlovett

Dispatches from the possibly last days of human relevance

from Scott Aaronson

As most readers have presumably heard by now, Paul Erdös’s Unit Distance Problem from 1946—one of the central open problems from the field of discrete geometry—has been solved by an internal OpenAI model. Erdös had conjectured that, given n points in the plane, at most n1+o(1) pairs of them could be unit distance apart. Using […]

As most readers have presumably heard by now, Paul Erdös’s Unit Distance Problem from 1946—one of the central open problems from the field of discrete geometry—has been solved by an internal OpenAI model. Erdös had conjectured that, given n points in the plane, at most n1+o(1) pairs of them could be unit distance apart. Using high-powered results from algebraic number theory, GPT refuted this, constructing a set with n1+ε unit-distance pairs, for ε ~ 10-38. Shortly afterward, Will Sawin, a human (!), improved GPT’s construction to get ~n1.014 pairs. There’s since been a claim to improve this further, to ~n1.034. Meanwhile, the best known upper bound remains n4/3, improving Erdös’s n3/2.

The entire process seems have been one-shot: my former student Lijie Chen simply gave GPT the problem, then GPT thought for a while and output a several-page argument that, on analysis by human experts, turned out to be correct. Of course there’s selection bias here; we’re not hearing as much about the hundreds of other problems GPT was given that it didn’t solve (isn’t that the case with humans too?). Clearly, too, GPT was helped by the facts that human mathematicians had wasted most of their time trying to prove Erdös right rather than looking for a counterexample, and that, even if they did look for a counterexample, they’d need to be experts in algebraic number theory to find this one, which hardly any discrete geometers are. So, maybe that suggests that AI, right now, is “merely” picking various medium-hanging fruits that human mathematicians missed for contingent reasons? With emphasis on the “right now.”

In a companion paper, OpenAI helpfully included commentary from Timothy Gowers, Noga Alon, Will Sawin, Daniel Litt, and many other experts, reflecting on the breakthrough, the path that GPT took to get to it (which can actually be seen by examining its chain-of-thought), and what this might mean for the future of mathematical research.

I heard the news maybe an hour after it broke, when some UT grad students came to my office to tell me. For what it’s worth: these students were morose, musing about how everything might soon be over for young scientists and mathematicians like themselves. I don’t know whether they’re right, but I feel like I should tell the truth about what their reaction was.

[Update: News has been coming faster than I can write about it, but today we learned that another important conjecture of Erdös has been refuted. Erdös and Szemerédi’s strong sumset conjecture over R, from the 1970s, had said that, if A is a finite set of real numbers, then either |A+A| or |A×A| must be at least |A|2-o(1). In this case humans, including the aforementioned Sawin, did almost all of the work of constructing the counterexample, but they were directly inspired by GPT’s earlier refutation of the unit distance conjecture. It remains open whether such a counterexample exists where A is a set of integers.]

Then, a few days later, a team at DeepMind, including my UT Austin colleague Swarat Chaudhuri, announced that they were able to use a system called AlphaProof Nexus to settle nine more (!) Erdös problems, many of them in additive combinatorics, along with miscellaneous other open math problems. Notably, in this case the AI also fully formalized its proofs in Lean.

And then, just today, Jelani Nelson alerted me to a new CS theory paper, which solves a longstanding open problem about electrical flows on graphs using a proof from GPT5.5Pro.

It seems to me that we’re now over the top of this particular rollercoaster, and it will keep accelerating until we reach the bottom, wherever that might be. I don’t know whether to hope or dread that solutions to P versus NP and all our other great problems will be included in the ride—that our role, as human mathematicians, will be reduced to (at most) deciding which questions we find interesting and then understanding AI models’ answers to those questions.

But maybe that won’t happen. Maybe the new AI mathematicians will soon hit a wall, because they lack the uncomputable quantum gravity microtubules of Penrose and Hameroff, or some other magic human ingredient. The fantastical thing is that, one way or the other, we’re going to find out empirically before very long.


Readers may have also seen the news that multiple prizewinning entries in a short fiction contest called the Commonwealth Prize, give overwhelming indications of having been written by AIs. As Kelsey Piper puts it:

There are, let’s say, also some noticeable similarities in the prose style between the winning stories that were flagged for AI use. AI chatbots love metaphors and similes, and they often spit out ones that sound vaguely pleasing but are logically incoherent or ascribe properties to things that don’t make sense.

“The Serpent in the Grove” gave us, “The girl smiled like sunrise over a sink.” “The Bastion’s Shadow” says, “She carried it now in her bag, heavy as a charm.” “Mehendi Nights” describes something as “swaying against plaster like a warning bell.”

The Commonwealth Foundation, whose judges chose these stories, hasn’t exactly covered itself in glory—saying, on the one hand, that it strictly forbids AI use but on the other, that it will continue taking authors at their word that they didn’t use AI, no matter the immensity of evidence to the contrary. As many others have pointed out, judges more versed in AI would’ve ironically been better placed to notice the signs of its use.

If only there were some sort of automated way to detect AI-generated text. Someone should really get on that problem, don’t you think?


But maybe we should just throw in the towel—as some of my colleagues have already done in the context of undergraduate projects? Maybe we should simply say that a good story is a good story, regardless of what manner of entity produced it?

As it happens, just last week I read my very first AI-written story that affected me as a story, to the extent that I wanted to read it more than once. This happened when I gave GPT5.5Pro the following simple prompt:

Write me a story about the most ancient Israelites that’s riveting like the stories of the Bible but that’s also consistent with all of the archeological evidence.

You can read the result here. One of my Facebook friends called it “disturbingly good,” and whatever the problems with the piece, I share that feeling. Of course, I’m well aware that GPT could easily generate a thousand stories like it—sampled from the same probability distribution—and then I could even do statistics on which tropes were the most common. This makes it feel silly to overindex on the first story that happened to be output, and yet somehow I did.


I feel like at this point, both the prophets of AI utopia like Ray Kurzweil, and of AI doom like Eliezer Yudkowsky, could be forgiven for asking: dude, will you listen to us YET? Do you still find it prudent to call this new form of terrestrial intelligence a stochastic parrot, a laughable fraud, or a fad that’s about to go away? Fear it all you want, hate it even, but at least respect it!

Which brings me to the other big AI news from the past week, namely that Pope Leo released his first encyclical, which is entitled “Safeguarding the Human Person in the Time of Artificial Intelligence.” I read it and … well, I certainly agreed with the theme that such a world-changing technology needs to be developed for the common good (as the Pope would have it, like the walls of Jerusalem), rather than for the profit or vanity of any one individual or company (in his analogy, like the Tower of Babel). I had quibbles with some of the other parts. Zvi Mowshowitz, as he often does, had a superb paragraph-by-paragraph analysis. Amusingly, there are indications that parts of the encyclical were written by AI.

To me, though, maybe the most notable part was that Chris Olah, who leads Anthropic’s interpretability team, was standing next to the Pope at the ceremony, and delivered his own remarks. I felt like Chris, who I met even before Anthropic existed, was a non-obvious yet inspired choice here, one of the rare figures in frontier AI whose technical and moral authority are both completely unimpeachable by anyone.

And so, at this momentous era for the human project, and on no less of an authority than that of the Vicar of Christ himself, the Supreme Pontiff and the Successor of Peter, I hereby throw myself on the wisdom and mercy of … uhh, I guess, Chris Olah and his team at Anthropic.

Chris, if I am soon to share the earth with entities that can prove the Riemann Hypothesis and solve quantum gravity after 30 seconds of thought, then may you understand those entities well enough to cause them to be nice.


Endnote: I should have foreseen, but didn’t, that the comments on this post would be dominated by people looking for ways to minimize whichever specific AI accomplishments I blogged about. Thus, it turns out, the ability of AI to solve Erdös problems just demonstrates that Erdös’s problems were never “serious” math in the first place—nothing like algebraic geometry or Grothendieck-style theory-building, which remain untouched. Likewise, the story I shared was obvious AI slop.

I had taken it as obvious that, when assessing AI’s impact on the world, one needs to look at least somewhat into the future: to remember where things were four years ago, compared to where they are today, and at least try to draw a straight line through the data, if not the exponential that seems to fit better.

Does anyone seriously doubt at this point that major open problems in algebraic geometry and other “Grothendieck-friendly” areas of math will fall to future AI models? Or that AI-written stories will improve, not only to win literary awards from AI-naïve judges, but to avoid the features that commenters here are complaining about? And that, whenever that happens, there will be new confident reasons not to care immediately offered up in comment sections like mine?

Apparently people do still doubt—hence the throwaway remark in my post about Penrose and Hameroff and microtubules. If not that or something like it, what exactly do they think the ceiling will be, and why?

Recently (I should have mentioned this before), I came across what I consider one of the greatest social experiments of all time, one that illuminates people’s reactions to every AI advance. A Twitter/X user named SHL0MS displayed the following AI-generated fake “Monet painting,” and asked people to explain what made it worse than real Monet paintings:

If you haven’t seen this yet, I recommend that you try the exercise yourself before reading further.

As it was, numerous art aficionados responded at length, savaging the flat, lifeless, uncreative AI slop, the emotionless composition, the missing spark, the lack of tranquility, the harshness, the lack of depth and symbiosis, and on and on and on.

Only after they had all said their piece did SHL0MS reveal that this is an actual Monet painting.

By Scott

Gauge Geometry of Hodge Zero-Mode Transport in Parameter-Dependent Topological Data Analysis

from arXiv: Computational Geometry

Authors: Satoshi Kanno, Rei Nishimura, Hiroshi Yamauchi, Yoshi-aki Shimada

We propose a practical computational framework for detecting structural changes in parameter-dependent topological data. In many applications, such as time-series data analysis, anomaly detection, and monitoring of systems under changing control parameters, persistence diagrams describe the birth and death of topological features at each parameter value, but they do not fully capture how these features are reorganized over time. To address this limitation, we represent homological features by zero modes of the ordinary combinatorial Hodge Laplacian and track the corresponding feature spaces in a common ambient chain space. This allows us to compute curvature and holonomy as descriptors of local reorganization and accumulated memory in evolving topological structures. Curvature highlights parameter regions where homological features mix or change rapidly, while holonomy summarizes the net effect of such changes after a closed cycle. We also establish stability estimates showing that these descriptors are robust under perturbations of the Hodge Laplacian on regular regions. Numerical experiments on controlled time-dependent point-cloud data show that the proposed method detects tracking instability, distinguishes systems with nearly identical persistence diagrams, and captures cycle-level memory invisible to pointwise feature matching. These results suggest that zero-mode transport geometry can serve as a useful computational tool for analyzing dynamic topological data.

Authors: Satoshi Kanno, Rei Nishimura, Hiroshi Yamauchi, Yoshi-aki Shimada

We propose a practical computational framework for detecting structural changes in parameter-dependent topological data. In many applications, such as time-series data analysis, anomaly detection, and monitoring of systems under changing control parameters, persistence diagrams describe the birth and death of topological features at each parameter value, but they do not fully capture how these features are reorganized over time. To address this limitation, we represent homological features by zero modes of the ordinary combinatorial Hodge Laplacian and track the corresponding feature spaces in a common ambient chain space. This allows us to compute curvature and holonomy as descriptors of local reorganization and accumulated memory in evolving topological structures. Curvature highlights parameter regions where homological features mix or change rapidly, while holonomy summarizes the net effect of such changes after a closed cycle. We also establish stability estimates showing that these descriptors are robust under perturbations of the Hodge Laplacian on regular regions. Numerical experiments on controlled time-dependent point-cloud data show that the proposed method detects tracking instability, distinguishes systems with nearly identical persistence diagrams, and captures cycle-level memory invisible to pointwise feature matching. These results suggest that zero-mode transport geometry can serve as a useful computational tool for analyzing dynamic topological data.

Powers and Limitations of Synchronous Self-Assembly

from arXiv: Computational Geometry

Authors: Florent Becker, Phillip Drake, Matthew J. Patitz, Ryder Smith

In abstract models of algorithmic self-assembly, synchronization between attachments has emerged as a crucial distinction between the classical asynchronous model (aTAM) and a new synchronous model, the syncTAM. This paper presents recent advances in gauging the additional power afforded by the syncTAM. While it is known that the syncTAM and the aTAM are each unable to fully simulate the other, this paper offers evidence that the syncTAM is computationally significantly more powerful than the aTAM, especially in the non-cooperative setting. The additional power of the non-cooperative syncTAM is witnessed by the following constructions, all impossible in the non-cooperative aTAM: a flagpole, a strict self-assembly of a variant of the discrete Sierpinski triangle, and the ability to build the same assemblies (modulo scale factor) as directed aTAM systems. The second topic is that of limited synchronization, wherein, when the number of attachments is smaller than some threshold $l$, they happen synchronously, but attachments in excess of that number must wait. In that context, the precise value of $l$ is crucial, and changes to that value prevent simulation and can change which shapes can be obtained.

Authors: Florent Becker, Phillip Drake, Matthew J. Patitz, Ryder Smith

In abstract models of algorithmic self-assembly, synchronization between attachments has emerged as a crucial distinction between the classical asynchronous model (aTAM) and a new synchronous model, the syncTAM. This paper presents recent advances in gauging the additional power afforded by the syncTAM. While it is known that the syncTAM and the aTAM are each unable to fully simulate the other, this paper offers evidence that the syncTAM is computationally significantly more powerful than the aTAM, especially in the non-cooperative setting. The additional power of the non-cooperative syncTAM is witnessed by the following constructions, all impossible in the non-cooperative aTAM: a flagpole, a strict self-assembly of a variant of the discrete Sierpinski triangle, and the ability to build the same assemblies (modulo scale factor) as directed aTAM systems. The second topic is that of limited synchronization, wherein, when the number of attachments is smaller than some threshold $l$, they happen synchronously, but attachments in excess of that number must wait. In that context, the precise value of $l$ is crucial, and changes to that value prevent simulation and can change which shapes can be obtained.

A Fresh Look at Lamarckian Evolution and the Baldwin Effect

from arXiv: Data Structures and Algorithms

Authors: Inès Benito, Johannes F. Lutzeyer, Benjamin Doerr

Baldwinian and Lamarckian evolution have existed for a long time in evolutionary algorithms (EAs) without ever dominating the academic literature or practical applications. In this work, we use modern empirical and theoretical methods to revisit Lamarckian and Baldwinian evolution and rigorously compare them with the generic Darwinian evolution. On the empirical side, we run a comprehensive suite of experiments on graphs from six different datasets from the recent GraphBench benchmark on Maximum Independent Set and Maximum Cut problems. Our results show that Baldwinian and Lamarckian evolution consistently outperform Darwinian evolution, confirming the great potential of local search augmented evolutionary algorithms. Notably, in the great majority of cases, all EAs outperform recent deep learning baselines and approach the performance of highly specialised heuristic and exact solvers. We furthermore report a high-performing set of generalist parameters for all studied evolution types that we hope will be of use to practitioners in future. On the theoretical side, we extend the existing Deceptive Leading Block benchmark to arbitrary block length and use tools from modern theoretical runtime analysis to prove upper and lower bounds on the expected runtime. For block lengths greater than two, Baldwinian evolution is asymptotically faster than Lamarckian which is asymptotically faster than Darwinian evolution. When accounting for the cost of the local search procedure in fitness evaluations, the ordering depends on the implementation with Baldwinian evolution staying fastest from small block lengths onwards, explaining its strong empirical performance.

Authors: Inès Benito, Johannes F. Lutzeyer, Benjamin Doerr

Baldwinian and Lamarckian evolution have existed for a long time in evolutionary algorithms (EAs) without ever dominating the academic literature or practical applications. In this work, we use modern empirical and theoretical methods to revisit Lamarckian and Baldwinian evolution and rigorously compare them with the generic Darwinian evolution. On the empirical side, we run a comprehensive suite of experiments on graphs from six different datasets from the recent GraphBench benchmark on Maximum Independent Set and Maximum Cut problems. Our results show that Baldwinian and Lamarckian evolution consistently outperform Darwinian evolution, confirming the great potential of local search augmented evolutionary algorithms. Notably, in the great majority of cases, all EAs outperform recent deep learning baselines and approach the performance of highly specialised heuristic and exact solvers. We furthermore report a high-performing set of generalist parameters for all studied evolution types that we hope will be of use to practitioners in future. On the theoretical side, we extend the existing Deceptive Leading Block benchmark to arbitrary block length and use tools from modern theoretical runtime analysis to prove upper and lower bounds on the expected runtime. For block lengths greater than two, Baldwinian evolution is asymptotically faster than Lamarckian which is asymptotically faster than Darwinian evolution. When accounting for the cost of the local search procedure in fitness evaluations, the ordering depends on the implementation with Baldwinian evolution staying fastest from small block lengths onwards, explaining its strong empirical performance.

Disjunctive Sum of Squares

from arXiv: Data Structures and Algorithms

Authors: Amir Ali Ahmadi, Sanjeeb Dash, Yixuan Hua, Bartolomeo Stellato

We introduce the concept of disjunctive sum of squares for certifying nonnegativity of polynomials. Unlike the popular sum of squares approach where nonnegativity is certified by a single algebraic identity, the disjunctive sum of squares approach certifies nonnegativity with multiple algebraic identities which can be found in parallel. Our main result is a disjunctive Positivstellensatz proving that we can keep the degree of each algebraic identity as low as the degree of the polynomial whose nonnegativity is in question. Based on this result, we construct a semidefinite programming based converging hierarchy of lower bounds for the problem of minimizing a polynomial over a compact basic semialgebraic set, where the size of the largest semidefinite constraint is fixed throughout the hierarchy. We further prove a second disjunctive Positivstellensatz which leads to an optimization-free hierarchy for polynomial optimization. We specialize this result to the problem of proving copositivity of matrices. Finally, we describe how the disjunctive sum of squares approach can be combined with a branch-and-bound algorithm and we present numerical experiments on polynomial, copositive, and combinatorial optimization problems.

Authors: Amir Ali Ahmadi, Sanjeeb Dash, Yixuan Hua, Bartolomeo Stellato

We introduce the concept of disjunctive sum of squares for certifying nonnegativity of polynomials. Unlike the popular sum of squares approach where nonnegativity is certified by a single algebraic identity, the disjunctive sum of squares approach certifies nonnegativity with multiple algebraic identities which can be found in parallel. Our main result is a disjunctive Positivstellensatz proving that we can keep the degree of each algebraic identity as low as the degree of the polynomial whose nonnegativity is in question. Based on this result, we construct a semidefinite programming based converging hierarchy of lower bounds for the problem of minimizing a polynomial over a compact basic semialgebraic set, where the size of the largest semidefinite constraint is fixed throughout the hierarchy. We further prove a second disjunctive Positivstellensatz which leads to an optimization-free hierarchy for polynomial optimization. We specialize this result to the problem of proving copositivity of matrices. Finally, we describe how the disjunctive sum of squares approach can be combined with a branch-and-bound algorithm and we present numerical experiments on polynomial, copositive, and combinatorial optimization problems.

High-Quality Multi-Constraint Hypergraph Partitioning via Greedy Rebalancing

from arXiv: Data Structures and Algorithms

Authors: Nikolai Maas

Multi-constraint hypergraph partitioning is a generalization of balanced partitioning, where the vertex set of a hypergraph is partitioned such that the inter-block connectivity of hyperedges is minimized while balancing the vertices with regard to $d$ distinct constraints. A prominent class of applications is data distribution tasks, where this allows to achieve good load balance for $d$ different kinds of resources and simultaneously minimize the communication volume. Although the best approaches for single-constraint partitioning are usually complex (multilevel) algorithms with many components, we show that replacing only one component already leads to high-quality multi-constraint partitions: the rebalancing step, which restores balance for a partition that has (hopefully) small connectivity but violates the constraints. We design a multi-constraint rebalancing algorithm based on greedy local search, proving that balance is always restored for $d=2$ and bounded maximum weight. The key is to ensure monotonically decreasing global imbalance by choosing an imbalance metric where there is always a balance-improving move available. Integrating our algorithm into the state-of-the-art partitioner Mt-KaHyPar, we demonstrate an 11.5\,\% geometric mean connectivity reduction compared to the next best competitor (Metis) and better reliability regarding partition balance, even though the majority of inputs is outside of the theoretical guarantee.

Authors: Nikolai Maas

Multi-constraint hypergraph partitioning is a generalization of balanced partitioning, where the vertex set of a hypergraph is partitioned such that the inter-block connectivity of hyperedges is minimized while balancing the vertices with regard to $d$ distinct constraints. A prominent class of applications is data distribution tasks, where this allows to achieve good load balance for $d$ different kinds of resources and simultaneously minimize the communication volume. Although the best approaches for single-constraint partitioning are usually complex (multilevel) algorithms with many components, we show that replacing only one component already leads to high-quality multi-constraint partitions: the rebalancing step, which restores balance for a partition that has (hopefully) small connectivity but violates the constraints. We design a multi-constraint rebalancing algorithm based on greedy local search, proving that balance is always restored for $d=2$ and bounded maximum weight. The key is to ensure monotonically decreasing global imbalance by choosing an imbalance metric where there is always a balance-improving move available. Integrating our algorithm into the state-of-the-art partitioner Mt-KaHyPar, we demonstrate an 11.5\,\% geometric mean connectivity reduction compared to the next best competitor (Metis) and better reliability regarding partition balance, even though the majority of inputs is outside of the theoretical guarantee.

A Deterministic Separation Lemma

from arXiv: Data Structures and Algorithms

Authors: Abhishek Sahu

The \emph{Separation Lemma} is a simple yet powerful tool, akin to the well-known \emph{Isolation Lemma}, that guarantees the uniqueness of certain set sums. Bandopadhyay et al.\ introduced this lemma to establish lower bounds for the \ALP problem with respect to certain structural parameters, relying on random weight assignments in the process. The lemma's applicability extends well beyond that specific work, especially in proving hardness results. However, while effective, these hardness results inherently rely on probabilistic assumptions. In this work, we give a fully \emph{deterministic} construction for the weight assignment required by the Separation Lemma. We provide formal proofs of correctness, explicit examples, and show how deterministic weights can replace randomized ones, thereby derandomizing existing hardness results for path-packing problems. Our exposition highlights a clear progression from the original randomized foundations to deterministic constructions and their practical implications.

Authors: Abhishek Sahu

The \emph{Separation Lemma} is a simple yet powerful tool, akin to the well-known \emph{Isolation Lemma}, that guarantees the uniqueness of certain set sums. Bandopadhyay et al.\ introduced this lemma to establish lower bounds for the \ALP problem with respect to certain structural parameters, relying on random weight assignments in the process. The lemma's applicability extends well beyond that specific work, especially in proving hardness results. However, while effective, these hardness results inherently rely on probabilistic assumptions. In this work, we give a fully \emph{deterministic} construction for the weight assignment required by the Separation Lemma. We provide formal proofs of correctness, explicit examples, and show how deterministic weights can replace randomized ones, thereby derandomizing existing hardness results for path-packing problems. Our exposition highlights a clear progression from the original randomized foundations to deterministic constructions and their practical implications.

Efficient Algorithms for Interdicting Facilities in Trees and Bounded Treewidth Graphs

from arXiv: Data Structures and Algorithms

Authors: Ali Abbasi, Eli Friedman, Leana Golubchik, Samir Khuller, Marco Paolieri

Given a graph $G$ of $n$ nodes partitioned into facilities and customers, the $r$-edge interdiction covering problem (REIC) is to remove up to $r$ edges so as to maximize the total weight of customers disconnected from all facilities, which is called the covering objective function. While REIC is known to be NP-complete for general graphs, Fröhlich and Ruzika show that the problem can be solved in polynomial time when $G$ is a tree, providing an $O(n^7 r)$-time algorithm. We give an efficient $O(nr^2)$-time dynamic programming algorithm for REIC on trees that is fixed-parameter linear in $n$. Evaluating our solution on a benchmark of randomly generated tree networks with baselines of the Fröhlich and Ruzika algorithm and the Gurobi integer program solver, we demonstrate that in practice, our algorithm is both significantly faster and less sensitive to network topology and size. We extend our algorithm for REIC to graphs of bounded treewidth, a well-studied family of sparse graphs that generalizes trees, and obtain a matching runtime of $O(nr^2)$. We also consider the $r$-facility interdiction covering problem (RFIC), a novel variant of this network interdiction problem where the goal is to remove up to $r$ facilities to maximize the covering objective function over disconnected customers. We show that RFIC is NP-complete by observing it generalizes the small set bipartite vertex expansion problem (SSBVE), also known as the minimum $p$-union problem. We give an $O(nr^2)$-time algorithm for RFIC on trees, which also gives an $O(n^3)$-time algorithm for SSBVE on trees.

Authors: Ali Abbasi, Eli Friedman, Leana Golubchik, Samir Khuller, Marco Paolieri

Given a graph $G$ of $n$ nodes partitioned into facilities and customers, the $r$-edge interdiction covering problem (REIC) is to remove up to $r$ edges so as to maximize the total weight of customers disconnected from all facilities, which is called the covering objective function. While REIC is known to be NP-complete for general graphs, Fröhlich and Ruzika show that the problem can be solved in polynomial time when $G$ is a tree, providing an $O(n^7 r)$-time algorithm. We give an efficient $O(nr^2)$-time dynamic programming algorithm for REIC on trees that is fixed-parameter linear in $n$. Evaluating our solution on a benchmark of randomly generated tree networks with baselines of the Fröhlich and Ruzika algorithm and the Gurobi integer program solver, we demonstrate that in practice, our algorithm is both significantly faster and less sensitive to network topology and size. We extend our algorithm for REIC to graphs of bounded treewidth, a well-studied family of sparse graphs that generalizes trees, and obtain a matching runtime of $O(nr^2)$. We also consider the $r$-facility interdiction covering problem (RFIC), a novel variant of this network interdiction problem where the goal is to remove up to $r$ facilities to maximize the covering objective function over disconnected customers. We show that RFIC is NP-complete by observing it generalizes the small set bipartite vertex expansion problem (SSBVE), also known as the minimum $p$-union problem. We give an $O(nr^2)$-time algorithm for RFIC on trees, which also gives an $O(n^3)$-time algorithm for SSBVE on trees.

Quantum principal component analysis without eigenvector recovery

from arXiv: Data Structures and Algorithms

Authors: Yewei Yuan, Michele Minervini, Mark M. Wilde, Nana Liu

Principal component analysis (PCA) is traditionally implemented through a covariance or kernel matrix, leading-eigenvector extraction, and hard rank-$k$ projection. These steps can be computationally costly in high-dimensional and quantum-data settings, sensitive to small eigengaps, and unnecessary when downstream tasks only require principal-subspace scores. Such score-based objectives are important in applications such as anomaly detection, spectral-energy profiling, and other postselection tasks. To address these needs, we introduce a measurement-based soft PCA framework replacing the hard top-$k$ projector with an entropy-regularized Fermi--Dirac filter. This filter is the unique optimizer of an entropy-regularized variational formulation of PCA and converges to the classical PCA projector in the zero-temperature limit. This filter has a direct interpretation as a quantum measurement, which naturally suggests a quantum approach. For centered covariance operators represented by quantum feature states, a single fixed circuit, together with threshold calibration, accesses all optimal filters for different rank budgets or retained-variance levels without rank-dependent circuit updates or eigenvector recovery. For new inputs, the same calibrated quantum circuit yields soft principal subspace scores, spectral energy profiles, and postselected filtered states. The required centering of both training and test data is performed coherently inside the quantum protocol, which is particularly important for quantum data where no classical feature vectors or centered Gram matrix are directly available. By reframing PCA as a calibrated measurement task, this framework bypasses the need for iterative eigenvector extraction and achieves a dimension-independent sample complexity $O(η^{-2})$ for normalized fractional-rank or retained variance scoring at additive accuracy $η$.

Authors: Yewei Yuan, Michele Minervini, Mark M. Wilde, Nana Liu

Principal component analysis (PCA) is traditionally implemented through a covariance or kernel matrix, leading-eigenvector extraction, and hard rank-$k$ projection. These steps can be computationally costly in high-dimensional and quantum-data settings, sensitive to small eigengaps, and unnecessary when downstream tasks only require principal-subspace scores. Such score-based objectives are important in applications such as anomaly detection, spectral-energy profiling, and other postselection tasks. To address these needs, we introduce a measurement-based soft PCA framework replacing the hard top-$k$ projector with an entropy-regularized Fermi--Dirac filter. This filter is the unique optimizer of an entropy-regularized variational formulation of PCA and converges to the classical PCA projector in the zero-temperature limit. This filter has a direct interpretation as a quantum measurement, which naturally suggests a quantum approach. For centered covariance operators represented by quantum feature states, a single fixed circuit, together with threshold calibration, accesses all optimal filters for different rank budgets or retained-variance levels without rank-dependent circuit updates or eigenvector recovery. For new inputs, the same calibrated quantum circuit yields soft principal subspace scores, spectral energy profiles, and postselected filtered states. The required centering of both training and test data is performed coherently inside the quantum protocol, which is particularly important for quantum data where no classical feature vectors or centered Gram matrix are directly available. By reframing PCA as a calibrated measurement task, this framework bypasses the need for iterative eigenvector extraction and achieves a dimension-independent sample complexity $O(η^{-2})$ for normalized fractional-rank or retained variance scoring at additive accuracy $η$.

Privately Estimating Monotone Statistics in Polynomial Time

from arXiv: Data Structures and Algorithms

Authors: Gavin Brown, Ephraim Linder, Mahbod Majid, Vikrant Singhal

We study efficient differentially private algorithms for estimating monotone statistics, i.e., statistics that are monotone under the addition of new observations. The starting point for our investigation is subsample-and-aggregate: a classical paradigm that partitions the dataset into blocks, estimates the statistic on each block, and then privately aggregates the estimates.While practical and generically applicable, this approach is quite data-hungry. We improve upon this framework for the class of monotone statistics -- compared to subsample-and-aggregate, our algorithms save a factor of $t$ in sample complexity and pay a factor of $e^t$ in running time, where $t>0$ is a tunable parameter. We complement our results with a query-complexity lower bound, showing that our algorithms are essentially optimal for this task. As an application, we obtain improved results for private eigenvalue estimation, private loss estimation, and privately estimating a single parameter of a high-dimensional model, e.g., in linear regression.

Authors: Gavin Brown, Ephraim Linder, Mahbod Majid, Vikrant Singhal

We study efficient differentially private algorithms for estimating monotone statistics, i.e., statistics that are monotone under the addition of new observations. The starting point for our investigation is subsample-and-aggregate: a classical paradigm that partitions the dataset into blocks, estimates the statistic on each block, and then privately aggregates the estimates.While practical and generically applicable, this approach is quite data-hungry. We improve upon this framework for the class of monotone statistics -- compared to subsample-and-aggregate, our algorithms save a factor of $t$ in sample complexity and pay a factor of $e^t$ in running time, where $t>0$ is a tunable parameter. We complement our results with a query-complexity lower bound, showing that our algorithms are essentially optimal for this task. As an application, we obtain improved results for private eigenvalue estimation, private loss estimation, and privately estimating a single parameter of a high-dimensional model, e.g., in linear regression.

Smoothed Score Queries and the Complexity of Sampling

from arXiv: Data Structures and Algorithms

Authors: Jingbo Liu

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic \(\sqrtκ\) dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix \(Λ\), a smoothed-score query at noise level \(τ\) gives access to the resolvent \((Λ+τ^{-1}I)^{-1}\). Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\logκ+\log(e\sqrt d/δ_{\rm TV})\bigr)\log(e\sqrt d/δ_{\rm TV})\right)$ smoothed-score queries for total variation error \(δ_{\rm TV}\), improving the condition-number dependence from \(\sqrtκ\) to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in \(κ\); in particular, for fixed dimension and accuracy, the bit complexity is \(O(\log^2κ)\). To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an \(Ω(\logκ)\) lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

Authors: Jingbo Liu

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic \(\sqrtκ\) dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix \(Λ\), a smoothed-score query at noise level \(τ\) gives access to the resolvent \((Λ+τ^{-1}I)^{-1}\). Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\logκ+\log(e\sqrt d/δ_{\rm TV})\bigr)\log(e\sqrt d/δ_{\rm TV})\right)$ smoothed-score queries for total variation error \(δ_{\rm TV}\), improving the condition-number dependence from \(\sqrtκ\) to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in \(κ\); in particular, for fixed dimension and accuracy, the bit complexity is \(O(\log^2κ)\). To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an \(Ω(\logκ)\) lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

from arXiv: Data Structures and Algorithms

Authors: Sergei Tikhonov, Arsen Vasilyan

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.

Authors: Sergei Tikhonov, Arsen Vasilyan

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.

Wednesday, May 27

Making a talk, without and with AI

from Kamathematics

Some of the discussion online has been about how not to use AI in making academic talks (see, e.g., this post by Jessica Hullman). A junior researcher asked my opinion on using AI to help make slides and posters. I shared my thoughts with them, and I thought it was worth writing down in a … Continue reading "Making a talk, without and with AI"

Some of the discussion online has been about how not to use AI in making academic talks (see, e.g., this post by Jessica Hullman). A junior researcher asked my opinion on using AI to help make slides and posters. I shared my thoughts with them, and I thought it was worth writing down in a more organized fashion.

I’ll first give my take on the value of making talks by hand (which I don’t think I’m terrible at). I’ll then comment on how AI could fit into the process well or poorly. I will say that I’m not (yet?) a wizard at using AI to enhance my talks, so don’t come here expecting pro tips (except, spoilers, that the process of writing itself is immensely important). Since AI is a fast-moving topic, note that this post was originally written in May 2026, but I don’t think that the general principles here are very dependent on current AI capabilities.

Making talks by hand

Roughly speaking, the contents of a talk can be decomposed into three levels. From highest to lowest level, they are:

  • The story you’re telling the audience
  • The contents of each slide
  • The words you actually say

Before you even make a single slide, you should already have a good idea of the story of the talk, which is your overarching narrative. This includes things like, what is the motivation for studying this problem, which are the important related works to mention, what technical results to highlight, etc.

Designing the story is the single most valuable part in preparing any talk, both in terms of your intellectual engagement with your own research and making sure you give a coherent and educational talk. Every one of these details requires thought and consideration. Why should people care about this problem? What examples illustrate the problem cleanly and intuitively? Which technical results are the most interesting, and how should they be simplified for the purpose of a talk?

Note that all of these story design choices are highly subjective. There is no “one talk” to give about a particular result, and two co-authors of the same paper may prepare and give totally different talks. That type of variance is fine and by design. A talk is meant to give your own perspective on the research results, and that should come through in the story you choose to tell. This is why you shouldn’t rely too heavily on slides from talk already given by a co-author: it’s their story, it doesn’t have to be yours.

The process of designing your story will also help you understand your own results better. Thinking about how to motivate a problem will help you discover new connections. Thinking about a simple example will help you understand the nature of the phenomenon you’re demonstrating. Thinking about which results to present and how to do it (e.g., what to simplify) will help you distill the core technical ideas in your work. The important part is not only the final content on your slides, but the thinking and your process to get there.

Once you design and internalize your story, it will be useful in other settings as well. For example, if you haven’t written the paper yet, it will make the introduction and technical overview much easier to write. And if someone asks you to explain your result impromptu, you’ll be able to do so more clearly.

Thus far, our discussion has been about designing the story. I often find that, when you have a sufficiently clear story in your head, writing it down is very easy. But there is another disconnect, between the words and pictures on the slides and the actual words you say. This is where practice and rehearsal comes in: you’ll frequently find that the most natural way to present a slide is entirely different than what you put down in the first place, and you have to go back and refine the slide afterwards (and potentially iterate). This is another reason why using someone else’s slides is a common pitfall: the way that is natural for you to present a result is not the same way that is natural for them.

It’s not the main point of this post, but a brief comment on delivery. It should be clear to the audience that you care about what you’re presenting about. After all, if you don’t care, why should they? This is of course challenging for some people, who are not naturally charismatic or are afraid of public speaking. But this is a core part of academic research, so it will benefit you to improve at it.

As a final point here, in case it’s not obvious, why is giving good talks an important thing to do? In some sense, being a researcher is being an influencer. Research is inherently an attentionally-driven field. You can have the most brilliant ideas and results, but if no one knows about or understands them, then the value of the research is a fraction of what it could be. A talk focuses all the attention on you and your results, so make the most of the spotlight when you have it.

How can AI help?

I’ll start by outlining ways that AI can hurt you when preparing talks, but end with some suggestions on how it can help you make better talks than you made before.

Using AI too broadly can be disastrous. Suppose you use it to design your entire talk. Then you are completely skipping the sometimes-painful but essential process of coming up with the story. You won’t understand your research as deeply. You won’t have internalized it as you would have if you sweated through it, and you won’t be able to explain the research as clearly in settings when you don’t have AI at your fingertips.

I’ve talked with people who are very anti-AI for writing technical content, due to the value of thinking through the details, but are OK with using AI for helping make talks, since (in their opinion) it’s not the “real” part of research. I disagree strongly with the latter: good communication is an essential part of research, and (hopefully as I’ve argued above) thinking about how to communicate something well is an essential part of the technical component of research.

Using AI too heavily also discards a lot of you from your talks. People came to the talk not only to hear about the research, but for your perspective on the research. The authenticity is very important (and, I believe, will only become more important over time).

This ties into another point, about delivery, where it is important to show that you care about your research. The easiest way to show you don’t care about your research is to dump slop slides (i.e., only lightly edited AI outputs) at your audience. You expect me to believe this is something you care at all about if you can’t spend just a few hours preparing a decent talk? It also signals that you have no respect for your audience. If I catch you presenting slop slides, you will correspondingly lose a lot of my respect for a long time.

AI can also result in slides that have a big disconnect in the content on the slides versus what you would actually say. It’s effectively like having someone else give you their slides. The “AI’s voice” is likely to be entirely different from yours. You can of course refine and iterate on them, but at that point, you might as well just make the slides yourself.

So with this said, what can AI be useful for? AI can be help by giving a second opinion on things. Maybe you’ve done the thinking yourself already (which, as mentioned before, is the important part), and you want to see if there’s any other perspectives you didn’t consider. It might suggest some other ideas that you already thought of and discarded, or other new ones that you hadn’t considered. But you should really think of it as being a second opinion, your own perspective and preferences should still dominate.

AI can also help make content that you wouldn’t have the time or the skills to make before. For example, AI can help make high-quality visualizations, animations, and interactive demos. These tools can help communicate an underlying technical idea much better than just words, but could previously have been too difficult to actually create. EDIT: After I released this blog post, Neeldhara Misra linked a talk by S Viswanath, which (at a glance, I only skimmed) appears to use AI quite well to improve the quality of the talk.

AI models are not going away. You should use them to make your talks better: communicate and express your underlying message in a way you would not have been able to do before. But you should not use them in a way that makes your thinking or your talks worse: stealing away your valuable thoughts, removing what makes you you from your talks, or embarrassing yourself in front of your professional colleagues.

EDIT: see also Jessica Hullman’s independent follow-up post about AI use in talks.

By Gautam

Congratulations, Dr. Patris!

from David Eppstein

I participated today in the successful Ph.D. defense of Nikolas Patris, a student of my newly-tenured colleague Ioannis Panageas. Nikolas has been working on problems of learning Nash equilibria of games and more generally finding saddle points of smooth functions, a crossover area between theoretical computer science and machine learning. His results are theoretical but his papers on them are in machine learning conferences:

I participated today in the successful Ph.D. defense of Nikolas Patris, a student of my newly-tenured colleague Ioannis Panageas. Nikolas has been working on problems of learning Nash equilibria of games and more generally finding saddle points of smooth functions, a crossover area between theoretical computer science and machine learning. His results are theoretical but his papers on them are in machine learning conferences:

The main theme of his thesis and defense was that games can have a nice structure and still not be efficiently learnable. He started with a positive result from ICLR 2024: if the difference of the two payoff matrices of a two-player game has rank one, then the problem of finding an approximate equilibrium strategy can be reduced to a one-dimensional search over a parameterized family of zero-sum games, and approximately solved in very few iterations by an algorithm that alternates between trying to learn how to play one of these zero-sum games, and adjusting the parameter to find a game in the parameterized family that more accurately approximates the rank-one game the algorithm is really trying to solve. Rank zero is just the case of zero-sum games themselves, and for rank two it is already \(\mathsf{PPAD}\)-complete to find an equilibrium.

Next in the defense, he discussed exponential lower bounds for certain game-learning strategies. He started with cooperative games in which both players have equal payoffs; in these it is trivial to find the right strategy from a game matrix but it is still interesting to understand how quickly one can converge using simple iterative methods that do not know the matrix explicitly. He started with fictitious play in which each player chooses the move that would have the best average payoff against the previous move history. In work from NeurIPS 2023, using a game matrix containing a hidden spiral path of improving payoffs, he showed that this strategy takes factorial time to make each successive step along the path and therefore factorial time (in the number of game options) to converge to an approximately-optimal strategy. He summarized this idea, that averaging the entire past history produces a significant slowdown in convergence time, as “history creates inertia”. The same factorial slowdown also applies to “follow the regularized leader”, a family of learning algorithms with smoother dynamics that involve mixed rather than pure game strategies (ICML 2026).

In \(n\)-player games with two options per player, the slowdown is even worse. Here, the two-dimensional payoff matrix of a symmetric two-player game is replaced by a hypercube of payoffs (with equal payoffs to all players) with a dimension for the two choices of each player. Instead of a spiral path, Nikolas observed that a hidden path of improving payoffs could be realized using a solution to the “snake in the box problem”, asking for the longest induced path in a hypercube. Solutions to this problem, for an \(n\)-dimensional hypercube, have length \(\Omega(2^n)\), although the optimal constant of proportionality remains unknown. Plugging in the factorial slowdown for each step along the path, for both fictitious play and follow the regularized leader, gives a doubly exponential lower bound on the convergence time (ICML 2026). However, here the multidimensional payoff matrix has exponential size, so the slowdown is only singly exponential in the description complexity of the game instance. This motivates an interesting and unsolved variation of the snake-in-the-box problem: is it possible to find an exponentially long induced path in the hypercube so that determining whether a hypercube vertex is on the path, and if so how far along the path it lies, can be computed in polynomial time? If so it would give a concisely-represented multiplayer game with convergence time doubly exponential in the representation size.

Solution to the four-dimensional snake-in-the-box problem: a seven-vertex induced path in a 16-vertex hypercube graph

Even more generally, he considered follow the regularized leader as a method for finding saddle points of smooth functions and showed that, in this general context, it does not even reach stationarity.

Nikolas has still not settled on what to do next; he has a good postdoc offer but also has some conflicting geographic constraints. Regardless, he has produced a strong dissertation, and I was especially impressed at how clearly he explained these concepts that were in many cases not very familiar to me. Congratulations, Nikolas, and also, congratulations, Ioannis!

(Discuss on Mastodon)

By David Eppstein

Authorship in the AI Age

from Computational Complexity

The technical paper for the Erdős Unit Distance Problem lists only "OpenAI" as an author. When Bill posted on Sunday about the Erdős distance problems, he mentioned the names of OpenAI researchers who prompted and checked the proof. Sebastien Bubeck of OpenAI reached out directly and later as a comment saying this was really an OpenAI-wide achievement, and we shouldn't just single out people at the end of the funnel. I agree with Bubeck for this paper, but it brings up some challenging authorship issues for the future.

Most publishers follow the position of the Committee on Publication Ethics that AI tools cannot be listed as an author of a paper.

AI tools cannot meet the requirements for authorship as they cannot take responsibility for the submitted work. As non-legal entities, they cannot assert the presence or absence of conflicts of interest nor manage copyright and license agreements.

The paper doesn't list the model as an author, rather the organization (OpenAI) that produced it. The closest analog might be the paper that verified the Higgs boson, which has only "ATLAS Collaboration" on the author line and lists the nearly three thousand authors at the end of the paper. If OpenAI were to publish their paper, they should list everyone who worked on the model with at least a corresponding author to handle the legal issues mentioned in the COPE position.

There is also all the mathematicians whose work was fed into the training data, but I wouldn't list them even if we could. We don't add as authors those whose work we rely on, rather we cite them, and the OpenAI paper does have a healthy references section.

Shortly after the OpenAI announcement, Princeton Professor Will Sawin improved and gave an explicit exponent for the lower bound in a currently singularly authored paper. Often, but not always, when a paper improves a bound of a recently announced result, the papers get merged. I don't expect that to happen here, but if it did I would have no idea how to handle the authorship of the combined paper beyond just listing Sawin and OpenAI as the authors.

OpenAI used an internal model for this work, but one would hope they will eventually release this model to the public. If someone outside of OpenAI uses a released model to answer another open math question, even with a single prompt, do they need to add OpenAI as a co-author? I think not, as the model now becomes a tool when released. The researcher should disclose the role the model played, but the model's creators no longer need to be authors.

How about a thought experiment. Suppose you are working with a student by email on a paper and you put in equal effort so you will both be co-authors. Then you discover the student was mostly just feeding your emails to ChatGPT and sending you back the responses. Now who should be on the author list? The student should come off but it would feel strange writing this a solely authored paper. 

What if someone builds an AI mathematician that doesn't take any prompts? It just reads papers and occasionally writes its own. Who is the author? That "someone". What if they just used a regular AI agent and gave it the prompt "You are a research mathematician. Go read papers, prove theorems and write up your results." Maybe we'll have AI math journals filled with papers written, edited and refereed by AI models. I wonder how we'll COPE with that.

By Lance Fortnow

The technical paper for the Erdős Unit Distance Problem lists only "OpenAI" as an author. When Bill posted on Sunday about the Erdős distance problems, he mentioned the names of OpenAI researchers who prompted and checked the proof. Sebastien Bubeck of OpenAI reached out directly and later as a comment saying this was really an OpenAI-wide achievement, and we shouldn't just single out people at the end of the funnel. I agree with Bubeck for this paper, but it brings up some challenging authorship issues for the future.

Most publishers follow the position of the Committee on Publication Ethics that AI tools cannot be listed as an author of a paper.

AI tools cannot meet the requirements for authorship as they cannot take responsibility for the submitted work. As non-legal entities, they cannot assert the presence or absence of conflicts of interest nor manage copyright and license agreements.

The paper doesn't list the model as an author, rather the organization (OpenAI) that produced it. The closest analog might be the paper that verified the Higgs boson, which has only "ATLAS Collaboration" on the author line and lists the nearly three thousand authors at the end of the paper. If OpenAI were to publish their paper, they should list everyone who worked on the model with at least a corresponding author to handle the legal issues mentioned in the COPE position.

There is also all the mathematicians whose work was fed into the training data, but I wouldn't list them even if we could. We don't add as authors those whose work we rely on, rather we cite them, and the OpenAI paper does have a healthy references section.

Shortly after the OpenAI announcement, Princeton Professor Will Sawin improved and gave an explicit exponent for the lower bound in a currently singularly authored paper. Often, but not always, when a paper improves a bound of a recently announced result, the papers get merged. I don't expect that to happen here, but if it did I would have no idea how to handle the authorship of the combined paper beyond just listing Sawin and OpenAI as the authors.

OpenAI used an internal model for this work, but one would hope they will eventually release this model to the public. If someone outside of OpenAI uses a released model to answer another open math question, even with a single prompt, do they need to add OpenAI as a co-author? I think not, as the model now becomes a tool when released. The researcher should disclose the role the model played, but the model's creators no longer need to be authors.

How about a thought experiment. Suppose you are working with a student by email on a paper and you put in equal effort so you will both be co-authors. Then you discover the student was mostly just feeding your emails to ChatGPT and sending you back the responses. Now who should be on the author list? The student should come off but it would feel strange writing this a solely authored paper. 

What if someone builds an AI mathematician that doesn't take any prompts? It just reads papers and occasionally writes its own. Who is the author? That "someone". What if they just used a regular AI agent and gave it the prompt "You are a research mathematician. Go read papers, prove theorems and write up your results." Maybe we'll have AI math journals filled with papers written, edited and refereed by AI models. I wonder how we'll COPE with that.

By Lance Fortnow

2-ASP(Q) programs with weak constraints: Complexity and efficient implementation

from arXiv: Computational Complexity

Authors: Andrea Cuteri, Giuseppe Mazzotta, Francesco Ricca

ASP(Q) extends Answer Set Programming (ASP) with Quantifiers over answer sets. In this paper we focus on the class of ASP(Q) programs with two quantifiers and weak constraints, denoted as 2-ASP(Q)^w. 2-ASP(Q)^w is a practically relevant fragment of ASP(Q) that is expressive enough to capture optimization problems up to the class Delta_3^P. On the theoretical side, we provide a complete complexity characterization of the main computational tasks for 2-ASP(Q)^w programs, including tight completeness results and the analysis of nontrivial cases that have not been addressed in previous works. On the practical side, we introduce novel strategies for computing (optimal) quantified answer sets in the Casper system, that rely on a Counterexample-Guided Abstraction Refinement (CEGAR) technique tailored to ASP(Q). An experimental evaluation on hard benchmarks from different application domains shows that the proposed techniques are effective in practice.

Authors: Andrea Cuteri, Giuseppe Mazzotta, Francesco Ricca

ASP(Q) extends Answer Set Programming (ASP) with Quantifiers over answer sets. In this paper we focus on the class of ASP(Q) programs with two quantifiers and weak constraints, denoted as 2-ASP(Q)^w. 2-ASP(Q)^w is a practically relevant fragment of ASP(Q) that is expressive enough to capture optimization problems up to the class Delta_3^P. On the theoretical side, we provide a complete complexity characterization of the main computational tasks for 2-ASP(Q)^w programs, including tight completeness results and the analysis of nontrivial cases that have not been addressed in previous works. On the practical side, we introduce novel strategies for computing (optimal) quantified answer sets in the Casper system, that rely on a Counterexample-Guided Abstraction Refinement (CEGAR) technique tailored to ASP(Q). An experimental evaluation on hard benchmarks from different application domains shows that the proposed techniques are effective in practice.

Polynomial-time isomorphism test for groups with abelian Sylow subgroups

from arXiv: Computational Complexity

Authors: Saveliy V. Skresanov

The group isomorphism problem in computational complexity asks whether two finite groups given by their Cayley tables are isomorphic or not. Although polynomial-time isomorphism tests exist for many specific types of groups, no general polynomial-time algorithm is known, classes of solvable and nilpotent groups being the main obstacles. In 2012 Babai and Qiao gave a polynomial-time isomorphism test for the class of solvable groups admitting normal series with abelian Sylow factors. We generalize their result and give a polynomial-time isomorphism test for A-groups, i.e. groups with abelian Sylow subgroups. The algorithm heavily relies both on the computational methods developed by Babai and Qiao, and structural properties of A-groups.

Authors: Saveliy V. Skresanov

The group isomorphism problem in computational complexity asks whether two finite groups given by their Cayley tables are isomorphic or not. Although polynomial-time isomorphism tests exist for many specific types of groups, no general polynomial-time algorithm is known, classes of solvable and nilpotent groups being the main obstacles. In 2012 Babai and Qiao gave a polynomial-time isomorphism test for the class of solvable groups admitting normal series with abelian Sylow factors. We generalize their result and give a polynomial-time isomorphism test for A-groups, i.e. groups with abelian Sylow subgroups. The algorithm heavily relies both on the computational methods developed by Babai and Qiao, and structural properties of A-groups.

Euclidean Steiner Shallow-Light Trees in Higher Dimensions

from arXiv: Computational Geometry

Authors: Devin Frost, Kimberly Kokado, Csaba D. Tóth

This paper proves a conjecture by Solomon about Steiner shallow-light trees (SLT) in Euclidean $d$-space: It is shown that for any finite point set $\mathbb{R}^d$, any root, and any $ε>0$, there is a Euclidean Steiner $(1+ε,O(\sqrt{1/ε}))$-SLT without any dependence on dimension. We also revisit the core example, designed by Solomon, in the plane and its generalization to $d$-space.

Authors: Devin Frost, Kimberly Kokado, Csaba D. Tóth

This paper proves a conjecture by Solomon about Steiner shallow-light trees (SLT) in Euclidean $d$-space: It is shown that for any finite point set $\mathbb{R}^d$, any root, and any $ε>0$, there is a Euclidean Steiner $(1+ε,O(\sqrt{1/ε}))$-SLT without any dependence on dimension. We also revisit the core example, designed by Solomon, in the plane and its generalization to $d$-space.

Rotation-Invariant Vectorized Shape Representations

from arXiv: Computational Geometry

Authors: Hamid Shafieasl, Jeff M. Phillips

We introduce a rotation-invariant representation of planar shapes. In particular, this representation encodes shapes as vectors such that the Euclidean distance between them serves as a valid shape distance. For standardized, star-shaped objects, we can deterministically create a sketched vector of dimension $O(1/\varepsilon)$ in $O((1/\varepsilon) \log (1/\varepsilon))$ time that approximates this shape distance to within $\varepsilon$. Moreover, because the representation is a standard Euclidean vector, we can directly and efficiently perform various data analyses, such as nearest neighbor search and clustering, in shape space, inherently invariant to the rotation of the shapes. We demonstrate this through a series of simple experiments. The key technical contribution operates on functions over $\mathbb{S}^1$, which we use to encode standardized objects. The most general rotation-invariant representation of these functions works through a map to an infinite-dimensional function space, parameterized by an offset parameter. By analyzing special discretized cases of these functions, we show that the representation is strictly injective up to the desired rotation and a mirror-flip-type operation we call \emph{reverse of complement} (RoC). While RoC status can be controlled by how the function is defined, it is inherent to the representation and required to be handled in the analysis. Regardless, the vectorized representation is robust to small shape perturbations, and hence discretizing the angles leads to the efficient approximation and algorithm.

Authors: Hamid Shafieasl, Jeff M. Phillips

We introduce a rotation-invariant representation of planar shapes. In particular, this representation encodes shapes as vectors such that the Euclidean distance between them serves as a valid shape distance. For standardized, star-shaped objects, we can deterministically create a sketched vector of dimension $O(1/\varepsilon)$ in $O((1/\varepsilon) \log (1/\varepsilon))$ time that approximates this shape distance to within $\varepsilon$. Moreover, because the representation is a standard Euclidean vector, we can directly and efficiently perform various data analyses, such as nearest neighbor search and clustering, in shape space, inherently invariant to the rotation of the shapes. We demonstrate this through a series of simple experiments. The key technical contribution operates on functions over $\mathbb{S}^1$, which we use to encode standardized objects. The most general rotation-invariant representation of these functions works through a map to an infinite-dimensional function space, parameterized by an offset parameter. By analyzing special discretized cases of these functions, we show that the representation is strictly injective up to the desired rotation and a mirror-flip-type operation we call \emph{reverse of complement} (RoC). While RoC status can be controlled by how the function is defined, it is inherent to the representation and required to be handled in the analysis. Regardless, the vectorized representation is robust to small shape perturbations, and hence discretizing the angles leads to the efficient approximation and algorithm.

Low Soundness Linearity Testing on the Half-Slice

from arXiv: Data Structures and Algorithms

Authors: Haakon Larsen, Tushant Mittal, Silas Richelson, Sourya Roy

Let $f: T\to \{ 0,1 \}$ be a Boolean function on the Boolean half-slice, $T$, \ie elements of $\{0,1\}^n$ with Hamming weight $n/2$. We show that if $f(x)+f(y)=f(x+y)$ holds with probability $\frac{1+δ}{2}$ over a uniform pair $(x,y)$ such that $x,y,x+y\in T$, then $f$ agrees with some linear function on at least $\frac{1+δ}{2}-o(1)$ fraction of the points in $T$. More generally, we show that if $f$ passes the natural $k$-query BLR test with probability $\frac{1+δ}{2}$ for any $k\geq3$, then it must agree with some affine function at $\frac{1+δ^{\frac{1}{k-2}}}{2}-o(1)$ fraction of the points in $T$. The only other known linearity test for the slice in the low soundness regime (i.e., when $δ$ can be arbitrarily small) was given by Kalai, Lifshitz, Minzer, and Ziegler [FOCS'24]. Our result improves upon this result in two significant ways: firstly, it works for $k=3$ queries, instead of requiring $k\geq4$; secondly, our result is sharper, e.g., when $k=4$, we are able to conclude an agreement of $\frac{1+\sqrtδ}{2}-o(1)$ instead of $\frac{1+c\sqrtδ}{2}$ for $c\approx.0035$. In particular, our result matches (up to the $o(1)$ term) the conclusion one obtains over the full hypercube via the classical BLR analysis. Our main technical contribution is a new dense model theorem using bounds on Krawtchouk polynomials. Using these Krawtchouk polynomial bounds, we also obtain a simple $k$-query test ($k\geq 5$) that avoids any use of the dense model machinery. This simplified test naturally extends to the slice over the $q$-ary hypercube, giving the first such result over larger alphabets.

Authors: Haakon Larsen, Tushant Mittal, Silas Richelson, Sourya Roy

Let $f: T\to \{ 0,1 \}$ be a Boolean function on the Boolean half-slice, $T$, \ie elements of $\{0,1\}^n$ with Hamming weight $n/2$. We show that if $f(x)+f(y)=f(x+y)$ holds with probability $\frac{1+δ}{2}$ over a uniform pair $(x,y)$ such that $x,y,x+y\in T$, then $f$ agrees with some linear function on at least $\frac{1+δ}{2}-o(1)$ fraction of the points in $T$. More generally, we show that if $f$ passes the natural $k$-query BLR test with probability $\frac{1+δ}{2}$ for any $k\geq3$, then it must agree with some affine function at $\frac{1+δ^{\frac{1}{k-2}}}{2}-o(1)$ fraction of the points in $T$. The only other known linearity test for the slice in the low soundness regime (i.e., when $δ$ can be arbitrarily small) was given by Kalai, Lifshitz, Minzer, and Ziegler [FOCS'24]. Our result improves upon this result in two significant ways: firstly, it works for $k=3$ queries, instead of requiring $k\geq4$; secondly, our result is sharper, e.g., when $k=4$, we are able to conclude an agreement of $\frac{1+\sqrtδ}{2}-o(1)$ instead of $\frac{1+c\sqrtδ}{2}$ for $c\approx.0035$. In particular, our result matches (up to the $o(1)$ term) the conclusion one obtains over the full hypercube via the classical BLR analysis. Our main technical contribution is a new dense model theorem using bounds on Krawtchouk polynomials. Using these Krawtchouk polynomial bounds, we also obtain a simple $k$-query test ($k\geq 5$) that avoids any use of the dense model machinery. This simplified test naturally extends to the slice over the $q$-ary hypercube, giving the first such result over larger alphabets.

Virtual-Memory Powersort

from arXiv: Data Structures and Algorithms

Authors: Finn Moltmann, Tamio-Vesa Nakajima, Sebastian Wild

We give a more space-efficient implementation of adaptive mergesort: Virtual-Memory Powersort. Using internal buffering techniques, we significantly reduce the memory consumption of the algorithm; specifically, for sorting $n$ objects the required buffer area is reduced from space for $n/2$ objects to $O(\sqrt{n \log n})$ objects. While this space-efficiency can be achieved (indeed reduced to $O(1)$) conceptually very easily with known inplace merging algorithms, using these as a drop-in replacement for the standard merge algorithm incurs a substantial slow-down. Virtual-Memory Powersort, by contrast, uses the same number of moves and comparisons as previous Powersort implementations up to an additive $O(n)$ term. We report on an empirical running-time study comparing our implementation against other Powersort variants and state-of-the-art stable sorting methods, demonstrating that almost in-place stable sorting can be achieved with negligible overhead in many scenarios.

Authors: Finn Moltmann, Tamio-Vesa Nakajima, Sebastian Wild

We give a more space-efficient implementation of adaptive mergesort: Virtual-Memory Powersort. Using internal buffering techniques, we significantly reduce the memory consumption of the algorithm; specifically, for sorting $n$ objects the required buffer area is reduced from space for $n/2$ objects to $O(\sqrt{n \log n})$ objects. While this space-efficiency can be achieved (indeed reduced to $O(1)$) conceptually very easily with known inplace merging algorithms, using these as a drop-in replacement for the standard merge algorithm incurs a substantial slow-down. Virtual-Memory Powersort, by contrast, uses the same number of moves and comparisons as previous Powersort implementations up to an additive $O(n)$ term. We report on an empirical running-time study comparing our implementation against other Powersort variants and state-of-the-art stable sorting methods, demonstrating that almost in-place stable sorting can be achieved with negligible overhead in many scenarios.

Improved Hardness Results for Nash Social Welfare, Budgeted Allocation and GAP via the Unique Games Conjecture

from arXiv: Data Structures and Algorithms

Authors: Vignesh Viswanathan

We consider the problem of dividing a set of indivisible goods among agents with additive valuations. This problem has been studied under various objectives in both the computer science and the operations research literature. Our main contribution is a novel dictator test using this problem, which can separate a dictator from any function sufficiently far from a dictator. We use this test to prove the following hardness results (assuming the unique games conjecture is true): (1) We show that it is NP-hard to approximate the max Nash welfare by a factor better than $\sqrt[3]{\frac{81}{65}} - \varepsilon \approx 1.0761$. This improves on the previous best known inapproximability factor of $\sqrt{\frac87} - \varepsilon \approx 1.069$. (2) We show that it is NP-hard to approximate the maximum budgeted allocation by a factor better than $\frac{243}{227} - \varepsilon \approx 1.07$. This improves on the previous best known inapproximability factor of $\frac{16}{15} - \varepsilon \approx 1.067$. (3) We show that it is NP-hard to approximate the max generalized assignment problem (GAP) by a factor better than $\frac{145}{129} - \varepsilon \approx 1.124$. This improves on the previous best known inapproximability factor of $\frac{11}{10} - \varepsilon \approx 1.10$.

Authors: Vignesh Viswanathan

We consider the problem of dividing a set of indivisible goods among agents with additive valuations. This problem has been studied under various objectives in both the computer science and the operations research literature. Our main contribution is a novel dictator test using this problem, which can separate a dictator from any function sufficiently far from a dictator. We use this test to prove the following hardness results (assuming the unique games conjecture is true): (1) We show that it is NP-hard to approximate the max Nash welfare by a factor better than $\sqrt[3]{\frac{81}{65}} - \varepsilon \approx 1.0761$. This improves on the previous best known inapproximability factor of $\sqrt{\frac87} - \varepsilon \approx 1.069$. (2) We show that it is NP-hard to approximate the maximum budgeted allocation by a factor better than $\frac{243}{227} - \varepsilon \approx 1.07$. This improves on the previous best known inapproximability factor of $\frac{16}{15} - \varepsilon \approx 1.067$. (3) We show that it is NP-hard to approximate the max generalized assignment problem (GAP) by a factor better than $\frac{145}{129} - \varepsilon \approx 1.124$. This improves on the previous best known inapproximability factor of $\frac{11}{10} - \varepsilon \approx 1.10$.

On the Detection of Commutative Factors in Factor Graphs: Necessary and Sufficient Conditions

from arXiv: Data Structures and Algorithms

Authors: Malte Luttermann, Ralf Möller, Marcel Gehrke

Exploiting the indistinguishability of objects in a probabilistic graphical model such as a factor graph is key to lifted probabilistic inference algorithms and allows for tractable probabilistic inference problems with respect to domain sizes. A central building block for the exploitation of indistinguishable objects in factor graphs is the identification of commutative factors, i.e., factors whose output values are invariant under permutations of input values assigned to a subset of their arguments. In this paper, we revisit the theoretical foundations underlying the state-of-the-art algorithm to detect commutative factors. Specifically, we show that in its current form, the state-of-the-art algorithm relies on a central theorem that is mistakenly regarded as a sufficient condition to identify commutative factors, while it actually only implies necessary condition. Consequently, the state of the art might, as we show in this paper, deliver incorrect results. To fix the flaws currently present in the state of the art, we prove a slightly modified version of the aforementioned theorem, which serves as a necessary condition to identify commutative factors. Moreover, we present a corrected version of the state-of-the-art algorithm, which keeps its efficiency while ensuring correctness and introduce a complementary algorithm with tighter worst-case bounds.

Authors: Malte Luttermann, Ralf Möller, Marcel Gehrke

Exploiting the indistinguishability of objects in a probabilistic graphical model such as a factor graph is key to lifted probabilistic inference algorithms and allows for tractable probabilistic inference problems with respect to domain sizes. A central building block for the exploitation of indistinguishable objects in factor graphs is the identification of commutative factors, i.e., factors whose output values are invariant under permutations of input values assigned to a subset of their arguments. In this paper, we revisit the theoretical foundations underlying the state-of-the-art algorithm to detect commutative factors. Specifically, we show that in its current form, the state-of-the-art algorithm relies on a central theorem that is mistakenly regarded as a sufficient condition to identify commutative factors, while it actually only implies necessary condition. Consequently, the state of the art might, as we show in this paper, deliver incorrect results. To fix the flaws currently present in the state of the art, we prove a slightly modified version of the aforementioned theorem, which serves as a necessary condition to identify commutative factors. Moreover, we present a corrected version of the state-of-the-art algorithm, which keeps its efficiency while ensuring correctness and introduce a complementary algorithm with tighter worst-case bounds.

Parsimonious Learning-Augmented Online Metric Matching

from arXiv: Data Structures and Algorithms

Authors: Yongho Shin, Phanu Vajanopath

Learning-augmented algorithms have received significant attention in recent years, particularly in the context of online optimization. Motivated by the high computational cost of generating predictions, a growing line of work studies the tradeoff between performance guarantees and the number of predictions used in learning-augmented algorithms for problems such as caching and metrical task systems. In this paper, we extend this line of research to online metric matching by developing parsimonious learning-augmented algorithms and establishing lower bounds on their performance. Our approach extends the Follow-the-Prediction framework to the parsimonious setting by filling in a virtual prediction in the absence of an actual prediction, using an online metric matching algorithm that maintains good intermediate matchings throughout its execution. We complement our theoretical results with an empirical evaluation, demonstrating the practical effectiveness of our approach.

Authors: Yongho Shin, Phanu Vajanopath

Learning-augmented algorithms have received significant attention in recent years, particularly in the context of online optimization. Motivated by the high computational cost of generating predictions, a growing line of work studies the tradeoff between performance guarantees and the number of predictions used in learning-augmented algorithms for problems such as caching and metrical task systems. In this paper, we extend this line of research to online metric matching by developing parsimonious learning-augmented algorithms and establishing lower bounds on their performance. Our approach extends the Follow-the-Prediction framework to the parsimonious setting by filling in a virtual prediction in the absence of an actual prediction, using an online metric matching algorithm that maintains good intermediate matchings throughout its execution. We complement our theoretical results with an empirical evaluation, demonstrating the practical effectiveness of our approach.

Where to Split and When to Charge: Optimal Route Construction from Customer Permutations in Electric Vehicle Routing

from arXiv: Data Structures and Algorithms

Authors: Leon Stjepan Uroić, Marko Đurasević

Permutation-based metaheuristics are widely used for electric vehicle routing, where candidate solutions are represented as ordered sequences of customers. Such sequences, however, do not directly define feasible vehicle routes: they must be decoded by choosing where to split the permutation into routes and where to insert charging-station visits, subject to cargo capacity and battery constraints. These decisions are inherently interdependent, since each return to the depot both separates consecutive routes and restores the vehicle battery. This paper formalizes the task as the Fixed-Permutation Splitting and Charging Problem and proposes an exact forward labeling algorithm that constructs a minimum-distance feasible decoding of a fixed customer permutation using dynamic programming with dominance pruning. We further derive restricted variants representing increasingly simplified decoding strategies: first separating route splitting from charging-station insertion, and then additionally limiting each inter-customer segment to at most one charging-station visit. Computational experiments on benchmark and randomly generated instances, including comparisons with heuristic decoders from the literature, confirm that the exact decoder remains tractable in practice and reveal a clear hierarchy among decoding strategies. The most restrictive variant achieves runtimes close to those of heuristic decoders while delivering substantially higher decoding success rates and better solution quality. Less restrictive variants further improve quality and robustness at the cost of additional runtime. The exact joint decoder provides the optimal reference for each fixed permutation, clarifying the trade-offs introduced by common decoding simplifications.

Authors: Leon Stjepan Uroić, Marko Đurasević

Permutation-based metaheuristics are widely used for electric vehicle routing, where candidate solutions are represented as ordered sequences of customers. Such sequences, however, do not directly define feasible vehicle routes: they must be decoded by choosing where to split the permutation into routes and where to insert charging-station visits, subject to cargo capacity and battery constraints. These decisions are inherently interdependent, since each return to the depot both separates consecutive routes and restores the vehicle battery. This paper formalizes the task as the Fixed-Permutation Splitting and Charging Problem and proposes an exact forward labeling algorithm that constructs a minimum-distance feasible decoding of a fixed customer permutation using dynamic programming with dominance pruning. We further derive restricted variants representing increasingly simplified decoding strategies: first separating route splitting from charging-station insertion, and then additionally limiting each inter-customer segment to at most one charging-station visit. Computational experiments on benchmark and randomly generated instances, including comparisons with heuristic decoders from the literature, confirm that the exact decoder remains tractable in practice and reveal a clear hierarchy among decoding strategies. The most restrictive variant achieves runtimes close to those of heuristic decoders while delivering substantially higher decoding success rates and better solution quality. Less restrictive variants further improve quality and robustness at the cost of additional runtime. The exact joint decoder provides the optimal reference for each fixed permutation, clarifying the trade-offs introduced by common decoding simplifications.

Tree Search With Predictions

from arXiv: Data Structures and Algorithms

Authors: Michael Dinitz, Bob Dong

``Algorithms with predictions'', or ``learning-augmented algorithms'', has proved to be an extremely useful paradigm for combining machine learning with traditional algorithms. One of the textbook settings for this is searching a sorted array. Without a prediction, classical binary search takes $O(\log n)$ queries, while with a prediction we can use ``doubling binary search'' to find the target key using $O(\log η)$ queries, where $η$ is the error of the prediction measured as the absolute value of the difference between the true location and the predicted location. Since an array is just a path graph, in this paper we ask whether similar bounds can be achieved for search on even slightly more general graphs: trees. We show first that the high-level answer is ``no'': there is no search algorithm that uses $O(\log η)$ queries, where $η$ is now the graph distance between the predicted location and the true location. However, as our main result, we show that such bounds can be achieved on trees which are ``path-like'' in that they have low \emph{pathwidth}. In particular, we prove that there is a search algorithm which uses at most $O(k \log η)$ queries, where $k$ is the pathwidth of the tree. We also prove a lower bound showing that our algorithm has existentially optimal query complexity. Finally, we show experimentally, on real-life inputs, that our algorithm has query complexity which is notably better than the simple non-prediction-based algorithm.

Authors: Michael Dinitz, Bob Dong

``Algorithms with predictions'', or ``learning-augmented algorithms'', has proved to be an extremely useful paradigm for combining machine learning with traditional algorithms. One of the textbook settings for this is searching a sorted array. Without a prediction, classical binary search takes $O(\log n)$ queries, while with a prediction we can use ``doubling binary search'' to find the target key using $O(\log η)$ queries, where $η$ is the error of the prediction measured as the absolute value of the difference between the true location and the predicted location. Since an array is just a path graph, in this paper we ask whether similar bounds can be achieved for search on even slightly more general graphs: trees. We show first that the high-level answer is ``no'': there is no search algorithm that uses $O(\log η)$ queries, where $η$ is now the graph distance between the predicted location and the true location. However, as our main result, we show that such bounds can be achieved on trees which are ``path-like'' in that they have low \emph{pathwidth}. In particular, we prove that there is a search algorithm which uses at most $O(k \log η)$ queries, where $k$ is the pathwidth of the tree. We also prove a lower bound showing that our algorithm has existentially optimal query complexity. Finally, we show experimentally, on real-life inputs, that our algorithm has query complexity which is notably better than the simple non-prediction-based algorithm.

Tuesday, May 26

Only One Company Makes the Game Monopoly

from Ben Recht

Revisiting David Graeber's theories of bureaucracy, violence, and interpretative labor.

In passing, I mentioned a new book by C. Thi Nguyen, The Score, which asks the question: Why are numerical scores fun in video games yet oppressive in social metrics? Or, more succinctly: Why do we love games and hate rules?

The Score is simultaneously a philosophy book, a self-help book, and a gentle introduction to the contemporary academic study of institutions. I applaud Nguyen for recognizing that a philosophy of games and rules needs to engage with ethnographic disciplines to make sense of why metric chasing is core to our current condition. His book makes the tension between individuals and populations more visible for those less willing to immerse themselves in the vast literature of science and technology studies.

Nguyen clearly summarizes the work on bureaucracy, statistics, and rules by scholars like Lorraine Daston, Theodore Porter, and anarchist anthropologist James Scott, whose classic Seeing Like a State is beloved by both the left and the reactionary right. But there’s another anarchist anthropologist who I think has already solved the core dilemma of The Score: David Graeber.

Graeber was not only an academic but a dedicated political activist. He was one of the central figures of the Occupy Movement, credited with helping coin its iconic slogan, “We are the 99%.” Though best known for his books Debt and Bullshit Jobs, my personal favorite is The Utopia of Rules. I like to think the subtitle was microtargeted to me: On Technology, Stupidity, and the Secret Joys of Bureaucracy. The book is a collection of five essays that, though readable separately, together provide a unified answer to Nguyen’s central question.

Graeber frames games and bureaucracy as offering similar utopian fantasies of fairness and equality. In games, everyone plays by the same rules. The outcomes are transparent. We know what it means to win and lose. Since everything is written down, we can all evaluate if we think it’s fair and argue for rule changes if it’s not. But bureaucracy promises the same thing. It has no shortage of written rules! We only have all of those rules to maintain transparency and fairness. We want everyone to be treated equally under the law, don’t we?

Score-based games let us escape in temporary fantasy, and yet, in their complex scoring systems, Graeber writes, they “reinforce the sense we live in a universe where accounting procedures define the very fabric of reality.” Video games and bureaucratic mechanisms are two reflections of the same reality. Ironically, the score-based games that Nguyen celebrates close off the imagination of alternative forms of governance.

Still, we love our board games and video games, and we hate going to the DMV. Why is that? What explains how getting enmeshed in a battle with HR or the IRS is terrifying and emotionally crippling? What explains why our popular imagination paints bureaucracy in surrealist, existential nightmares like in The Trial, Brazil, or Andor?

For Graeber, the main difference between games and bureaucracy is what he calls structural violence—“forms of pervasive social inequality that are ultimately backed up by the threat of physical harm.” Graeber argues that bureaucratic systems of endless, stupid paperwork are the defining example of structural violence in our society.

“Now, I admit that this emphasis on violence might seem odd. We are not used to thinking of nursing homes or banks or even HMOs as violent institutions—except perhaps in the most abstract and metaphorical sense. But the violence I’m referring to here is not abstract. I am not speaking of conceptual violence. I am speaking of violence in the literal sense: the kind that involves, say, one person hitting another over the head with a wooden stick. All of these are institutions involved in the allocation of resources within a system of property rights regulated and guaranteed by governments in a system that ultimately rests on the threat of force. ‘Force’ in turn is just a euphemistic way to refer to violence: that is, the ability to call up people dressed in uniforms, willing to threaten to hit others over the head with wooden sticks.”

It is this threat of force that makes bureaucracies so stupid. To see why, begin with violence itself. It is one of the only forms of human interaction that requires no interpersonal interpretation. We know exactly what happens with the conditional command: “Cross this line and I’ll shoot.” When you issue this edict, you are running a mechanical algorithm of violence that needs no understanding of the person who is coming at you. If they cross the line, you shoot them. Both you and your counterparty have a precise expectation of what happens after you pull the trigger.

This lack of interpretation is afforded only to those who have power and can actualize violence without repercussion. This consequent ability to harm makes those in power lazy. And the structures built to maintain these power structures become lazy with them.

Indeed, bureaucratic rules let those in power remain oblivious to what’s actually happening to everyone else. The stupidity of bureaucracy is a feature for those in power. It removes their need to understand those who are subject to the rules. Graeber’s argument is completely consistent with the views of stout institutionalists who tout the benefits and necessity of bureaucracy. Part of the benefit of bureaucratic systems is how they abstract complexity up a hierarchy, allowing people at each level to act without knowing all the details of what’s happening below them.

However, Graeber foregrounds the people at the bottom who are erased by quantified summaries. The erasure of those individuals into statistics means that those in power have no need to do the interpretive labor of thinking what it must be like to be them. Those who set the rules are privileged to not have to think about those forced to abide by them. By sharp contrast, those who have to deal with the rules are obliged to empathize with the powerful. They must constantly imagine how the powerful might act and react to avoid the persistent threat of violence. This is how normally intelligent people are forced to act like idiots when dealing with bureaucratic procedures.

Summarization and bureaucratization do not have to be stupid. You can read my architecture lecture from a few weeks back to see how well-designed hierarchical rule systems can create amazing outcomes. Graeber, to his credit, doesn’t disagree.

“To put it crudely: it is not so much that bureaucratic procedures are inherently stupid, or even that they tend to produce behavior that they themselves define as stupid—though they do do that—but rather, that they are invariably ways of managing social situations that are already stupid because they are founded on structural violence.”

Bureaucracy is stupid when it is used as a system of deempathization and structural violence. When rule-based systems reduce the interpretative labor of those in power, “such procedures come to partake of the very blindness and foolishness they seek to manage.”

Now, I don’t think you’re going to reach Ezra Klein and Derek Thompson listeners with Graeber’s far-left radicalism. You’re definitely not going to reach the staunch institutionalist liberals of Bluesky who hate David Graeber with every ounce of their being. I don’t mean this cynically: I think Nguyen wants to reach out to both of those audiences, and that’s his prerogative.

However, I think we are best off not forgetting Graeber and the left-wing movements that arose in the wake of the financial crisis. Though the stock market is soaring, it’s hard to argue the world is in a better place today than it was after 2008. The economist Joseph Stiglitz’s so-called 1% has lost some of its power, but only because it has ceded it to the 0.001%. There’s less faith in institutions than ever. The financialization and gamification of everything have put us all in the awkward position where every aspect of our lives is now connected to a risky set of dehumanizing rules. The radical critiques from the 2010s don’t have simple answers for our current polycrisis, but ignoring them walls off imagining better worlds.

Subscribe now

By Ben Recht