Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Tuesday, October 28

Postdoc at EPFL (apply by February 28, 2026)

from CCI: jobs

Theory Group at EPFL has multiple postdoc positions available. Application review starts in early January 2026 (with final deadline February 28th). Topics: probabilistic proofs, combinatorial optimization, approximation algorithms, online algorithms, sublinear algorithms, (quantum) computational complexity (circuits/proofs/communication), quantum cryptography/algorithms/learning. Website: careers.epfl.ch/job/Lausanne-Postdoctoral-position-in-the-Theory-Group/1163629255/ Email: theory-postdoc@epfl.ch

Theory Group at EPFL has multiple postdoc positions available. Application review starts in early January 2026 (with final deadline February 28th).

Topics: probabilistic proofs, combinatorial optimization, approximation algorithms, online algorithms, sublinear algorithms, (quantum) computational complexity (circuits/proofs/communication), quantum cryptography/algorithms/learning.

Website: https://careers.epfl.ch/job/Lausanne-Postdoctoral-position-in-the-Theory-Group/1163629255/
Email: theory-postdoc@epfl.ch

By shacharlovett

Noisy nonlinear information and entropy numbers

from arXiv: Computational Complexity

Authors: David Krieg, Erich Novak, Leszek Plaskota, Mario Ullrich

It is impossible to recover a vector from $\mathbb{R}^m$ with less than $m$ linear measurements, even if the measurements are chosen adaptively. Recently, it has been shown that one can recover vectors from $\mathbb{R}^m$ with arbitrary precision using only $O(\log m)$ continuous (even Lipschitz) adaptive measurements, resulting in an exponential speed-up of continuous information compared to linear information for various approximation problems. In this note, we characterize the quality of optimal (dis-)continuous information that is disturbed by deterministic noise in terms of entropy numbers. This shows that in the presence of noise the potential gain of continuous over linear measurements is limited, but significant in some cases.

Authors: David Krieg, Erich Novak, Leszek Plaskota, Mario Ullrich

It is impossible to recover a vector from $\mathbb{R}^m$ with less than $m$ linear measurements, even if the measurements are chosen adaptively. Recently, it has been shown that one can recover vectors from $\mathbb{R}^m$ with arbitrary precision using only $O(\log m)$ continuous (even Lipschitz) adaptive measurements, resulting in an exponential speed-up of continuous information compared to linear information for various approximation problems. In this note, we characterize the quality of optimal (dis-)continuous information that is disturbed by deterministic noise in terms of entropy numbers. This shows that in the presence of noise the potential gain of continuous over linear measurements is limited, but significant in some cases.

A Critique of Quigley's "A Polynomial Time Algorithm for 3SAT"

from arXiv: Computational Complexity

Authors: Nicholas DeJesse, Spencer Lyudovyk, Dhruv Pai

In this paper, we examine Quigley's "A Polynomial Time Algorithm for 3SAT" [Qui24]. Quigley claims to construct an algorithm that runs in polynomial time and determines whether a boolean formula in 3CNF form is satisfiable. Such a result would prove that 3SAT $\in \text{P}$ and thus $\text{P} = \text{NP}$. We show Quigley's argument is flawed by providing counterexamples to several lemmas he attempts to use to justify the correctness of his algorithm. We also provide an infinite class of 3CNF formulas that are unsatisfiable but are classified as satisfiable by Quigley's algorithm. In doing so, we prove that Quigley's algorithm fails on certain inputs, and thus his claim that $\text{P} = \text{NP}$ is not established by his paper.

Authors: Nicholas DeJesse, Spencer Lyudovyk, Dhruv Pai

In this paper, we examine Quigley's "A Polynomial Time Algorithm for 3SAT" [Qui24]. Quigley claims to construct an algorithm that runs in polynomial time and determines whether a boolean formula in 3CNF form is satisfiable. Such a result would prove that 3SAT $\in \text{P}$ and thus $\text{P} = \text{NP}$. We show Quigley's argument is flawed by providing counterexamples to several lemmas he attempts to use to justify the correctness of his algorithm. We also provide an infinite class of 3CNF formulas that are unsatisfiable but are classified as satisfiable by Quigley's algorithm. In doing so, we prove that Quigley's algorithm fails on certain inputs, and thus his claim that $\text{P} = \text{NP}$ is not established by his paper.

NP-Completeness Proofs of All or Nothing and Water Walk Using the T-Metacell Framework

from arXiv: Computational Complexity

Authors: Pakapim Eua-anant, Papangkorn Apinyanon, Thunyatorn Jirachaisri, Nantapong Ruangsuksriwong, Suthee Ruangwises

All or Nothing and Water Walk are pencil puzzles that involve constructing a continuous loop on a rectangular grid under specific constraints. In this paper, we analyze their computational complexity using the T-metacell framework developed by Tang and MIT Hardness Group. We establish that both puzzles are NP-complete by providing reductions from the problem of finding a Hamiltonian cycle in a maximum-degree-3 spanning subgraph of a rectangular grid graph.

Authors: Pakapim Eua-anant, Papangkorn Apinyanon, Thunyatorn Jirachaisri, Nantapong Ruangsuksriwong, Suthee Ruangwises

All or Nothing and Water Walk are pencil puzzles that involve constructing a continuous loop on a rectangular grid under specific constraints. In this paper, we analyze their computational complexity using the T-metacell framework developed by Tang and MIT Hardness Group. We establish that both puzzles are NP-complete by providing reductions from the problem of finding a Hamiltonian cycle in a maximum-degree-3 spanning subgraph of a rectangular grid graph.

T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning

from arXiv: Computational Geometry

Authors: Julie Mordacq, David Loiseaux, Vicky Kalogeiton, Steve Oudot

Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data, often by enforcing invariance to input transformations such as rotations or blurring. Recent studies have highlighted two pivotal properties for effective representations: (i) avoiding dimensional collapse-where the learned features occupy only a low-dimensional subspace, and (ii) enhancing uniformity of the induced distribution. In this work, we introduce T-REGS, a simple regularization framework for SSL based on the length of the Minimum Spanning Tree (MST) over the learned representation. We provide theoretical analysis demonstrating that T-REGS simultaneously mitigates dimensional collapse and promotes distribution uniformity on arbitrary compact Riemannian manifolds. Several experiments on synthetic data and on classical SSL benchmarks validate the effectiveness of our approach at enhancing representation quality.

Authors: Julie Mordacq, David Loiseaux, Vicky Kalogeiton, Steve Oudot

Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data, often by enforcing invariance to input transformations such as rotations or blurring. Recent studies have highlighted two pivotal properties for effective representations: (i) avoiding dimensional collapse-where the learned features occupy only a low-dimensional subspace, and (ii) enhancing uniformity of the induced distribution. In this work, we introduce T-REGS, a simple regularization framework for SSL based on the length of the Minimum Spanning Tree (MST) over the learned representation. We provide theoretical analysis demonstrating that T-REGS simultaneously mitigates dimensional collapse and promotes distribution uniformity on arbitrary compact Riemannian manifolds. Several experiments on synthetic data and on classical SSL benchmarks validate the effectiveness of our approach at enhancing representation quality.

Expected Length of the Euclidean Minimum Spanning Tree and 1-norms of Chromatic Persistence Diagrams in the Plane

from arXiv: Computational Geometry

Authors: Ondřej Draganov, Herbert Edelsbrunner, Sophie Rosenmeier, Morteza Saghafian

Let $c$ be the constant such that the expected length of the Euclidean minimum spanning tree of $n$ random points in the unit square is $c \sqrt{n}$ in the limit, when $n$ goes to infinity. We improve the prior best lower bound of $0.6008 \leq c$ by Avram and Bertsimas to $0.6289 \leq c$. The proof is a by-product of studying the persistent homology of randomly $2$-colored point sets. Specifically, we consider the filtration induced by the inclusions of the two mono-chromatic sublevel sets of the Euclidean distance function into the bi-chromatic sublevel set of that function. Assigning colors randomly, and with equal probability, we show that the expected $1$-norm of each chromatic persistence diagram is a constant times $\sqrt{n}$ in the limit, and we determine the constant in terms of $c$ and another constant, $c_L$, which arises for a novel type of Euclidean minimum spanning tree of $2$-colored point sets.

Authors: Ondřej Draganov, Herbert Edelsbrunner, Sophie Rosenmeier, Morteza Saghafian

Let $c$ be the constant such that the expected length of the Euclidean minimum spanning tree of $n$ random points in the unit square is $c \sqrt{n}$ in the limit, when $n$ goes to infinity. We improve the prior best lower bound of $0.6008 \leq c$ by Avram and Bertsimas to $0.6289 \leq c$. The proof is a by-product of studying the persistent homology of randomly $2$-colored point sets. Specifically, we consider the filtration induced by the inclusions of the two mono-chromatic sublevel sets of the Euclidean distance function into the bi-chromatic sublevel set of that function. Assigning colors randomly, and with equal probability, we show that the expected $1$-norm of each chromatic persistence diagram is a constant times $\sqrt{n}$ in the limit, and we determine the constant in terms of $c$ and another constant, $c_L$, which arises for a novel type of Euclidean minimum spanning tree of $2$-colored point sets.

Online Hitting Set for Axis-Aligned Squares

from arXiv: Computational Geometry

Authors: Minati De, Satyam Singh, Csaba D. Tóth

We are given a set $P$ of $n$ points in the plane, and a sequence of axis-aligned squares that arrive in an online fashion. The online hitting set problem consists of maintaining, by adding new points if necessary, a set $H\subseteq P$ that contains at least one point in each input square. We present an $O(\log n)$-competitive deterministic algorithm for this problem. The competitive ratio is the best possible, apart from constant factors. In fact, this is the first $O(\log n)$-competitive algorithm for the online hitting set problem that works for geometric objects of arbitrary sizes (i.e., arbitrary scaling factors) in the plane. We further generalize this result to positive homothets of a polygon with $k\geq 3$ vertices in the plane and provide an $O(k^2\log n)$-competitive algorithm.

Authors: Minati De, Satyam Singh, Csaba D. Tóth

We are given a set $P$ of $n$ points in the plane, and a sequence of axis-aligned squares that arrive in an online fashion. The online hitting set problem consists of maintaining, by adding new points if necessary, a set $H\subseteq P$ that contains at least one point in each input square. We present an $O(\log n)$-competitive deterministic algorithm for this problem. The competitive ratio is the best possible, apart from constant factors. In fact, this is the first $O(\log n)$-competitive algorithm for the online hitting set problem that works for geometric objects of arbitrary sizes (i.e., arbitrary scaling factors) in the plane. We further generalize this result to positive homothets of a polygon with $k\geq 3$ vertices in the plane and provide an $O(k^2\log n)$-competitive algorithm.

On the complexity of the free space of a translating square in R^3

from arXiv: Computational Geometry

Authors: Gabriel Nivasch

Consider a polyhedral robot $B$ that can translate (without rotating) amidst a finite set of non-moving polyhedral obstacles in $\mathbb R^3$. The "free space" $\mathcal F$ of $B$ is the set of all positions in which $B$ is disjoint from the interior of every obstacle. Aronov and Sharir (1997) derived an upper bound of $O(n^2\log n)$ for the combinatorial complexity of $\mathcal F$, where $n$ is the total number of vertices of the obstacles, and the complexity of $B$ is assumed constant. Halperin and Yap (1993) showed that, if $B$ is either a "flat" convex polygon or a three-dimensional box, then a tighter bound of $O(n^2\alpha(n))$ holds. Here $\alpha(n)$ is the inverse Ackermann function. In this paper we prove that if $B$ is a square (or a rectangle or a parallelogram), then the complexity of $\mathcal F$ is $O(n^2)$. We conjecture that this bound holds more generally if $B$ is any convex polygon whose edges come in parallel pairs. For such polygons $B$, the only triple contacts whose number we were not able to bound by $O(n^2)$ are those made by three mutually non-parallel edges of $B$. Similarly, for the case where $B$ is a cube (or a box or a parallelepiped), we bound by $O(n^2)$ all triple contacts except those made by three mutually non-parallel edges of $B$.

Authors: Gabriel Nivasch

Consider a polyhedral robot $B$ that can translate (without rotating) amidst a finite set of non-moving polyhedral obstacles in $\mathbb R^3$. The "free space" $\mathcal F$ of $B$ is the set of all positions in which $B$ is disjoint from the interior of every obstacle. Aronov and Sharir (1997) derived an upper bound of $O(n^2\log n)$ for the combinatorial complexity of $\mathcal F$, where $n$ is the total number of vertices of the obstacles, and the complexity of $B$ is assumed constant. Halperin and Yap (1993) showed that, if $B$ is either a "flat" convex polygon or a three-dimensional box, then a tighter bound of $O(n^2\alpha(n))$ holds. Here $\alpha(n)$ is the inverse Ackermann function. In this paper we prove that if $B$ is a square (or a rectangle or a parallelogram), then the complexity of $\mathcal F$ is $O(n^2)$. We conjecture that this bound holds more generally if $B$ is any convex polygon whose edges come in parallel pairs. For such polygons $B$, the only triple contacts whose number we were not able to bound by $O(n^2)$ are those made by three mutually non-parallel edges of $B$. Similarly, for the case where $B$ is a cube (or a box or a parallelepiped), we bound by $O(n^2)$ all triple contacts except those made by three mutually non-parallel edges of $B$.

Coresets for Clustering Under Stochastic Noise

from arXiv: Data Structures and Algorithms

Authors: Lingxiao Huang, Zhize Li, Nisheeth K. Vishnoi, Runkai Yang, Haoyu Zhao

We study the problem of constructing coresets for $(k, z)$-clustering when the input dataset is corrupted by stochastic noise drawn from a known distribution. In this setting, evaluating the quality of a coreset is inherently challenging, as the true underlying dataset is unobserved. To address this, we investigate coreset construction using surrogate error metrics that are tractable and provably related to the true clustering cost. We analyze a traditional metric from prior work and introduce a new error metric that more closely aligns with the true cost. Although our metric is defined independently of the noise distribution, it enables approximation guarantees that scale with the noise level. We design a coreset construction algorithm based on this metric and show that, under mild assumptions on the data and noise, enforcing an $\varepsilon$-bound under our metric yields smaller coresets and tighter guarantees on the true clustering cost than those obtained via classical metrics. In particular, we prove that the coreset size can improve by a factor of up to $\mathrm{poly}(k)$, where $n$ is the dataset size. Experiments on real-world datasets support our theoretical findings and demonstrate the practical advantages of our approach.

Authors: Lingxiao Huang, Zhize Li, Nisheeth K. Vishnoi, Runkai Yang, Haoyu Zhao

We study the problem of constructing coresets for $(k, z)$-clustering when the input dataset is corrupted by stochastic noise drawn from a known distribution. In this setting, evaluating the quality of a coreset is inherently challenging, as the true underlying dataset is unobserved. To address this, we investigate coreset construction using surrogate error metrics that are tractable and provably related to the true clustering cost. We analyze a traditional metric from prior work and introduce a new error metric that more closely aligns with the true cost. Although our metric is defined independently of the noise distribution, it enables approximation guarantees that scale with the noise level. We design a coreset construction algorithm based on this metric and show that, under mild assumptions on the data and noise, enforcing an $\varepsilon$-bound under our metric yields smaller coresets and tighter guarantees on the true clustering cost than those obtained via classical metrics. In particular, we prove that the coreset size can improve by a factor of up to $\mathrm{poly}(k)$, where $n$ is the dataset size. Experiments on real-world datasets support our theoretical findings and demonstrate the practical advantages of our approach.

Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation

from arXiv: Data Structures and Algorithms

Authors: Ved Danait, Srijan Das, Sujoy Bhore

Approximate Nearest Neighbor (ANN) search and Approximate Kernel Density Estimation (A-KDE) are fundamental problems at the core of modern machine learning, with broad applications in data analysis, information systems, and large-scale decision making. In massive and dynamic data streams, a central challenge is to design compact sketches that preserve essential structural properties of the data while enabling efficient queries. In this work, we develop new sketching algorithms that achieve sublinear space and query time guarantees for both ANN and A-KDE for a dynamic stream of data. For ANN in the streaming model, under natural assumptions, we design a sublinear sketch that requires only $\mathcal{O}(n^{1+\rho-\eta})$ memory by storing only a sublinear ($n^{-\eta}$) fraction of the total inputs, where $\rho$ is a parameter of the LSH family, and $0<\eta<1$. Our method supports sublinear query time, batch queries, and extends to the more general Turnstile model. While earlier works have focused on Exact NN, this is the first result on ANN that achieves near-optimal trade-offs between memory size and approximation error. Next, for A-KDE in the Sliding-Window model, we propose a sketch of size $\mathcal{O}\left(RW \cdot \frac{1}{\sqrt{1+\epsilon} - 1} \log^2 N\right)$, where $R$ is the number of sketch rows, $W$ is the LSH range, $N$ is the window size, and $\epsilon$ is the approximation error. This, to the best of our knowledge, is the first theoretical sublinear sketch guarantee for A-KDE in the Sliding-Window model. We complement our theoretical results with experiments on various real-world datasets, which show that the proposed sketches are lightweight and achieve consistently low error in practice.

Authors: Ved Danait, Srijan Das, Sujoy Bhore

Approximate Nearest Neighbor (ANN) search and Approximate Kernel Density Estimation (A-KDE) are fundamental problems at the core of modern machine learning, with broad applications in data analysis, information systems, and large-scale decision making. In massive and dynamic data streams, a central challenge is to design compact sketches that preserve essential structural properties of the data while enabling efficient queries. In this work, we develop new sketching algorithms that achieve sublinear space and query time guarantees for both ANN and A-KDE for a dynamic stream of data. For ANN in the streaming model, under natural assumptions, we design a sublinear sketch that requires only $\mathcal{O}(n^{1+\rho-\eta})$ memory by storing only a sublinear ($n^{-\eta}$) fraction of the total inputs, where $\rho$ is a parameter of the LSH family, and $0<\eta<1$. Our method supports sublinear query time, batch queries, and extends to the more general Turnstile model. While earlier works have focused on Exact NN, this is the first result on ANN that achieves near-optimal trade-offs between memory size and approximation error. Next, for A-KDE in the Sliding-Window model, we propose a sketch of size $\mathcal{O}\left(RW \cdot \frac{1}{\sqrt{1+\epsilon} - 1} \log^2 N\right)$, where $R$ is the number of sketch rows, $W$ is the LSH range, $N$ is the window size, and $\epsilon$ is the approximation error. This, to the best of our knowledge, is the first theoretical sublinear sketch guarantee for A-KDE in the Sliding-Window model. We complement our theoretical results with experiments on various real-world datasets, which show that the proposed sketches are lightweight and achieve consistently low error in practice.

Multi-Way Co-Ranking: Index-Space Partitioning of Sorted Sequences Without Merge

from arXiv: Data Structures and Algorithms

Authors: Amit Joshi

We present a merge-free algorithm for multi-way co-ranking, the problem of computing cut indices $i_1,\dots,i_m$ that partition each of the $m$ sorted sequences such that all prefix segments together contain exactly $K$ elements. Our method extends two-list co-ranking to arbitrary $m$, maintaining per-sequence bounds that converge to a consistent global frontier without performing any multi-way merge or value-space search. Rather, we apply binary search to \emph{index-space}. The algorithm runs in $O(\log(\sum_t n_t)\,\log m)$ time and $O(m)$ space, independent of $K$. We prove correctness via an exchange argument and discuss applications to distributed fractional knapsack, parallel merge partitioning, and multi-stream joins. Keywords: Co-ranking \sep partitioning \sep Merge-free algorithms \sep Index-space optimization \sep Selection and merging \sep Data structures

Authors: Amit Joshi

We present a merge-free algorithm for multi-way co-ranking, the problem of computing cut indices $i_1,\dots,i_m$ that partition each of the $m$ sorted sequences such that all prefix segments together contain exactly $K$ elements. Our method extends two-list co-ranking to arbitrary $m$, maintaining per-sequence bounds that converge to a consistent global frontier without performing any multi-way merge or value-space search. Rather, we apply binary search to \emph{index-space}. The algorithm runs in $O(\log(\sum_t n_t)\,\log m)$ time and $O(m)$ space, independent of $K$. We prove correctness via an exchange argument and discuss applications to distributed fractional knapsack, parallel merge partitioning, and multi-stream joins. Keywords: Co-ranking \sep partitioning \sep Merge-free algorithms \sep Index-space optimization \sep Selection and merging \sep Data structures

Testing forbidden order-pattern properties on hypergrids

from arXiv: Data Structures and Algorithms

Authors: Harish Chandramouleeswaran, Ilan Newman, Tomer Pelleg, Nithin Varma

We study testing $\pi$-freeness of functions $f:[n]^d\to\mathbb{R}$, where $f$ is $\pi$-free if there there are no $k$ indices $x_1\prec\cdots\prec x_k\in [n]^d$ such that $f(x_i)2$. We initiate a systematic study of pattern freeness on higher-dimensional grids. For $d=2$ and all permutations of size $k=3$, we design an adaptive one-sided tester with query complexity $O(n^{4/5+o(1)})$. We also prove general lower bounds for $k=3$: every nonadaptive tester requires $\Omega(n)$ queries, and every adaptive tester requires $\Omega(\sqrt{n})$ queries, yielding the first super-logarithmic lower bounds for $\pi$-freeness. For the monotone patterns $\pi=(1,2,3)$ and $(3,2,1)$, we present a nonadaptive tester with polylogarithmic query complexity, giving an exponential separation between monotone and nonmonotone patterns (unlike the one-dimensional case). A key ingredient in our $\pi$-freeness testers is new erasure-resilient ($\delta$-ER) $\epsilon$-testers for monotonicity over $[n]^d$ with query complexity $O(\log^{O(d)}n/(\epsilon(1-\delta)))$, where $0<\delta<1$ is an upper bound on the fraction of erasures. Prior ER testers worked only for $\delta=O(\epsilon/d)$. Our nonadaptive monotonicity tester is nearly optimal via a matching lower bound due to Pallavoor, Raskhodnikova, and Waingarten (Random Struct. Algorithms, 2022). Finally, we show that current techniques cannot yield sublinear-query testers for patterns of length $4$ even on two-dimensional hypergrids.

Authors: Harish Chandramouleeswaran, Ilan Newman, Tomer Pelleg, Nithin Varma

We study testing $\pi$-freeness of functions $f:[n]^d\to\mathbb{R}$, where $f$ is $\pi$-free if there there are no $k$ indices $x_1\prec\cdots\prec x_k\in [n]^d$ such that $f(x_i)2$. We initiate a systematic study of pattern freeness on higher-dimensional grids. For $d=2$ and all permutations of size $k=3$, we design an adaptive one-sided tester with query complexity $O(n^{4/5+o(1)})$. We also prove general lower bounds for $k=3$: every nonadaptive tester requires $\Omega(n)$ queries, and every adaptive tester requires $\Omega(\sqrt{n})$ queries, yielding the first super-logarithmic lower bounds for $\pi$-freeness. For the monotone patterns $\pi=(1,2,3)$ and $(3,2,1)$, we present a nonadaptive tester with polylogarithmic query complexity, giving an exponential separation between monotone and nonmonotone patterns (unlike the one-dimensional case). A key ingredient in our $\pi$-freeness testers is new erasure-resilient ($\delta$-ER) $\epsilon$-testers for monotonicity over $[n]^d$ with query complexity $O(\log^{O(d)}n/(\epsilon(1-\delta)))$, where $0<\delta<1$ is an upper bound on the fraction of erasures. Prior ER testers worked only for $\delta=O(\epsilon/d)$. Our nonadaptive monotonicity tester is nearly optimal via a matching lower bound due to Pallavoor, Raskhodnikova, and Waingarten (Random Struct. Algorithms, 2022). Finally, we show that current techniques cannot yield sublinear-query testers for patterns of length $4$ even on two-dimensional hypergrids.

Hierarchical Exponential Search Via K-Spines

from arXiv: Data Structures and Algorithms

Authors: Bob Dong

We introduce the concept of a k-spine of a tree. A k-spine is essentially a path in the tree whose removal leaves only "less-bushy" components of a smaller pathwidth. Using a k-spine as a central guide, we introduce an O(klog dist) exponential search algorithm on a tree by searching mainly along the spine to narrow down the target's vicinity and then recursively handling the smaller components.

Authors: Bob Dong

We introduce the concept of a k-spine of a tree. A k-spine is essentially a path in the tree whose removal leaves only "less-bushy" components of a smaller pathwidth. Using a k-spine as a central guide, we introduce an O(klog dist) exponential search algorithm on a tree by searching mainly along the spine to narrow down the target's vicinity and then recursively handling the smaller components.

$L_p$ Sampling in Distributed Data Streams with Applications to Adversarial Robustness

from arXiv: Data Structures and Algorithms

Authors: Honghao Lin, Zhao Song, David P. Woodruff, Shenghao Xie, Samson Zhou

In the distributed monitoring model, a data stream over a universe of size $n$ is distributed over $k$ servers, who must continuously provide certain statistics of the overall dataset, while minimizing communication with a central coordinator. In such settings, the ability to efficiently collect a random sample from the global stream is a powerful primitive, enabling a wide array of downstream tasks such as estimating frequency moments, detecting heavy hitters, or performing sparse recovery. Of particular interest is the task of producing a perfect $L_p$ sample, which given a frequency vector $f \in \mathbb{R}^n$, outputs an index $i$ with probability $\frac{f_i^p}{\|f\|_p^p}+\frac{1}{\mathrm{poly}(n)}$. In this paper, we resolve the problem of perfect $L_p$ sampling for all $p\ge 1$ in the distributed monitoring model. Specifically, our algorithm runs in $k^{p-1} \cdot \mathrm{polylog}(n)$ bits of communication, which is optimal up to polylogarithmic factors. Utilizing our perfect $L_p$ sampler, we achieve adversarially-robust distributed monitoring protocols for the $F_p$ moment estimation problem, where the goal is to provide a $(1+\varepsilon)$-approximation to $f_1^p+\ldots+f_n^p$. Our algorithm uses $\frac{k^{p-1}}{\varepsilon^2}\cdot\mathrm{polylog}(n)$ bits of communication for all $p\ge 2$ and achieves optimal bounds up to polylogarithmic factors, matching lower bounds by Woodruff and Zhang (STOC 2012) in the non-robust setting. Finally, we apply our framework to achieve near-optimal adversarially robust distributed protocols for central problems such as counting, frequency estimation, heavy-hitters, and distinct element estimation.

Authors: Honghao Lin, Zhao Song, David P. Woodruff, Shenghao Xie, Samson Zhou

In the distributed monitoring model, a data stream over a universe of size $n$ is distributed over $k$ servers, who must continuously provide certain statistics of the overall dataset, while minimizing communication with a central coordinator. In such settings, the ability to efficiently collect a random sample from the global stream is a powerful primitive, enabling a wide array of downstream tasks such as estimating frequency moments, detecting heavy hitters, or performing sparse recovery. Of particular interest is the task of producing a perfect $L_p$ sample, which given a frequency vector $f \in \mathbb{R}^n$, outputs an index $i$ with probability $\frac{f_i^p}{\|f\|_p^p}+\frac{1}{\mathrm{poly}(n)}$. In this paper, we resolve the problem of perfect $L_p$ sampling for all $p\ge 1$ in the distributed monitoring model. Specifically, our algorithm runs in $k^{p-1} \cdot \mathrm{polylog}(n)$ bits of communication, which is optimal up to polylogarithmic factors. Utilizing our perfect $L_p$ sampler, we achieve adversarially-robust distributed monitoring protocols for the $F_p$ moment estimation problem, where the goal is to provide a $(1+\varepsilon)$-approximation to $f_1^p+\ldots+f_n^p$. Our algorithm uses $\frac{k^{p-1}}{\varepsilon^2}\cdot\mathrm{polylog}(n)$ bits of communication for all $p\ge 2$ and achieves optimal bounds up to polylogarithmic factors, matching lower bounds by Woodruff and Zhang (STOC 2012) in the non-robust setting. Finally, we apply our framework to achieve near-optimal adversarially robust distributed protocols for central problems such as counting, frequency estimation, heavy-hitters, and distinct element estimation.

Faster Negative-Weight Shortest Paths and Directed Low-Diameter Decompositions

from arXiv: Data Structures and Algorithms

Authors: Jason Li, Connor Mowry, Satish Rao

We present a faster algorithm for low-diameter decompositions on directed graphs, matching the $O(\log n\log\log n)$ loss factor from Bringmann, Fischer, Haeupler, and Latypov (ICALP 2025) and improving the running time to $O((m+n\log\log n)\log n\log\log n)$ in expectation. We then apply our faster low-diameter decomposition to obtain an algorithm for negative-weight single source shortest paths on integer-weighted graphs in $O((m+n\log\log n)\log(nW)\log n\log\log n)$ time, a nearly log-factor improvement over the algorithm of Bringmann, Cassis, and Fischer (FOCS 2023).

Authors: Jason Li, Connor Mowry, Satish Rao

We present a faster algorithm for low-diameter decompositions on directed graphs, matching the $O(\log n\log\log n)$ loss factor from Bringmann, Fischer, Haeupler, and Latypov (ICALP 2025) and improving the running time to $O((m+n\log\log n)\log n\log\log n)$ in expectation. We then apply our faster low-diameter decomposition to obtain an algorithm for negative-weight single source shortest paths on integer-weighted graphs in $O((m+n\log\log n)\log(nW)\log n\log\log n)$ time, a nearly log-factor improvement over the algorithm of Bringmann, Cassis, and Fischer (FOCS 2023).

Generating pivot Gray codes for spanning trees of complete graphs in constant amortized time

from arXiv: Data Structures and Algorithms

Authors: Bowie Liu, Dennis Wong, Chan-Tong Lam, Sio-Kei Im

We present the first known pivot Gray code for spanning trees of complete graphs, listing all spanning trees such that consecutive trees differ by pivoting a single edge around a vertex. This pivot Gray code thus addresses an open problem posed by Knuth in The Art of Computer Programming, Volume 4 (Exercise 101, Section 7.2.1.6, [Knuth, 2011]), rated at a difficulty level of 46 out of 50, and imposes stricter conditions than existing revolving-door or edge-exchange Gray codes for spanning trees of complete graphs. Our recursive algorithm generates each spanning tree in constant amortized time using $O(n^2)$ space. In addition, we provide a novel proof of Cayley's formula, $n^{n-2}$, for the number of spanning trees in a complete graph, derived from our recursive approach. We extend the algorithm to generate edge-exchange Gray codes for general graphs with $n$ vertices, achieving $O(n^2)$ time per tree using $O(n^2)$ space. For specific graph classes, the algorithm can be optimized to generate edge-exchange Gray codes for spanning trees in constant amortized time per tree for complete bipartite graphs, $O(n)$-amortized time per tree for fan graphs, and $O(n)$-amortized time per tree for wheel graphs, all using $O(n^2)$ space.

Authors: Bowie Liu, Dennis Wong, Chan-Tong Lam, Sio-Kei Im

We present the first known pivot Gray code for spanning trees of complete graphs, listing all spanning trees such that consecutive trees differ by pivoting a single edge around a vertex. This pivot Gray code thus addresses an open problem posed by Knuth in The Art of Computer Programming, Volume 4 (Exercise 101, Section 7.2.1.6, [Knuth, 2011]), rated at a difficulty level of 46 out of 50, and imposes stricter conditions than existing revolving-door or edge-exchange Gray codes for spanning trees of complete graphs. Our recursive algorithm generates each spanning tree in constant amortized time using $O(n^2)$ space. In addition, we provide a novel proof of Cayley's formula, $n^{n-2}$, for the number of spanning trees in a complete graph, derived from our recursive approach. We extend the algorithm to generate edge-exchange Gray codes for general graphs with $n$ vertices, achieving $O(n^2)$ time per tree using $O(n^2)$ space. For specific graph classes, the algorithm can be optimized to generate edge-exchange Gray codes for spanning trees in constant amortized time per tree for complete bipartite graphs, $O(n)$-amortized time per tree for fan graphs, and $O(n)$-amortized time per tree for wheel graphs, all using $O(n^2)$ space.

Tree Embedding in High Dimensions: Dynamic and Massively Parallel

from arXiv: Data Structures and Algorithms

Authors: Gramoz Goranci, Shaofeng H. -C. Jiang, Peter Kiss, Qihao Kong, Yi Qian, Eva Szilagyi

Tree embedding has been a fundamental method in algorithm design with wide applications. We focus on the efficiency of building tree embedding in various computational settings under high-dimensional Euclidean $\mathbb{R}^d$. We devise a new tree embedding construction framework that operates on an arbitrary metric decomposition with bounded diameter, offering a tradeoff between distortion and the locality of its algorithmic steps. This framework works for general metric spaces and may be of independent interest beyond the Euclidean setting. Using this framework, we obtain a dynamic algorithm that maintains an $O_\epsilon(\log n)$-distortion tree embedding with update time $\tilde O(n^\epsilon + d)$ subject to point insertions/deletions, and a massively parallel algorithm that achieves $O_\epsilon(\log n)$-distortion in $O(1)$ rounds and total space $\tilde O(n^{1 + \epsilon})$ (for constant $\epsilon \in (0, 1)$). These new tree embedding results allow for a wide range of applications. Notably, under a similar performance guarantee as in our tree embedding algorithms, i.e., $\tilde O(n^\epsilon + d)$ update time and $O(1)$ rounds, we obtain $O_\epsilon(\log n)$-approximate dynamic and MPC algorithms for $k$-median and earth-mover distance in $\mathbb{R}^d$.

Authors: Gramoz Goranci, Shaofeng H. -C. Jiang, Peter Kiss, Qihao Kong, Yi Qian, Eva Szilagyi

Tree embedding has been a fundamental method in algorithm design with wide applications. We focus on the efficiency of building tree embedding in various computational settings under high-dimensional Euclidean $\mathbb{R}^d$. We devise a new tree embedding construction framework that operates on an arbitrary metric decomposition with bounded diameter, offering a tradeoff between distortion and the locality of its algorithmic steps. This framework works for general metric spaces and may be of independent interest beyond the Euclidean setting. Using this framework, we obtain a dynamic algorithm that maintains an $O_\epsilon(\log n)$-distortion tree embedding with update time $\tilde O(n^\epsilon + d)$ subject to point insertions/deletions, and a massively parallel algorithm that achieves $O_\epsilon(\log n)$-distortion in $O(1)$ rounds and total space $\tilde O(n^{1 + \epsilon})$ (for constant $\epsilon \in (0, 1)$). These new tree embedding results allow for a wide range of applications. Notably, under a similar performance guarantee as in our tree embedding algorithms, i.e., $\tilde O(n^\epsilon + d)$ update time and $O(1)$ rounds, we obtain $O_\epsilon(\log n)$-approximate dynamic and MPC algorithms for $k$-median and earth-mover distance in $\mathbb{R}^d$.

On Integer Programs That Look Like Paths

from arXiv: Data Structures and Algorithms

Authors: Marcin Briański, Alexandra Lassota, Kristýna Pekárková, Michał Pilipczuk, Janina Reuter

Solving integer programs of the form $\min \{\mathbf{x} \mid A\mathbf{x} = \mathbf{b}, \mathbf{l} \leq \mathbf{x} \leq \mathbf{u}, \mathbf{x} \in \mathbb{Z}^n \}$ is, in general, $\mathsf{NP}$-hard. Hence, great effort has been put into identifying subclasses of integer programs that are solvable in polynomial or $\mathsf{FPT}$ time. A common scheme for many of these integer programs is a star-like structure of the constraint matrix. The arguably simplest form that is not a star is a path. We study integer programs where the constraint matrix $A$ has such a path-like structure: every non-zero coefficient appears in at most two consecutive constraints. We prove that even if all coefficients of $A$ are bounded by 8, deciding the feasibility of such integer programs is $\mathsf{NP}$-hard via a reduction from 3-SAT. Given the existence of efficient algorithms for integer programs with star-like structures and a closely related pattern where the sum of absolute values is column-wise bounded by 2 (hence, there are at most two non-zero entries per column of size at most 2), this hardness result is surprising.

Authors: Marcin Briański, Alexandra Lassota, Kristýna Pekárková, Michał Pilipczuk, Janina Reuter

Solving integer programs of the form $\min \{\mathbf{x} \mid A\mathbf{x} = \mathbf{b}, \mathbf{l} \leq \mathbf{x} \leq \mathbf{u}, \mathbf{x} \in \mathbb{Z}^n \}$ is, in general, $\mathsf{NP}$-hard. Hence, great effort has been put into identifying subclasses of integer programs that are solvable in polynomial or $\mathsf{FPT}$ time. A common scheme for many of these integer programs is a star-like structure of the constraint matrix. The arguably simplest form that is not a star is a path. We study integer programs where the constraint matrix $A$ has such a path-like structure: every non-zero coefficient appears in at most two consecutive constraints. We prove that even if all coefficients of $A$ are bounded by 8, deciding the feasibility of such integer programs is $\mathsf{NP}$-hard via a reduction from 3-SAT. Given the existence of efficient algorithms for integer programs with star-like structures and a closely related pattern where the sum of absolute values is column-wise bounded by 2 (hence, there are at most two non-zero entries per column of size at most 2), this hardness result is surprising.

Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry

from arXiv: Data Structures and Algorithms

Authors: Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin

The Johnson-Lindenstrauss (JL) lemma is a cornerstone of dimensionality reduction in Euclidean space, but its applicability to non-Euclidean data has remained limited. This paper extends the JL lemma beyond Euclidean geometry to handle general dissimilarity matrices that are prevalent in real-world applications. We present two complementary approaches: First, we show the JL transform can be applied to vectors in pseudo-Euclidean space with signature $(p,q)$, providing theoretical guarantees that depend on the ratio of the $(p, q)$ norm and Euclidean norm of two vectors, measuring the deviation from Euclidean geometry. Second, we prove that any symmetric hollow dissimilarity matrix can be represented as a matrix of generalized power distances, with an additional parameter representing the uncertainty level within the data. In this representation, applying the JL transform yields multiplicative approximation with a controlled additive error term proportional to the deviation from Euclidean geometry. Our theoretical results provide fine-grained performance analysis based on the degree to which the input data deviates from Euclidean geometry, making practical and meaningful reduction in dimensionality accessible to a wider class of data. We validate our approaches on both synthetic and real-world datasets, demonstrating the effectiveness of extending the JL lemma to non-Euclidean settings.

Authors: Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin

The Johnson-Lindenstrauss (JL) lemma is a cornerstone of dimensionality reduction in Euclidean space, but its applicability to non-Euclidean data has remained limited. This paper extends the JL lemma beyond Euclidean geometry to handle general dissimilarity matrices that are prevalent in real-world applications. We present two complementary approaches: First, we show the JL transform can be applied to vectors in pseudo-Euclidean space with signature $(p,q)$, providing theoretical guarantees that depend on the ratio of the $(p, q)$ norm and Euclidean norm of two vectors, measuring the deviation from Euclidean geometry. Second, we prove that any symmetric hollow dissimilarity matrix can be represented as a matrix of generalized power distances, with an additional parameter representing the uncertainty level within the data. In this representation, applying the JL transform yields multiplicative approximation with a controlled additive error term proportional to the deviation from Euclidean geometry. Our theoretical results provide fine-grained performance analysis based on the degree to which the input data deviates from Euclidean geometry, making practical and meaningful reduction in dimensionality accessible to a wider class of data. We validate our approaches on both synthetic and real-world datasets, demonstrating the effectiveness of extending the JL lemma to non-Euclidean settings.

(Approximate) Matrix Multiplication via Convolutions

from arXiv: Data Structures and Algorithms

Authors: Kevin Pratt, Yahel Uffenheimer, Omri Weinstein

A longstanding open question in algorithm design is whether "combinatorial" matrix multiplication algorithms -- avoiding Strassen-like divide-and-conquer -- can achieve truly subcubic runtime $n^{3-\delta}$. We present an $O(n^{2.89})$-time exact algorithm, which only sums convolutions in $\mathbb{Z}_m^k$ (multivariate polynomial multiplications) via FFT, building on the work of Cohn, Kleinberg, Szegedy and Umans (CKSU'05). While the algorithm avoids recursion, the asymptotic speedup arises only for impractically large matrices. Motivated by practical applications, we use this baseline to develop a new framework for fast approximate matrix multiplication (AMM), via low-degree approximations of the CKSU polynomials. We show that combining the aforementioned algorithm with black-box linear sketching already breaks the longstanding linear speed-accuracy tradeoff for AMM (Sarlos'06, Clarkson-Woodruff'13 ,Pagh'11, Cohn-Lewis'00), achieving $\frac{1}{r^{1.1}}\|\mathbf{A}\|_F^2\|\mathbf{B}\|_F^2$ error in $O(rn^2)$-time. Our main result is a low-degree approximation scheme for the CKSU polynomials, based on a Fourier-concentration lemma, yielding substantially smaller error in the distributional setting where $\mathbf{A},\mathbf{B}$ come from an i.i.d product-distribution; For random Gaussian matrices, this practical AMM algorithm attains smaller error than the best rank-$r$ SVD of the output matrix $\mathbf{A}\mathbf{B}$, in time $O(rn^2)$. This is a substantial improvement over iterative Krylov subspace methods for low-rank approximation. Our theoretical and empirical results suggest the possibility of replacing MatMuls with sums of convolutions in LLM training and inference.

Authors: Kevin Pratt, Yahel Uffenheimer, Omri Weinstein

A longstanding open question in algorithm design is whether "combinatorial" matrix multiplication algorithms -- avoiding Strassen-like divide-and-conquer -- can achieve truly subcubic runtime $n^{3-\delta}$. We present an $O(n^{2.89})$-time exact algorithm, which only sums convolutions in $\mathbb{Z}_m^k$ (multivariate polynomial multiplications) via FFT, building on the work of Cohn, Kleinberg, Szegedy and Umans (CKSU'05). While the algorithm avoids recursion, the asymptotic speedup arises only for impractically large matrices. Motivated by practical applications, we use this baseline to develop a new framework for fast approximate matrix multiplication (AMM), via low-degree approximations of the CKSU polynomials. We show that combining the aforementioned algorithm with black-box linear sketching already breaks the longstanding linear speed-accuracy tradeoff for AMM (Sarlos'06, Clarkson-Woodruff'13 ,Pagh'11, Cohn-Lewis'00), achieving $\frac{1}{r^{1.1}}\|\mathbf{A}\|_F^2\|\mathbf{B}\|_F^2$ error in $O(rn^2)$-time. Our main result is a low-degree approximation scheme for the CKSU polynomials, based on a Fourier-concentration lemma, yielding substantially smaller error in the distributional setting where $\mathbf{A},\mathbf{B}$ come from an i.i.d product-distribution; For random Gaussian matrices, this practical AMM algorithm attains smaller error than the best rank-$r$ SVD of the output matrix $\mathbf{A}\mathbf{B}$, in time $O(rn^2)$. This is a substantial improvement over iterative Krylov subspace methods for low-rank approximation. Our theoretical and empirical results suggest the possibility of replacing MatMuls with sums of convolutions in LLM training and inference.

Quasi-Self-Concordant Optimization with Lewis Weights

from arXiv: Data Structures and Algorithms

Authors: Alina Ene, Ta Duy Nguyen, Adrian Vladu

In this paper, we study the problem $\min_{x\in \mathbb{R}^{d},Nx=v}\sum_{i=1}^{n}f((Ax-b)_{i})$ for a quasi-self-concordant function $f:\mathbb{R}\to\mathbb{R}$, where $A,N$ are $n\times d$ and $m\times d$ matrices, $b,v$ are vectors of length $n$ and $m$ with $n\ge d.$ We show an algorithm based on a trust-region method with an oracle that can be implemented using $\widetilde{O}(d^{1/3})$ linear system solves, improving the $\widetilde{O}(n^{1/3})$ oracle by {[}Adil-Bullins-Sachdeva, NeurIPS 2021{]}. Our implementation of the oracle relies on solving the overdetermined $\ell_{\infty}$-regression problem $\min_{x\in\mathbb{R}^{d},Nx=v}\|Ax-b\|_{\infty}$. We provide an algorithm that finds a $(1+\epsilon)$-approximate solution to this problem using $O((d^{1/3}/\epsilon+1/\epsilon^{2})\log(n/\epsilon))$ linear system solves. This algorithm leverages $\ell_{\infty}$ Lewis weight overestimates and achieves this iteration complexity via a simple lightweight IRLS approach, inspired by the work of {[}Ene-Vladu, ICML 2019{]}. Experimentally, we demonstrate that our algorithm significantly improves the runtime of the standard CVX solver.

Authors: Alina Ene, Ta Duy Nguyen, Adrian Vladu

In this paper, we study the problem $\min_{x\in \mathbb{R}^{d},Nx=v}\sum_{i=1}^{n}f((Ax-b)_{i})$ for a quasi-self-concordant function $f:\mathbb{R}\to\mathbb{R}$, where $A,N$ are $n\times d$ and $m\times d$ matrices, $b,v$ are vectors of length $n$ and $m$ with $n\ge d.$ We show an algorithm based on a trust-region method with an oracle that can be implemented using $\widetilde{O}(d^{1/3})$ linear system solves, improving the $\widetilde{O}(n^{1/3})$ oracle by {[}Adil-Bullins-Sachdeva, NeurIPS 2021{]}. Our implementation of the oracle relies on solving the overdetermined $\ell_{\infty}$-regression problem $\min_{x\in\mathbb{R}^{d},Nx=v}\|Ax-b\|_{\infty}$. We provide an algorithm that finds a $(1+\epsilon)$-approximate solution to this problem using $O((d^{1/3}/\epsilon+1/\epsilon^{2})\log(n/\epsilon))$ linear system solves. This algorithm leverages $\ell_{\infty}$ Lewis weight overestimates and achieves this iteration complexity via a simple lightweight IRLS approach, inspired by the work of {[}Ene-Vladu, ICML 2019{]}. Experimentally, we demonstrate that our algorithm significantly improves the runtime of the standard CVX solver.

An Optimal Density Bound for Discretized Point Patrolling

from arXiv: Data Structures and Algorithms

Authors: Ahan Mishra

The pinwheel problem is a real-time scheduling problem that asks, given $n$ tasks with periods $a_i \in \mathbb{N}$, whether it is possible to infinitely schedule the tasks, one per time unit, such that every task $i$ is scheduled in every interval of $a_i$ units. We study a corresponding version of this packing problem in the covering setting, stylized as the discretized point patrolling problem in the literature. Specifically, given $n$ tasks with periods $a_i$, the problem asks whether it is possible to assign each day to a task such that every task $i$ is scheduled at \textit{most} once every $a_i$ days. The density of an instance in either case is defined as the sum of the inverses of task periods. Recently, the long-standing $5/6$ density bound conjecture in the packing setting was resolved affirmatively. The resolution means any instance with density at least $5/6$ is schedulable. A corresponding conjecture was made in the covering setting and renewed multiple times in more recent work. We resolve this conjecture affirmatively by proving that every discretized point patrolling instance with density at least $\sum_{i = 0}^{\infty} 1/(2^i + 1) \approx 1.264$ is schedulable. This significantly improves upon the current best-known density bound of 1.546 and is, in fact, optimal. We also study the bamboo garden trimming problem, an optimization variant of the pinwheel problem. Specifically, given $n$ growth rates with values $h_i \in \mathbb{N}$, the objective is to minimize the maximum height of a bamboo garden with the corresponding growth rates, where we are allowed to trim one bamboo tree to height zero per time step. We achieve an efficient $9/7$-approximation algorithm for this problem, improving on the current best known approximation factor of $4/3$.

Authors: Ahan Mishra

The pinwheel problem is a real-time scheduling problem that asks, given $n$ tasks with periods $a_i \in \mathbb{N}$, whether it is possible to infinitely schedule the tasks, one per time unit, such that every task $i$ is scheduled in every interval of $a_i$ units. We study a corresponding version of this packing problem in the covering setting, stylized as the discretized point patrolling problem in the literature. Specifically, given $n$ tasks with periods $a_i$, the problem asks whether it is possible to assign each day to a task such that every task $i$ is scheduled at \textit{most} once every $a_i$ days. The density of an instance in either case is defined as the sum of the inverses of task periods. Recently, the long-standing $5/6$ density bound conjecture in the packing setting was resolved affirmatively. The resolution means any instance with density at least $5/6$ is schedulable. A corresponding conjecture was made in the covering setting and renewed multiple times in more recent work. We resolve this conjecture affirmatively by proving that every discretized point patrolling instance with density at least $\sum_{i = 0}^{\infty} 1/(2^i + 1) \approx 1.264$ is schedulable. This significantly improves upon the current best-known density bound of 1.546 and is, in fact, optimal. We also study the bamboo garden trimming problem, an optimization variant of the pinwheel problem. Specifically, given $n$ growth rates with values $h_i \in \mathbb{N}$, the objective is to minimize the maximum height of a bamboo garden with the corresponding growth rates, where we are allowed to trim one bamboo tree to height zero per time step. We achieve an efficient $9/7$-approximation algorithm for this problem, improving on the current best known approximation factor of $4/3$.

Generalized Top-k Mallows Model for Ranked Choices

from arXiv: Data Structures and Algorithms

Authors: Shahrzad Haddadan, Sara Ahmadian

The classic Mallows model is a foundational tool for modeling user preferences. However, it has limitations in capturing real-world scenarios, where users often focus only on a limited set of preferred items and are indifferent to the rest. To address this, extensions such as the top-k Mallows model have been proposed, aligning better with practical applications. In this paper, we address several challenges related to the generalized top-k Mallows model, with a focus on analyzing buyer choices. Our key contributions are: (1) a novel sampling scheme tailored to generalized top-k Mallows models, (2) an efficient algorithm for computing choice probabilities under this model, and (3) an active learning algorithm for estimating the model parameters from observed choice data. These contributions provide new tools for analysis and prediction in critical decision-making scenarios. We present a rigorous mathematical analysis for the performance of our algorithms. Furthermore, through extensive experiments on synthetic data and real-world data, we demonstrate the scalability and accuracy of our proposed methods, and we compare the predictive power of Mallows model for top-k lists compared to the simpler Multinomial Logit model.

Authors: Shahrzad Haddadan, Sara Ahmadian

The classic Mallows model is a foundational tool for modeling user preferences. However, it has limitations in capturing real-world scenarios, where users often focus only on a limited set of preferred items and are indifferent to the rest. To address this, extensions such as the top-k Mallows model have been proposed, aligning better with practical applications. In this paper, we address several challenges related to the generalized top-k Mallows model, with a focus on analyzing buyer choices. Our key contributions are: (1) a novel sampling scheme tailored to generalized top-k Mallows models, (2) an efficient algorithm for computing choice probabilities under this model, and (3) an active learning algorithm for estimating the model parameters from observed choice data. These contributions provide new tools for analysis and prediction in critical decision-making scenarios. We present a rigorous mathematical analysis for the performance of our algorithms. Furthermore, through extensive experiments on synthetic data and real-world data, we demonstrate the scalability and accuracy of our proposed methods, and we compare the predictive power of Mallows model for top-k lists compared to the simpler Multinomial Logit model.

Monday, October 27

The fine art of crate digging

from Ben Recht

Alexeev and Mixon's resolution of Erdős problem 707 and the vastness of the library.

In the comments of Friday’s post, Will P sent me a delightful paper by Boris Alexeev and Dustin Mixon. They resolve a 50-year-old open Erdős problem by finding the solution in an 80-year-old paper. If you’re at all mathematically inclined, Alexeev and Mixon’s paper is super fun to read. The mathematics is accessible, the bemused tone adds suspense, and the long section about vibe-coding in Lean using ChatGPT is hilarious. However, I want to emphasize from the get-go that the AI is a complete sideshow here. Alexeev and Mixon tell a fascinating story about how even experts are incapable of fathoming their own fields.

As I wrote last time, Paul Erdős posed thousands of challenging math problems. He was fond of some more than others, and often offered prize bounties for solutions he most desired. According to Thomas Bloom’s Erdős problems database, 11 of those bounties were for a thousand dollars or more. Problem 707, resolved in Alexeev and Mixon’s paper, is one of these.

The fun part of number theory is that you can often explain its open problems to anyone. Problem 707 asks whether every finite Sidon set can be extended to a finite perfect difference set. A set is a Sidon set if all of its pairwise differences are distinct. An example is {1, 2, 4, 8, 13}. The pairwise differences are {1, 2, 3, 4, 5, 6, 7, 9, 11, 12}. We’re missing 8 and 10. Erdős wondered if you could add integers to this set to get a complete list of differences that doesn’t skip any numbers, at least in modular arithmetic. It’s an esoteric question, but not that much weirder than Fermat’s Last Theorem, and not much harder to state. The other fun part of number theory is that you never know when something simple to state is completely impossible to prove.

Problem 707 puzzled mathematicians for half a century. Or so it seemed. When Alexeev and Mixon were working on this problem, they tried applying some ideas from projective planes, a powerful tool in combinatorics developed by mathematician Marshall Hall. In a 1947 paper by Hall, they found the following sentence:

“From this theorem it immediately follows that there are many sets of integers satisfying the conditions of [Erdos’ problem] which cannot be extended to any finite [perfect] difference set. For example the set −8, −6, 0, 1, 4 may not be so extended.”

Hall had derived a counterexample of Erdős’ conjecture thirty years before Erdős had conjectured it.

Alexeev and Mixon write:

“Clearly, it appears that Erdős was not aware of this result. The authors of this paper were also unaware of this result, even though they performed a reasonably deep literature search prior to starting this project. (In fact, no large language model could find Hall’s result, even with substantial prompting that the result indeed exists.) Instead, the paper was discovered by accident when searching for support for Conjecture 14.”

Now, the more I look into it, the more amazed I am that this Erdős problem remained open. First, I slightly disagree with Alexeev and Mixon that Hall just casually wrote a throwaway sentence about perfect difference extensions. In the previous section of Hall’s paper, he proves that every Sidon set has an infinite extension. He then devotes a long paragraph to the question of whether finite extensions exist. Here are the first and last sentences of that paragraph:

“As an example of the application of the theorem, consider the set -8, -6, 0, 1, 4... There is no corresponding theorem for finite planes such as Theorem 3.1. In the next section it will be shown that there is no finite difference set including the numbers -8, -6, 0, 1, 4.”

It’s clear Hall thought the finite extension problem was an interesting question. He just didn’t flag the counterexample as a theorem or proposition.

Second, it’s not like this Hall paper is particularly esoteric. While the 1947 paper is not as famous as Hall’s 1943 paper on projective planes, it’s still been cited hundreds of times. Hall was a notable mathematician and had been a professor at Ohio State and Caltech. His paper appeared in the reputable Duke Mathematical Journal. And the paper is still regularly cited today. For example, Sarah Peluse cites it in her 2020 work on finite difference sets.

Now, Peluse perhaps wasn’t aware of Erdős’ not-so-open open problem. But many mathematicians were aware of both Hall’s paper and Erdős’s conjecture and failed to see the connection. Alexeev and Mixon note that in a 2004 book called Unsolved Problems in Number Theory, Richard Guy cites Hall’s paper two sentences before stating Erdős’ conjecture! Here’s Guy’s passage:

“Marshall Hall has shown that numerous non-prime powers cannot serve as values of k and Evans & Mann that there is no such k < 1600 that is not a prime power. It is conjectured that no perfect difference set exists unless k is a prime power. Can a given finite sequence, which contains no repeated differences, always be extended to form a perfect difference set?”

Guy actually refers to Hall’s paper twice in his book.

Anyway, this is an amazing story about the vastness of the literature. We have no idea what’s in our libraries, even when we restrict our attention to papers with hundreds of citations. To be fair to all of us, we all cite papers that we have not read in their entirety. This is to be expected. We don’t write academic papers to be read like novels. If the authors don’t explicitly state a claim as THEOREM or LEMMA or CONJECTURE, it’s easy to just skim past it.

Still, it’s surprising that Erdős didn’t know about this. He was going around offering a thousand dollars to solve his simply stated problem. How is it that no one he talked to knew about Hall’s example?

And how is it that ChatGPT, which we know is trained illegally on paywalled papers, couldn’t find Hall’s paper for Alexeev and Mixon? It was probably too busy coming up with bad pull-up programs. Get it together, ChatGPT.

Subscribe now

By Ben Recht

TR25-160 | Intersection Theorems: A Potential Approach to Proof Complexity Lower Bounds | Yaroslav Alekseev, Nikita Gaevoy

from ECCC Papers

Recently, Göös et al. (2024) showed that Res ? uSA = RevRes in the following sense: if a formula $\varphi$ has refutations of size at most $s$ and width/degree at most $w$ in both Res and uSA, then there is a refutation for $\varphi$ of size at most $poly(s·2^w)$ in RevRes. Their proof relies on the TFNP characterization of the aforementioned proof systems. In our work, we give a direct and simplified proof of this result, simultaneously achieving better bounds: we show that if for a formula $\varphi$ there are refutations of size at most $s$ in both Res and uSA, then there is a refutation of $\varphi$ of size at most $poly(s)$ in RevRes. This potentially allows us to "lift" size lower bounds from RevRes to Res for the formulas for which there are upper bounds in uSA. This kind of lifting was not possible before because of the exponential blow-up in size from the width. Similarly, we improve the bounds in another intersection theorem from Göös et al. (2024) by giving a direct proof of Res ? uNS = RevResT. Finally, we generalize those intersection theorems to some proof systems for which we currently do not have a TFNP characterization. For example, we show that Res($\oplus$) ? u-wRes($\oplus$) = RevRes($\oplus$), which effectively allows us to reduce the problem of proving Pigeonhole Principle lower bounds in Res($\oplus$) to proving Pigeonhole Principle lower bounds in RevRes($\oplus$), a potentially weaker proof system.

Recently, Göös et al. (2024) showed that Res ? uSA = RevRes in the following sense: if a formula $\varphi$ has refutations of size at most $s$ and width/degree at most $w$ in both Res and uSA, then there is a refutation for $\varphi$ of size at most $poly(s·2^w)$ in RevRes. Their proof relies on the TFNP characterization of the aforementioned proof systems. In our work, we give a direct and simplified proof of this result, simultaneously achieving better bounds: we show that if for a formula $\varphi$ there are refutations of size at most $s$ in both Res and uSA, then there is a refutation of $\varphi$ of size at most $poly(s)$ in RevRes. This potentially allows us to "lift" size lower bounds from RevRes to Res for the formulas for which there are upper bounds in uSA. This kind of lifting was not possible before because of the exponential blow-up in size from the width. Similarly, we improve the bounds in another intersection theorem from Göös et al. (2024) by giving a direct proof of Res ? uNS = RevResT. Finally, we generalize those intersection theorems to some proof systems for which we currently do not have a TFNP characterization. For example, we show that Res($\oplus$) ? u-wRes($\oplus$) = RevRes($\oplus$), which effectively allows us to reduce the problem of proving Pigeonhole Principle lower bounds in Res($\oplus$) to proving Pigeonhole Principle lower bounds in RevRes($\oplus$), a potentially weaker proof system.

TR25-159 | Efficiently Batching Unambiguous Interactive Proofs | Matthew M. Hong, Rohan Goyal, Bonnie Berger, Yael Tauman Kalai

from ECCC Papers

We show that if a language $\mathcal{L}$ admits a public-coin unambiguous interactive proof (UIP) with round complexity $\ell$, where $a$ bits are communicated per round, then the \emph{batch language} $\mathcal{L}^{\otimes k}$, i.e. the set of $k$-tuples of statements all belonging to $\mathcal{L}$, has an unambiguous interactive proof with round complexity $\ell\cdot\text{polylog}(k)$, per-round communication of $a\cdot \ell\cdot\text{polylog}(k) + \text{poly}(\ell)$ bits, assuming the verifier in the $\text{UIP}$ has depth bounded by $\text{polylog}(k)$. Prior to this work, the best known batch $\text{UIP}$ for $\mathcal{L}^{\otimes{k}}$ required communication complexity at least $(\text{poly}(a)\cdot k^{\epsilon} + k) \cdot \ell^{1/\epsilon}$ for any arbitrarily small constant $\epsilon>0$ (Reingold-Rothblum-Rothblum, STOC 2016). As a corollary of our result, we obtain a \emph{doubly efficient proof system}, that is, a proof system whose proving overhead is polynomial in the time of the underlying computation, for any language computable in polynomial space and in time at most $n^{O\left(\sqrt{\frac{\log n}{\log\log n}}\right)}$. This expands the state of the art of doubly efficient proof systems: prior to our work, such systems were known for languages computable in polynomial space and in time $n^{({\log n})^\delta}$ for a small $\delta>0$ significantly smaller than $1/2$ (Reingold-Rothblum-Rothblum, STOC 2016).

We show that if a language $\mathcal{L}$ admits a public-coin unambiguous interactive proof (UIP) with round complexity $\ell$, where $a$ bits are communicated per round, then the \emph{batch language} $\mathcal{L}^{\otimes k}$, i.e. the set of $k$-tuples of statements all belonging to $\mathcal{L}$, has an unambiguous interactive proof with round complexity $\ell\cdot\text{polylog}(k)$, per-round communication of $a\cdot \ell\cdot\text{polylog}(k) + \text{poly}(\ell)$ bits, assuming the verifier in the $\text{UIP}$ has depth bounded by $\text{polylog}(k)$. Prior to this work, the best known batch $\text{UIP}$ for $\mathcal{L}^{\otimes{k}}$ required communication complexity at least $(\text{poly}(a)\cdot k^{\epsilon} + k) \cdot \ell^{1/\epsilon}$ for any arbitrarily small constant $\epsilon>0$ (Reingold-Rothblum-Rothblum, STOC 2016). As a corollary of our result, we obtain a \emph{doubly efficient proof system}, that is, a proof system whose proving overhead is polynomial in the time of the underlying computation, for any language computable in polynomial space and in time at most $n^{O\left(\sqrt{\frac{\log n}{\log\log n}}\right)}$. This expands the state of the art of doubly efficient proof systems: prior to our work, such systems were known for languages computable in polynomial space and in time $n^{({\log n})^\delta}$ for a small $\delta>0$ significantly smaller than $1/2$ (Reingold-Rothblum-Rothblum, STOC 2016).

Bill's Bad Advice

from Computational Complexity

 I sometimes give the following advice for research which I label Bill's Bad Advice. We will later see who it might be good advice for. Spoiler alert: the number of people for whom it is good advice is shrinking but might include Lance especially now (see his post about stepping down from admin, here).

When you come across a possible research topic or problem, or have some idea,  and are wondering if you want to pursue it,  here is my bad advice: 

1) DON"T  CARE if anyone else cares. If YOU care then that is enough to at least get started. 

2) DON"T  CARE if it has the potential for a published paper. FIRST do the work then, if you feel like it, look for a good venue. You might not bother if posting to arxiv or making an open problems column out of it or a (guest) blog post out of it is  a good endpoint.  (I run the SIGACT News Open Problems Column- feel free to contact me if you want to submit one.) 

3) DON"T CARE if it has practical implications. 

4) DON"T  CARE if you can get a grant for it. With the current state of the NSF this advice may soon become irrelevant. 

5) DON"T  CARE if someone else already did it (though at a later stage you should check on this). Even if you work on it and find someone else did it, you will have LEARNED about the problem through your efforts. You might then want to do a survey for your own benefit to consolidate your  knowledge. 

Why should you NOT CARE about any of these things? Because they get in the way of actually DOING something.  

Here are two examples of when this approach WORKED and one where it DID NOT WORK, though both classifications might depend on your definition of WORKED. 

WORKED: My work on Muffins. All I wanted was to get some High School and Undergraduate Projects out of it. I ended up with a book on it which made me twenty dollars last year! More to the point, I learned a lot, as did my co-authors and I am happy with the book. The co-authors were Undergraduates so my dept put me up for a mentoring award (I have other credentials as well). I did not win, but I got a nice letter saying they had many qualified applicants. OH- it didn't say if I was one of them. 

WORKED: I have had many Ramsey Projects where a High School Student codes stuff up and learns some Ramsey, some coding, and gets the experience of research. Sometimes they do a survey paper or open problems column. We both learn A LOT from this and the student gets a good letter from me. Do they do something NEW? Publishable? No, though some surveys and open problems columns have come out of this.  I DO tell them ahead of time that the work is unlikely to lead to original results (and hence unlikely to be good for a science competition).   

DID NOT WORK: See this blog post here about the math and here about finding out that the problem we were working on was already known and more was known than we thought. I didn't mind this, but one of the authors did. 

Note that WORKED and DID NOT WORK also depend on your goals. 

For whom is this bad advice? Good advice? 

1) It was always bad advice for young assistant professors who need to get papers and grants to get tenure. 

2) Hypothetically, once you get tenure you have job security and hence can change fields or follow my bad advice without consequences. But with grants and salary and teaching load issues, this is less the case. Perhaps I am nostalgic for a time that never was.

3) High School Students were my main audience for bad advice. It's not as important for them to get papers out as for (say) assistant professors. But even this is changing. Colleges are getting more competitive. And HS students may well want a project that can lead to Science competitions. I am not going to say things were better when I was a kid but instead pose non-rhetorical questions:

a) Are high school students getting into research earlier than they used to? I am sure the answer is yes.

b) Are we losing the safe space where a high school student can just learn things and do things and not worry so much about if it's publishable?  Yes, but I am not sure how widespread that is.

c) If we are losing that safe space, is that a bad thing? 

d) Points a,b,c apply to ugraduates who want to go to graduate school more than for high school students who want to go to college. 

4) Full Professors may have more freedom to follow my bad advice. Lance is looking for things to do now that he is no longer a dean, and indeed, is back to being a teaching-and-research professor. So he might follow my advice. However, he actually cares if people care about his work. He does not have to follow all of my advice, but he can follow some of it. 



By gasarch

 I sometimes give the following advice for research which I label Bill's Bad Advice. We will later see who it might be good advice for. Spoiler alert: the number of people for whom it is good advice is shrinking but might include Lance especially now (see his post about stepping down from admin, here).

When you come across a possible research topic or problem, or have some idea,  and are wondering if you want to pursue it,  here is my bad advice: 

1) DON"T  CARE if anyone else cares. If YOU care then that is enough to at least get started. 

2) DON"T  CARE if it has the potential for a published paper. FIRST do the work then, if you feel like it, look for a good venue. You might not bother if posting to arxiv or making an open problems column out of it or a (guest) blog post out of it is  a good endpoint.  (I run the SIGACT News Open Problems Column- feel free to contact me if you want to submit one.) 

3) DON"T CARE if it has practical implications. 

4) DON"T  CARE if you can get a grant for it. With the current state of the NSF this advice may soon become irrelevant. 

5) DON"T  CARE if someone else already did it (though at a later stage you should check on this). Even if you work on it and find someone else did it, you will have LEARNED about the problem through your efforts. You might then want to do a survey for your own benefit to consolidate your  knowledge. 

Why should you NOT CARE about any of these things? Because they get in the way of actually DOING something.  

Here are two examples of when this approach WORKED and one where it DID NOT WORK, though both classifications might depend on your definition of WORKED. 

WORKED: My work on Muffins. All I wanted was to get some High School and Undergraduate Projects out of it. I ended up with a book on it which made me twenty dollars last year! More to the point, I learned a lot, as did my co-authors and I am happy with the book. The co-authors were Undergraduates so my dept put me up for a mentoring award (I have other credentials as well). I did not win, but I got a nice letter saying they had many qualified applicants. OH- it didn't say if I was one of them. 

WORKED: I have had many Ramsey Projects where a High School Student codes stuff up and learns some Ramsey, some coding, and gets the experience of research. Sometimes they do a survey paper or open problems column. We both learn A LOT from this and the student gets a good letter from me. Do they do something NEW? Publishable? No, though some surveys and open problems columns have come out of this.  I DO tell them ahead of time that the work is unlikely to lead to original results (and hence unlikely to be good for a science competition).   

DID NOT WORK: See this blog post here about the math and here about finding out that the problem we were working on was already known and more was known than we thought. I didn't mind this, but one of the authors did. 

Note that WORKED and DID NOT WORK also depend on your goals. 

For whom is this bad advice? Good advice? 

1) It was always bad advice for young assistant professors who need to get papers and grants to get tenure. 

2) Hypothetically, once you get tenure you have job security and hence can change fields or follow my bad advice without consequences. But with grants and salary and teaching load issues, this is less the case. Perhaps I am nostalgic for a time that never was.

3) High School Students were my main audience for bad advice. It's not as important for them to get papers out as for (say) assistant professors. But even this is changing. Colleges are getting more competitive. And HS students may well want a project that can lead to Science competitions. I am not going to say things were better when I was a kid but instead pose non-rhetorical questions:

a) Are high school students getting into research earlier than they used to? I am sure the answer is yes.

b) Are we losing the safe space where a high school student can just learn things and do things and not worry so much about if it's publishable?  Yes, but I am not sure how widespread that is.

c) If we are losing that safe space, is that a bad thing? 

d) Points a,b,c apply to ugraduates who want to go to graduate school more than for high school students who want to go to college. 

4) Full Professors may have more freedom to follow my bad advice. Lance is looking for things to do now that he is no longer a dean, and indeed, is back to being a teaching-and-research professor. So he might follow my advice. However, he actually cares if people care about his work. He does not have to follow all of my advice, but he can follow some of it. 



By gasarch

SHAP Meets Tensor Networks: Provably Tractable Explanations with Parallelism

from arXiv: Computational Complexity

Authors: Reda Marzouk, Shahaf Bassan, Guy Katz

Although Shapley additive explanations (SHAP) can be computed in polynomial time for simple models like decision trees, they unfortunately become NP-hard to compute for more expressive black-box models like neural networks - where generating explanations is often most critical. In this work, we analyze the problem of computing SHAP explanations for *Tensor Networks (TNs)*, a broader and more expressive class of models than those for which current exact SHAP algorithms are known to hold, and which is widely used for neural network abstraction and compression. First, we introduce a general framework for computing provably exact SHAP explanations for general TNs with arbitrary structures. Interestingly, we show that, when TNs are restricted to a *Tensor Train (TT)* structure, SHAP computation can be performed in *poly-logarithmic* time using *parallel* computation. Thanks to the expressiveness power of TTs, this complexity result can be generalized to many other popular ML models such as decision trees, tree ensembles, linear models, and linear RNNs, therefore tightening previously reported complexity results for these families of models. Finally, by leveraging reductions of binarized neural networks to Tensor Network representations, we demonstrate that SHAP computation can become *efficiently tractable* when the network's *width* is fixed, while it remains computationally hard even with constant *depth*. This highlights an important insight: for this class of models, width - rather than depth - emerges as the primary computational bottleneck in SHAP computation.

Authors: Reda Marzouk, Shahaf Bassan, Guy Katz

Although Shapley additive explanations (SHAP) can be computed in polynomial time for simple models like decision trees, they unfortunately become NP-hard to compute for more expressive black-box models like neural networks - where generating explanations is often most critical. In this work, we analyze the problem of computing SHAP explanations for *Tensor Networks (TNs)*, a broader and more expressive class of models than those for which current exact SHAP algorithms are known to hold, and which is widely used for neural network abstraction and compression. First, we introduce a general framework for computing provably exact SHAP explanations for general TNs with arbitrary structures. Interestingly, we show that, when TNs are restricted to a *Tensor Train (TT)* structure, SHAP computation can be performed in *poly-logarithmic* time using *parallel* computation. Thanks to the expressiveness power of TTs, this complexity result can be generalized to many other popular ML models such as decision trees, tree ensembles, linear models, and linear RNNs, therefore tightening previously reported complexity results for these families of models. Finally, by leveraging reductions of binarized neural networks to Tensor Network representations, we demonstrate that SHAP computation can become *efficiently tractable* when the network's *width* is fixed, while it remains computationally hard even with constant *depth*. This highlights an important insight: for this class of models, width - rather than depth - emerges as the primary computational bottleneck in SHAP computation.

Distributed $(Δ+1)$-Coloring in Graphs of Bounded Neighborhood Independence

from arXiv: Computational Complexity

Authors: Marc Fuchs, Fabian Kuhn

The distributed coloring problem is arguably one of the key problems studied in the area of distributed graph algorithms. The most standard variant of the problem asks for a proper vertex coloring of a graph with $\Delta+1$ colors, where $\Delta$ is the maximum degree of the graph. Despite an immense amount of work on distributed coloring problems in the distributed setting, determining the deterministic complexity of $(\Delta+1)$-coloring in the standard message passing model remains one of the most important open questions of the area. In this paper, we aim to improve our understanding of the deterministic complexity of $(\Delta+1)$-coloring as a function of $\Delta$ in a special family of graphs for which significantly faster algorithms are already known. The neighborhood independence $\theta$ of a graph is the maximum number of pairwise non-adjacent neighbors of some node of the graph. In general, in graphs of neighborhood independence $\theta=O(1)$ (e.g., line graphs), it is known that $(\Delta+1)$-coloring can be solved in $2^{O(\sqrt{\log\Delta})}+O(\log^* n)$ rounds. In the present paper, we significantly improve this result, and we show that in graphs of neighborhood independence $\theta$, a $(\Delta+1)$-coloring can be computed in $(\theta\cdot\log\Delta)^{O(\log\log\Delta / \log\log\log\Delta)}+O(\log^* n)$ rounds and thus in quasipolylogarithmic time in $\Delta$ as long as $\theta$ is at most polylogarithmic in $\Delta$. We also show that the known approach that leads to a polylogarithmic in $\Delta$ algorithm for $(2\Delta-1)$-edge coloring already fails for edge colorings of hypergraphs of rank at least $3$.

Authors: Marc Fuchs, Fabian Kuhn

The distributed coloring problem is arguably one of the key problems studied in the area of distributed graph algorithms. The most standard variant of the problem asks for a proper vertex coloring of a graph with $\Delta+1$ colors, where $\Delta$ is the maximum degree of the graph. Despite an immense amount of work on distributed coloring problems in the distributed setting, determining the deterministic complexity of $(\Delta+1)$-coloring in the standard message passing model remains one of the most important open questions of the area. In this paper, we aim to improve our understanding of the deterministic complexity of $(\Delta+1)$-coloring as a function of $\Delta$ in a special family of graphs for which significantly faster algorithms are already known. The neighborhood independence $\theta$ of a graph is the maximum number of pairwise non-adjacent neighbors of some node of the graph. In general, in graphs of neighborhood independence $\theta=O(1)$ (e.g., line graphs), it is known that $(\Delta+1)$-coloring can be solved in $2^{O(\sqrt{\log\Delta})}+O(\log^* n)$ rounds. In the present paper, we significantly improve this result, and we show that in graphs of neighborhood independence $\theta$, a $(\Delta+1)$-coloring can be computed in $(\theta\cdot\log\Delta)^{O(\log\log\Delta / \log\log\log\Delta)}+O(\log^* n)$ rounds and thus in quasipolylogarithmic time in $\Delta$ as long as $\theta$ is at most polylogarithmic in $\Delta$. We also show that the known approach that leads to a polylogarithmic in $\Delta$ algorithm for $(2\Delta-1)$-edge coloring already fails for edge colorings of hypergraphs of rank at least $3$.

Additive Models Explained: A Computational Complexity Approach

from arXiv: Computational Complexity

Authors: Shahaf Bassan, Michal Moshkovitz, Guy Katz

Generalized Additive Models (GAMs) are commonly considered *interpretable* within the ML community, as their structure makes the relationship between inputs and outputs relatively understandable. Therefore, it may seem natural to hypothesize that obtaining meaningful explanations for GAMs could be performed efficiently and would not be computationally infeasible. In this work, we challenge this hypothesis by analyzing the *computational complexity* of generating different explanations for various forms of GAMs across multiple contexts. Our analysis reveals a surprisingly diverse landscape of both positive and negative complexity outcomes. Particularly, under standard complexity assumptions such as P!=NP, we establish several key findings: (1) in stark contrast to many other common ML models, the complexity of generating explanations for GAMs is heavily influenced by the structure of the input space; (2) the complexity of explaining GAMs varies significantly with the types of component models used - but interestingly, these differences only emerge under specific input domain settings; (3) significant complexity distinctions appear for obtaining explanations in regression tasks versus classification tasks in GAMs; and (4) expressing complex models like neural networks additively (e.g., as neural additive models) can make them easier to explain, though interestingly, this benefit appears only for certain explanation methods and input domains. Collectively, these results shed light on the feasibility of computing diverse explanations for GAMs, offering a rigorous theoretical picture of the conditions under which such computations are possible or provably hard.

Authors: Shahaf Bassan, Michal Moshkovitz, Guy Katz

Generalized Additive Models (GAMs) are commonly considered *interpretable* within the ML community, as their structure makes the relationship between inputs and outputs relatively understandable. Therefore, it may seem natural to hypothesize that obtaining meaningful explanations for GAMs could be performed efficiently and would not be computationally infeasible. In this work, we challenge this hypothesis by analyzing the *computational complexity* of generating different explanations for various forms of GAMs across multiple contexts. Our analysis reveals a surprisingly diverse landscape of both positive and negative complexity outcomes. Particularly, under standard complexity assumptions such as P!=NP, we establish several key findings: (1) in stark contrast to many other common ML models, the complexity of generating explanations for GAMs is heavily influenced by the structure of the input space; (2) the complexity of explaining GAMs varies significantly with the types of component models used - but interestingly, these differences only emerge under specific input domain settings; (3) significant complexity distinctions appear for obtaining explanations in regression tasks versus classification tasks in GAMs; and (4) expressing complex models like neural networks additively (e.g., as neural additive models) can make them easier to explain, though interestingly, this benefit appears only for certain explanation methods and input domains. Collectively, these results shed light on the feasibility of computing diverse explanations for GAMs, offering a rigorous theoretical picture of the conditions under which such computations are possible or provably hard.

Computational Hardness of Reinforcement Learning with Partial $q^π$-Realizability

from arXiv: Computational Complexity

Authors: Shayan Karimi, Xiaoqi Tan

This paper investigates the computational complexity of reinforcement learning in a novel linear function approximation regime, termed partial $q^{\pi}$-realizability. In this framework, the objective is to learn an $\epsilon$-optimal policy with respect to a predefined policy set $\Pi$, under the assumption that all value functions for policies in $\Pi$ are linearly realizable. The assumptions of this framework are weaker than those in $q^{\pi}$-realizability but stronger than those in $q^*$-realizability, providing a practical model where function approximation naturally arises. We prove that learning an $\epsilon$-optimal policy in this setting is computationally hard. Specifically, we establish NP-hardness under a parameterized greedy policy set (argmax) and show that - unless NP = RP - an exponential lower bound (in feature vector dimension) holds when the policy set contains softmax policies, under the Randomized Exponential Time Hypothesis. Our hardness results mirror those in $q^*$-realizability and suggest computational difficulty persists even when $\Pi$ is expanded beyond the optimal policy. To establish this, we reduce from two complexity problems, $\delta$-Max-3SAT and $\delta$-Max-3SAT(b), to instances of GLinear-$\kappa$-RL (greedy policy) and SLinear-$\kappa$-RL (softmax policy). Our findings indicate that positive computational results are generally unattainable in partial $q^{\pi}$-realizability, in contrast to $q^{\pi}$-realizability under a generative access model.

Authors: Shayan Karimi, Xiaoqi Tan

This paper investigates the computational complexity of reinforcement learning in a novel linear function approximation regime, termed partial $q^{\pi}$-realizability. In this framework, the objective is to learn an $\epsilon$-optimal policy with respect to a predefined policy set $\Pi$, under the assumption that all value functions for policies in $\Pi$ are linearly realizable. The assumptions of this framework are weaker than those in $q^{\pi}$-realizability but stronger than those in $q^*$-realizability, providing a practical model where function approximation naturally arises. We prove that learning an $\epsilon$-optimal policy in this setting is computationally hard. Specifically, we establish NP-hardness under a parameterized greedy policy set (argmax) and show that - unless NP = RP - an exponential lower bound (in feature vector dimension) holds when the policy set contains softmax policies, under the Randomized Exponential Time Hypothesis. Our hardness results mirror those in $q^*$-realizability and suggest computational difficulty persists even when $\Pi$ is expanded beyond the optimal policy. To establish this, we reduce from two complexity problems, $\delta$-Max-3SAT and $\delta$-Max-3SAT(b), to instances of GLinear-$\kappa$-RL (greedy policy) and SLinear-$\kappa$-RL (softmax policy). Our findings indicate that positive computational results are generally unattainable in partial $q^{\pi}$-realizability, in contrast to $q^{\pi}$-realizability under a generative access model.

Near-Optimal Min-Sum Motion Planning in a Planar Polygonal Environment

from arXiv: Computational Geometry

Authors: Pankaj K. Agarwal, Benjamin Holmgren, Alex Steiger

Let $W \subset \mathbb{R}^2$ be a planar polygonal environment with $n$ vertices, and let $[k] = \{1,\ldots,k\}$ denote $k$ unit-square robots translating in $W$. Given source and target placements $s_1, t_1, \ldots, s_k, t_k \in W$ for each robot, we wish to compute a collision-free motion plan $\mathbf{\pi}$, i.e., a coordinated motion for each robot $i$ along a continuous path from $s_i$ to $t_i$ so that robot $i$ does not leave $W$ or collide with any other $j$. Moreover, we additionally require that $\mathbf{\pi}$ minimizes the sum of the path lengths; this variant is known as \textit{min-sum motion planning}. Even computing a feasible motion plan for $k$ unit-square robots in a polygonal environment is {\textsf PSPACE}-hard. For $r > 0$, let $opt(\mathbf{s},\mathbf{t}, r)$ denote the cost of a min-sum motion plan for $k$ square robots of radius $r$ each from $\mathbf{s}=(s_1,\ldots,s_k)$ to $\mathbf{t}=(t_1,\ldots,t_k)$. Given a parameter $\epsilon > 0$, we present an algorithm for computing a coordinated motion plan for $k$ unit radius square robots of cost at most $(1+\epsilon)opt(\mathbf{s},\mathbf{t}, 1+\epsilon)+\epsilon$, which improves to $(1+\epsilon)opt(\mathbf{s},\mathbf{t}, 1+\epsilon)$ if $opt(\mathbf{s},\mathbf{t}, 1+\epsilon)\geq 1$, that runs in time $f(k,\epsilon)n^{O(k)}$, where $f(k,\epsilon) = (k/\epsilon)^{O(k^2)}$. Our result is the first polynomial-time bicriteria $(1+\epsilon)$-approximation algorithm for any optimal multi-robot motion planning problem amidst obstacles for a constant value of $k > 2$. The algorithm also works even if robots are modeled as $k$ congruent disks.

Authors: Pankaj K. Agarwal, Benjamin Holmgren, Alex Steiger

Let $W \subset \mathbb{R}^2$ be a planar polygonal environment with $n$ vertices, and let $[k] = \{1,\ldots,k\}$ denote $k$ unit-square robots translating in $W$. Given source and target placements $s_1, t_1, \ldots, s_k, t_k \in W$ for each robot, we wish to compute a collision-free motion plan $\mathbf{\pi}$, i.e., a coordinated motion for each robot $i$ along a continuous path from $s_i$ to $t_i$ so that robot $i$ does not leave $W$ or collide with any other $j$. Moreover, we additionally require that $\mathbf{\pi}$ minimizes the sum of the path lengths; this variant is known as \textit{min-sum motion planning}. Even computing a feasible motion plan for $k$ unit-square robots in a polygonal environment is {\textsf PSPACE}-hard. For $r > 0$, let $opt(\mathbf{s},\mathbf{t}, r)$ denote the cost of a min-sum motion plan for $k$ square robots of radius $r$ each from $\mathbf{s}=(s_1,\ldots,s_k)$ to $\mathbf{t}=(t_1,\ldots,t_k)$. Given a parameter $\epsilon > 0$, we present an algorithm for computing a coordinated motion plan for $k$ unit radius square robots of cost at most $(1+\epsilon)opt(\mathbf{s},\mathbf{t}, 1+\epsilon)+\epsilon$, which improves to $(1+\epsilon)opt(\mathbf{s},\mathbf{t}, 1+\epsilon)$ if $opt(\mathbf{s},\mathbf{t}, 1+\epsilon)\geq 1$, that runs in time $f(k,\epsilon)n^{O(k)}$, where $f(k,\epsilon) = (k/\epsilon)^{O(k^2)}$. Our result is the first polynomial-time bicriteria $(1+\epsilon)$-approximation algorithm for any optimal multi-robot motion planning problem amidst obstacles for a constant value of $k > 2$. The algorithm also works even if robots are modeled as $k$ congruent disks.

World-POI: Global Point-of-Interest Data Enriched from Foursquare and OpenStreetMap as Tabular and Graph Data

from arXiv: Computational Geometry

Authors: Hossein Amiri, Mohammad Hashemi, Andreas Züfle

Recently, Foursquare released a global dataset with more than 100 million points of interest (POIs), each representing a real-world business on its platform. However, many entries lack complete metadata such as addresses or categories, and some correspond to non-existent or fictional locations. In contrast, OpenStreetMap (OSM) offers a rich, user-contributed POI dataset with detailed and frequently updated metadata, though it does not formally verify whether a POI represents an actual business. In this data paper, we present a methodology that integrates the strengths of both datasets: Foursquare as a comprehensive baseline of commercial POIs and OSM as a source of enriched metadata. The combined dataset totals approximately 1 TB. While this full version is not publicly released, we provide filtered releases with adjustable thresholds that reduce storage needs and make the data practical to download and use across domains. We also provide step-by-step instructions to reproduce the full 631 GB build. Record linkage is achieved by computing name similarity scores and spatial distances between Foursquare and OSM POIs. These measures identify and retain high-confidence matches that correspond to real businesses in Foursquare, have representations in OSM, and show strong name similarity. Finally, we use this filtered dataset to construct a graph-based representation of POIs enriched with attributes from both sources, enabling advanced spatial analyses and a range of downstream applications.

Authors: Hossein Amiri, Mohammad Hashemi, Andreas Züfle

Recently, Foursquare released a global dataset with more than 100 million points of interest (POIs), each representing a real-world business on its platform. However, many entries lack complete metadata such as addresses or categories, and some correspond to non-existent or fictional locations. In contrast, OpenStreetMap (OSM) offers a rich, user-contributed POI dataset with detailed and frequently updated metadata, though it does not formally verify whether a POI represents an actual business. In this data paper, we present a methodology that integrates the strengths of both datasets: Foursquare as a comprehensive baseline of commercial POIs and OSM as a source of enriched metadata. The combined dataset totals approximately 1 TB. While this full version is not publicly released, we provide filtered releases with adjustable thresholds that reduce storage needs and make the data practical to download and use across domains. We also provide step-by-step instructions to reproduce the full 631 GB build. Record linkage is achieved by computing name similarity scores and spatial distances between Foursquare and OSM POIs. These measures identify and retain high-confidence matches that correspond to real businesses in Foursquare, have representations in OSM, and show strong name similarity. Finally, we use this filtered dataset to construct a graph-based representation of POIs enriched with attributes from both sources, enabling advanced spatial analyses and a range of downstream applications.

Relative-error unateness testing

from arXiv: Data Structures and Algorithms

Authors: Xi Chen, Diptaksho Palit, Kabir Peshawaria, William Pires, Rocco A. Servedio, Yiding Zhang

The model of relative-error property testing of Boolean functions has been the subject of significant recent research effort [CDH+24][CPPS25a][CPPS25b] In this paper we consider the problem of relative-error testing an unknown and arbitrary $f: \{0,1\}^n \to \{0,1\}$ for the property of being a unate function, i.e. a function that is either monotone non-increasing or monotone non-decreasing in each of the $n$ input variables. Our first result is a one-sided non-adaptive algorithm for this problem that makes $\tilde{O}(\log(N)/\epsilon)$ samples and queries, where $N=|f^{-1}(1)|$ is the number of satisfying assignments of the function that is being tested and the value of $N$ is given as an input parameter to the algorithm. Building on this algorithm, we next give a one-sided adaptive algorithm for this problem that does not need to be given the value of $N$ and with high probability makes $\tilde{O}(\log(N)/\epsilon)$ samples and queries. We also give lower bounds for both adaptive and non-adaptive two-sided algorithms that are given the value of $N$ up to a constant multiplicative factor. In the non-adaptive case, our lower bounds essentially match the complexity of the algorithm that we provide.

Authors: Xi Chen, Diptaksho Palit, Kabir Peshawaria, William Pires, Rocco A. Servedio, Yiding Zhang

The model of relative-error property testing of Boolean functions has been the subject of significant recent research effort [CDH+24][CPPS25a][CPPS25b] In this paper we consider the problem of relative-error testing an unknown and arbitrary $f: \{0,1\}^n \to \{0,1\}$ for the property of being a unate function, i.e. a function that is either monotone non-increasing or monotone non-decreasing in each of the $n$ input variables. Our first result is a one-sided non-adaptive algorithm for this problem that makes $\tilde{O}(\log(N)/\epsilon)$ samples and queries, where $N=|f^{-1}(1)|$ is the number of satisfying assignments of the function that is being tested and the value of $N$ is given as an input parameter to the algorithm. Building on this algorithm, we next give a one-sided adaptive algorithm for this problem that does not need to be given the value of $N$ and with high probability makes $\tilde{O}(\log(N)/\epsilon)$ samples and queries. We also give lower bounds for both adaptive and non-adaptive two-sided algorithms that are given the value of $N$ up to a constant multiplicative factor. In the non-adaptive case, our lower bounds essentially match the complexity of the algorithm that we provide.

Placing Green Bridges Optimally, with Close-Range Habitats in Sparse Graphs

from arXiv: Data Structures and Algorithms

Authors: Christian Wallisch, Till Fluschnik, Leon Kellerhals

We study a network design problem motivated by the challenge of placing wildlife crossings to reconnect fragmented habitats of animal species, which is among the 17 goals towards sustainable development by the UN: Given a graph, whose vertices represent the fragmented habitat areas and whose edges represent possible green bridge locations (with costs), and the habitable vertex set for each species' habitat, the goal is to find the cheapest set of edges such that each species' habitat is sufficiently connected. We focus on the established variant where a habitat is considered sufficiently connected if it has diameter two in the solution and study its complexity in cases justified by our setting namely small habitat sizes on planar graphs and graphs of small maximum degree $\Delta$. We provide efficient algorithms and NP-hardness results for different values of $\Delta$ and maximum habitat sizes on general and planar graphs.

Authors: Christian Wallisch, Till Fluschnik, Leon Kellerhals

We study a network design problem motivated by the challenge of placing wildlife crossings to reconnect fragmented habitats of animal species, which is among the 17 goals towards sustainable development by the UN: Given a graph, whose vertices represent the fragmented habitat areas and whose edges represent possible green bridge locations (with costs), and the habitable vertex set for each species' habitat, the goal is to find the cheapest set of edges such that each species' habitat is sufficiently connected. We focus on the established variant where a habitat is considered sufficiently connected if it has diameter two in the solution and study its complexity in cases justified by our setting namely small habitat sizes on planar graphs and graphs of small maximum degree $\Delta$. We provide efficient algorithms and NP-hardness results for different values of $\Delta$ and maximum habitat sizes on general and planar graphs.

A Unified Approach to Submodular Maximization Under Noise

from arXiv: Data Structures and Algorithms

Authors: Kshipra Bhawalkar, Yang Cai, Zhe Feng, Christopher Liaw, Tao Lin

We consider the problem of maximizing a submodular function with access to a noisy value oracle for the function instead of an exact value oracle. Similar to prior work, we assume that the noisy oracle is persistent in that multiple calls to the oracle for a specific set always return the same value. In this model, Hassidim and Singer (2017) design a $(1-1/e)$-approximation algorithm for monotone submodular maximization subject to a cardinality constraint, and Huang et al (2022) design a $(1-1/e)/2$-approximation algorithm for monotone submodular maximization subject to any arbitrary matroid constraint. In this paper, we design a meta-algorithm that allows us to take any "robust" algorithm for exact submodular maximization as a black box and transform it into an algorithm for the noisy setting while retaining the approximation guarantee. By using the meta-algorithm with the measured continuous greedy algorithm, we obtain a $(1-1/e)$-approximation (resp. $1/e$-approximation) for monotone (resp. non-monotone) submodular maximization subject to a matroid constraint under noise. Furthermore, by using the meta-algorithm with the double greedy algorithm, we obtain a $1/2$-approximation for unconstrained (non-monotone) submodular maximization under noise.

Authors: Kshipra Bhawalkar, Yang Cai, Zhe Feng, Christopher Liaw, Tao Lin

We consider the problem of maximizing a submodular function with access to a noisy value oracle for the function instead of an exact value oracle. Similar to prior work, we assume that the noisy oracle is persistent in that multiple calls to the oracle for a specific set always return the same value. In this model, Hassidim and Singer (2017) design a $(1-1/e)$-approximation algorithm for monotone submodular maximization subject to a cardinality constraint, and Huang et al (2022) design a $(1-1/e)/2$-approximation algorithm for monotone submodular maximization subject to any arbitrary matroid constraint. In this paper, we design a meta-algorithm that allows us to take any "robust" algorithm for exact submodular maximization as a black box and transform it into an algorithm for the noisy setting while retaining the approximation guarantee. By using the meta-algorithm with the measured continuous greedy algorithm, we obtain a $(1-1/e)$-approximation (resp. $1/e$-approximation) for monotone (resp. non-monotone) submodular maximization subject to a matroid constraint under noise. Furthermore, by using the meta-algorithm with the double greedy algorithm, we obtain a $1/2$-approximation for unconstrained (non-monotone) submodular maximization under noise.

O(1)-Distortion Planar Emulators for String Graphs

from arXiv: Data Structures and Algorithms

Authors: Hsien-Chih Chang, Jonathan Conroy, Zihan Tan, Da Wei Zheng

We show that every unweighted string graph $G$ has an $O(1)$-distortion planar emulator: that is, there exists an (edge-weighted) planar graph $H$ with $V(H) = V(G)$, such that every pair of vertices $(u,v)$ satisfies $\delta_G(u,v) \le \delta_H(u,v) \le O(1) \cdot \delta_G(u,v).$

Authors: Hsien-Chih Chang, Jonathan Conroy, Zihan Tan, Da Wei Zheng

We show that every unweighted string graph $G$ has an $O(1)$-distortion planar emulator: that is, there exists an (edge-weighted) planar graph $H$ with $V(H) = V(G)$, such that every pair of vertices $(u,v)$ satisfies $\delta_G(u,v) \le \delta_H(u,v) \le O(1) \cdot \delta_G(u,v).$

Beyond Smoothed Analysis: Analyzing the Simplex Method by the Book

from arXiv: Data Structures and Algorithms

Authors: Eleon Bach, Alexander E. Black, Sophie Huiberts, Sean Kafer

Narrowing the gap between theory and practice is a longstanding goal of the algorithm analysis community. To further progress our understanding of how algorithms work in practice, we propose a new algorithm analysis framework that we call by the book analysis. In contrast to earlier frameworks, by the book analysis not only models an algorithm's input data, but also the algorithm itself. Results from by the book analysis are meant to correspond well with established knowledge of an algorithm's practical behavior, as they are meant to be grounded in observations from implementations, input modeling best practices, and measurements on practical benchmark instances. We apply our framework to the simplex method, an algorithm which is beloved for its excellent performance in practice and notorious for its high running time under worst-case analysis. The simplex method similarly showcased the state of the art framework smoothed analysis (Spielman and Teng, STOC'01). We explain how our framework overcomes several weaknesses of smoothed analysis and we prove that under input scaling assumptions, feasibility tolerances and other design principles used by simplex method implementations, the simplex method indeed attains a polynomial running time.

Authors: Eleon Bach, Alexander E. Black, Sophie Huiberts, Sean Kafer

Narrowing the gap between theory and practice is a longstanding goal of the algorithm analysis community. To further progress our understanding of how algorithms work in practice, we propose a new algorithm analysis framework that we call by the book analysis. In contrast to earlier frameworks, by the book analysis not only models an algorithm's input data, but also the algorithm itself. Results from by the book analysis are meant to correspond well with established knowledge of an algorithm's practical behavior, as they are meant to be grounded in observations from implementations, input modeling best practices, and measurements on practical benchmark instances. We apply our framework to the simplex method, an algorithm which is beloved for its excellent performance in practice and notorious for its high running time under worst-case analysis. The simplex method similarly showcased the state of the art framework smoothed analysis (Spielman and Teng, STOC'01). We explain how our framework overcomes several weaknesses of smoothed analysis and we prove that under input scaling assumptions, feasibility tolerances and other design principles used by simplex method implementations, the simplex method indeed attains a polynomial running time.

Approximate minimization of interpretations in fuzzy description logics under the Gödel semantics

from arXiv: Data Structures and Algorithms

Authors: Linh Anh Nguyen

The problem of minimizing fuzzy interpretations in fuzzy description logics (FDLs) is important both theoretically and practically. For instance, fuzzy or weighted social networks can be modeled as fuzzy interpretations, where individuals represent actors and roles capture interactions. Minimizing such interpretations yields more compact representations, which can significantly improve the efficiency of reasoning and analysis tasks in knowledge-based systems. We present the first algorithm that minimizes a finite fuzzy interpretation while preserving fuzzy concept assertions in FDLs without the Baaz projection operator and the universal role, under the G\"odel semantics. The considered class of FDLs ranges from the sublogic of $f\!\mathcal{ALC}$ without the union operator and universal restriction to the FDL that extends $f\!\mathcal{ALC}_{reg}$ with inverse roles and nominals. Our algorithm is given in an extended form that supports approximate preservation: it minimizes a finite fuzzy interpretation $\mathcal{I}$ while preserving fuzzy concept assertions up to a degree $\gamma \in (0,1]$. Its time complexity is $O((m\log{l} + n)\log{n})$, where $n$ is the size of the domain of $\mathcal{I}$, $m$ is the number of nonzero instances of atomic roles in $\mathcal{I}$, and $l$ is the number of distinct fuzzy values used in such instances plus 2.

Authors: Linh Anh Nguyen

The problem of minimizing fuzzy interpretations in fuzzy description logics (FDLs) is important both theoretically and practically. For instance, fuzzy or weighted social networks can be modeled as fuzzy interpretations, where individuals represent actors and roles capture interactions. Minimizing such interpretations yields more compact representations, which can significantly improve the efficiency of reasoning and analysis tasks in knowledge-based systems. We present the first algorithm that minimizes a finite fuzzy interpretation while preserving fuzzy concept assertions in FDLs without the Baaz projection operator and the universal role, under the G\"odel semantics. The considered class of FDLs ranges from the sublogic of $f\!\mathcal{ALC}$ without the union operator and universal restriction to the FDL that extends $f\!\mathcal{ALC}_{reg}$ with inverse roles and nominals. Our algorithm is given in an extended form that supports approximate preservation: it minimizes a finite fuzzy interpretation $\mathcal{I}$ while preserving fuzzy concept assertions up to a degree $\gamma \in (0,1]$. Its time complexity is $O((m\log{l} + n)\log{n})$, where $n$ is the size of the domain of $\mathcal{I}$, $m$ is the number of nonzero instances of atomic roles in $\mathcal{I}$, and $l$ is the number of distinct fuzzy values used in such instances plus 2.

Universal Maximum Likelihood (List) Decoding via Fast Vector-Matrix Multiplication

from arXiv: Data Structures and Algorithms

Authors: Hoang Ly, Emina Soljanin

Maximum-likelihood (ML) decoding for arbitrary block codes remains fundamentally hard, with worst-case time complexity-measured by the total number of multiplications-being no better than straightforward exhaustive search, which requires $q^{k} n$ operations for an $[n,k]_q$ code. This paper introduces a simple, code-agnostic framework that reduces the worst-case complexity by a factor of $n$, down to $q^{k}$ operations, a highly desirable reduction in practice. The result holds for both linear and nonlinear block codes over general memoryless channels and under both hard-decision and soft-decision decoding. It naturally extends to intersymbol-interference (ISI) channels and ML list decoding with only a negligible increase in complexity. Our core insight is that, upon receipt of each sequence at the receiver, the conditional probability of that sequence for each codeword in the codebook (i.e., the \emph{likelihood}) can be expressed as the inner product of two carefully constructed vectors -- the first depending on the received sequence, and the second on that codeword itself. As a result, evaluating the likelihoods for all codewords in the codebook reduces to a single vector-matrix multiplication, and ML decoding (MLD) becomes the simple task of picking the maximum entry in the resulting vector. The only non-trivial cost lies in the vector-matrix product. However, our matrix construction allows the use of the Mailman algorithm to reduce this cost. This time reduction is achieved at the cost of high space complexity, requiring $\mathcal{O}(q^{k+1} n)$ space to store the pre-computed codebook matrix.

Authors: Hoang Ly, Emina Soljanin

Maximum-likelihood (ML) decoding for arbitrary block codes remains fundamentally hard, with worst-case time complexity-measured by the total number of multiplications-being no better than straightforward exhaustive search, which requires $q^{k} n$ operations for an $[n,k]_q$ code. This paper introduces a simple, code-agnostic framework that reduces the worst-case complexity by a factor of $n$, down to $q^{k}$ operations, a highly desirable reduction in practice. The result holds for both linear and nonlinear block codes over general memoryless channels and under both hard-decision and soft-decision decoding. It naturally extends to intersymbol-interference (ISI) channels and ML list decoding with only a negligible increase in complexity. Our core insight is that, upon receipt of each sequence at the receiver, the conditional probability of that sequence for each codeword in the codebook (i.e., the \emph{likelihood}) can be expressed as the inner product of two carefully constructed vectors -- the first depending on the received sequence, and the second on that codeword itself. As a result, evaluating the likelihoods for all codewords in the codebook reduces to a single vector-matrix multiplication, and ML decoding (MLD) becomes the simple task of picking the maximum entry in the resulting vector. The only non-trivial cost lies in the vector-matrix product. However, our matrix construction allows the use of the Mailman algorithm to reduce this cost. This time reduction is achieved at the cost of high space complexity, requiring $\mathcal{O}(q^{k+1} n)$ space to store the pre-computed codebook matrix.

On the Complexity of Distributed Edge Coloring and Orientation Problems

from arXiv: Data Structures and Algorithms

Authors: Sebastian Brandt, Fabian Kuhn, Zahra Parsaeian

Understanding the role of randomness when solving locally checkable labeling (LCL) problems in the LOCAL model has been one of the top priorities in the research on distributed graph algorithms in recent years. For LCL problems in bounded-degree graphs, it is known that randomness cannot help more than polynomially, except in one case: if the deterministic complexity of an LCL problem is in $\Omega(\log n)$ and its randomized complexity is in $o(\log n)$, then the randomized complexity is guaranteed to be $poly(\log \log n)$. However, the fundamental question of \emph{which} problems with a deterministic complexity of $\Omega(\log n)$ can be solved exponentially faster using randomization still remains wide open. We make a step towards answering this question by studying a simple, but natural class of LCL problems: so-called degree splitting problems. These problems come in two varieties: coloring problems where the edges of a graph have to be colored with $2$ colors and orientation problems where each edge needs to be oriented. For $\Delta$-regular graphs (where $\Delta=O(1)$), we obtain the following results. - We gave an exact characterization of the randomized complexity of all problems where the edges need to be colored with two colors, say red and blue, and which have a deterministic complexity of $O(\log n)$. - For edge orientation problems, we give a partial characterization of the problems that have a randomized complexity of $\Omega(\log n)$ and the problems that have a randomized complexity of $poly\log\log n$. While our results are cleanest to state for the $\Delta$-regular case, all our algorithms naturally generalize to nodes of any degree $d<\Delta$ in general graphs of maximum degree $\Delta$.

Authors: Sebastian Brandt, Fabian Kuhn, Zahra Parsaeian

Understanding the role of randomness when solving locally checkable labeling (LCL) problems in the LOCAL model has been one of the top priorities in the research on distributed graph algorithms in recent years. For LCL problems in bounded-degree graphs, it is known that randomness cannot help more than polynomially, except in one case: if the deterministic complexity of an LCL problem is in $\Omega(\log n)$ and its randomized complexity is in $o(\log n)$, then the randomized complexity is guaranteed to be $poly(\log \log n)$. However, the fundamental question of \emph{which} problems with a deterministic complexity of $\Omega(\log n)$ can be solved exponentially faster using randomization still remains wide open. We make a step towards answering this question by studying a simple, but natural class of LCL problems: so-called degree splitting problems. These problems come in two varieties: coloring problems where the edges of a graph have to be colored with $2$ colors and orientation problems where each edge needs to be oriented. For $\Delta$-regular graphs (where $\Delta=O(1)$), we obtain the following results. - We gave an exact characterization of the randomized complexity of all problems where the edges need to be colored with two colors, say red and blue, and which have a deterministic complexity of $O(\log n)$. - For edge orientation problems, we give a partial characterization of the problems that have a randomized complexity of $\Omega(\log n)$ and the problems that have a randomized complexity of $poly\log\log n$. While our results are cleanest to state for the $\Delta$-regular case, all our algorithms naturally generalize to nodes of any degree $d<\Delta$ in general graphs of maximum degree $\Delta$.

Unsplittable Cost Flows from Unweighted Error-Bounded Variants

from arXiv: Data Structures and Algorithms

Authors: Chaitanya Swamy, Vera Traub, Laura Vargas Koch, Rico Zenklusen

A famous conjecture of Goemans on single-source unsplittable flows states that one can turn any fractional flow into an unsplittable one of no higher cost, while increasing the load on any arc by at most the maximum demand. Despite extensive work on the topic, only limited progress has been made. Recently, Morell and Skutella suggested an alternative conjecture, stating that one can turn any fractional flow into an unsplittable one without changing the load on any arc by more than the maximum demand. We show that their conjecture implies Goemans' conjecture (with a violation of twice the maximum demand). To this end, we generalize a technique of Linhares and Swamy, used to obtain a low-cost chain-constrained spanning tree from an algorithm without cost guarantees. Whereas Linhares and Swamy's proof relies on Langrangian duality, we provide a very simple elementary proof of a generalized version, which we hope to be of independent interest. Moreover, we show how this technique can also be used in the context of the weighted ring loading problem, showing that cost-unaware approximation algorithms can be transformed into approximation algorithms with additional cost guarantees.

Authors: Chaitanya Swamy, Vera Traub, Laura Vargas Koch, Rico Zenklusen

A famous conjecture of Goemans on single-source unsplittable flows states that one can turn any fractional flow into an unsplittable one of no higher cost, while increasing the load on any arc by at most the maximum demand. Despite extensive work on the topic, only limited progress has been made. Recently, Morell and Skutella suggested an alternative conjecture, stating that one can turn any fractional flow into an unsplittable one without changing the load on any arc by more than the maximum demand. We show that their conjecture implies Goemans' conjecture (with a violation of twice the maximum demand). To this end, we generalize a technique of Linhares and Swamy, used to obtain a low-cost chain-constrained spanning tree from an algorithm without cost guarantees. Whereas Linhares and Swamy's proof relies on Langrangian duality, we provide a very simple elementary proof of a generalized version, which we hope to be of independent interest. Moreover, we show how this technique can also be used in the context of the weighted ring loading problem, showing that cost-unaware approximation algorithms can be transformed into approximation algorithms with additional cost guarantees.

Hardness of Approximation for Shortest Path with Vector Costs

from arXiv: Data Structures and Algorithms

Authors: Charlie Carlson, Yury Makarychev, Ron Mosenzon

We obtain hardness of approximation results for the $\ell_p$-Shortest Path problem, a variant of the classic Shortest Path problem with vector costs. For every integer $p \in [2,\infty)$, we show a hardness of $\Omega(p(\log n / \log^2\log n)^{1-1/p})$ for both polynomial- and quasi-polynomial-time approximation algorithms. This nearly matches the approximation factor of $O(p(\log n / \log\log n)^{1-1/p})$ achieved by a quasi-polynomial-time algorithm of Makarychev, Ovsiankin, and Tani (ICALP 2025). No hardness of approximation results were previously known for any $p < \infty$. We also present results for the case where $p$ is a function of $n$. For $p = \infty$, we establish a hardness of $\tilde\Omega(\log^2 n)$, improving upon the previous $\tilde\Omega(\log n)$ hardness result. Our result nearly matches the $O(\log^2 n)$ approximation guarantee of the quasi-polynomial-time algorithm by Li, Xu, and Zhang (ICALP 2025). Finally, we present asymptotic bounds on higher-order Bell numbers, which might be of independent interest.

Authors: Charlie Carlson, Yury Makarychev, Ron Mosenzon

We obtain hardness of approximation results for the $\ell_p$-Shortest Path problem, a variant of the classic Shortest Path problem with vector costs. For every integer $p \in [2,\infty)$, we show a hardness of $\Omega(p(\log n / \log^2\log n)^{1-1/p})$ for both polynomial- and quasi-polynomial-time approximation algorithms. This nearly matches the approximation factor of $O(p(\log n / \log\log n)^{1-1/p})$ achieved by a quasi-polynomial-time algorithm of Makarychev, Ovsiankin, and Tani (ICALP 2025). No hardness of approximation results were previously known for any $p < \infty$. We also present results for the case where $p$ is a function of $n$. For $p = \infty$, we establish a hardness of $\tilde\Omega(\log^2 n)$, improving upon the previous $\tilde\Omega(\log n)$ hardness result. Our result nearly matches the $O(\log^2 n)$ approximation guarantee of the quasi-polynomial-time algorithm by Li, Xu, and Zhang (ICALP 2025). Finally, we present asymptotic bounds on higher-order Bell numbers, which might be of independent interest.

Online Multi-Class Selection with Group Fairness Guarantee

from arXiv: Data Structures and Algorithms

Authors: Faraz Zargari, Hossein Nekouyan, Lyndon Hallett, Bo Sun, Xiaoqi Tan

We study the online multi-class selection problem with group fairness guarantees, where limited resources must be allocated to sequentially arriving agents. Our work addresses two key limitations in the existing literature. First, we introduce a novel lossless rounding scheme that ensures the integral algorithm achieves the same expected performance as any fractional solution. Second, we explicitly address the challenges introduced by agents who belong to multiple classes. To this end, we develop a randomized algorithm based on a relax-and-round framework. The algorithm first computes a fractional solution using a resource reservation approach -- referred to as the set-aside mechanism -- to enforce fairness across classes. The subsequent rounding step preserves these fairness guarantees without degrading performance. Additionally, we propose a learning-augmented variant that incorporates untrusted machine-learned predictions to better balance fairness and efficiency in practical settings.

Authors: Faraz Zargari, Hossein Nekouyan, Lyndon Hallett, Bo Sun, Xiaoqi Tan

We study the online multi-class selection problem with group fairness guarantees, where limited resources must be allocated to sequentially arriving agents. Our work addresses two key limitations in the existing literature. First, we introduce a novel lossless rounding scheme that ensures the integral algorithm achieves the same expected performance as any fractional solution. Second, we explicitly address the challenges introduced by agents who belong to multiple classes. To this end, we develop a randomized algorithm based on a relax-and-round framework. The algorithm first computes a fractional solution using a resource reservation approach -- referred to as the set-aside mechanism -- to enforce fairness across classes. The subsequent rounding step preserves these fairness guarantees without degrading performance. Additionally, we propose a learning-augmented variant that incorporates untrusted machine-learned predictions to better balance fairness and efficiency in practical settings.

Sunday, October 26

TR25-158 | Probabilistic Guarantees to Explicit Constructions: Local Properties of Linear Codes | Fernando Jeronimo, Nikhil Shagrithaya

from ECCC Papers

We present a general framework for derandomizing random linear codes with respect to a broad class of permutation-invariant properties, known as local properties, which encompass several standard notions such as distance, list-decoding, list-recovery, and perfect hashing. Our approach extends the classical Alon-Edmonds-Luby (AEL) construction through a modified formalism of local coordinate-wise linear (LCL) properties, introduced by Levi, Mosheiff, and Shagrithaya (2025). The main theorem demonstrates that if random linear codes satisfy the complement of an LCL property $\mathcal{P}$ with high probability, then one can construct explicit codes satisfying the complement of $\mathcal{P}$ as well, with an enlarged yet constant alphabet size. This gives the first explicit constructions for list recovery, as well as special cases (e.g., list recovery with erasures, zero-error list recovery, perfect hash matrices), with parameters matching those of random linear codes. More broadly, our constructions realize the full range of parameters associated with these properties at the same level of optimality as in the random setting, thereby offering a systematic pathway from probabilistic guarantees to explicit codes that attain them. Furthermore, our derandomization of random linear codes also admits efficient (list) decoding via recently developed expander-based decoders.

We present a general framework for derandomizing random linear codes with respect to a broad class of permutation-invariant properties, known as local properties, which encompass several standard notions such as distance, list-decoding, list-recovery, and perfect hashing. Our approach extends the classical Alon-Edmonds-Luby (AEL) construction through a modified formalism of local coordinate-wise linear (LCL) properties, introduced by Levi, Mosheiff, and Shagrithaya (2025). The main theorem demonstrates that if random linear codes satisfy the complement of an LCL property $\mathcal{P}$ with high probability, then one can construct explicit codes satisfying the complement of $\mathcal{P}$ as well, with an enlarged yet constant alphabet size. This gives the first explicit constructions for list recovery, as well as special cases (e.g., list recovery with erasures, zero-error list recovery, perfect hash matrices), with parameters matching those of random linear codes. More broadly, our constructions realize the full range of parameters associated with these properties at the same level of optimality as in the random setting, thereby offering a systematic pathway from probabilistic guarantees to explicit codes that attain them. Furthermore, our derandomization of random linear codes also admits efficient (list) decoding via recently developed expander-based decoders.

TR25-157 | Recovery Reductions, Conjectures, and Barriers | Tejas Nareddy, Abhishek Mishra

from ECCC Papers

We introduce and initiate the study of a new model of reductions called the random noise model. In this model, the truth table $T_f$ of the function $f$ is corrupted on a randomly chosen $\delta$-fraction of instances. A randomized algorithm $\mathcal{A}$ is a $\left(t, \delta, 1-\varepsilon\right)$-recovery reduction for $f$ if: 1. With probability $1-\varepsilon$ over the choice of $\delta$-fraction corruptions, given access to the corrupted truth table, the algorithm $\mathcal{A}$ computes $f(\phi)$ correctly with probability at least $2/3$ on every input $\phi$. 2. The algorithm $\mathcal{A}$ runs in time $O(t)$. This model, a natural relaxation of average-case complexity, has practical motivations and is mathematically interesting. Pointing towards this, we show the existence of robust deterministic polynomial-time recovery reductions with optimal parameters up to polynomial factors (that is, deterministic $\left( poly(n), 0.5 - 1/poly(n), 1-e^{-\Omega(poly(n))} \right)$-recovery reductions) for a large function class SLNP$^S$ containing many of the canonical NP-complete problems - SAT, $k$SAT, $k$CSP, CLIQUE and more. As a corollary, we obtain that the barrier of Bogdanov and Trevisan (2006) for non-adaptive worst-case to average-case reductions does not apply to our mild non-adaptive relaxation. Furthermore, we establish recovery reductions with optimal parameters for Orthogonal Vectors and Parity $k$-Clique problems. These problems exhibit structural similarities to NP-complete problems, with Orthogonal Vectors admitting a $2^{0.5n}$-time reduction from $k$SAT on $n$ variables; and Parity $k$-Clique a subexponential-time reduction from $3$SAT.

We introduce and initiate the study of a new model of reductions called the random noise model. In this model, the truth table $T_f$ of the function $f$ is corrupted on a randomly chosen $\delta$-fraction of instances. A randomized algorithm $\mathcal{A}$ is a $\left(t, \delta, 1-\varepsilon\right)$-recovery reduction for $f$ if: 1. With probability $1-\varepsilon$ over the choice of $\delta$-fraction corruptions, given access to the corrupted truth table, the algorithm $\mathcal{A}$ computes $f(\phi)$ correctly with probability at least $2/3$ on every input $\phi$. 2. The algorithm $\mathcal{A}$ runs in time $O(t)$. This model, a natural relaxation of average-case complexity, has practical motivations and is mathematically interesting. Pointing towards this, we show the existence of robust deterministic polynomial-time recovery reductions with optimal parameters up to polynomial factors (that is, deterministic $\left( poly(n), 0.5 - 1/poly(n), 1-e^{-\Omega(poly(n))} \right)$-recovery reductions) for a large function class SLNP$^S$ containing many of the canonical NP-complete problems - SAT, $k$SAT, $k$CSP, CLIQUE and more. As a corollary, we obtain that the barrier of Bogdanov and Trevisan (2006) for non-adaptive worst-case to average-case reductions does not apply to our mild non-adaptive relaxation. Furthermore, we establish recovery reductions with optimal parameters for Orthogonal Vectors and Parity $k$-Clique problems. These problems exhibit structural similarities to NP-complete problems, with Orthogonal Vectors admitting a $2^{0.5n}$-time reduction from $k$SAT on $n$ variables; and Parity $k$-Clique a subexponential-time reduction from $3$SAT.

Saturday, October 25

The hyperbolic spiral as a trisectrix

from David Eppstein

It is, of course, impossible to subdivide an arbitrary angle into equal thirds using a compass and straightedge alone. But there are many known constructions that rely on other techniques. One of those techniques is the use of a trisectrix, a curve of a special form that is assumed to be already drawn for you, so that with your compass and straightedge you can find its crossing points with lines and circles. More strongly, some curves can be used as a “sectrix” to subdivide an angle into any given number of equal parts; these include the Archimedean spiral, quadratrix of Hippias, and sine curve.

It is, of course, impossible to subdivide an arbitrary angle into equal thirds using a compass and straightedge alone. But there are many known constructions that rely on other techniques. One of those techniques is the use of a trisectrix, a curve of a special form that is assumed to be already drawn for you, so that with your compass and straightedge you can find its crossing points with lines and circles. More strongly, some curves can be used as a “sectrix” to subdivide an angle into any given number of equal parts; these include the Archimedean spiral, quadratrix of Hippias, and sine curve.

The hyperbolic spiral is closely related to all three of these sectrix curves. It is what you get when you perform a circle inversion of the Archimedean spiral, centered at the center of the spiral. The sine curve is what you get from a helix (the curve of a spiral staircase) by perpendicular projection onto a plane parallel to the axis of the helix, the quadratrix is what you get by slicing the helix by an inclined plane and then projecting it onto a plane perpendicular to the axis, and the hyperbolic spiral is what you get from a perspective projection of the helix onto a plane perpendicular to the axis. Intuitively, it’s what you see when you look straight up from a point centered at the bottom of a spiral staircase.

Stairs by Czech architect Jan Blažej Santini-Aichel (1677-1723), Church of the Assumption of Our Lady and Saint John the Baptist, Kutná Hora, Czech Republic. CC-BY 2.0 licensed image by Brad Hammonds, March 24, 2013, from https://commons.wikimedia.org/wiki/File:Golden_Spiral_by_Brad_Hammonds.jpg

Because of its close connection to these curves, it should not be a surprise that a hyperbolic spiral can also be a sectrix, but I could not find a reference saying so. Here’s a construction:

Angle trisection using a hyperbolic spiral

The pre-given parts of the diagram are the red hyperbolic spiral, shown with its center point \(O\), its asymptotic line \(AB\) (the lower horizontal line), another horizontal line \(OC\) parallel to the asymptotic line through the center of the spiral, and two perpendicular lines \(AO\) and \(BC\). The horizontal offset between the vertical lines \(AO\) and \(BC\) doesn’t matter for this construction.

Given your angle to be trisected (or \(n\)-sected), place it as \(POS\) at the center of your given hyperbolic spiral. Construct circles with radii \(OP\) and \(OS\) (the inner and outer circles of the figure), and intersect them with line \(OC\) at \(P'\) and \(S'\). Extend the lines \(AP'\) and \(AS'\) to cross line \(BC\) at \(P''\) and \(S''\). Subdivide line segment \(P''S''\) into as many equal-length pieces as desired; here I’ve drawn three pieces, separated at \(Q''\) and \(R''\). Then reverse the construction for \(Q''\) and \(R''\): extend the lines \(AQ''\) and \(AR''\) to cross \(OC\) at \(Q'\) and \(R'\), draw circles with radii \(OQ'\) and \(OR'\), and find the crossing points \(Q\) and \(R\) of these circles with the spiral. The result is that angle \(POS\) is subdivided into three equal angles, \(POQ\), \(QOR\), and \(ROS\).

Why does this work? Let’s go back to the spiral staircase, and the fact that the hyperbolic spiral is its perspective projection. Here’s a view of this taken from Wikipedia, drawn looking down from the top of the spiral staircase rather than up from the bottom:

The hyperbolic spiral as a projection of a helix. CC-BY-SA 4.0 licensed image by Ag2gaeh, 10 August 2019, from https://commons.wikimedia.org/wiki/File:Schraublinie-hyp-spirale.svg

The right side of the trisection diagram can be thought of as your view up the staircase, with points \(PQRS\) marking four equally-spaced points along the staircase, marking out angles with respect to the center axis of the staircase that are equal from your perspective at the bottom of this axis. Because it’s a spiral staircase, you know that these four points are actually at equally spaced heights with respect to each other, although your view does not let you see these heights. If the staircase winds up the side of a cylindrical tower, the circles of the diagram represent your projected view of circular cross-sections of this tower (such as one of its floors).

The left side of the diagram can be thought of as a cross-section of the staircase as seen from the side. From this view, the staircase itself would appear to be a vertical sine curve, but I left this out of the diagram because it’s not used in the trisection. In this side-on view, point \(A\) is the bottom center point of the staircase (from which you were looking up), and line \(OC\) is a cross-section of the horizontal plane onto which your upward view of the staircase was projected. As seen in a side-on cross-section, the cylindrical tower looks like a rectangle with two vertical sides; vertical line \(BC\) is one of these two sides, and can be thought of as a vertical line on the cylinder itself. Points \(P''\), \(Q''\), \(R''\), and \(S''\) are the points on this vertical line where it crosses the cross-sections of the cylinder that you see are circles. The lines from \(A\) through these four points project them onto the viewing plane seen in cross-section as \(OC\). In short, the left side of the diagram depicts the projection of vertical line \(BC\) from the cylinder onto the viewing plane \(OC\) of the hyperbolic spiral. Through this construction, we correspond the (unprimed) points on the staircase, the (single prime) points in your projected view of the leftmost point of each horizontal cross-section circle, and the (double prime) points, at their actual height on the cylinder in its side-on cross-section view.

Equal angles, on the right side, correspond to equal heights, on the left side. Therefore, a subdivision of an angle into equal parts, on the left, corresponds to a subdivision of a line segment into equal parts, on the right. And subdivisions of a line segment into equal parts are easy to construct.

(Discuss on Mastodon)

By David Eppstein

STOC 2026 Experimental Program Announcement

from Theory Matters

For all authors preparing submissions for this year’s STOC, the conference is running a new experimental program to provide automated pre-submission feedback. We are offering authors the opportunity to opt-in and receive a review of their paper generated by an advanced LLM-based tool (built on Google’s Gemini model) that is optimized for checking mathematical rigor. […]

For all authors preparing submissions for this year’s STOC, the conference is running a new experimental program to provide automated pre-submission feedback.

We are offering authors the opportunity to opt-in and receive a review of their paper generated by an advanced LLM-based tool (built on Google’s Gemini model) that is optimized for checking mathematical rigor. The goal is to provide constructive suggestions and help find technical mistakes before the final submission deadline.

Here are the key details:

What it is: An optional, opt-in program to get an automated review of your STOC submission.

Deadline: To participate, you must opt-in and submit your full paper on HotCRP by November 1, 5pm EST.

Confidentiality (from PC): The reviews generated WILL NOT be passed on to the PC. They are only visible to the authors and the program organizers.

Data Privacy (Our Commitment): We commit to “No Logging.” Your paper will not be logged, stored, or used for training.

Please do not publicly share these reviews without contacting the organizing team first.

Please note that the tool is optimized to check a paper’s self-contained mathematical rigor. Because it does not possess external, area-specific knowledge (such as folklore results), it may flag sections that rely on unstated assumptions. We hope you find this feedback useful for improving your paper’s overall clarity and completeness.

This experiment is conducted by PC members David Woodruff (CMU) and Rajesh Jayaram (Google), as well as Vincent Cohen-Addad (Google) and Jon Schneider (Google).

Your participation and feedback will help us assess the value of such tools for future theory conferences.

To opt-in, please check the box on the HotCRP submission form for your paper. Full details on the program are available at the link below.

Please see the STOC 2026 Call for Papers here:

https://acm-stoc.org/stoc2026/stoc2026-cfp.html

and specific details on the experiment here:

https://acm-stoc.org/stoc2026/stoc2026-LLM_feedback.html

By dpwoodru

Friday, October 24

Simons Fall 2026 Program on Pseudorandomness & High-Dimensional Expansion

from CS Theory Events

August 24 – December 18, 2026 Simons Institute, Berkeley CA simons.berkeley.edu/programs/pseudorandomness-high-dimensional-expansion Submission deadline: November 15, 2025 We are organizing a semester program on Pseudorandomness and High-Dimensional Expansion at the Simons Institute in Fall 2026 (in parallel with a program on Spectral Theory Beyond Graphs). Junior researchers (PhD after Summer 2021) can apply for a Simons … Continue reading Simons Fall 2026 Program on Pseudorandomness & High-Dimensional Expansion

By shacharlovett

August 24 – December 18, 2026 Simons Institute, Berkeley CA https://simons.berkeley.edu/programs/pseudorandomness-high-dimensional-expansion Submission deadline: November 15, 2025 We are organizing a semester program on Pseudorandomness and High-Dimensional Expansion at the Simons Institute in Fall 2026 (in parallel with a program on Spectral Theory Beyond Graphs). Junior researchers (PhD after Summer 2021) can apply for a Simons … Continue reading Simons Fall 2026 Program on Pseudorandomness & High-Dimensional Expansion

By shacharlovett

Lore Laundering Machines

from Ben Recht

When insight comes from willful forgetting.

Last weekend, researchers from OpenAI took the hyperbolic overpromotion of their chatbots a bit too far, only to be embarrassed by an ensuing pile-on. They claimed their software had “found” solutions to “open” math problems, only for it later to be revealed that “found” did not mean solved, and “open” did not mean unsolved.

The party started when researcher Mark Sellke announced that he and a collaborator had used ChatGPT to find the solutions to ten open “Erdős” problems. These problems were named after Paul Erdős, a legendary Hungarian mathematician who both closed and opened problems prolifically (likely because he was famously always high on amphetamines). Erdős’ problems were usually closer to tricky math puzzles than long-standing conjectures like the twin prime conjecture or the Riemann Hypothesis. Nonetheless, they gave mathematicians plenty to think about, and working on them pushed the boundaries of number theory, probability, and combinatorics. Erdős often sweetened the pot by offering bounties of varying sizes for those who could find “book proofs” of his conjectures.

Erdős posed so many problems that it was impossible to keep track of them. In 2024, almost thirty years after Erdős’ passing, mathematician Thomas Bloom created a website to catalog Erdős problems and their solutions. Since the original problems had been so numerous and poorly cataloged, putting together such a webpage was an arduous passion project. Sellke used ChatGPT to find papers containing ten solutions unknown to Bloom. All the papers were published, but this literature search helped Bloom update his website. To be clear here: the working definition of “open problem” was “unknown to the maintainer of erdosproblems.com.”

It’s still very neat. ChatGPT seems quite proficient at semantic search of mathematics and can now find somewhat obscure results in an accelerating expanding universe of technical papers. Of course, it’s worth noting that these results aren’t always that obscure. One Sellke’s found solutions appeared in a paper entitled “A proof of two Erdos conjectures on restricted addition and further results,” published in 2003 in the respectable and well-known Crelle’s Journal.

Now, of course, we can’t just have a neat and fun story. Since ChatGPT was involved, someone would inevitably have to spin this into an argument of how AGI has arrived. Enter OpenAI employee Sebastien Bubeck, who has been breathlessly heralding the Sparks of AGI for three years now. Bubeck took to Twitter to proclaim: “Science Acceleration has officially begun: two researchers found the solution to 10 Erdos problems over the weekend with help from gpt-5.” Kevin Weil, the Vice President of Open Science at OpenAI (god, that is straight out of Orwell), tweeted “GPT-5 found solutions to 10 (!) previously unsolved Erdős problems and made progress on 11 others. These have all been open for decades.”

A shitstorm ensued. People understandably thought these OpenAI employees were saying that ChatGPT had solved challenging mathematical conjectures. If all you saw were these tweets, you could be forgiven for thinking that their software had gone from pissing off the organizers of the Math Olympiad to solving deep, important math problems in a matter of months.

Fortunately, despite all evidence to the contrary, you can still get piled on for overhyping AI on Twitter, and all of the competing labs pulled out the knives. Turing Award Winner and longtime Facebook employee Yann Lecun proclaimed, “Hoisted by their own GPTards.” Even Nobel Laureate and Alphabet executive Demis Hassabis chimed in to tell Bubeck, “This is embarrassing.”

To be fair, it’s Twitter. Embarrassment is part of the entry fee.

There’s an interesting psychology at play here, one which the protagonist Sebastien Bubeck has been demonstrating for several years. I’m not singling Seb out to further pile on. Instead, I’d like to use him as a cautionary tale of a too common behavior with generative AI: motivated forgetting.

Bubeck has consistently been telling us that ChatGPT is doing things that are not in the training data for years now. A key example occurred in a “debate” at the Simons Institute last October, where he breathlessly announced that ChatGPT was accelerating his mathematical work. Let me post his story in its entirety so it can speak for itself. I’ve edited the following for clarity, but you can find his exact words at 28:40 in this video.

“There is going to be a phase where before the AI solves a problem on its own, it’s going to collaborate with all of you guys. This is for sure. I don’t think it’s up for debate.

“I just want to share a personal experience that I had myself just in the last few weeks with o1 or o1 plus epsilon on a research problem. I know it is not out there because it’s a paper I have written. It’s a draft in my Dropbox, and I just didn’t publish it yet. So I know that it’s not in the training data. And you will see it’s a very fun question. It’s just about how long the gradient flow of a convex function can be.

“I asked this question to this o1 model. It thought for a long time, and then it made a connection with what’s called self-contracted curves, which are not necessarily directly linked to this. And it explained to me why it thinks that this would be a good idea to connect those two things. It gave me the bounds that exist in the literature, et cetera.

“When I was working on that problem, it took me three days to make that connection. So at the very least, it would have accelerated me by three days. And now you take this very basic current model. Take this next year. I have other anecdotal evidence of people—top people who are maybe sitting in this room, who were stuck on some lemma, asked o1, and o1 made the connection and helped them solve this problem. So this is not in the future. This is not hypothetical. It is happening right now. It’s the same story as with the medical diagnosis. It’s the same story in every field right now. AI is, at the very least, going to be almost an equal with us.”

Now, here’s the thing. Just like Thomas Bloom didn’t know about the solution of problem 339, even though there was a paper that said in its title that it was solving problem 339, there were already many papers that examined the length of gradient flow trajectories using self-contracted curves. For example, the 2021 JMLR paper “Path Length Bounds for Gradient Descent and Flow” by Chirag Gupta and collaborators builds on an extensive body of work in the space. I mention Gupta’s paper in particular because it has a solid related work section surveying the history of using this theory for this application, tracing it back to at least 1991. I also bring up this particular paper because of the following line from the acknowledgments:

“We thank Sebastien Bubeck for pointing us to some references on self-contracted curves.”

Hmm.

LLMs compress and rearticulate an impossible amount of information. No one knows what’s in the training data, even the employees of OpenAI. And so it can surprise us by revealing coherent descriptions that look like a profound discovery, when it’s merely a reference. LLMs let us search all the libraries in the world, but don’t return citations unless we repeatedly prompt for them.

Chatbots’ core capability is laundering the esoteric into insight. This could be through mimicking human dialogue as a companion agent. This could be through pulling a math proof from one of the gazillion math papers out there that you didn’t read, without giving proper attribution. These are the same thing. LLMs are Lore Laundering Machines.

But if we fetishize LLMs as machine gods, then motivated reasoning demands we see them as machine gods, even if this means completely discounting our own intelligence. If you want to believe that AGI is here, it helps to become willfully forgetful of what you already know.

Subscribe now

By Ben Recht