Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Friday, December 12

Further Statistical Study of NISQ Experiments

from arXiv: Computational Complexity

Authors: Gil Kalai, Tomer Shoham, Carsten Voelkmann

We revisit and extend some topics that we studied in our previous works (Rinott, Kalai and Shoham 2022; Kalai, Rinott and Shoham, 2023,2024) regarding the Google 2019 "quantum supremacy" experiment. We extend our analysis of the prediction based on Google's digital error model (Formula (77)), based on more detailed data provided by Google. We also provide some preliminary analysis for a few other NISQ experiments.

Authors: Gil Kalai, Tomer Shoham, Carsten Voelkmann

We revisit and extend some topics that we studied in our previous works (Rinott, Kalai and Shoham 2022; Kalai, Rinott and Shoham, 2023,2024) regarding the Google 2019 "quantum supremacy" experiment. We extend our analysis of the prediction based on Google's digital error model (Formula (77)), based on more detailed data provided by Google. We also provide some preliminary analysis for a few other NISQ experiments.

From Alternation to FPRAS: Toward a Complexity Classification of Approximate Counting

from arXiv: Computational Complexity

Authors: Markus Hecher, Matthias Lanzinger

Counting problems are fundamental across mathematics and computer science. Among the most subtle are those whose associated decision problem is solvable in polynomial time, yet whose exact counting version appears intractable. For some such problems, however, one can still obtain efficient randomized approximation in the form of a fully polynomial randomized approximation scheme (FPRAS). Existing proofs of FPRAS existence are often highly technical and problem-specific, offering limited insight into a more systematic complexity-theoretic account of approximability. In this work, we propose a machine-based framework for establishing the existence of an FPRAS beyond previous uniform criteria. Our starting point is alternating computation: we introduce a counting model obtained by equipping alternating Turing machines with a transducer-style output mechanism, and we use it to define a corresponding counting class spanALP. We show that every problem in spanALP admits an FPRAS, yielding a reusable sufficient condition that can be applied via reductions to alternating logspace, polynomial-time computation with output. We situate spanALP in the counting complexity landscape as strictly between #L and TotP (assuming RP $\neq$ NP) and observe interesting conceptual and technical gaps in the current machinery counting complexity. Moreover, as an illustrative application, we obtain an FPRAS for counting answers to counting the answers Dyck-constrained path queries in edge-labeled graphs, i.e., counting the number of distinct labelings realized by s-t walks whose label sequence is well-formed with respect to a Dyck-like language. To our knowledge, no FPRAS was previously known for this setting. We expect the alternating-transducer characterization to provide a broadly applicable tool for establishing FPRAS existence for further counting problems.

Authors: Markus Hecher, Matthias Lanzinger

Counting problems are fundamental across mathematics and computer science. Among the most subtle are those whose associated decision problem is solvable in polynomial time, yet whose exact counting version appears intractable. For some such problems, however, one can still obtain efficient randomized approximation in the form of a fully polynomial randomized approximation scheme (FPRAS). Existing proofs of FPRAS existence are often highly technical and problem-specific, offering limited insight into a more systematic complexity-theoretic account of approximability. In this work, we propose a machine-based framework for establishing the existence of an FPRAS beyond previous uniform criteria. Our starting point is alternating computation: we introduce a counting model obtained by equipping alternating Turing machines with a transducer-style output mechanism, and we use it to define a corresponding counting class spanALP. We show that every problem in spanALP admits an FPRAS, yielding a reusable sufficient condition that can be applied via reductions to alternating logspace, polynomial-time computation with output. We situate spanALP in the counting complexity landscape as strictly between #L and TotP (assuming RP $\neq$ NP) and observe interesting conceptual and technical gaps in the current machinery counting complexity. Moreover, as an illustrative application, we obtain an FPRAS for counting answers to counting the answers Dyck-constrained path queries in edge-labeled graphs, i.e., counting the number of distinct labelings realized by s-t walks whose label sequence is well-formed with respect to a Dyck-like language. To our knowledge, no FPRAS was previously known for this setting. We expect the alternating-transducer characterization to provide a broadly applicable tool for establishing FPRAS existence for further counting problems.

Improved Small Set Expansion in High Dimensional Expanders

from arXiv: Computational Complexity

Authors: Tali Kaufman, David Mass

Small set expansion in high dimensional expanders is of great importance, e.g., towards proving cosystolic expansion, local testability of codes and constructions of good quantum codes. In this work we improve upon the state of the art results of small set expansion in high dimensional expanders. Our improvement is either on the expansion quality or on the size of sets for which expansion is guaranteed. One line of previous works [KM22, DD24] has obtained weak expansion for small sets, which is sufficient for deducing cosystolic expansion of one dimension below. We improve upon their result by showing strong expansion for small sets. Another line of works [KKL14, EK16, KM21] has shown strong expansion for small sets. However, they obtain it only for very small sets. We get an exponential improvement on the size of sets for which expansion is guaranteed by these prior works. Interestingly, our result is obtained by bridging between these two lines of works. The works of [KM22, DD24] use global averaging operators in order to obtain expansion for larger sets. However, their method could be utilized only on sets that are cocycle-like. We show how to combine these global averaging operators with ideas from the so-called ``fat machinery'' of [KKL14, EK16, KM21] in order to apply them for general sets.

Authors: Tali Kaufman, David Mass

Small set expansion in high dimensional expanders is of great importance, e.g., towards proving cosystolic expansion, local testability of codes and constructions of good quantum codes. In this work we improve upon the state of the art results of small set expansion in high dimensional expanders. Our improvement is either on the expansion quality or on the size of sets for which expansion is guaranteed. One line of previous works [KM22, DD24] has obtained weak expansion for small sets, which is sufficient for deducing cosystolic expansion of one dimension below. We improve upon their result by showing strong expansion for small sets. Another line of works [KKL14, EK16, KM21] has shown strong expansion for small sets. However, they obtain it only for very small sets. We get an exponential improvement on the size of sets for which expansion is guaranteed by these prior works. Interestingly, our result is obtained by bridging between these two lines of works. The works of [KM22, DD24] use global averaging operators in order to obtain expansion for larger sets. However, their method could be utilized only on sets that are cocycle-like. We show how to combine these global averaging operators with ideas from the so-called ``fat machinery'' of [KKL14, EK16, KM21] in order to apply them for general sets.

Noisy Quantum Learning Theory

from arXiv: Computational Complexity

Authors: Jordan Cotler, Weiyuan Gong, Ishaan Kannan

We develop a framework for learning from noisy quantum experiments, focusing on fault-tolerant devices accessing uncharacterized systems through noisy couplings. Our starting point is the complexity class $\textsf{NBQP}$ ("noisy BQP"), modeling noisy fault-tolerant quantum computers that cannot, in general, error-correct the oracle systems they query. Using this class, we show that for natural oracle problems, noise can eliminate exponential quantum learning advantages of ideal noiseless learners while preserving a superpolynomial gap between NISQ and fault-tolerant devices. Beyond oracle separations, we study concrete noisy learning tasks. For purity testing, the exponential two-copy advantage collapses under a single application of local depolarizing noise. Nevertheless, we identify a setting motivated by AdS/CFT in which noise-resilient structure restores a quantum learning advantage in a noisy regime. We then analyze noisy Pauli shadow tomography, deriving lower bounds that characterize how instance size, quantum memory, and noise control sample complexity, and design algorithms with parametrically similar scalings. Together, our results show that the Bell-basis and SWAP-test primitives underlying most exponential quantum learning advantages are fundamentally fragile to noise unless the experimental system has latent noise-robust structure. Thus, realizing meaningful quantum advantages in future experiments will require understanding how noise-robust physical properties interface with available algorithmic techniques.

Authors: Jordan Cotler, Weiyuan Gong, Ishaan Kannan

We develop a framework for learning from noisy quantum experiments, focusing on fault-tolerant devices accessing uncharacterized systems through noisy couplings. Our starting point is the complexity class $\textsf{NBQP}$ ("noisy BQP"), modeling noisy fault-tolerant quantum computers that cannot, in general, error-correct the oracle systems they query. Using this class, we show that for natural oracle problems, noise can eliminate exponential quantum learning advantages of ideal noiseless learners while preserving a superpolynomial gap between NISQ and fault-tolerant devices. Beyond oracle separations, we study concrete noisy learning tasks. For purity testing, the exponential two-copy advantage collapses under a single application of local depolarizing noise. Nevertheless, we identify a setting motivated by AdS/CFT in which noise-resilient structure restores a quantum learning advantage in a noisy regime. We then analyze noisy Pauli shadow tomography, deriving lower bounds that characterize how instance size, quantum memory, and noise control sample complexity, and design algorithms with parametrically similar scalings. Together, our results show that the Bell-basis and SWAP-test primitives underlying most exponential quantum learning advantages are fundamentally fragile to noise unless the experimental system has latent noise-robust structure. Thus, realizing meaningful quantum advantages in future experiments will require understanding how noise-robust physical properties interface with available algorithmic techniques.

Approximating Euclidean Shallow-Light Trees

from arXiv: Computational Geometry

Authors: Hung Le, Shay Solomon, Cuong Than, Csaba D. Tóth, Tianyi Zhang

For a weighted graph $G = (V, E, w)$ and a designated source vertex $s \in V$, a spanning tree that simultaneously approximates a shortest-path tree w.r.t. source $s$ and a minimum spanning tree is called a shallow-light tree (SLT). Specifically, an $(α, β)$-SLT of $G$ w.r.t. $s \in V$ is a spanning tree of $G$ with root-stretch $α$ (preserving all distances between $s$ and the other vertices up to a factor of $α$) and lightness $β$ (its weight is at most $β$ times the weight of a minimum spanning tree of $G$). Despite the large body of work on SLTs, the basic question of whether a better approximation algorithm exists was left untouched to date, and this holds in any graph family. This paper makes a first nontrivial step towards this question by presenting two bicriteria approximation algorithms. For any $ε>0$, a set $P$ of $n$ points in constant-dimensional Euclidean space and a source $s\in P$, our first (respectively, second) algorithm returns, in $O(n \log n \cdot {\rm polylog}(1/ε))$ time, a non-Steiner (resp., Steiner) tree with root-stretch $1+O(ε\log ε^{-1})$ and weight at most $O(\mathrm{opt}_ε\cdot \log^2 ε^{-1})$ (resp., $O(\mathrm{opt}_ε\cdot \log ε^{-1})$), where $\mathrm{opt}_ε$ denotes the minimum weight of a non-Steiner (resp., Steiner) tree with root-stretch $1+ε$.

Authors: Hung Le, Shay Solomon, Cuong Than, Csaba D. Tóth, Tianyi Zhang

For a weighted graph $G = (V, E, w)$ and a designated source vertex $s \in V$, a spanning tree that simultaneously approximates a shortest-path tree w.r.t. source $s$ and a minimum spanning tree is called a shallow-light tree (SLT). Specifically, an $(α, β)$-SLT of $G$ w.r.t. $s \in V$ is a spanning tree of $G$ with root-stretch $α$ (preserving all distances between $s$ and the other vertices up to a factor of $α$) and lightness $β$ (its weight is at most $β$ times the weight of a minimum spanning tree of $G$). Despite the large body of work on SLTs, the basic question of whether a better approximation algorithm exists was left untouched to date, and this holds in any graph family. This paper makes a first nontrivial step towards this question by presenting two bicriteria approximation algorithms. For any $ε>0$, a set $P$ of $n$ points in constant-dimensional Euclidean space and a source $s\in P$, our first (respectively, second) algorithm returns, in $O(n \log n \cdot {\rm polylog}(1/ε))$ time, a non-Steiner (resp., Steiner) tree with root-stretch $1+O(ε\log ε^{-1})$ and weight at most $O(\mathrm{opt}_ε\cdot \log^2 ε^{-1})$ (resp., $O(\mathrm{opt}_ε\cdot \log ε^{-1})$), where $\mathrm{opt}_ε$ denotes the minimum weight of a non-Steiner (resp., Steiner) tree with root-stretch $1+ε$.

Quantifying displacement: a gentrification's consequence via persistent homology

from arXiv: Computational Geometry

Authors: Rita Rodríguez Vázquez, Manuel Cuerno

Gentrification is the process by which wealthier individuals move into a previously lower-income neighbourhood. Among the effects of this multi-faceted phenomenon are rising living costs, cultural and social changes-where local traditions, businesses, and community networks are replaced or diluted by new, more affluent lifestyles-and population displacement, where long-term, lower-income residents are priced out by rising rents and property taxes. Despite its relevance, quantifying displacement presents difficulties stemming from lack of information on motives for relocation and from the fact that a long time-span must be analysed: displacement is a gradual process (leases end or conditions change at different times), impossible to capture in one data snapshot. We introduce a novel tool to overcome these difficulties. Using only publicly available address change data, we construct four cubical complexes which simultaneously incorporate geographical and temporal information of people moving, and then analyse them building on Topological Data Analysis tools. Finally, we demonstrate the potential of this method through a 20-year case study of Madrid, Spain. The results reveal its ability to capture population displacement and to identify the specific neighbourhoods and years affected--patterns that cannot be inferred from raw address change data.

Authors: Rita Rodríguez Vázquez, Manuel Cuerno

Gentrification is the process by which wealthier individuals move into a previously lower-income neighbourhood. Among the effects of this multi-faceted phenomenon are rising living costs, cultural and social changes-where local traditions, businesses, and community networks are replaced or diluted by new, more affluent lifestyles-and population displacement, where long-term, lower-income residents are priced out by rising rents and property taxes. Despite its relevance, quantifying displacement presents difficulties stemming from lack of information on motives for relocation and from the fact that a long time-span must be analysed: displacement is a gradual process (leases end or conditions change at different times), impossible to capture in one data snapshot. We introduce a novel tool to overcome these difficulties. Using only publicly available address change data, we construct four cubical complexes which simultaneously incorporate geographical and temporal information of people moving, and then analyse them building on Topological Data Analysis tools. Finally, we demonstrate the potential of this method through a 20-year case study of Madrid, Spain. The results reveal its ability to capture population displacement and to identify the specific neighbourhoods and years affected--patterns that cannot be inferred from raw address change data.

A gradient descent algorithm for computing circle patterns

from arXiv: Computational Geometry

Authors: Te Ba, Ze Zhou

This paper presents a new algorithm for generating planar circle patterns. The algorithm employs gradient descent and conjugate gradient method to compute circle radii and centers separately. Compared with existing algorithms, the proposed method is more efficient in computing centers of circles and is applicable for realizing circle patterns with possible obtuse overlap angles.

Authors: Te Ba, Ze Zhou

This paper presents a new algorithm for generating planar circle patterns. The algorithm employs gradient descent and conjugate gradient method to compute circle radii and centers separately. Compared with existing algorithms, the proposed method is more efficient in computing centers of circles and is applicable for realizing circle patterns with possible obtuse overlap angles.

Equivalent Instances for Scheduling and Packing Problems

from arXiv: Data Structures and Algorithms

Authors: Klaus Jansen, Kai Kahler, Corinna Wambsganz

Two instances $(I,k)$ and $(I',k')$ of a parameterized problem $P$ are equivalent if they have the same set of solutions (static equivalent) or if the set of solutions of $(I,k)$ can be constructed by the set of solutions for $(I',k')$ and some computable pre-solutions. If the algorithm constructing such a (static) equivalent instance whose size is polynomial bounded runs in fixed-parameter tractable (FPT) time, we say that there exists a (static) equivalent instance for this problem. In this paper we present (static) equivalent instances for Scheduling and Knapsack problems. We improve the bound for the $\ell_1$-norm of an equivalent vector given by Eisenbrand, Hunkenschröder, Klein, Koutecký, Levin, and Onn and show how this yields equivalent instances for integer linear programs (ILPs) and related problems. We obtain an $O(MN^2\log(NU))$ static equivalent instance for feasibility ILPs where $M$ is the number of constraints, $N$ is the number of variables and $U$ is an upper bound for the $\ell_\infty$-norm of the smallest feasible solution. With this, we get an $O(n^2\log(n))$ static equivalent instance for Knapsack where $n$ is the number of items. Moreover, we give an $O(M^2N\log(NMΔ))$ kernel for feasibility ILPs where $Δ$ is an upper bound for the $\ell_\infty$-norm of the given constraint matrix. Using balancing results by Knop et al., the ConfILP and a proximity result by Eisenbrand and Weismantel we give an $O(d^2\log(p_{\max}))$ equivalent instance for LoadBalancing, a generalization of scheduling problems. Here $d$ is the number of different processing times and $p_{\max}$ is the largest processing time.

Authors: Klaus Jansen, Kai Kahler, Corinna Wambsganz

Two instances $(I,k)$ and $(I',k')$ of a parameterized problem $P$ are equivalent if they have the same set of solutions (static equivalent) or if the set of solutions of $(I,k)$ can be constructed by the set of solutions for $(I',k')$ and some computable pre-solutions. If the algorithm constructing such a (static) equivalent instance whose size is polynomial bounded runs in fixed-parameter tractable (FPT) time, we say that there exists a (static) equivalent instance for this problem. In this paper we present (static) equivalent instances for Scheduling and Knapsack problems. We improve the bound for the $\ell_1$-norm of an equivalent vector given by Eisenbrand, Hunkenschröder, Klein, Koutecký, Levin, and Onn and show how this yields equivalent instances for integer linear programs (ILPs) and related problems. We obtain an $O(MN^2\log(NU))$ static equivalent instance for feasibility ILPs where $M$ is the number of constraints, $N$ is the number of variables and $U$ is an upper bound for the $\ell_\infty$-norm of the smallest feasible solution. With this, we get an $O(n^2\log(n))$ static equivalent instance for Knapsack where $n$ is the number of items. Moreover, we give an $O(M^2N\log(NMΔ))$ kernel for feasibility ILPs where $Δ$ is an upper bound for the $\ell_\infty$-norm of the given constraint matrix. Using balancing results by Knop et al., the ConfILP and a proximity result by Eisenbrand and Weismantel we give an $O(d^2\log(p_{\max}))$ equivalent instance for LoadBalancing, a generalization of scheduling problems. Here $d$ is the number of different processing times and $p_{\max}$ is the largest processing time.

Optimal learning of quantum channels in diamond distance

from arXiv: Data Structures and Algorithms

Authors: Antonio Anna Mele, Lennart Bittel

Quantum process tomography, the task of estimating an unknown quantum channel, is a central problem in quantum information theory and a key primitive for characterising noisy quantum devices. A long-standing open question is to determine the optimal number of uses of an unknown channel required to learn it in diamond distance, the standard measure of worst-case distinguishability between quantum processes. Here we show that a quantum channel acting on a $d$-dimensional system can be estimated to accuracy $\varepsilon$ in diamond distance using $O(d^4/\varepsilon^2)$ channel uses. This scaling is essentially optimal, as it matches lower bounds up to logarithmic factors. Our analysis extends to channels with input and output dimensions $d_{\mathrm{in}}$ and $d_{\mathrm{out}}$ and Kraus rank at most $k$, for which $O(d_{\mathrm{in}} d_{\mathrm{out}} k/\varepsilon^2)$ channel uses suffice, interpolating between unitary and fully generic channels. As by-products, we obtain, to the best of our knowledge, the first essentially optimal strategies for operator-norm learning of binary POVMs and isometries, and we recover optimal trace-distance tomography for fixed-rank states. Our approach consists of using the channel only non-adaptively to prepare copies of the Choi state, purify them in parallel, perform sample-optimal pure-state tomography on the purifications, and analyse the resulting estimator directly in diamond distance via its semidefinite-program characterisation. While the sample complexity of state tomography in trace distance is by now well understood, our results finally settle the corresponding problem for quantum channels in diamond distance.

Authors: Antonio Anna Mele, Lennart Bittel

Quantum process tomography, the task of estimating an unknown quantum channel, is a central problem in quantum information theory and a key primitive for characterising noisy quantum devices. A long-standing open question is to determine the optimal number of uses of an unknown channel required to learn it in diamond distance, the standard measure of worst-case distinguishability between quantum processes. Here we show that a quantum channel acting on a $d$-dimensional system can be estimated to accuracy $\varepsilon$ in diamond distance using $O(d^4/\varepsilon^2)$ channel uses. This scaling is essentially optimal, as it matches lower bounds up to logarithmic factors. Our analysis extends to channels with input and output dimensions $d_{\mathrm{in}}$ and $d_{\mathrm{out}}$ and Kraus rank at most $k$, for which $O(d_{\mathrm{in}} d_{\mathrm{out}} k/\varepsilon^2)$ channel uses suffice, interpolating between unitary and fully generic channels. As by-products, we obtain, to the best of our knowledge, the first essentially optimal strategies for operator-norm learning of binary POVMs and isometries, and we recover optimal trace-distance tomography for fixed-rank states. Our approach consists of using the channel only non-adaptively to prepare copies of the Choi state, purify them in parallel, perform sample-optimal pure-state tomography on the purifications, and analyse the resulting estimator directly in diamond distance via its semidefinite-program characterisation. While the sample complexity of state tomography in trace distance is by now well understood, our results finally settle the corresponding problem for quantum channels in diamond distance.

The Localization Method for High-Dimensional Inequalities

from arXiv: Data Structures and Algorithms

Authors: Yunbum Kook, Santosh S. Vempala

We survey the localization method for proving inequalities in high dimension, pioneered by Lovász and Simonovits (1993), and its stochastic extension developed by Eldan (2012). The method has found applications in a surprising wide variety of settings, ranging from its original motivation in isoperimetric inequalities to optimization, concentration of measure, and bounding the mixing rate of Markov chains. At heart, the method converts a given instance of an inequality (for a set or distribution in high dimension) into a highly structured instance, often just one-dimensional.

Authors: Yunbum Kook, Santosh S. Vempala

We survey the localization method for proving inequalities in high dimension, pioneered by Lovász and Simonovits (1993), and its stochastic extension developed by Eldan (2012). The method has found applications in a surprising wide variety of settings, ranging from its original motivation in isoperimetric inequalities to optimization, concentration of measure, and bounding the mixing rate of Markov chains. At heart, the method converts a given instance of an inequality (for a set or distribution in high dimension) into a highly structured instance, often just one-dimensional.

Efficient Hypergraph Pattern Matching via Match-and-Filter and Intersection Constraint

from arXiv: Data Structures and Algorithms

Authors: Siwoo Song, Wonseok Shin, Kunsoo Park, Giuseppe F. Italiano, Zhengyi Yang, Wenjie Zhang

A hypergraph is a generalization of a graph, in which a hyperedge can connect multiple vertices, modeling complex relationships involving multiple vertices simultaneously. Hypergraph pattern matching, which is to find all isomorphic embeddings of a query hypergraph in a data hypergraph, is one of the fundamental problems. In this paper, we present a novel algorithm for hypergraph pattern matching by introducing (1) the intersection constraint, a necessary and sufficient condition for valid embeddings, which significantly speeds up the verification process, (2) the candidate hyperedge space, a data structure that stores potential mappings between hyperedges in the query hypergraph and the data hypergraph, and (3) the Match-and-Filter framework, which interleaves matching and filtering operations to maintain only compatible candidates in the candidate hyperedge space during backtracking. Experimental results on real-world datasets demonstrate that our algorithm significantly outperforms the state-of-the-art algorithms, by up to orders of magnitude in terms of query processing time.

Authors: Siwoo Song, Wonseok Shin, Kunsoo Park, Giuseppe F. Italiano, Zhengyi Yang, Wenjie Zhang

A hypergraph is a generalization of a graph, in which a hyperedge can connect multiple vertices, modeling complex relationships involving multiple vertices simultaneously. Hypergraph pattern matching, which is to find all isomorphic embeddings of a query hypergraph in a data hypergraph, is one of the fundamental problems. In this paper, we present a novel algorithm for hypergraph pattern matching by introducing (1) the intersection constraint, a necessary and sufficient condition for valid embeddings, which significantly speeds up the verification process, (2) the candidate hyperedge space, a data structure that stores potential mappings between hyperedges in the query hypergraph and the data hypergraph, and (3) the Match-and-Filter framework, which interleaves matching and filtering operations to maintain only compatible candidates in the candidate hyperedge space during backtracking. Experimental results on real-world datasets demonstrate that our algorithm significantly outperforms the state-of-the-art algorithms, by up to orders of magnitude in terms of query processing time.

Semi-Robust Communication Complexity of Maximum Matching

from arXiv: Data Structures and Algorithms

Authors: Gabriel Cipriani Huete, Adithya Diddapur, Pavel Dvořák, Christian Konrad

We study the one-way two-party communication complexity of Maximum Matching in the semi-robust setting where the edges of a maximum matching are randomly partitioned between Alice and Bob, but all remaining edges of the input graph are adversarially partitioned between the two parties. We show that the simple protocol where Alice solely communicates a lexicographically-first maximum matching of their edges to Bob is surprisingly powerful: We prove that it yields a $3/4$-approximation in expectation and that our analysis is tight. The semi-robust setting is at least as hard as the fully robust setting. In this setting, all edges of the input graph are randomly partitioned between Alice and Bob, and the state-of-the-art result is a fairly involved $5/6$-approximation protocol that is based on the computation of edge-degree constrained subgraphs [Azarmehr, Behnezhad, ICALP'23]. Our protocol also immediately yields a $3/4$-approximation in the fully robust setting. One may wonder whether an improved analysis of our protocol in the fully robust setting is possible: While we cannot rule this out, we give an instance where our protocol only achieves a $0.832 < 5/6 = 0.83$-approximation. Hence, while our simple protocol performs surprisingly well, it cannot be used to improve over the state-of-the-art in the fully robust setting.

Authors: Gabriel Cipriani Huete, Adithya Diddapur, Pavel Dvořák, Christian Konrad

We study the one-way two-party communication complexity of Maximum Matching in the semi-robust setting where the edges of a maximum matching are randomly partitioned between Alice and Bob, but all remaining edges of the input graph are adversarially partitioned between the two parties. We show that the simple protocol where Alice solely communicates a lexicographically-first maximum matching of their edges to Bob is surprisingly powerful: We prove that it yields a $3/4$-approximation in expectation and that our analysis is tight. The semi-robust setting is at least as hard as the fully robust setting. In this setting, all edges of the input graph are randomly partitioned between Alice and Bob, and the state-of-the-art result is a fairly involved $5/6$-approximation protocol that is based on the computation of edge-degree constrained subgraphs [Azarmehr, Behnezhad, ICALP'23]. Our protocol also immediately yields a $3/4$-approximation in the fully robust setting. One may wonder whether an improved analysis of our protocol in the fully robust setting is possible: While we cannot rule this out, we give an instance where our protocol only achieves a $0.832 < 5/6 = 0.83$-approximation. Hence, while our simple protocol performs surprisingly well, it cannot be used to improve over the state-of-the-art in the fully robust setting.

Efficient Defective Clique Enumeration and Search with Worst-Case Optimal Search Space

from arXiv: Data Structures and Algorithms

Authors: Jihoon Jang, Yehyun Nam, Kunsoo Park, Hyunjoon Kim

A $k$-defective clique is a relaxation of the traditional clique definition, allowing up to $k$ missing edges. This relaxation is crucial in various real-world applications such as link prediction, community detection, and social network analysis. Although the problems of enumerating maximal $k$-defective cliques and searching a maximum $k$-defective clique have been extensively studied, existing algorithms suffer from limitations such as the combinatorial explosion of small partial solutions and sub-optimal search spaces. To address these limitations, we propose a novel clique-first branch-and-bound framework that first generates cliques and then adds missing edges. Furthermore, we introduce a new pivoting technique that achieves a search space size of $\mathcal{O}(3^{\frac{n}{3}} \cdot n^k)$, where $n$ is the number of vertices in the input graph. We prove that the worst-case number of maximal $k$-defective cliques is $Ω(3^{\frac{n}{3}} \cdot n^k)$ when $k$ is a constant, establishing that our algorithm's search space is worst-case optimal. Leveraging the diameter-two property of defective cliques, we further reduce the search space size to $\mathcal{O}(n \cdot 3^{\fracδ{3}} \cdot (δΔ)^k)$, where $δ$ is the degeneracy and $Δ$ is the maximum degree of the input graph. We also propose an efficient framework for maximum $k$-defective clique search based on our branch-and-bound, together with practical techniques to reduce the search space. Experiments on real-world benchmark datasets with more than 1 million edges demonstrate that each of our proposed algorithms for maximal $k$-defective clique enumeration and maximum $k$-defective clique search outperforms the respective state-of-the-art algorithms by up to four orders of magnitude in terms of processing time.

Authors: Jihoon Jang, Yehyun Nam, Kunsoo Park, Hyunjoon Kim

A $k$-defective clique is a relaxation of the traditional clique definition, allowing up to $k$ missing edges. This relaxation is crucial in various real-world applications such as link prediction, community detection, and social network analysis. Although the problems of enumerating maximal $k$-defective cliques and searching a maximum $k$-defective clique have been extensively studied, existing algorithms suffer from limitations such as the combinatorial explosion of small partial solutions and sub-optimal search spaces. To address these limitations, we propose a novel clique-first branch-and-bound framework that first generates cliques and then adds missing edges. Furthermore, we introduce a new pivoting technique that achieves a search space size of $\mathcal{O}(3^{\frac{n}{3}} \cdot n^k)$, where $n$ is the number of vertices in the input graph. We prove that the worst-case number of maximal $k$-defective cliques is $Ω(3^{\frac{n}{3}} \cdot n^k)$ when $k$ is a constant, establishing that our algorithm's search space is worst-case optimal. Leveraging the diameter-two property of defective cliques, we further reduce the search space size to $\mathcal{O}(n \cdot 3^{\fracδ{3}} \cdot (δΔ)^k)$, where $δ$ is the degeneracy and $Δ$ is the maximum degree of the input graph. We also propose an efficient framework for maximum $k$-defective clique search based on our branch-and-bound, together with practical techniques to reduce the search space. Experiments on real-world benchmark datasets with more than 1 million edges demonstrate that each of our proposed algorithms for maximal $k$-defective clique enumeration and maximum $k$-defective clique search outperforms the respective state-of-the-art algorithms by up to four orders of magnitude in terms of processing time.

Approximate Counting in Local Lemma Regimes

from arXiv: Data Structures and Algorithms

Authors: Ryan L. Mann, Gabriel Waite

We establish efficient approximate counting algorithms for several natural problems in local lemma regimes. In particular, we consider the probability of intersection of events and the dimension of intersection of subspaces. Our approach is based on the cluster expansion method. We obtain fully polynomial-time approximation schemes for both the probability of intersection and the dimension of intersection for commuting projectors. For general projectors, we provide two algorithms: a fully polynomial-time approximation scheme under a global inclusion-exclusion stability condition, and an efficient affine approximation under a spectral gap assumption. As corollaries of our results, we obtain efficient algorithms for approximating the number of satisfying assignments of conjunctive normal form formulae and the dimension of satisfying subspaces of quantum satisfiability formulae.

Authors: Ryan L. Mann, Gabriel Waite

We establish efficient approximate counting algorithms for several natural problems in local lemma regimes. In particular, we consider the probability of intersection of events and the dimension of intersection of subspaces. Our approach is based on the cluster expansion method. We obtain fully polynomial-time approximation schemes for both the probability of intersection and the dimension of intersection for commuting projectors. For general projectors, we provide two algorithms: a fully polynomial-time approximation scheme under a global inclusion-exclusion stability condition, and an efficient affine approximation under a spectral gap assumption. As corollaries of our results, we obtain efficient algorithms for approximating the number of satisfying assignments of conjunctive normal form formulae and the dimension of satisfying subspaces of quantum satisfiability formulae.

Universal Hirschberg for Width Bounded Dynamic Programs

from arXiv: Data Structures and Algorithms

Authors: Logan Nye

Hirschberg's algorithm (1975) reduces the space complexity for the longest common subsequence problem from $O(N^2)$ to $O(N)$ via recursive midpoint bisection on a grid dynamic program (DP). We show that the underlying idea generalizes to a broad class of dynamic programs with local dependencies on directed acyclic graphs (DP DAGs). Modeling a DP as deterministic time evolution over a topologically ordered DAG with frontier width $ω$ and bounded in-degree, and assuming a max-type semiring with deterministic tie breaking, we prove that in a standard offline random-access model any such DP admits deterministic traceback in space $O(ω\log T + (\log T)^{O(1)})$ cells over a fixed finite alphabet, where $T$ is the number of states. Our construction replaces backward dynamic programs by forward-only recomputation and organizes the time order into a height-compressed recursion tree whose nodes expose small "middle frontiers'' across which every optimal path must pass. The framework yields near-optimal traceback bounds for asymmetric and banded sequence alignment, one-dimensional recurrences, and dynamic-programming formulations on graphs of bounded pathwidth. We also show that an $Ω(ω)$ space term (in bits) is unavoidable in forward single-pass models and discuss conjectured $\sqrt{T}$-type barriers in streaming settings, supporting the view that space-efficient traceback is a structural property of width-bounded DP DAGs rather than a peculiarity of grid-based algorithms.

Authors: Logan Nye

Hirschberg's algorithm (1975) reduces the space complexity for the longest common subsequence problem from $O(N^2)$ to $O(N)$ via recursive midpoint bisection on a grid dynamic program (DP). We show that the underlying idea generalizes to a broad class of dynamic programs with local dependencies on directed acyclic graphs (DP DAGs). Modeling a DP as deterministic time evolution over a topologically ordered DAG with frontier width $ω$ and bounded in-degree, and assuming a max-type semiring with deterministic tie breaking, we prove that in a standard offline random-access model any such DP admits deterministic traceback in space $O(ω\log T + (\log T)^{O(1)})$ cells over a fixed finite alphabet, where $T$ is the number of states. Our construction replaces backward dynamic programs by forward-only recomputation and organizes the time order into a height-compressed recursion tree whose nodes expose small "middle frontiers'' across which every optimal path must pass. The framework yields near-optimal traceback bounds for asymmetric and banded sequence alignment, one-dimensional recurrences, and dynamic-programming formulations on graphs of bounded pathwidth. We also show that an $Ω(ω)$ space term (in bits) is unavoidable in forward single-pass models and discuss conjectured $\sqrt{T}$-type barriers in streaming settings, supporting the view that space-efficient traceback is a structural property of width-bounded DP DAGs rather than a peculiarity of grid-based algorithms.

Thursday, December 11

TR25-213 | Improved Small Set Expansion in High Dimensional Expanders | David Mass, Tali Kaufman

from ECCC Papers

Small set expansion in high dimensional expanders is of great importance, e.g., towards proving cosystolic expansion, local testability of codes and constructions of good quantum codes. In this work we improve upon the state of the art results of small set expansion in high dimensional expanders. Our improvement is either on the expansion quality or on the size of sets for which expansion is guaranteed. One line of previous works [KM22, DD24] has obtained weak expansion for small sets, which is sufficient for deducing cosystolic expansion of one dimension below. We improve upon their result by showing strong expansion for small sets. Another line of works [KKL14, EK16, KM21] has shown strong expansion for small sets. However, they obtain it only for very small sets. We get an exponential improvement on the size of sets for which expansion is guaranteed by these prior works. Interestingly, our result is obtained by bridging between these two lines of works. The works of [KM22, DD24] use global averaging operators in order to obtain expansion for larger sets. However, their method could be utilized only on sets that are cocycle-like. We show how to combine these global averaging operators with ideas from the so-called ``fat machinery'' of [KKL14, EK16, KM21] in order to apply them for general sets.

Small set expansion in high dimensional expanders is of great importance, e.g., towards proving cosystolic expansion, local testability of codes and constructions of good quantum codes. In this work we improve upon the state of the art results of small set expansion in high dimensional expanders. Our improvement is either on the expansion quality or on the size of sets for which expansion is guaranteed. One line of previous works [KM22, DD24] has obtained weak expansion for small sets, which is sufficient for deducing cosystolic expansion of one dimension below. We improve upon their result by showing strong expansion for small sets. Another line of works [KKL14, EK16, KM21] has shown strong expansion for small sets. However, they obtain it only for very small sets. We get an exponential improvement on the size of sets for which expansion is guaranteed by these prior works. Interestingly, our result is obtained by bridging between these two lines of works. The works of [KM22, DD24] use global averaging operators in order to obtain expansion for larger sets. However, their method could be utilized only on sets that are cocycle-like. We show how to combine these global averaging operators with ideas from the so-called ``fat machinery'' of [KKL14, EK16, KM21] in order to apply them for general sets.

Understanding vs. impact: the paradox of how to spend my time

from Scott Aaronson

Not long ago William MacAskill, the founder of the Effective Altruist movement, visited Austin, where I got to talk with him in person for the first time. I was a fan of his book What We Owe the Future, and found him as thoughtful and eloquent face-to-face as I did on the page. Talking to […]

Not long ago William MacAskill, the founder of the Effective Altruist movement, visited Austin, where I got to talk with him in person for the first time. I was a fan of his book What We Owe the Future, and found him as thoughtful and eloquent face-to-face as I did on the page. Talking to Will inspired me to write the following short reflection on how I should spend my time, which I’m now sharing in case it’s of interest to anyone else.


By inclination and temperament, I simply seek the clearest possible understanding of reality.  This has led me to spend time on (for example) the Busy Beaver function and the P versus NP problem and quantum computation and the foundations of quantum mechanics and the black hole information puzzle, and on explaining whatever I’ve understood to others.  It’s why I became a professor.

But the understanding I’ve gained also tells me that I should try to do things that will have huge positive impact, in what looks like a pivotal and even terrifying time for civilization.  It tells me that seeking understanding of the universe, like I’ve been doing, is probably nowhere close to optimizing any values that I could defend.  It’s self-indulgent, a few steps above spending my life learning to solve Rubik’s Cube as quickly as possible, but only a few.  Basically, it’s the most fun way I could make a good living and have a prestigious career, so it’s what I ended up doing.  I should be skeptical that such a course would coincidentally also maximize the good I can do for humanity.

Instead I should plausibly be figuring out how to make billions of dollars, in cryptocurrency or startups or whatever, and then spending it in a way that saves human civilization, for example by making AGI go well.  Or I should be convincing whatever billionaires I know to do the same.  Or executing some other galaxy-brained plan.  Even if I were purely selfish, as I hope I’m not, still there are things other than theoretical computer science research that would bring more hedonistic pleasure.  I’ve basically just followed a path of least resistance.

On the other hand, I don’t know how to make billions of dollars.  I don’t know how to make AGI go well.  I don’t know how to influence Elon Musk or Sam Altman or Peter Thiel or Sergey Brin or Mark Zuckerberg or Marc Andreessen to do good things rather than bad things, even when I have gotten to talk to some of them.  Past attempts in this direction by extremely smart and motivated people—for example, those of Eliezer Yudkowsky and Sam Bankman-Fried—have had, err, uneven results, to put it mildly.  I don’t know why I would succeed where they failed.

Of course, if I had a better understanding of reality, I might know how better to achieve prosocial goals for humanity.  Or I might learn why they were actually the wrong goals, and replace them with better goals.  But then I’m back to the original goal of understanding reality as clearly as possible, with the corresponding danger that I spend my time learning to solve Rubik’s Cube faster.

By Scott

Learning the Mathematical Process

from Computational Complexity

♦ Watching Mathematicians at Work (AI generated)
The Smithsonian Natural History Museum has a FossiLab where visitors can peek through windows watching scientists prepare fossils for conservation. Maybe we should have a similar exhibit at math museums or universities. How else can we learn what mathematicians do? 
In 2025, artificial intelligence has achieved gold medal status at the International Mathematical Olympiad but so far has only contributed modestly in finding new theorems. Of course, finding and proving new theorems requires a different set of skills than competition problems but it goes further than that.
The Internet has considerable text and video on how to solve math competition problems that machine learning systems can train on. On the other hand, mathematical research papers usually have little more than theorems and proofs. Maybe some intuition. Rarely do papers go into the thinking process and the false steps that one takes until one finds the proof. For some problems I've spent weeks proving a theorem but only the last day's work gets written up.
Now I doubt many mathematicians would give up their privacy and time to train AI systems to take over their jobs, but just suppose we wanted to do so. We could equip every mathematician with a camera recording every mathematical conversation and everything they write, especially the ideas that don't pan out. We can transcribe it all and feed it into an ML system. But it probably won't be enough.
Trouble is most mathematical breakthroughs just happen inside of people's heads. If you ask a mathematician how they came up with the clever idea that led to a major new result, they can rarely truly explain the process behind it. Not unlike neural nets. 
If machines can't learn to prove theorems by watching mathematicians, perhaps the route mathematicians take: A grad school slog towards PhD research and learning from endless failure. 

By Lance Fortnow

Watching Mathematicians at Work (AI generated)

The Smithsonian Natural History Museum has a FossiLab where visitors can peek through windows watching scientists prepare fossils for conservation. Maybe we should have a similar exhibit at math museums or universities. How else can we learn what mathematicians do? 

In 2025, artificial intelligence has achieved gold medal status at the International Mathematical Olympiad but so far has only contributed modestly in finding new theorems. Of course, finding and proving new theorems requires a different set of skills than competition problems but it goes further than that.

The Internet has considerable text and video on how to solve math competition problems that machine learning systems can train on. On the other hand, mathematical research papers usually have little more than theorems and proofs. Maybe some intuition. Rarely do papers go into the thinking process and the false steps that one takes until one finds the proof. For some problems I've spent weeks proving a theorem but only the last day's work gets written up.

Now I doubt many mathematicians would give up their privacy and time to train AI systems to take over their jobs, but just suppose we wanted to do so. We could equip every mathematician with a camera recording every mathematical conversation and everything they write, especially the ideas that don't pan out. We can transcribe it all and feed it into an ML system. But it probably won't be enough.

Trouble is most mathematical breakthroughs just happen inside of people's heads. If you ask a mathematician how they came up with the clever idea that led to a major new result, they can rarely truly explain the process behind it. Not unlike neural nets. 

If machines can't learn to prove theorems by watching mathematicians, perhaps the route mathematicians take: A grad school slog towards PhD research and learning from endless failure. 

By Lance Fortnow

Prompts for Open Problems

from Ben Recht

A few research problems the class inspired me to think about

This is a live blog of the final lecture of the 2025 edition of my graduate machine learning class “Patterns, Predictions, and Actions.” A full Table of Contents is here. I tried to summarize my semester reflections in class on Thursday, but found my thoughts haven’t quite settled yet. I’m hoping a week of posting will help me sort through it.

I don’t think I need to write a post arguing for more machine learning research. We definitely have more than we need. Rather than asking for more research, I’m proposing perhaps different research. I got myself interested in a bunch of problems while teaching the class, so let me take a post to selfishly nudge you in the directions that interested me. There are always questions and more experiments to be done.

Design-based Machine Learning

Abandoning the myth of data-generating distributions is more than just semantic. Nuances emerge when you treat all randomness as generated by the engineer rather than by nature. In statistics, this is the contrast between the model-based (natural randomness) and design-based (intentional randomness) views of statistical inference.

I remain skeptical of statistical inference, but I think there is a promising way to extend the online-learning regret view of decision making to develop a more compelling version of Neyman-Pearson decision theory.

I was surprised to see how much of machine learning can be cast in a design-based frame. I just had to gently rearrange some of the definitions and verify that the probability bounds held for without-replacement sampling (they usually do). This design-based perspective opened the door to novel and interesting analyses. For example, the adaptive experiment design I covered in Lecture 19 shows how to formulate decision theory at the population level and results in a simple adaptive algorithm for sequential experiments without power calculations. It also indicates that all current studies are woefully underpowered if we actually cared about population-level outcomes. (80% power only suffices when your population has five people).

Design-based bounds for more complex models might provide a better understanding of trade-offs in algorithmic decision systems that make predictions from data. A design-based perspective might change how we use machine learning to design such complex algorithmic decision aids.

A theory of competitive testing

I still think the most frustrating thing about machine learning theory is that we have no consensus explanation for plots like this one:

This is from Lecture 14. It is a strikingly robust phenomenon. It’s been reproduced in hundreds of benchmarks. It drives much of the “large models are all you need” discourse. And yet, all of our consensus theories predict that these plots should look different.

Alexandre Passos suggested in the comments that machine learning theory had moved from treating things like math to treating things like physics. We find some robust linear trends when we throw scatter plots on the same axes. But plots alone are not sufficient to transform observational science into physics. There’s a group of theoretical physicists, the phenomenologists, who cook up models to explain these trends. They aren’t always successful, but they love to put out theories. And sometimes interesting new physics comes from this spitballing.

I realize this 10 year old problem is now boring. It’s sexier to pose theories about world models or LLM skill acquisition or the optimal noise schedules for diffusion models. But competitive testing is the core of our field, and it’s embarrassing that we don’t have any reasonable explanation for how it works.

Beyond average case evaluation

A recurring theme in the class was metrical determinism. Once you decide that you will evaluate your performance by the average on some population, you’re stuck with statistical methods, and probably can’t beat competitive testing on train-test benchmarks. I always wonder whether this is really the only way to think about evaluation. Why can’t we escape bean counting? This question is likely more sociological than mathematical, and I may need a whole blog post to make it well-posed. I’ll add it to my to-do list.

Certainty equivalence in LLM reasoning

Cleaning up LLM reasoning may be the lowest-hanging fruit on this list. Right now, “reasoning” in LLMs means applying policy gradient to increase the probability that LLMs will answer test questions correctly. I’m not convinced that I want to run a bunch of experiments to get LLMs to do well on math tests, but all of my experience is screaming at me that policy gradient is leaving a ton of efficiency on the table. These optimization methods are just trying to take a model that gets 20s on tests to a model that gets 90s on tests. That is, we need optimizers for the endgame, not for the cold start of the optimization algorithm. In optimization, the end game is almost always the fast and easy part. I’m sure there are straightforward innovations to achieve the same level of performance as the XPO methods in a fraction of the time. Every time I have looked at a policy gradient method, this has been the case! I’ve seen no evidence to the contrary that this time is different. If you are an intrepid optimization researcher who wants to run experiments on this problem and want a grumpy advisor who doesn’t even want to be a coauthor, please send me a note.

Open source, Open corpus language models

Perhaps the way you could get me to care more about RL in LLMs is that they might help in the quest to build high-performing, open-source, open-corpus large language models. This topic didn’t come up in the class at all, but I’m plugging it here as I still think it’s the most important “open problem” in applied machine learning. Many teams have been making progress on this front, be they at Allen AI, Pleais in France, or pockets of Andrej Karpathy’s free time. I think that now we know what we “want” from LLMs, better algorithmic innovations can get us there for considerably less computing resources. Endless research and commercial possibilities open up once you can train a model on a single workstation. Moreover, breaking the current narrative that the “bitter lesson” means covering the earth with GPUs would be better for our geopolitics.

I think this is doable. There just needs to be a will to do it.

Subscribe now

By Ben Recht

Dichotomy results for classes of countable graphs

from arXiv: Computational Complexity

Authors: Vittorio Cipriani, Ekaterina Fokina, Matthew Harrison-Trainor, Liling Ko, Dino Rossegger

We study classes of countable graphs where every member does not contain a given finite graph as an induced subgraph -- denoted by $\mathsf{Free}(\mathcal{G})$ for a given finite graph $\mathcal{G}$. Our main results establish a structural dichotomy for such classes: If $\mathcal{G}$ is not an induced subgraph of $\mathcal{P}_4$, then $\mathsf{Free}(\mathcal{G})$ is on top under effective bi-interpretability, implying that the members of $\mathsf{Free}(\mathcal{G})$ exhibit the full range of structural and computational behaviors. In contrast, if $\mathcal{G}$ is an induced subgraph of $\mathcal{P}_4$, then $\mathsf{Free}(\mathcal{G})$ is structurally simple, as witnessed by the fact that every member satisfies the computable embeddability condition. This dichotomy is mirrored in the finite setting when one considers combinatorial and complexity-theoretic properties. Specifically, it is known that $\mathsf{Free}(\mathcal{G})^{fin}$ is complete for graph isomorphism and not a well-quasi-order under embeddability whenever $\mathcal{G}$ is not an induced subgraph of $\mathcal{P}_4$, while in all other cases $\mathsf{Free}(\mathcal{G})^{fin}$ forms a well-quasi-order and the isomorphism problem for $\mathsf{Free}(\mathcal{G})^{fin}$ is solvable in polynomial time.

Authors: Vittorio Cipriani, Ekaterina Fokina, Matthew Harrison-Trainor, Liling Ko, Dino Rossegger

We study classes of countable graphs where every member does not contain a given finite graph as an induced subgraph -- denoted by $\mathsf{Free}(\mathcal{G})$ for a given finite graph $\mathcal{G}$. Our main results establish a structural dichotomy for such classes: If $\mathcal{G}$ is not an induced subgraph of $\mathcal{P}_4$, then $\mathsf{Free}(\mathcal{G})$ is on top under effective bi-interpretability, implying that the members of $\mathsf{Free}(\mathcal{G})$ exhibit the full range of structural and computational behaviors. In contrast, if $\mathcal{G}$ is an induced subgraph of $\mathcal{P}_4$, then $\mathsf{Free}(\mathcal{G})$ is structurally simple, as witnessed by the fact that every member satisfies the computable embeddability condition. This dichotomy is mirrored in the finite setting when one considers combinatorial and complexity-theoretic properties. Specifically, it is known that $\mathsf{Free}(\mathcal{G})^{fin}$ is complete for graph isomorphism and not a well-quasi-order under embeddability whenever $\mathcal{G}$ is not an induced subgraph of $\mathcal{P}_4$, while in all other cases $\mathsf{Free}(\mathcal{G})^{fin}$ forms a well-quasi-order and the isomorphism problem for $\mathsf{Free}(\mathcal{G})^{fin}$ is solvable in polynomial time.

Derandomizing Isolation In Catalytic Logspace

from arXiv: Computational Complexity

Authors: V. Arvind, Srijan Chakraborty, Samir Datta

A language is said to be in catalytic logspace if we can test membership using a deterministic logspace machine that has an additional read/write tape filled with arbitrary data whose contents have to be restored to their original value at the end of the computation. The model of catalytic computation was introduced by Buhrman et al [STOC2014]. As our first result, we obtain a catalytic logspace algorithm for computing a minimum weight witness to a search problem, with small weights, provided the algorithm is given oracle access for the corresponding weighted decision problem. In particular, our reduction yields CL algorithms for the search versions of the following three problems: planar perfect matching, planar exact perfect matching and weighted arborescences in weighted digraphs. Our second set of results concern the significantly larger class CL^{NP}_{2-round}. We show that CL^{NP}_{2-round} contains SearchSAT and the complexity classes BPP, MA and ZPP^{NP[1]}. While SearchSAT is shown to be in CL^{NP}_{2-round} using the isolation lemma, the other three containments, while based on the compress-or-random technique, use the Nisan-Wigderson [JCSS 1994] based pseudo-random generator. These containments show that CL^{NP}_{2-round} resembles ZPP^NP more than P^{NP}, providing some weak evidence that CL is more like ZPP than P. For our third set of results we turn to isolation well inside catalytic classes. We consider the unambiguous catalytic class CUTISP[poly(n),logn,log^2n] and show that it contains reachability and therefore NL. This is a catalytic version of the result of van Melkebeek & Prakriya [SIAM J. Comput. 2019]. Building on their result, we also show a tradeoff between workspace and catalytic space. Finally, we extend these catalytic upper bounds to LogCFL.

Authors: V. Arvind, Srijan Chakraborty, Samir Datta

A language is said to be in catalytic logspace if we can test membership using a deterministic logspace machine that has an additional read/write tape filled with arbitrary data whose contents have to be restored to their original value at the end of the computation. The model of catalytic computation was introduced by Buhrman et al [STOC2014]. As our first result, we obtain a catalytic logspace algorithm for computing a minimum weight witness to a search problem, with small weights, provided the algorithm is given oracle access for the corresponding weighted decision problem. In particular, our reduction yields CL algorithms for the search versions of the following three problems: planar perfect matching, planar exact perfect matching and weighted arborescences in weighted digraphs. Our second set of results concern the significantly larger class CL^{NP}_{2-round}. We show that CL^{NP}_{2-round} contains SearchSAT and the complexity classes BPP, MA and ZPP^{NP[1]}. While SearchSAT is shown to be in CL^{NP}_{2-round} using the isolation lemma, the other three containments, while based on the compress-or-random technique, use the Nisan-Wigderson [JCSS 1994] based pseudo-random generator. These containments show that CL^{NP}_{2-round} resembles ZPP^NP more than P^{NP}, providing some weak evidence that CL is more like ZPP than P. For our third set of results we turn to isolation well inside catalytic classes. We consider the unambiguous catalytic class CUTISP[poly(n),logn,log^2n] and show that it contains reachability and therefore NL. This is a catalytic version of the result of van Melkebeek & Prakriya [SIAM J. Comput. 2019]. Building on their result, we also show a tradeoff between workspace and catalytic space. Finally, we extend these catalytic upper bounds to LogCFL.

Near-Linear and Parameterized Approximations for Maximum Cliques in Disk Graphs

from arXiv: Computational Geometry

Authors: Jie Gao, Pawel Gawrychowski, Panos Giannopoulos, Wolfgang Mulzer, Satyam Singh, Frank Staals, Meirav Zehavi

A \emph{disk graph} is the intersection graph of (closed) disks in the plane. We consider the classic problem of finding a maximum clique in a disk graph. For general disk graphs, the complexity of this problem is still open, but for unit disk graphs, it is well known to be in P. The currently fastest algorithm runs in time $O(n^{7/3+ o(1)})$, where $n$ denotes the number of disks~\cite{EspenantKM23, keil_et_al:LIPIcs.SoCG.2025.63}. Moreover, for the case of disk graphs with $t$ distinct radii, the problem has also recently been shown to be in XP. More specifically, it is solvable in time $O^*(n^{2t})$~\cite{keil_et_al:LIPIcs.SoCG.2025.63}. In this paper, we present algorithms with improved running times by allowing for approximate solutions and by using randomization: (i) for unit disk graphs, we give an algorithm that, with constant success probability, computes a $(1-\varepsilon)$-approximate maximum clique in expected time $\tilde{O}(n/\varepsilon^2)$; and (ii) for disk graphs with $t$ distinct radii, we give a parameterized approximation scheme that, with a constant success probability, computes a $(1-\varepsilon)$-approximate maximum clique in expected time $\tilde{O}(f(t)\cdot (1/\varepsilon)^{O(t)} \cdot n)$.

Authors: Jie Gao, Pawel Gawrychowski, Panos Giannopoulos, Wolfgang Mulzer, Satyam Singh, Frank Staals, Meirav Zehavi

A \emph{disk graph} is the intersection graph of (closed) disks in the plane. We consider the classic problem of finding a maximum clique in a disk graph. For general disk graphs, the complexity of this problem is still open, but for unit disk graphs, it is well known to be in P. The currently fastest algorithm runs in time $O(n^{7/3+ o(1)})$, where $n$ denotes the number of disks~\cite{EspenantKM23, keil_et_al:LIPIcs.SoCG.2025.63}. Moreover, for the case of disk graphs with $t$ distinct radii, the problem has also recently been shown to be in XP. More specifically, it is solvable in time $O^*(n^{2t})$~\cite{keil_et_al:LIPIcs.SoCG.2025.63}. In this paper, we present algorithms with improved running times by allowing for approximate solutions and by using randomization: (i) for unit disk graphs, we give an algorithm that, with constant success probability, computes a $(1-\varepsilon)$-approximate maximum clique in expected time $\tilde{O}(n/\varepsilon^2)$; and (ii) for disk graphs with $t$ distinct radii, we give a parameterized approximation scheme that, with a constant success probability, computes a $(1-\varepsilon)$-approximate maximum clique in expected time $\tilde{O}(f(t)\cdot (1/\varepsilon)^{O(t)} \cdot n)$.

Coloring Geometric Hypergraphs: A Survey

from arXiv: Computational Geometry

Authors: Gábor Damásdi, Balázs Keszegh, János Pach, Dömötör Pálvölgyi, Géza Tóth

The \emph{chromatic number} of a hypergraph is the smallest number of colors needed to color the vertices such that no edge of at least two vertices is monochromatic. Given a family of geometric objects $\mathcal{F}$ that covers a subset $S$ of the Euclidean space, we can associate it with a hypergraph whose vertex set is $\mathcal F$ and whose edges are those subsets ${\mathcal{F}'}\subset \mathcal F$ for which there exists a point $p\in S$ such that ${\mathcal F}'$ consists of precisely those elements of $\mathcal{F}$ that contain $p$. The question whether $\mathcal F$ can be split into 2 coverings is equivalent to asking whether the chromatic number of the hypergraph is equal to 2. There are a number of competing notions of the chromatic number that lead to deep combinatorial questions already for abstract hypergraphs. In this paper, we concentrate on \emph{geometrically defined} (in short, \emph{geometric}) hypergraphs, and survey many recent coloring results related to them. In particular, we study and survey the following problem, dual to the above covering question. Given a set of points $S$ in the Euclidean space and a family $\mathcal{F}$ of geometric objects of a fixed type, define a hypergraph ${\mathcal H}_m$ on the point set $S$, whose edges are the subsets of $S$ that can be obtained as the intersection of $S$ with a member of $\mathcal F$ and have at least $m$ elements. Is it true that if $m$ is large enough, then the chromatic number of ${\mathcal H}_m$ is equal to 2?

Authors: Gábor Damásdi, Balázs Keszegh, János Pach, Dömötör Pálvölgyi, Géza Tóth

The \emph{chromatic number} of a hypergraph is the smallest number of colors needed to color the vertices such that no edge of at least two vertices is monochromatic. Given a family of geometric objects $\mathcal{F}$ that covers a subset $S$ of the Euclidean space, we can associate it with a hypergraph whose vertex set is $\mathcal F$ and whose edges are those subsets ${\mathcal{F}'}\subset \mathcal F$ for which there exists a point $p\in S$ such that ${\mathcal F}'$ consists of precisely those elements of $\mathcal{F}$ that contain $p$. The question whether $\mathcal F$ can be split into 2 coverings is equivalent to asking whether the chromatic number of the hypergraph is equal to 2. There are a number of competing notions of the chromatic number that lead to deep combinatorial questions already for abstract hypergraphs. In this paper, we concentrate on \emph{geometrically defined} (in short, \emph{geometric}) hypergraphs, and survey many recent coloring results related to them. In particular, we study and survey the following problem, dual to the above covering question. Given a set of points $S$ in the Euclidean space and a family $\mathcal{F}$ of geometric objects of a fixed type, define a hypergraph ${\mathcal H}_m$ on the point set $S$, whose edges are the subsets of $S$ that can be obtained as the intersection of $S$ with a member of $\mathcal F$ and have at least $m$ elements. Is it true that if $m$ is large enough, then the chromatic number of ${\mathcal H}_m$ is equal to 2?

On Mobile Ad Hoc Networks for Coverage of Partially Observable Worlds

from arXiv: Computational Geometry

Authors: Edwin Meriaux, Shuo Wen, Louis-Roy Langevin, Doina Precup, Antonio Loría, Gregory Dudek

This paper addresses the movement and placement of mobile agents to establish a communication network in initially unknown environments. We cast the problem in a computational-geometric framework by relating the coverage problem and line-of-sight constraints to the Cooperative Guard Art Gallery Problem, and introduce its partially observable variant, the Partially Observable Cooperative Guard Art Gallery Problem (POCGAGP). We then present two algorithms that solve POCGAGP: CADENCE, a centralized planner that incrementally selects 270 degree corners at which to deploy agents, and DADENCE, a decentralized scheme that coordinates agents using local information and lightweight messaging. Both approaches operate under partial observability and target simultaneous coverage and connectivity. We evaluate the methods in simulation across 1,500 test cases of varied size and structure, demonstrating consistent success in forming connected networks while covering and exploring unknown space. These results highlight the value of geometric abstractions for communication-driven exploration and show that decentralized policies are competitive with centralized performance while retaining scalability.

Authors: Edwin Meriaux, Shuo Wen, Louis-Roy Langevin, Doina Precup, Antonio Loría, Gregory Dudek

This paper addresses the movement and placement of mobile agents to establish a communication network in initially unknown environments. We cast the problem in a computational-geometric framework by relating the coverage problem and line-of-sight constraints to the Cooperative Guard Art Gallery Problem, and introduce its partially observable variant, the Partially Observable Cooperative Guard Art Gallery Problem (POCGAGP). We then present two algorithms that solve POCGAGP: CADENCE, a centralized planner that incrementally selects 270 degree corners at which to deploy agents, and DADENCE, a decentralized scheme that coordinates agents using local information and lightweight messaging. Both approaches operate under partial observability and target simultaneous coverage and connectivity. We evaluate the methods in simulation across 1,500 test cases of varied size and structure, demonstrating consistent success in forming connected networks while covering and exploring unknown space. These results highlight the value of geometric abstractions for communication-driven exploration and show that decentralized policies are competitive with centralized performance while retaining scalability.

Magic Gems: A Polyhedral Framework for Magic Squares

from arXiv: Computational Geometry

Authors: Kyle Elliott Mathewson

We introduce Magic Gems, a geometric representation of magic squares as three-dimensional polyhedra. By mapping an n x n magic square onto a centered coordinate grid with cell values as vertical displacements, we construct a point cloud whose convex hull defines the Magic Gem. This reveals a connection between magic square constraints and statistical structure: we prove that magic squares have vanishing covariances between position and value. We introduce a covariance energy functional -- the sum of squared covariances with row, column, and diagonal indicator variables -- and prove for n=3 (via exhaustive enumeration) that its zeros are precisely the magic squares. Large-scale sampling for n=4,5 (460+ million arrangements) provides strong numerical evidence that this characterization extends to larger orders. Perturbation analysis demonstrates that magic squares are isolated local minima. The representation is invariant under dihedral symmetry D_4, yielding canonical geometric objects for equivalence classes.

Authors: Kyle Elliott Mathewson

We introduce Magic Gems, a geometric representation of magic squares as three-dimensional polyhedra. By mapping an n x n magic square onto a centered coordinate grid with cell values as vertical displacements, we construct a point cloud whose convex hull defines the Magic Gem. This reveals a connection between magic square constraints and statistical structure: we prove that magic squares have vanishing covariances between position and value. We introduce a covariance energy functional -- the sum of squared covariances with row, column, and diagonal indicator variables -- and prove for n=3 (via exhaustive enumeration) that its zeros are precisely the magic squares. Large-scale sampling for n=4,5 (460+ million arrangements) provides strong numerical evidence that this characterization extends to larger orders. Perturbation analysis demonstrates that magic squares are isolated local minima. The representation is invariant under dihedral symmetry D_4, yielding canonical geometric objects for equivalence classes.

Colouring Graphs Without a Subdivided H-Graph: A Full Complexity Classification

from arXiv: Data Structures and Algorithms

Authors: Tala Eagling-Vose, Jorik Jooken, Felicia Lucke, Barnaby Martin, Daniël Paulusma

We consider Colouring on graphs that are $H$-subgraph-free for some fixed graph $H$, i.e., graphs that do not contain $H$ as a subgraph. It is known that even $3$-Colouring is NP-complete for $H$-subgraph-free graphs whenever $H$ has a cycle; or a vertex of degree at least $5$; or a component with two vertices of degree $4$, while Colouring is polynomial-time solvable for $H$-subgraph-free graphs if $H$ is a forest of maximum degree at most $3$, in which each component has at most one vertex of degree $3$. For connected graphs $H$, this means that it remains to consider when $H$ is tree of maximum degree $4$ with exactly one vertex of degree $4$, or a tree of maximum degree $3$ with at least two vertices of degree $3$. We let $H$ be a so-called subdivided "H"-graph, which is either a subdivided $\mathbb{H}_0$: a tree of maximum degree $4$ with exactly one vertex of degree $4$ and no vertices of degree $3$, or a subdivided $\mathbb{H}_1$: a tree of maximum degree $3$ with exactly two vertices of degree $3$. In the literature, only a limited number of polynomial-time and NP-completeness results for these cases are known. We develop new polynomial-time techniques that allow us to determine the complexity of Colouring on $H$-subgraph-free graphs for all the remaining subdivided "H"-graphs, so we fully classify both cases. As a consequence, the complexity of Colouring on $H$-subgraph-free graphs has now been settled for all connected graphs $H$ except when $H$ is a tree of maximum degree $4$ with exactly one vertex of degree $4$ and at least one vertex of degree $3$; or a tree of maximum degree $3$ with at least three vertices of degree $3$. We also employ our new techniques to obtain the same new polynomial-time results for another classic graph problem, namely Stable Cut.

Authors: Tala Eagling-Vose, Jorik Jooken, Felicia Lucke, Barnaby Martin, Daniël Paulusma

We consider Colouring on graphs that are $H$-subgraph-free for some fixed graph $H$, i.e., graphs that do not contain $H$ as a subgraph. It is known that even $3$-Colouring is NP-complete for $H$-subgraph-free graphs whenever $H$ has a cycle; or a vertex of degree at least $5$; or a component with two vertices of degree $4$, while Colouring is polynomial-time solvable for $H$-subgraph-free graphs if $H$ is a forest of maximum degree at most $3$, in which each component has at most one vertex of degree $3$. For connected graphs $H$, this means that it remains to consider when $H$ is tree of maximum degree $4$ with exactly one vertex of degree $4$, or a tree of maximum degree $3$ with at least two vertices of degree $3$. We let $H$ be a so-called subdivided "H"-graph, which is either a subdivided $\mathbb{H}_0$: a tree of maximum degree $4$ with exactly one vertex of degree $4$ and no vertices of degree $3$, or a subdivided $\mathbb{H}_1$: a tree of maximum degree $3$ with exactly two vertices of degree $3$. In the literature, only a limited number of polynomial-time and NP-completeness results for these cases are known. We develop new polynomial-time techniques that allow us to determine the complexity of Colouring on $H$-subgraph-free graphs for all the remaining subdivided "H"-graphs, so we fully classify both cases. As a consequence, the complexity of Colouring on $H$-subgraph-free graphs has now been settled for all connected graphs $H$ except when $H$ is a tree of maximum degree $4$ with exactly one vertex of degree $4$ and at least one vertex of degree $3$; or a tree of maximum degree $3$ with at least three vertices of degree $3$. We also employ our new techniques to obtain the same new polynomial-time results for another classic graph problem, namely Stable Cut.

Optimal certification of constant-local Hamiltonians

from arXiv: Data Structures and Algorithms

Authors: Junseo Lee, Myeongjin Shin

We study the problem of certifying local Hamiltonians from real-time access to their dynamics. Given oracle access to $e^{-itH}$ for an unknown $k$-local Hamiltonian $H$ and a fully specified target Hamiltonian $H_0$, the goal is to decide whether $H$ is exactly equal to $H_0$ or differs from $H_0$ by at least $\varepsilon$ in normalized Frobenius norm, while minimizing the total evolution time. We introduce the first intolerant Hamiltonian certification protocol that achieves optimal performance for all constant-locality Hamiltonians. For general $n$-qubit, $k$-local, traceless Hamiltonians, our procedure uses $O(c^k/\varepsilon)$ total evolution time for a universal constant $c$, and succeeds with high probability. In particular, for $O(1)$-local Hamiltonians, the total evolution time becomes $Θ(1/\varepsilon)$, matching the known $Ω(1/\varepsilon)$ lower bounds and achieving the gold-standard Heisenberg-limit scaling. Prior certification methods either relied on implementing inverse evolution of $H$, required controlled access to $e^{-itH}$, or achieved near-optimal guarantees only in restricted settings such as the Ising case ($k=2$). In contrast, our algorithm requires neither inverse evolution nor controlled operations: it uses only forward real-time dynamics and achieves optimal intolerant certification for all constant-locality Hamiltonians.

Authors: Junseo Lee, Myeongjin Shin

We study the problem of certifying local Hamiltonians from real-time access to their dynamics. Given oracle access to $e^{-itH}$ for an unknown $k$-local Hamiltonian $H$ and a fully specified target Hamiltonian $H_0$, the goal is to decide whether $H$ is exactly equal to $H_0$ or differs from $H_0$ by at least $\varepsilon$ in normalized Frobenius norm, while minimizing the total evolution time. We introduce the first intolerant Hamiltonian certification protocol that achieves optimal performance for all constant-locality Hamiltonians. For general $n$-qubit, $k$-local, traceless Hamiltonians, our procedure uses $O(c^k/\varepsilon)$ total evolution time for a universal constant $c$, and succeeds with high probability. In particular, for $O(1)$-local Hamiltonians, the total evolution time becomes $Θ(1/\varepsilon)$, matching the known $Ω(1/\varepsilon)$ lower bounds and achieving the gold-standard Heisenberg-limit scaling. Prior certification methods either relied on implementing inverse evolution of $H$, required controlled access to $e^{-itH}$, or achieved near-optimal guarantees only in restricted settings such as the Ising case ($k=2$). In contrast, our algorithm requires neither inverse evolution nor controlled operations: it uses only forward real-time dynamics and achieves optimal intolerant certification for all constant-locality Hamiltonians.

A 0.8395-approximation algorithm for the EPR problem

from arXiv: Data Structures and Algorithms

Authors: Anuj Apte, Eunou Lee, Kunal Marwaha, Ojas Parekh, Lennart Sinjorgo, James Sud

We give an efficient 0.8395-approximation algorithm for the EPR Hamiltonian. Our improvement comes from a new nonlinear monogamy-of-entanglement bound on star graphs and a refined parameterization of a shallow quantum circuit from previous works. We also prove limitations showing that current methods cannot achieve substantially better approximation ratios, indicating that further progress will require fundamentally new techniques.

Authors: Anuj Apte, Eunou Lee, Kunal Marwaha, Ojas Parekh, Lennart Sinjorgo, James Sud

We give an efficient 0.8395-approximation algorithm for the EPR Hamiltonian. Our improvement comes from a new nonlinear monogamy-of-entanglement bound on star graphs and a refined parameterization of a shallow quantum circuit from previous works. We also prove limitations showing that current methods cannot achieve substantially better approximation ratios, indicating that further progress will require fundamentally new techniques.

Provably Learning from Modern Language Models via Low Logit Rank

from arXiv: Data Structures and Algorithms

Authors: Noah Golowich, Allen Liu, Abhishek Shetty

While modern language models and their inner workings are incredibly complex, recent work (Golowich, Liu & Shetty; 2025) has proposed a simple and potentially tractable abstraction for them through the observation that empirically, these language models all seem to have approximately low logit rank. Roughly, this means that a matrix formed by the model's log probabilities of various tokens conditioned on certain sequences of tokens is well approximated by a low rank matrix. In this paper, our focus is on understanding how this structure can be exploited algorithmically for obtaining provable learning guarantees. Since low logit rank models can encode hard-to-learn distributions such as noisy parities, we study a query learning model with logit queries that reflects the access model for common APIs. Our main result is an efficient algorithm for learning any approximately low logit rank model from queries. We emphasize that our structural assumption closely reflects the behavior that is empirically observed in modern language models. Thus, our result gives what we believe is the first end-to-end learning guarantee for a generative model that plausibly captures modern language models.

Authors: Noah Golowich, Allen Liu, Abhishek Shetty

While modern language models and their inner workings are incredibly complex, recent work (Golowich, Liu & Shetty; 2025) has proposed a simple and potentially tractable abstraction for them through the observation that empirically, these language models all seem to have approximately low logit rank. Roughly, this means that a matrix formed by the model's log probabilities of various tokens conditioned on certain sequences of tokens is well approximated by a low rank matrix. In this paper, our focus is on understanding how this structure can be exploited algorithmically for obtaining provable learning guarantees. Since low logit rank models can encode hard-to-learn distributions such as noisy parities, we study a query learning model with logit queries that reflects the access model for common APIs. Our main result is an efficient algorithm for learning any approximately low logit rank model from queries. We emphasize that our structural assumption closely reflects the behavior that is empirically observed in modern language models. Thus, our result gives what we believe is the first end-to-end learning guarantee for a generative model that plausibly captures modern language models.

Dynamic Graph Coloring: Sequential, Parallel, and Distributed

from arXiv: Data Structures and Algorithms

Authors: Mohsen Ghaffari, Jaehyun Koo

We present a simple randomized algorithm that can efficiently maintain a $(Δ+1)$ coloring as the graph undergoes edge insertion and deletion updates, where $Δ$ denotes an upper bound on the maximum degree. A key advantage is the algorithm's ability to process many updates simultaneously, which makes it naturally adaptable to the parallel and distributed models. Concretely, it gives a unified framework across the models, leading to the following results: - In the sequential setting, the algorithm processes each update in $O(1)$ expected time, worst-case. This matches and strengthens the results of Henzinger and Peng [TALG 2022] and Bhattacharya et al. [TALG 2022], who achieved an $O(1)$ bound but amortized (in expectation and with high probability, respectively), whose work was an improvement of the $O(\log Δ)$ expected amortized bound of Bhattacharya et al. [SODA'18]. - In the parallel setting, the algorithm processes each (arbitrary size) batch of updates using $O(1)$ work per update in the batch in expectation, and in $\text{poly}(\log n)$ depth with high probability. This is, in a sense, an ideal parallelization of the above results. - In the distributed setting, the algorithm can maintain a coloring of the network graph as (potentially many) edges are added or deleted. The maintained coloring is always proper; it may become partial upon updates, i.e., some nodes may temporarily lose their colors, but quickly converges to a full, proper coloring. Concretely, each insertion and deletion causes at most $O(1)$ nodes to become uncolored, but this is resolved within $O(\log n)$ rounds with high probability (e.g., in the absence of further updates nearby--the precise guarantee is stronger, but technical). Importantly, the algorithm incurs only $O(1)$ expected message complexity and computation per update.

Authors: Mohsen Ghaffari, Jaehyun Koo

We present a simple randomized algorithm that can efficiently maintain a $(Δ+1)$ coloring as the graph undergoes edge insertion and deletion updates, where $Δ$ denotes an upper bound on the maximum degree. A key advantage is the algorithm's ability to process many updates simultaneously, which makes it naturally adaptable to the parallel and distributed models. Concretely, it gives a unified framework across the models, leading to the following results: - In the sequential setting, the algorithm processes each update in $O(1)$ expected time, worst-case. This matches and strengthens the results of Henzinger and Peng [TALG 2022] and Bhattacharya et al. [TALG 2022], who achieved an $O(1)$ bound but amortized (in expectation and with high probability, respectively), whose work was an improvement of the $O(\log Δ)$ expected amortized bound of Bhattacharya et al. [SODA'18]. - In the parallel setting, the algorithm processes each (arbitrary size) batch of updates using $O(1)$ work per update in the batch in expectation, and in $\text{poly}(\log n)$ depth with high probability. This is, in a sense, an ideal parallelization of the above results. - In the distributed setting, the algorithm can maintain a coloring of the network graph as (potentially many) edges are added or deleted. The maintained coloring is always proper; it may become partial upon updates, i.e., some nodes may temporarily lose their colors, but quickly converges to a full, proper coloring. Concretely, each insertion and deletion causes at most $O(1)$ nodes to become uncolored, but this is resolved within $O(\log n)$ rounds with high probability (e.g., in the absence of further updates nearby--the precise guarantee is stronger, but technical). Importantly, the algorithm incurs only $O(1)$ expected message complexity and computation per update.

Almost-Optimal Approximation Algorithms for Global Minimum Cut in Directed Graphs

from arXiv: Data Structures and Algorithms

Authors: Ron Mosenzon

We develop new $(1+ε)$-approximation algorithms for finding the global minimum edge-cut in a directed edge-weighted graph, and for finding the global minimum vertex-cut in a directed vertex-weighted graph. Our algorithms are randomized, and have a running time of $O\left(m^{1+o(1)}/ε\right)$ on any $m$-edge $n$-vertex input graph, assuming all edge/vertex weights are polynomially-bounded. In particular, for any constant $ε>0$, our algorithms have an almost-optimal running time of $O\left(m^{1+o(1)}\right)$. The fastest previously-known running time for this setting, due to (Cen et al., FOCS 2021), is $\tilde{O}\left(\min\left\{n^2/ε^2,m^{1+o(1)}\sqrt{n}\right\}\right)$ for Minimum Edge-Cut, and $\tilde{O}\left(n^2/ε^2\right)$ for Minimum Vertex-Cut. Our results further extend to the rooted variants of the Minimum Edge-Cut and Minimum Vertex-Cut problems, where the algorithm is additionally given a root vertex $r$, and the goal is to find a minimum-weight cut separating any vertex from the root $r$. In terms of techniques, we build upon and extend a framework that was recently introduced by (Chuzhoy et al., SODA 2026) for solving the Minimum Vertex-Cut problem in unweighted directed graphs. Additionally, in order to obtain our result for the Global Minimum Vertex-Cut problem, we develop a novel black-box reduction from this problem to its rooted variant. Prior to our work, such reductions were only known for more restricted settings, such as when all vertex-weights are unit.

Authors: Ron Mosenzon

We develop new $(1+ε)$-approximation algorithms for finding the global minimum edge-cut in a directed edge-weighted graph, and for finding the global minimum vertex-cut in a directed vertex-weighted graph. Our algorithms are randomized, and have a running time of $O\left(m^{1+o(1)}/ε\right)$ on any $m$-edge $n$-vertex input graph, assuming all edge/vertex weights are polynomially-bounded. In particular, for any constant $ε>0$, our algorithms have an almost-optimal running time of $O\left(m^{1+o(1)}\right)$. The fastest previously-known running time for this setting, due to (Cen et al., FOCS 2021), is $\tilde{O}\left(\min\left\{n^2/ε^2,m^{1+o(1)}\sqrt{n}\right\}\right)$ for Minimum Edge-Cut, and $\tilde{O}\left(n^2/ε^2\right)$ for Minimum Vertex-Cut. Our results further extend to the rooted variants of the Minimum Edge-Cut and Minimum Vertex-Cut problems, where the algorithm is additionally given a root vertex $r$, and the goal is to find a minimum-weight cut separating any vertex from the root $r$. In terms of techniques, we build upon and extend a framework that was recently introduced by (Chuzhoy et al., SODA 2026) for solving the Minimum Vertex-Cut problem in unweighted directed graphs. Additionally, in order to obtain our result for the Global Minimum Vertex-Cut problem, we develop a novel black-box reduction from this problem to its rooted variant. Prior to our work, such reductions were only known for more restricted settings, such as when all vertex-weights are unit.

Wednesday, December 10

Benchmark Studies

from Ben Recht

It is impossible to disentangle technical innovation from technical debt

This is a live blog of the final lecture of the 2025 edition of my graduate machine learning class “Patterns, Predictions, and Actions.” A full Table of Contents is here. I tried to summarize my semester reflections in class on Thursday, but found my thoughts haven’t quite settled yet. I’m hoping a week of posting will help me sort through it.

My GSIs (one in particular) gave me a lot of guff about how the homework was too easy this semester. My response is that the theoretical foundations of the course are still too weak for me to pose hard math questions in good faith. Yes, there are difficult mathematical problems in finding the optimal constants for certain learning-theoretic bounds. But finding constants for theories that give bad advice in practice is a waste of everyone’s time. In machine learning, we learn more from social analysis than functional analysis, and it’s hard to write problem sets on sociology.

The theory in machine learning is frustratingly less mathematical than that of other fields of information engineering. For example, consider mathematical optimization, which I taught last fall. There, you begin with a mathematical modeling language. If you believe that you have a problem that can be written in this language, optimization provides algorithms to find the solution. If you can’t write your problem in that language, you’ll need to try something else.

You might think machine learning works that way too. If we believe we have a population in which a certain variable is well approximated by a parametric function of the other variables, then sure, we can look into algorithms to estimate that function via random sampling from the population. However, we rarely ever believe in this parametric assumption in applied machine learning. This is where we become untethered from the mathematics. In almost every problem that people care about, we simply don’t know the functional relationship connecting these quantities. Because if we did know it, we wouldn’t be using machine learning in the first place.

I like to illustrate this with two extreme cases:

  • I believe that there is a computer program that can tell whether a string of bits has an even or odd number of ones. I can write this program in a single line of Python. I don’t use machine learning.

  • I believe there is a computer program that can identify animals in images. I have no idea how to write that computer program. I turn to machine learning.

In machine learning, the price of admission is a belief that pattern recognition is possible and the conviction that it’s too hard to write a function describing these patterns from first principles. Generalization is an axiom, not a theorem. It would be nice if we could say one algorithm is better than another at finding these prediction functions, but we don’t have theory for that. Instead, we have to look to engineering “best practice” for what to do next.

Am I saying that every machine learning course has to have a few weeks on science studies? Yes I am.

Of course, this is frustrating as all hell to computer science students who are taught that you just type incantations into Jupyter notebooks and conjure up pure logical positivism. Machine learning has a recipe book, but, unlike when I’m teaching an undergraduate course in algorithms, I can’t justify much of it at all.

I know that I can approximate arbitrary nonlinear patterns as a composition of simple, componentwise nonlinear functions and linear maps (what we call neural networks). I know that I can arrange data streams into arrays that respect certain local consistency properties, thinking of text as sequences of tokens and images as spatial arrays of patches. I can compose these basic primitives to construct a bunch of different candidate prediction functions. I can use numerical search to find the linear maps in the resulting functional expression.

This set of weakly specified building blocks begets a zoo of methods. But we can only tell stories for why we’d prefer one over another. Random Forests, CNNs, RNNs, and Transformers are all potentially useful to get your number to go up, but they aren’t fundamental in any sense. There are some architectures that people get excited about for a while, and they yell about how awesome they are. Then some new architecture becomes exciting, and they yell about that. People build the next cool thing on the last cool thing. They tend not to emphasize how data is fundamental, and how it’s essential to set up your ETL pipeline in precisely the right way. But those pesky details are easy enough to find if you look through the various open-source machine learning repositories. And so machine learning continues, one pull request at a time.

This organic process is fine! But I don’t think you can explain anything about it with large deviation inequalities or functional analysis. How can I know which method is best? I check my answer on some leaderboard.

I’ve been trying to figure out how best to teach this in context. Our machine learning practices make it impossible to disentangle technical innovation from technical debt. I don’t want to prove theorems about some widget that is currently widely deployed because maybe it won’t be used next week. Some components are vestigial structures left behind after a frenzied series of paper deadlines and investor calls. Which structures? I can’t tell you.

On the other hand, machine learning has some shockingly robust practices that other fields should emulate. The train-test paradigm is fascinating. How can I know which method is best? I check my answer on some leaderboard. Despite statistical arguments declaring it fundamentally flawed, the culture of competitive testing on benchmarks has driven and still drives the engine of what the field defines as progress. We still can’t explain much about why it works as well as it does. We don’t have compelling theories for why benchmarks don’t go stale as they saturate, or why we see the patterns we see in empirical performance on these benchmarks. The social patterns here are fascinating, and they should be taught more explicitly in machine learning courses.

Although some argue that we need to move beyond the benchmarking paradigm, I would counter that the benchmarking paradigm defines the field. Believe that pattern recognition is possible. Specify your metric at the population level. Gather two samples representative of this population and use one for play and one for benchmarking, trying to maximize your metric. Once you get bored with the benchmark, make a new one. That’s machine learning in a nutshell. In practice, machine learning sociology is all we need.

Subscribe now

By Ben Recht

Direct Product Theorems for Randomized Query Complexity

from arXiv: Computational Complexity

Authors: Shalev Ben-David, Eric Blais

We establish two new direct product theorems for the randomized query complexity of Boolean functions. The first shows that computing $n$ copies of a function $f$, even with a small success probability of $γ^n$, requires $Θ(n)$ times the "maximum distributional" query complexity of $f$ with success parameter $γ$. This result holds for all success parameters $γ$, even when $γ$ is very close to $1/2$ or to $1$. As a result, it unifies and generalizes Drucker's direct product theorem (2012) for $γ$ bounded away from $\frac12$ and $1$ as well as the strong direct sum theorem of Blais and Brody (2019) for $γ\approx 1-1/n$. The second establishes a general "list decoding" direct product theorem that captures many different variants of partial computation tasks related to the function $f^n$ consisting of $n$ copies of $f$. Notably, our list decoding direct product theorem yields a new threshold direct product theorem and other new variants such as the labelled-threshold direct product theorem. Both of these direct product theorems are obtained by taking a new approach. Instead of directly analyzing the query complexity of algorithms, we introduce a new measure of complexity of functions that we call "discounted score". We show that this measure satisfies a number of useful structural properties, including tensorization, that make it particularly suitable for the study of direct product questions.

Authors: Shalev Ben-David, Eric Blais

We establish two new direct product theorems for the randomized query complexity of Boolean functions. The first shows that computing $n$ copies of a function $f$, even with a small success probability of $γ^n$, requires $Θ(n)$ times the "maximum distributional" query complexity of $f$ with success parameter $γ$. This result holds for all success parameters $γ$, even when $γ$ is very close to $1/2$ or to $1$. As a result, it unifies and generalizes Drucker's direct product theorem (2012) for $γ$ bounded away from $\frac12$ and $1$ as well as the strong direct sum theorem of Blais and Brody (2019) for $γ\approx 1-1/n$. The second establishes a general "list decoding" direct product theorem that captures many different variants of partial computation tasks related to the function $f^n$ consisting of $n$ copies of $f$. Notably, our list decoding direct product theorem yields a new threshold direct product theorem and other new variants such as the labelled-threshold direct product theorem. Both of these direct product theorems are obtained by taking a new approach. Instead of directly analyzing the query complexity of algorithms, we introduce a new measure of complexity of functions that we call "discounted score". We show that this measure satisfies a number of useful structural properties, including tensorization, that make it particularly suitable for the study of direct product questions.

Structure Theorems (and Fast Algorithms) for List Recovery of Subspace-Design Codes

from arXiv: Computational Complexity

Authors: Rohan Goyal, Venkatesan Guruswami

List recovery of error-correcting codes has emerged as a fundamental notion with broad applications across coding theory and theoretical computer science. Folded Reed-Solomon (FRS) and univariate multiplicity codes are explicit constructions which can be efficiently list-recovered up to capacity, namely a fraction of errors approaching $1-R$ where $R$ is the code rate. Chen and Zhang and related works showed that folded Reed-Solomon codes and linear codes must have list sizes exponential in $1/ε$ for list-recovering from an error-fraction $1-R-ε$. These results suggest that one cannot list-recover FRS codes in time that is also polynomial in $1/ε$. In contrast to such limitations, we show, extending algorithmic advances of Ashvinkumar, Habib, and Srivastava for list decoding, that even if the lists in the case of list-recovery are large, they are highly structured. In particular, we can output a compact description of a set of size only $\ell^{O((\log \ell)/ε)}$ which contains the relevant list, while running in time only polynomial in $1/ε$ (the previously known compact description due to Guruswami and Wang had size $\approx n^{\ell/ε}$). We also improve on the state-of-the-art algorithmic results for the task of list-recovery.

Authors: Rohan Goyal, Venkatesan Guruswami

List recovery of error-correcting codes has emerged as a fundamental notion with broad applications across coding theory and theoretical computer science. Folded Reed-Solomon (FRS) and univariate multiplicity codes are explicit constructions which can be efficiently list-recovered up to capacity, namely a fraction of errors approaching $1-R$ where $R$ is the code rate. Chen and Zhang and related works showed that folded Reed-Solomon codes and linear codes must have list sizes exponential in $1/ε$ for list-recovering from an error-fraction $1-R-ε$. These results suggest that one cannot list-recover FRS codes in time that is also polynomial in $1/ε$. In contrast to such limitations, we show, extending algorithmic advances of Ashvinkumar, Habib, and Srivastava for list decoding, that even if the lists in the case of list-recovery are large, they are highly structured. In particular, we can output a compact description of a set of size only $\ell^{O((\log \ell)/ε)}$ which contains the relevant list, while running in time only polynomial in $1/ε$ (the previously known compact description due to Guruswami and Wang had size $\approx n^{\ell/ε}$). We also improve on the state-of-the-art algorithmic results for the task of list-recovery.

New Constructions of SSPDs and their Applications

from arXiv: Computational Geometry

Authors: Mohammad A. Abam, Sariel Har-Peled

$\renewcommand{\Re}{\mathbb{R}}$We present a new optimal construction of a semi-separated pair decomposition (i.e., SSPD) for a set of $n$ points in $\Re^d$. In the new construction each point participates in a few pairs, and it extends easily to spaces with low doubling dimension. This is the first optimal construction with these properties. As an application of the new construction, for a fixed $t>1$, we present a new construction of a $t$-spanner with $O(n)$ edges and maximum degree $O(\log^2 n)$ that has a separator of size $O\pth{n^{1-1/d}}$.

Authors: Mohammad A. Abam, Sariel Har-Peled

$\renewcommand{\Re}{\mathbb{R}}$We present a new optimal construction of a semi-separated pair decomposition (i.e., SSPD) for a set of $n$ points in $\Re^d$. In the new construction each point participates in a few pairs, and it extends easily to spaces with low doubling dimension. This is the first optimal construction with these properties. As an application of the new construction, for a fixed $t>1$, we present a new construction of a $t$-spanner with $O(n)$ edges and maximum degree $O(\log^2 n)$ that has a separator of size $O\pth{n^{1-1/d}}$.

Connectivity-Preserving Cortical Surface Tetrahedralization

from arXiv: Computational Geometry

Authors: Besm Osman, Ruben Vink, Andrei Jalba, Maxime Chamberland

A prerequisite for many biomechanical simulation techniques is discretizing a bounded volume into a tetrahedral mesh. In certain contexts, such as cortical surface simulations, preserving input surface connectivity is critical. However, automated surface extraction often yields meshes containing self-intersections, small holes, and faulty geometry, which prevents existing constrained and unconstrained meshers from preserving this connectivity. We address this issue by developing a novel tetrahedralization method that maintains input surface connectivity in the presence of such defects. We also present a metric to quantify the preservation of surface connectivity and demonstrate that our method correctly maintains connectivity compared to existing solutions.

Authors: Besm Osman, Ruben Vink, Andrei Jalba, Maxime Chamberland

A prerequisite for many biomechanical simulation techniques is discretizing a bounded volume into a tetrahedral mesh. In certain contexts, such as cortical surface simulations, preserving input surface connectivity is critical. However, automated surface extraction often yields meshes containing self-intersections, small holes, and faulty geometry, which prevents existing constrained and unconstrained meshers from preserving this connectivity. We address this issue by developing a novel tetrahedralization method that maintains input surface connectivity in the presence of such defects. We also present a metric to quantify the preservation of surface connectivity and demonstrate that our method correctly maintains connectivity compared to existing solutions.

Reeb Graph of Sample Thickenings

from arXiv: Computational Geometry

Authors: Håvard Bakke Bjerkevik, Nello Blaser, Lars M. Salbu

We consider the Reeb graph of a thickening of points sampled from an unknown space. Our main contribution is a framework to transfer reconstruction results similar to the well-known work of Niyogi, Smale, and Weinberger to the setting of Reeb graphs. To this end, we first generalize and study the interleaving distances for Reeb graphs. We find that many of the results previously established for constructible spaces also hold for general topological spaces. We use this to show that under certain conditions for topological spaces with real-valued Lipschitz maps, the Reeb graph of a sample thickening approximates the Reeb graph of the underlying space. Finally, we provide an algorithm for computing the Reeb graph of a sample thickening.

Authors: Håvard Bakke Bjerkevik, Nello Blaser, Lars M. Salbu

We consider the Reeb graph of a thickening of points sampled from an unknown space. Our main contribution is a framework to transfer reconstruction results similar to the well-known work of Niyogi, Smale, and Weinberger to the setting of Reeb graphs. To this end, we first generalize and study the interleaving distances for Reeb graphs. We find that many of the results previously established for constructible spaces also hold for general topological spaces. We use this to show that under certain conditions for topological spaces with real-valued Lipschitz maps, the Reeb graph of a sample thickening approximates the Reeb graph of the underlying space. Finally, we provide an algorithm for computing the Reeb graph of a sample thickening.

Parallel Batch Dynamic Vertex Coloring in $O(\log Δ)$ Amortized Update Time

from arXiv: Data Structures and Algorithms

Authors: Chase Hutton, Adam Melrod

We present the first parallel batch-dynamic algorithm for maintaining a proper $(Δ+ 1)$-vertex coloring. Our approach builds on a new sequential dynamic algorithm inspired by the work of Bhattacharya et al. (SODA'18). The resulting randomized algorithm achieves $O(\log Δ)$ expected amortized update time and, for any batch of $b$ updates, has parallel span $O(\operatorname{polylog} b + \operatorname{polylog} n)$ with high probability.

Authors: Chase Hutton, Adam Melrod

We present the first parallel batch-dynamic algorithm for maintaining a proper $(Δ+ 1)$-vertex coloring. Our approach builds on a new sequential dynamic algorithm inspired by the work of Bhattacharya et al. (SODA'18). The resulting randomized algorithm achieves $O(\log Δ)$ expected amortized update time and, for any batch of $b$ updates, has parallel span $O(\operatorname{polylog} b + \operatorname{polylog} n)$ with high probability.

Fast exact algorithms via the Matrix Tree Theorem

from arXiv: Data Structures and Algorithms

Authors: V. Arvind, Srijan Chakraborty, Samir Datta, Asif Khan

Fast exact algorithms are known for Hamiltonian paths in undirected and directed bipartite graphs through elegant though involved algorithms that are quite different from each other. We devise algorithms that are simple and similar to each other while having the same upper bounds. The common features of these algorithms is the use of the Matrix-Tree theorem and sieving using roots of unity. Next, we use the framework to provide alternative algorithms to count perfect matchings in bipartite graphs on $n$ vertices, i.e., computing the $\{0,1\}$-permanent of a square $n/2 \times n/2$ matrix which runs in a time similar to Ryser. We demonstrate the flexibility of our method by counting the number of ways to vertex partition the graph into $k$-stars (a $k$-star consist of a tree with a root having $k-1$ children that are all leaves). Interestingly, our running time improves to $O^*((1+ε_k)^n)$ with $ε_k \rightarrow 0$ as $k \rightarrow \infty$. As an aside, making use of Björklund's algorithm for exact counting perfect matchings in general graphs, we show that the count of maximum matchings can be computed in time $O^*(2^ν)$ where $ν$ is the size of a maximum matching. The crucial ingredient here is the famous Gallai-Edmonds decomposition theorem. All our algorithms run in polynomial space.

Authors: V. Arvind, Srijan Chakraborty, Samir Datta, Asif Khan

Fast exact algorithms are known for Hamiltonian paths in undirected and directed bipartite graphs through elegant though involved algorithms that are quite different from each other. We devise algorithms that are simple and similar to each other while having the same upper bounds. The common features of these algorithms is the use of the Matrix-Tree theorem and sieving using roots of unity. Next, we use the framework to provide alternative algorithms to count perfect matchings in bipartite graphs on $n$ vertices, i.e., computing the $\{0,1\}$-permanent of a square $n/2 \times n/2$ matrix which runs in a time similar to Ryser. We demonstrate the flexibility of our method by counting the number of ways to vertex partition the graph into $k$-stars (a $k$-star consist of a tree with a root having $k-1$ children that are all leaves). Interestingly, our running time improves to $O^*((1+ε_k)^n)$ with $ε_k \rightarrow 0$ as $k \rightarrow \infty$. As an aside, making use of Björklund's algorithm for exact counting perfect matchings in general graphs, we show that the count of maximum matchings can be computed in time $O^*(2^ν)$ where $ν$ is the size of a maximum matching. The crucial ingredient here is the famous Gallai-Edmonds decomposition theorem. All our algorithms run in polynomial space.

Weighted $k$-Path and Other Problems in Almost $O^*(2^k)$ Deterministic Time via Dynamic Representative Sets

from arXiv: Data Structures and Algorithms

Authors: Jesper Nederlof

We present a data structure that we call a Dynamic Representative Set. In its most basic form, it is given two parameters $0< k < n$ and allows us to maintain a representation of a family $\mathcal{F}$ of subsets of $\{1,\ldots,n\}$. It supports basic update operations (unioning of two families, element convolution) and a query operation that determines for a set $B \subseteq \{1,\ldots,n\}$ whether there is a set $A \in \mathcal{F}$ of size at most $k-|B|$ such that $A$ and $B$ are disjoint. After $2^{k+O(\sqrt{k}\log^2k)}n \log n$ preprocessing time, all operations use $2^{k+O(\sqrt{k}\log^2k)}\log n$ time. Our data structure has many algorithmic consequences that improve over previous works. One application is a deterministic algorithm for the Weighted Directed $k$-Path problem, one of the central problems in parameterized complexity. Our algorithm takes as input an $n$-vertex directed graph $G=(V,E)$ with edge lengths and an integer $k$, and it outputs the minimum edge length of a path on $k$ vertices in $2^{k+O(\sqrt{k}\log^2k)}(n+m)\log n$ time (in the word RAM model where weights fit into a single word). Modulo the lower order term $2^{O(\sqrt{k}\log^2k)}$, this answers a question that has been repeatedly posed as a major open problem in the field.

Authors: Jesper Nederlof

We present a data structure that we call a Dynamic Representative Set. In its most basic form, it is given two parameters $0< k < n$ and allows us to maintain a representation of a family $\mathcal{F}$ of subsets of $\{1,\ldots,n\}$. It supports basic update operations (unioning of two families, element convolution) and a query operation that determines for a set $B \subseteq \{1,\ldots,n\}$ whether there is a set $A \in \mathcal{F}$ of size at most $k-|B|$ such that $A$ and $B$ are disjoint. After $2^{k+O(\sqrt{k}\log^2k)}n \log n$ preprocessing time, all operations use $2^{k+O(\sqrt{k}\log^2k)}\log n$ time. Our data structure has many algorithmic consequences that improve over previous works. One application is a deterministic algorithm for the Weighted Directed $k$-Path problem, one of the central problems in parameterized complexity. Our algorithm takes as input an $n$-vertex directed graph $G=(V,E)$ with edge lengths and an integer $k$, and it outputs the minimum edge length of a path on $k$ vertices in $2^{k+O(\sqrt{k}\log^2k)}(n+m)\log n$ time (in the word RAM model where weights fit into a single word). Modulo the lower order term $2^{O(\sqrt{k}\log^2k)}$, this answers a question that has been repeatedly posed as a major open problem in the field.

A Distribution Testing Approach to Clustering Distributions

from arXiv: Data Structures and Algorithms

Authors: Gunjan Kumar, Yash Pote, Jonathan Scarlett

We study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are $\varepsilon$-far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size $n$, number of distributions $k$, size $r$ of one of the clusters, and distance $\varepsilon$. In particular, we achieve tightness with respect to $(n,k,r,\varepsilon)$ (up to an $O(\log k)$ factor) for all regimes.

Authors: Gunjan Kumar, Yash Pote, Jonathan Scarlett

We study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are $\varepsilon$-far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size $n$, number of distributions $k$, size $r$ of one of the clusters, and distance $\varepsilon$. In particular, we achieve tightness with respect to $(n,k,r,\varepsilon)$ (up to an $O(\log k)$ factor) for all regimes.

A tight example for approximation ratio 5 for covering small cuts by the primal-dual method

from arXiv: Data Structures and Algorithms

Authors: Zeev Nutov

In the Small Cuts Cover problem we seek to cover by a min-cost edge-set the set family of cuts of size/capacity $

Authors: Zeev Nutov

In the Small Cuts Cover problem we seek to cover by a min-cost edge-set the set family of cuts of size/capacity $

The Bichromatic Two-Center Problem on Graphs

from arXiv: Data Structures and Algorithms

Authors: Qi Sun, Jingru Zhang

In this paper, we study the (weighted) bichromatic two-center problem on graphs. The input consists of a graph $G$ of $n$ (weighted) vertices and $m$ edges, and a set $\mathcal{P}$ of pairs of distinct vertices, where no vertex appears in more than one pair. The problem aims to find two points (i.e., centers) on $G$ by assigning vertices of each pair to different centers so as to minimize the maximum (weighted) distance of vertices to their assigned centers (so that the graph can be bi-colored with this goal). To the best of our knowledge, this problem has not been studied on graphs, including tree graphs. In this paper, we propose an $O(m^2n\log n\log mn)$ algorithm for solving the problem on an undirected graph provided with the distance matrix, an $O(n\log n)$-time algorithm for the problem on trees, and a linear-time approach for the unweighted tree version.

Authors: Qi Sun, Jingru Zhang

In this paper, we study the (weighted) bichromatic two-center problem on graphs. The input consists of a graph $G$ of $n$ (weighted) vertices and $m$ edges, and a set $\mathcal{P}$ of pairs of distinct vertices, where no vertex appears in more than one pair. The problem aims to find two points (i.e., centers) on $G$ by assigning vertices of each pair to different centers so as to minimize the maximum (weighted) distance of vertices to their assigned centers (so that the graph can be bi-colored with this goal). To the best of our knowledge, this problem has not been studied on graphs, including tree graphs. In this paper, we propose an $O(m^2n\log n\log mn)$ algorithm for solving the problem on an undirected graph provided with the distance matrix, an $O(n\log n)$-time algorithm for the problem on trees, and a linear-time approach for the unweighted tree version.

Tuesday, December 09

PH.D. Positions in Theoretical Computer Science at Boston College (apply by January 2, 2026)

from CCI: jobs

Boston College invites applications for a the newly established Ph.D. program. The doctoral program will specialize in two broad research areas theory of computation and artificial intelligence/machine learning. For more information contact the program Director Sergio Alvarez. Website: www.bc.edu/content/bc-web/schools/morrissey/departments/computer-science/academics/phd.html Email: sergio.alvarez@bc.edu

Boston College invites applications for a the newly established Ph.D. program. The doctoral program will specialize in two broad research areas theory of computation and artificial intelligence/machine learning. For more information contact the program Director Sergio Alvarez.

Website: https://www.bc.edu/content/bc-web/schools/morrissey/departments/computer-science/academics/phd.html
Email: sergio.alvarez@bc.edu

By shacharlovett

There is no data-generating distribution

from Ben Recht

Reflecting on teaching machine learning again. Again.

This is a live blog of the final lecture of the 2025 edition of my graduate machine learning class “Patterns, Predictions, and Actions.” A full Table of Contents is here. I tried to summarize my semester reflections in class on Thursday, but found my thoughts haven’t quite settled yet. I’m hoping a week of posting will help me sort through it.

This semester was the first time I taught machine learning and never felt like I was lying. Over the past decade or so, I’ve been working to strip from the curriculum field-making myths like overfitting and bias-variance tradeoffs. As I dug into the roots of the misunderstandings and misinterpretations of machine learning, I kept hitting against the unassailable belief in the data-generating distribution. There is no data-generating distribution. This semester, I managed to remove that too.

Machine learning is all about recognizing patterns in collected data and transforming these patterns into predictions and actions. This transformation requires a model for how the data is linked to the outcomes we aim to forecast or act upon. We have a sample that we believe is representative of a population. The model declares, implicitly or explicitly, what “representative” means. The most ubiquitous model in machine learning posits that datasets are comprised of samples from a probability distribution. This is the data-generating distribution. It doesn’t exist.

Just think about it. What is the stochastic process that creates the radiology images in a cancer detection dataset? What is the stochastic process that generates click-through data for a machine learning engineer building a recommendation system? What is the stochastic process that results in the millions of books literally torn to pieces by Anthropic for the pretraining of their language model?

There is no data-generating distribution. And yet, machine learning theorists and practitioners love to talk about it. Machine learning theorists are at least upset about it. They love to make arguments that are “distribution-free,” but then they always have a data-generating distribution hiding in the background. This semester, I managed to stitch together a truly distribution-free story. Randomness can be created and applied algorithmically, but nature need not be modeled as god playing dice. All of the randomness is made or imagined by the machine learning engineer.

To set the stage, we first need to think about populations, the imagined examples you want to make predictions about or take actions upon. The central tenet of this class is that our model of the population dictates what we see in our samples. Once your conception of the population is set, what you do with samples is determined. Most of the mucking around with data targets action at the population level, so think there first.

Indeed, the first step in any forecasting or decision-making problem is to discuss how predictions will be scored at the population level. Once you conceive of a scoring system, you can compare the value of different prediction methods. The scoring determines the best predictions.

As an example of how this population-level thinking works, say that you have to decide whether to give an entire population of people a drug or not. You know in advance how everyone responds upon getting the drug. Your central planner tells you the required quality-adjusted life years needed to keep public opinion high. You plug in the two options and decide to administer the drug based on which score is higher in your epidemiological metric.

This is more or less how decision theory works. The optimal prediction and decision are completely determined by the score function you choose. I call this metrical determinism. Different cost-benefit of overdiagnosis will yield different policy recommendations for cancer screening. Different philosophies for demographics of an incoming class will yield different admission rules. Different risk tolerances determine different investment strategies.

The population model determines the consequences of optimization-based decision making. When your metric is an average, your optimal prediction is necessarily statistical. Maximizing average benefit requires statistical rules. Trade-offs between error types and Neyman-Pearson decision theory all derive from the population properties. All of the results concerning the impossibility, incompatibility, and incoherence of fair machine learning are population-level arguments. None of these deductions require a data-generating distribution.

Being clear about what we think the population is is critical. We are going to make decisions at the population level based on sample-level evidence, but sample-level decisions try to approximate ideal population-level decisions given the statistical facts at hand.

To make such sample-level approximations, we need to make assumptions about how the sample is linked to the population. There are many models in machine learning, each defined by specific evaluation criteria.

Batch Learning: Someone hands you a sample of data. You can do whatever you want with it. You are evaluated by how well your model does on the remainder of the population. To justify sample-level conclusions, we typically choose a model so that the law of large numbers holds. This could be that the data are iid samples from the population. This could be that the data is an actively randomly sampled subset of the population (these two are very different assumptions). The model could also be deterministic, with assumptions about how observations and predictions are linked—people like to assume linearity. Some combination of modeling assumptions gives us confidence that data from the past will be representative of observations in the future.

Online Learning: You run through the data sequentially, make predictions, and update your models based on the accuracy of each sequential prediction. Online learning is fascinating because no randomness is required to derive theorems. It is fundamentally a non-stochastic theory. The key is to compare with the best predictions at the population level and to show that, on average, you match population-level performance as the sample grows. If pattern recognition is possible, your algorithm will recognize them. The classic Perceptron learning bound is the classic example of online learning, but similar bounds can be derived for gradient methods on linear models and many other machine learning applications. The downside here is the clunky, unintuitive evaluation metric that is regret.

Empiricist Learning: Your score is at the population level, but you can act on the population to make decisions. You assume that you have a mechanism for selecting individuals from the population and acting on them (i.e., choosing some representative individuals from the group for an experiment). Based on the selection mechanisms available, you can design algorithms with ex-ante guarantees about population performance. This was the model I described in Lecture 19 on adaptive experiment design. When I do the class again next time, I’m going to clarify that you can model batch learning this way too. More broadly, this perspective connects empirical risk minimization in machine learning to decision theory without Bayesian modeling. I don’t know what to call this perspective, which lies between the batch and online models, but it’s the one I’m most excited about. A statistically minded reader might call it “design-based” machine learning.

The data-generating distribution may be a convenient mnemonic crutch for machine learning engineers. But it’s not necessary to understand machine learning. We use the same methods regardless of whether the world is producing randomness. Why that is the case is interesting. Or at least, it’s interesting to me. To make sense of our overly data-driven culture, we should figure out when we actually need statistical models of data. In machine learning, the answer could be never.

From the outset of our collaboration, Moritz and I discussed teaching this machine learning course without data-generating distributions. It took us a few iterations of teaching the class to get there. Perhaps I can entice him to write a second edition of PPA where we finish the job.

Subscribe now

By Ben Recht

Virtual Qudits for Simon's Problem: Dimension-Lifted Algorithms on Qubit Hardware

from arXiv: Computational Complexity

Authors: Abed Semre, Steven Frankel

Simon's problem admits an exponential quantum speedup, but current quantum devices support only qubits. This work introduces a general construction for simulating qudit versions of Simon's algorithm on qubit hardware by defining virtual qudits implemented through controlled permutations and qudit phase operations. We build a dimension lifted oracle that encodes the hidden shift in dimension d and show how to realize its action using only qubit gates. We mathematically verify that the lifted circuit reproduces the correct measurement statistics, analyze the depth overhead tradeoffs as a function of d, and provide numerical simulations in QuTiP for example values. Our approach demonstrates how higher-dimensional structures can be embedded into qubit devices and provides a general method for extending qudit algorithms to current hardware.

Authors: Abed Semre, Steven Frankel

Simon's problem admits an exponential quantum speedup, but current quantum devices support only qubits. This work introduces a general construction for simulating qudit versions of Simon's algorithm on qubit hardware by defining virtual qudits implemented through controlled permutations and qudit phase operations. We build a dimension lifted oracle that encodes the hidden shift in dimension d and show how to realize its action using only qubit gates. We mathematically verify that the lifted circuit reproduces the correct measurement statistics, analyze the depth overhead tradeoffs as a function of d, and provide numerical simulations in QuTiP for example values. Our approach demonstrates how higher-dimensional structures can be embedded into qubit devices and provides a general method for extending qudit algorithms to current hardware.

Optimal Preconditioning is a Geodesically Convex Optimization Problem

from arXiv: Computational Complexity

Authors: M. Levent Doğan, Alperen Ergür, Elias Tsigaridas

We introduce a unified framework for computing approximately-optimal preconditioners for solving linear and non-linear systems of equations. We demonstrate that the condition number minimization problem, under structured transformations such as diagonal and block-diagonal preconditioners, is geodesically convex with respect to unitarily invariant norms, including the Frobenius and Bombieri--Weyl norms. This allows us to introduce efficient first-order algorithms with precise convergence guarantees. For linear systems, we analyze the action of symmetric Lie subgroups $G \subseteq \GL_m(\CC) \times \GL_n(\CC)$ on the input matrix and prove that the logarithm of the condition number is a smooth geodesically convex function on the associated Riemannian quotient manifold. We obtain explicit gradient formulas, show Lipschitz continuity, and prove convergence rates for computing the optimal Frobenius condition number: $\widetilde{O}(1/\eps^2)$ iterations for general two-sided preconditioners and $\widetilde{O}(κ_F^2 \log(1/\eps))$ for strongly convex cases such as left preconditioning. We extend our framework to consider preconditioning of polynomial systems $\f(x) = 0$, where $\f$ is a system of multivariate polynomials. We analyze the local condition number $μ(\f, ξ)$, at a root $ξ$ and prove that it also admits a geodesically convex formulation under appropriate group actions. We deduce explicit formulas for the Riemannian gradients and present convergence bounds for the corresponding optimization algorithms. To the best of our knowledge, this is the first preconditioning algorithm with theoretical guarantees for polynomial systems.

Authors: M. Levent Doğan, Alperen Ergür, Elias Tsigaridas

We introduce a unified framework for computing approximately-optimal preconditioners for solving linear and non-linear systems of equations. We demonstrate that the condition number minimization problem, under structured transformations such as diagonal and block-diagonal preconditioners, is geodesically convex with respect to unitarily invariant norms, including the Frobenius and Bombieri--Weyl norms. This allows us to introduce efficient first-order algorithms with precise convergence guarantees. For linear systems, we analyze the action of symmetric Lie subgroups $G \subseteq \GL_m(\CC) \times \GL_n(\CC)$ on the input matrix and prove that the logarithm of the condition number is a smooth geodesically convex function on the associated Riemannian quotient manifold. We obtain explicit gradient formulas, show Lipschitz continuity, and prove convergence rates for computing the optimal Frobenius condition number: $\widetilde{O}(1/\eps^2)$ iterations for general two-sided preconditioners and $\widetilde{O}(κ_F^2 \log(1/\eps))$ for strongly convex cases such as left preconditioning. We extend our framework to consider preconditioning of polynomial systems $\f(x) = 0$, where $\f$ is a system of multivariate polynomials. We analyze the local condition number $μ(\f, ξ)$, at a root $ξ$ and prove that it also admits a geodesically convex formulation under appropriate group actions. We deduce explicit formulas for the Riemannian gradients and present convergence bounds for the corresponding optimization algorithms. To the best of our knowledge, this is the first preconditioning algorithm with theoretical guarantees for polynomial systems.

Algebra in Algorithmic Coding Theory

from arXiv: Computational Complexity

Authors: Madhu Sudan

We survey the notion and history of error-correcting codes and the algorithms needed to make them effective in information transmission. We then give some basic as well as more modern constructions of, and algorithms for, error-correcting codes that depend on relatively simple elements of applied algebra. While the role of algebra in the constructions of codes has been widely acknowledged in texts and other writings, the role in the design of algorithms is often less widely understood, and this survey hopes to reduce this difference to some extent.

Authors: Madhu Sudan

We survey the notion and history of error-correcting codes and the algorithms needed to make them effective in information transmission. We then give some basic as well as more modern constructions of, and algorithms for, error-correcting codes that depend on relatively simple elements of applied algebra. While the role of algebra in the constructions of codes has been widely acknowledged in texts and other writings, the role in the design of algorithms is often less widely understood, and this survey hopes to reduce this difference to some extent.

On the hardness of recognizing graphs of small mim-width and its variants

from arXiv: Computational Complexity

Authors: Max Dupré la Tour, Manuel Lafond, Ndiamé Ndiaye

The mim-width of a graph is a powerful structural parameter that, when bounded by a constant, allows several hard problems to be polynomial-time solvable - with a recent meta-theorem encompassing a large class of problems [SODA2023]. Since its introduction, several variants such as sim-width and omim-width were developed, along with a linear version of these parameters. It was recently shown that mim-width and all these variants all paraNP-hard, a consequence of the NP-hardness of distinguishing between graphs of linear mim-width at most 1211 and graphs of sim-width at least 1216 [ICALP2025]. The complexity of recognizing graphs of small width, particularly those close to $1$, remained open, despite their especially attractive algorithmic applications. In this work, we show that the width recognition problems remain NP-hard even on small widths. Specifically, after introducing the novel parameter Omim-width sandwiched between omim-width and mim-width, we show that: (1) deciding whether a graph has sim-width = 1, omim-width = 1, or Omin-width = 1 is NP-hard, and the same is true for their linear variants; (2) the problems of deciding whether mim-width $\leq$ 2 or linear mim-width $\leq$ 2 are both NP-hard. Interestingly, our reductions are relatively simple and are from the Unrooted Quartet Consistency problem, which is of great interest in computational biology but is not commonly used (if ever) in the theory of algorithms.

Authors: Max Dupré la Tour, Manuel Lafond, Ndiamé Ndiaye

The mim-width of a graph is a powerful structural parameter that, when bounded by a constant, allows several hard problems to be polynomial-time solvable - with a recent meta-theorem encompassing a large class of problems [SODA2023]. Since its introduction, several variants such as sim-width and omim-width were developed, along with a linear version of these parameters. It was recently shown that mim-width and all these variants all paraNP-hard, a consequence of the NP-hardness of distinguishing between graphs of linear mim-width at most 1211 and graphs of sim-width at least 1216 [ICALP2025]. The complexity of recognizing graphs of small width, particularly those close to $1$, remained open, despite their especially attractive algorithmic applications. In this work, we show that the width recognition problems remain NP-hard even on small widths. Specifically, after introducing the novel parameter Omim-width sandwiched between omim-width and mim-width, we show that: (1) deciding whether a graph has sim-width = 1, omim-width = 1, or Omin-width = 1 is NP-hard, and the same is true for their linear variants; (2) the problems of deciding whether mim-width $\leq$ 2 or linear mim-width $\leq$ 2 are both NP-hard. Interestingly, our reductions are relatively simple and are from the Unrooted Quartet Consistency problem, which is of great interest in computational biology but is not commonly used (if ever) in the theory of algorithms.

The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds

from arXiv: Computational Complexity

Authors: Shahar Lutati

When should an autonomous agent commit resources to a task? We introduce the Agent Capability Problem (ACP), a framework for predicting whether an agent can solve a problem under resource constraints. Rather than relying on empirical heuristics, ACP frames problem-solving as information acquisition: an agent requires $\Itotal$ bits to identify a solution and gains $\Istep$ bits per action at cost $\Cstep$, yielding an effective cost $\Ceff = (\Itotal/\Istep), \Cstep$ that predicts resource requirements before search. We prove that $\Ceff$ lower-bounds expected cost and provide tight probabilistic upper bounds. Experimental validation shows that ACP predictions closely track actual agent performance, consistently bounding search effort while improving efficiency over greedy and random strategies. The framework generalizes across LLM-based and agentic workflows, linking principles from active learning, Bayesian optimization, and reinforcement learning through a unified information-theoretic lens. \

Authors: Shahar Lutati

When should an autonomous agent commit resources to a task? We introduce the Agent Capability Problem (ACP), a framework for predicting whether an agent can solve a problem under resource constraints. Rather than relying on empirical heuristics, ACP frames problem-solving as information acquisition: an agent requires $\Itotal$ bits to identify a solution and gains $\Istep$ bits per action at cost $\Cstep$, yielding an effective cost $\Ceff = (\Itotal/\Istep), \Cstep$ that predicts resource requirements before search. We prove that $\Ceff$ lower-bounds expected cost and provide tight probabilistic upper bounds. Experimental validation shows that ACP predictions closely track actual agent performance, consistently bounding search effort while improving efficiency over greedy and random strategies. The framework generalizes across LLM-based and agentic workflows, linking principles from active learning, Bayesian optimization, and reinforcement learning through a unified information-theoretic lens. \