Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Wednesday, March 25

2nd Quantum Cambridge-Oxford-Warwick Colloquium (QCOW)

from CS Theory Events

April 23-24, 2026 University of Warwick qcow.cs.ox.ac.uk/ QCOW is a new series of meetings dedicated to advancing the understanding of fundamental questions and open problems in quantum complexity theory. The meetings will rotate between universities of Cambridge, Oxford, and Warwick, with each event focusing on a specific theme within the theoretical foundations of quantum computation. … Continue reading 2nd Quantum Cambridge-Oxford-Warwick Colloquium (QCOW)

By shacharlovett

April 23-24, 2026 University of Warwick https://qcow.cs.ox.ac.uk/ QCOW is a new series of meetings dedicated to advancing the understanding of fundamental questions and open problems in quantum complexity theory. The meetings will rotate between universities of Cambridge, Oxford, and Warwick, with each event focusing on a specific theme within the theoretical foundations of quantum computation. … Continue reading 2nd Quantum Cambridge-Oxford-Warwick Colloquium (QCOW)

By shacharlovett

My Oxford Term

from Computational Complexity

♦ High table dinner at Magdalen
My time in Oxford has come to an end and I head back to Chicago this week. I was a visiting Fellow at Magdalen (pronounced "maudlin") College for the Hilary Term.
There's a six week break between the eight-week Hilary and Trinity terms. They work the fellows hard during the terms with teaching, tutoring, admissions, hiring and various other administrative functions. All the events, seminars, workshops, high-table dinners are scheduled during the term. Pretty much nothing between terms, and many domestic students are forced out of their housing, and many of the fellows/professors leave town as well. An interesting strategy when us Americans get just a week for spring break. 

I came here for research, working mostly with my former PhD student Rahul Santhanam, a tutorial fellow at Magdalen, and his students. More on the research in a future post.

I took full advantage of the Magdalen college life, working in the senior common room, having lunch in the winter common room, evensong in the chapel with an outstanding choir and organ, and high table dinner in the hall. I had the same experiences as Magdalen fellows have for centuries including CS Lewis, Oscar Wilde and Erwin Schrödinger. There's also a summer common room with a secret door to the old library, and by old it predates most American universities. Magdalen looks like such a traditional old college that some recent Oxford-set shows, including My Oxford Year and Young Sherlock, had extensive filming there. 

As I mentioned earlier, community focuses on the college not on the departments. I had an office in the CS building but didn't spend that much time there. Every day at Magdalen particularly at lunch and dinner, I had great conversations with lawyers, biologists, historians, archivists, literature, music historians, stained-glass restorers and the numismatist who manages the 300,000 coin collection of the Oxford Ashmolean museum.

One dinner I sat next to the COO of the new Ellison Institute of Technology, a ten billion dollar venture in Oxford but independent of the university, funded by the Oracle CEO. She talked considerably about the famous pub, the Eagle and the Child. The pub, nicknamed the Bird and the Baby, was famous as the meeting place of the Inklings, a group of writers including Lewis and Tolkien. It never reopened after Covid and was purchased by Ellison and currently being renovated. 

Another visiting fellow, Elaine Treharne, was giving a talk on Medieval poetry the same week I talked about complexity and machine learning. We went to each other's talk. Hers in the brand new Schwarzman Centre for the Humanities, the same Schwarzman from MIT's College of Computing and mine in a CS building that's a mish mash of other buildings. She outdrew me two to one.

By Lance Fortnow

High table dinner at Magdalen

My time in Oxford has come to an end and I head back to Chicago this week. I was a visiting Fellow at Magdalen (pronounced "maudlin") College for the Hilary Term.

There's a six week break between the eight-week Hilary and Trinity terms. They work the fellows hard during the terms with teaching, tutoring, admissions, hiring and various other administrative functions. All the events, seminars, workshops, high-table dinners are scheduled during the term. Pretty much nothing between terms, and many domestic students are forced out of their housing, and many of the fellows/professors leave town as well. An interesting strategy when us Americans get just a week for spring break. 

I came here for research, working mostly with my former PhD student Rahul Santhanam, a tutorial fellow at Magdalen, and his students. More on the research in a future post.

I took full advantage of the Magdalen college life, working in the senior common room, having lunch in the winter common room, evensong in the chapel with an outstanding choir and organ, and high table dinner in the hall. I had the same experiences as Magdalen fellows have for centuries including CS Lewis, Oscar Wilde and Erwin Schrödinger. There's also a summer common room with a secret door to the old library, and by old it predates most American universities. Magdalen looks like such a traditional old college that some recent Oxford-set shows, including My Oxford Year and Young Sherlock, had extensive filming there. 

As I mentioned earlier, community focuses on the college not on the departments. I had an office in the CS building but didn't spend that much time there. Every day at Magdalen particularly at lunch and dinner, I had great conversations with lawyers, biologists, historians, archivists, literature, music historians, stained-glass restorers and the numismatist who manages the 300,000 coin collection of the Oxford Ashmolean museum.

One dinner I sat next to the COO of the new Ellison Institute of Technology, a ten billion dollar venture in Oxford but independent of the university, funded by the Oracle CEO. She talked considerably about the famous pub, the Eagle and the Child. The pub, nicknamed the Bird and the Baby, was famous as the meeting place of the Inklings, a group of writers including Lewis and Tolkien. It never reopened after Covid and was purchased by Ellison and currently being renovated. 

Another visiting fellow, Elaine Treharne, was giving a talk on Medieval poetry the same week I talked about complexity and machine learning. We went to each other's talk. Hers in the brand new Schwarzman Centre for the Humanities, the same Schwarzman from MIT's College of Computing and mine in a CS building that's a mish mash of other buildings. She outdrew me two to one.

By Lance Fortnow

Exponential Separation of Quantum and Classical One-Way Numbers-on-Forehead Communication

from arXiv: Computational Complexity

Authors: Guangxu Yang, Jiapeng Zhang

Numbers-on-Forehead (NOF) communication model is a central model in communication complexity. As a restricted variant, one-way NOF model is of particular interest. Establishing strong one-way NOF lower bounds would imply circuit lower bounds, resolve well-known problems in additive combinatorics, and yield wide-ranging applications in areas such as cryptography and distributed computing. However, proving strong lower bounds in one-way NOF communication remains highly challenging; many fundamental questions in one-way NOF communication remain wide open. One of the fundamental questions, proposed by Gavinsky and Pudlák (CCC 2008), is to establish an explicit exponential separation between quantum and classical one-way NOF communication. In this paper, we resolve this open problem by establishing the first exponential separation between quantum and randomized communication complexity in one-way NOF model. Specifically, we define a lifted variant of the Hidden Matching problem of Bar-Yossef, Jayram, and Kerenidis (STOC 2004) and show that it admits an ($O(\log n)$)-cost quantum protocol in the one-way NOF setting. By contrast, we prove that any $k$-party one-way randomized protocol for this problem requires communication $Ω(\frac{n^{1/3}}{2^{k/3}})$. Notably, our separation applies even to a generalization of $k$-player one-way communication, where the first player speaks once, and all other $k-1$ players can communicate freely.

Authors: Guangxu Yang, Jiapeng Zhang

Numbers-on-Forehead (NOF) communication model is a central model in communication complexity. As a restricted variant, one-way NOF model is of particular interest. Establishing strong one-way NOF lower bounds would imply circuit lower bounds, resolve well-known problems in additive combinatorics, and yield wide-ranging applications in areas such as cryptography and distributed computing. However, proving strong lower bounds in one-way NOF communication remains highly challenging; many fundamental questions in one-way NOF communication remain wide open. One of the fundamental questions, proposed by Gavinsky and Pudlák (CCC 2008), is to establish an explicit exponential separation between quantum and classical one-way NOF communication. In this paper, we resolve this open problem by establishing the first exponential separation between quantum and randomized communication complexity in one-way NOF model. Specifically, we define a lifted variant of the Hidden Matching problem of Bar-Yossef, Jayram, and Kerenidis (STOC 2004) and show that it admits an ($O(\log n)$)-cost quantum protocol in the one-way NOF setting. By contrast, we prove that any $k$-party one-way randomized protocol for this problem requires communication $Ω(\frac{n^{1/3}}{2^{k/3}})$. Notably, our separation applies even to a generalization of $k$-player one-way communication, where the first player speaks once, and all other $k-1$ players can communicate freely.

Covering and Partitioning Complex Objects with Small Pieces

from arXiv: Computational Geometry

Authors: Anders Aamand, Mikkel Abrahamsen, Reilly Browne, Mayank Goswami, Prahlad Narasimhan Kasthurirangan, Linda Kleist, Joseph S. B. Mitchell, Valentin Polishchuk, Jack Stade

We study the problems of covering or partitioning a polygon $P$ (possibly with holes) using a minimum number of small pieces, where a small piece is a connected sub-polygon contained in an axis-aligned unit square. For covering, we seek to write $P$ as a union of small pieces, and in partitioning, we furthermore require the pieces to be pairwise interior-disjoint. We show that these problems are in fact equivalent: Optimum covers and partitions have the same number of pieces. For covering, a natural local search algorithm repeatedly attempts to replace $k$ pieces from a candidate cover with $k-1$ pieces. In two dimensions and for sufficiently large $k$, we show that when no such swap is possible, the cover is a $1+O(1/\sqrt k)$-approximation, hence obtaining the first PTAS for the problem. Prior to our work, the only known algorithm was a $13$-approximation that only works for polygons without holes [Abrahamsen and Rasmussen, SODA 2025]. In contrast, in the three dimensional version of the problem, for a polyhedron $P$ of complexity $n$, we show that it is NP-hard to approximate an optimal cover or partition to within a factor that is logarithmic in $n$, even if $P$ is simple, i.e., has genus $0$ and no holes.

Authors: Anders Aamand, Mikkel Abrahamsen, Reilly Browne, Mayank Goswami, Prahlad Narasimhan Kasthurirangan, Linda Kleist, Joseph S. B. Mitchell, Valentin Polishchuk, Jack Stade

We study the problems of covering or partitioning a polygon $P$ (possibly with holes) using a minimum number of small pieces, where a small piece is a connected sub-polygon contained in an axis-aligned unit square. For covering, we seek to write $P$ as a union of small pieces, and in partitioning, we furthermore require the pieces to be pairwise interior-disjoint. We show that these problems are in fact equivalent: Optimum covers and partitions have the same number of pieces. For covering, a natural local search algorithm repeatedly attempts to replace $k$ pieces from a candidate cover with $k-1$ pieces. In two dimensions and for sufficiently large $k$, we show that when no such swap is possible, the cover is a $1+O(1/\sqrt k)$-approximation, hence obtaining the first PTAS for the problem. Prior to our work, the only known algorithm was a $13$-approximation that only works for polygons without holes [Abrahamsen and Rasmussen, SODA 2025]. In contrast, in the three dimensional version of the problem, for a polyhedron $P$ of complexity $n$, we show that it is NP-hard to approximate an optimal cover or partition to within a factor that is logarithmic in $n$, even if $P$ is simple, i.e., has genus $0$ and no holes.

Linear time single-source shortest path algorithms in Euclidean graph classes

from arXiv: Computational Geometry

Authors: Joachim Gudmundsson, Yuan Sha, Sampson Wong

In the celebrated paper of Henzinger, Klein, Rao and Subramanian (1997), it was shown that planar graphs admit a linear time single-source shortest path algorithm. Their algorithm unfortunately does not extend to Euclidean graph classes. We give criteria and prove that any Euclidean graph class satisfying the criteria admits a linear time single-source shortest path algorithm. As a main ingredient, we show that the contracted graphs of these Euclidean graph classes admit sublinear separators.

Authors: Joachim Gudmundsson, Yuan Sha, Sampson Wong

In the celebrated paper of Henzinger, Klein, Rao and Subramanian (1997), it was shown that planar graphs admit a linear time single-source shortest path algorithm. Their algorithm unfortunately does not extend to Euclidean graph classes. We give criteria and prove that any Euclidean graph class satisfying the criteria admits a linear time single-source shortest path algorithm. As a main ingredient, we show that the contracted graphs of these Euclidean graph classes admit sublinear separators.

Simple but not Simpler: A Surface-Sliding Method for Finding the Minimum Distance between Two Ellipsoids

from arXiv: Computational Geometry

Authors: Dariush Amirkhani, Junfeng Zhang

We propose a novel iterative process to establish the minimum separation between two ellipsoids. The method maintains one point on each surface and updates their locations in the theta-phi parametric space. The tension along the connecting segment between the two surface points serves as the guidance for the sliding direction, and the distance between them decreases gradually. The minimum distance is established when the connecting segment becomes perpendicular to the ellipsoid surfaces, at which point the net effect of the segment tension disappears and the surface points no longer move. Demonstration examples are carefully designed, and excellent numerical performance is observed, including accuracy, consistency, stability, and robustness. Furthermore, compared to other existing techniques, this surface-sliding approach has several attractive features, such as clear geometric representation, concise formulation, a simple algorithm, and the potential to be extended straightforwardly to other situations. This method is expected to be useful for future studies in computer graphics, engineering design, material modeling, and scientific simulations.

Authors: Dariush Amirkhani, Junfeng Zhang

We propose a novel iterative process to establish the minimum separation between two ellipsoids. The method maintains one point on each surface and updates their locations in the theta-phi parametric space. The tension along the connecting segment between the two surface points serves as the guidance for the sliding direction, and the distance between them decreases gradually. The minimum distance is established when the connecting segment becomes perpendicular to the ellipsoid surfaces, at which point the net effect of the segment tension disappears and the surface points no longer move. Demonstration examples are carefully designed, and excellent numerical performance is observed, including accuracy, consistency, stability, and robustness. Furthermore, compared to other existing techniques, this surface-sliding approach has several attractive features, such as clear geometric representation, concise formulation, a simple algorithm, and the potential to be extended straightforwardly to other situations. This method is expected to be useful for future studies in computer graphics, engineering design, material modeling, and scientific simulations.

Product Range Search Problem

from arXiv: Computational Geometry

Authors: Oliver Chubet, Niyathi Kukkapalli, Anvi Kudaraya, Don Sheehy

Given a metric space, a standard metric range search, given a query $(q,r)$, finds all points within distance $r$ of the point $q$. Suppose now we have two different metrics $d_1$ and $d_2$. A product range query $(q, r_1, r_2)$ is a point $q$ and two radii $r_1$ and $r_2$. The output is all points within distance $r_1$ of $q$ with respect to $d_1$ and all points within $r_2$ of $q$ with respect to $d_2$. In other words, it is the intersection of two searches. We present two data structures for approximate product range search in doubling metrics. Both data structures use a net-tree variant, the greedy tree. The greedy tree is a data structure that can efficiently answer approximate range searches in doubling metrics. The first data structure is a generalization of the range tree from computational geometry using greedy trees rather than binary trees. The second data structure is a single greedy tree constructed on the product induced by the two metrics.

Authors: Oliver Chubet, Niyathi Kukkapalli, Anvi Kudaraya, Don Sheehy

Given a metric space, a standard metric range search, given a query $(q,r)$, finds all points within distance $r$ of the point $q$. Suppose now we have two different metrics $d_1$ and $d_2$. A product range query $(q, r_1, r_2)$ is a point $q$ and two radii $r_1$ and $r_2$. The output is all points within distance $r_1$ of $q$ with respect to $d_1$ and all points within $r_2$ of $q$ with respect to $d_2$. In other words, it is the intersection of two searches. We present two data structures for approximate product range search in doubling metrics. Both data structures use a net-tree variant, the greedy tree. The greedy tree is a data structure that can efficiently answer approximate range searches in doubling metrics. The first data structure is a generalization of the range tree from computational geometry using greedy trees rather than binary trees. The second data structure is a single greedy tree constructed on the product induced by the two metrics.

ETH Flippers Approach to Parallel Reconfiguration of Triangulations: SAT formulation and Heuristics

from arXiv: Computational Geometry

Authors: Lorenzo Battini, Marko Milenković

We describe the algorithms used by the ETH Flippers team in the CG:SHOP 2026 Challenge. Each instance consists of a set of triangulations on a common point set, and the objective is to find a central triangulation that minimizes the total parallel flip distance to the input set. Our strategy combines an exact solver for small and medium-sized instances with a suite of heuristics for larger instances. For the exact approach, we formulate the problem as a SAT instance with XOR clauses to model edge transitions across multiple rounds, further optimized by lower bounds derived from exact pairwise distances. For larger instances, we use a greedy local search and edge-coloring techniques to identify maximal sets of independent flips. Our approach ranked second overall and first in the junior category, computing provably optimal solutions for 186 out of 250 instances.

Authors: Lorenzo Battini, Marko Milenković

We describe the algorithms used by the ETH Flippers team in the CG:SHOP 2026 Challenge. Each instance consists of a set of triangulations on a common point set, and the objective is to find a central triangulation that minimizes the total parallel flip distance to the input set. Our strategy combines an exact solver for small and medium-sized instances with a suite of heuristics for larger instances. For the exact approach, we formulate the problem as a SAT instance with XOR clauses to model edge transitions across multiple rounds, further optimized by lower bounds derived from exact pairwise distances. For larger instances, we use a greedy local search and edge-coloring techniques to identify maximal sets of independent flips. Our approach ranked second overall and first in the junior category, computing provably optimal solutions for 186 out of 250 instances.

Testing Properties of Edge Distributions

from arXiv: Data Structures and Algorithms

Authors: Yumou Fei

We initiate the study of distribution testing for probability distributions over the edges of a graph, motivated by the closely related question of ``edge-distribution-free'' graph property testing. The main results of this paper are nearly-tight bounds on testing bipartiteness, triangle-freeness and square-freeness of edge distributions, whose sample complexities are shown to scale as $Θ(n)$, $n^{4/3\pm o(1)}$ and $n^{9/8\pm o(1)}$, respectively. The technical core of our paper lies in the proof of the upper bound for testing square-freeness, wherein we develop new techniques based on certain birthday-paradox-type lemmas that may be of independent interest. We will discuss how our techniques fit into the general framework of distribution-free property testing. We will also discuss how our results are conceptually connected with Turán problems and subgraph removal lemmas in extremal combinatorics.

Authors: Yumou Fei

We initiate the study of distribution testing for probability distributions over the edges of a graph, motivated by the closely related question of ``edge-distribution-free'' graph property testing. The main results of this paper are nearly-tight bounds on testing bipartiteness, triangle-freeness and square-freeness of edge distributions, whose sample complexities are shown to scale as $Θ(n)$, $n^{4/3\pm o(1)}$ and $n^{9/8\pm o(1)}$, respectively. The technical core of our paper lies in the proof of the upper bound for testing square-freeness, wherein we develop new techniques based on certain birthday-paradox-type lemmas that may be of independent interest. We will discuss how our techniques fit into the general framework of distribution-free property testing. We will also discuss how our results are conceptually connected with Turán problems and subgraph removal lemmas in extremal combinatorics.

Dynamic Light Spanners in Doubling Metrics

from arXiv: Data Structures and Algorithms

Authors: Sujoy Bhore, Jonathan Conroy, Arnold Filtser

A $t$-spanner of a point set $X$ in a metric space $(\mathcal{X}, δ)$ is a graph $G$ with vertex set $P$ such that, for any pair of points $u,v \in X$, the distance between $u$ and $v$ in $G$ is at most $t$ times $δ(u,v)$. We study the problem of maintaining a spanner for a dynamic point set $X$ -- that is, when $X$ undergoes a sequence of insertions and deletions -- in a metric space of constant doubling dimension. For any constant $\varepsilon>0$, we maintain a $(1+\varepsilon)$-spanner of $P$ whose total weight remains within a constant factor of the weight of the minimum spanning tree of $X$. Each update (insertion or deletion) can be performed in $\operatorname{poly}(\log Φ)$ time, where $Φ$ denotes the aspect ratio of $X$. Prior to our work, no efficient dynamic algorithm for maintaining a light-weight spanner was known even for point sets in low-dimensional Euclidean space.

Authors: Sujoy Bhore, Jonathan Conroy, Arnold Filtser

A $t$-spanner of a point set $X$ in a metric space $(\mathcal{X}, δ)$ is a graph $G$ with vertex set $P$ such that, for any pair of points $u,v \in X$, the distance between $u$ and $v$ in $G$ is at most $t$ times $δ(u,v)$. We study the problem of maintaining a spanner for a dynamic point set $X$ -- that is, when $X$ undergoes a sequence of insertions and deletions -- in a metric space of constant doubling dimension. For any constant $\varepsilon>0$, we maintain a $(1+\varepsilon)$-spanner of $P$ whose total weight remains within a constant factor of the weight of the minimum spanning tree of $X$. Each update (insertion or deletion) can be performed in $\operatorname{poly}(\log Φ)$ time, where $Φ$ denotes the aspect ratio of $X$. Prior to our work, no efficient dynamic algorithm for maintaining a light-weight spanner was known even for point sets in low-dimensional Euclidean space.

Dynamic k-center clustering with lifetimes

from arXiv: Data Structures and Algorithms

Authors: Simone Moretti, Paolo Pellizzoni, Andrea Pietracaprina, Geppino Pucci

The $k$-center problem is a fundamental clustering variant with applications in learning systems and data summarization. In several real-world scenarios, the dataset to be clustered is not static, but evolves over time, as new data points arrive and old ones become stale. To account for dynamicity, the $k$-center problem has been mainly studied under the sliding window setting, where only the $N$ most recent points are considered non-stale, or the fully dynamic setting, where arbitrary sequences of point arrivals and deletions without prior notice may occur. In this paper, we introduce the dynamic setting with lifetimes, which bridges the two aforementioned classical settings by still allowing arbitrary arrivals and deletions, but making the deletion time of each point known upon its arrival. Under this new setting, we devise a deterministic $(2+\varepsilon)$-approximation algorithm with $\tilde{O}(k/\varepsilon)$ amortized update time and memory usage linear in the number of currently active points. Moreover, we develop a deterministic $(6+\varepsilon)$-approximation algorithm that, under tame update sequences, has $\tilde{O}(k/\varepsilon)$ worst-case update time and heavily sublinear working memory.

Authors: Simone Moretti, Paolo Pellizzoni, Andrea Pietracaprina, Geppino Pucci

The $k$-center problem is a fundamental clustering variant with applications in learning systems and data summarization. In several real-world scenarios, the dataset to be clustered is not static, but evolves over time, as new data points arrive and old ones become stale. To account for dynamicity, the $k$-center problem has been mainly studied under the sliding window setting, where only the $N$ most recent points are considered non-stale, or the fully dynamic setting, where arbitrary sequences of point arrivals and deletions without prior notice may occur. In this paper, we introduce the dynamic setting with lifetimes, which bridges the two aforementioned classical settings by still allowing arbitrary arrivals and deletions, but making the deletion time of each point known upon its arrival. Under this new setting, we devise a deterministic $(2+\varepsilon)$-approximation algorithm with $\tilde{O}(k/\varepsilon)$ amortized update time and memory usage linear in the number of currently active points. Moreover, we develop a deterministic $(6+\varepsilon)$-approximation algorithm that, under tame update sequences, has $\tilde{O}(k/\varepsilon)$ worst-case update time and heavily sublinear working memory.

Algorithms and Hardness for Geodetic Set on Tree-like Digraphs

from arXiv: Data Structures and Algorithms

Authors: Florent Foucaud, Narges Ghareghani, Lucas Lorieau, Morteza Mohammad-Noori, Rasa Parvini Oskuei, Prafullkumar Tale

In the GEODETIC SET problem, an input is a (di)graph $G$ and integer $k$, and the objective is to decide whether there exists a vertex subset $S$ of size $k$ such that any vertex in $V(G)\setminus S$ lies on a shortest (directed) path between two vertices in $S$. The problem has been studied on undirected and directed graphs from both algorithmic and graph-theoretical perspectives. We focus on directed graphs and prove that GEODETIC SET admits a polynomial-time algorithm on ditrees, that is, digraphs with possible 2-cycles when the underlying undirected graph is a tree (after deleting possible parallel edges). This positive result naturally leads us to investigate cases where the underlying undirected graph is "close to a tree". Towards this, we show that GEODETIC SET on digraphs without 2-cycles and whose underlying undirected graph has feedback edge set number $\textsf{fen}$, can be solved in time $2^{\mathcal{O}(\textsf{fen})} \cdot n^{\mathcal{O}(1)}$, where $n$ is the number of vertices. To complement this, we prove that the problem remains NP-hard on DAGs (which do not contain 2-cycles) even when the underlying undirected graph has constant feedback vertex set number. Our last result significantly strengthens the result of Araújo and Arraes~[Discrete Applied Mathematics, 2022] that the problem is NP-hard on DAGs when the underlying undirected graph is either bipartite, cobipartite or split.

Authors: Florent Foucaud, Narges Ghareghani, Lucas Lorieau, Morteza Mohammad-Noori, Rasa Parvini Oskuei, Prafullkumar Tale

In the GEODETIC SET problem, an input is a (di)graph $G$ and integer $k$, and the objective is to decide whether there exists a vertex subset $S$ of size $k$ such that any vertex in $V(G)\setminus S$ lies on a shortest (directed) path between two vertices in $S$. The problem has been studied on undirected and directed graphs from both algorithmic and graph-theoretical perspectives. We focus on directed graphs and prove that GEODETIC SET admits a polynomial-time algorithm on ditrees, that is, digraphs with possible 2-cycles when the underlying undirected graph is a tree (after deleting possible parallel edges). This positive result naturally leads us to investigate cases where the underlying undirected graph is "close to a tree". Towards this, we show that GEODETIC SET on digraphs without 2-cycles and whose underlying undirected graph has feedback edge set number $\textsf{fen}$, can be solved in time $2^{\mathcal{O}(\textsf{fen})} \cdot n^{\mathcal{O}(1)}$, where $n$ is the number of vertices. To complement this, we prove that the problem remains NP-hard on DAGs (which do not contain 2-cycles) even when the underlying undirected graph has constant feedback vertex set number. Our last result significantly strengthens the result of Araújo and Arraes~[Discrete Applied Mathematics, 2022] that the problem is NP-hard on DAGs when the underlying undirected graph is either bipartite, cobipartite or split.

Compressing Dynamic Fully Indexable Dictionaries in Word-RAM

from arXiv: Data Structures and Algorithms

Authors: Gabriel Marques Domingues

We study the problem of constructing a dynamic fully indexable dictionary (FID) in the Word-RAM model using space close to the information-theoretic lower bound. A FID is a data-structure that encodes a bit-vector $B$ of length $u$ and answers, for $b\in\{0,1\}$, $\texttt{rank}_b(B, x)=|{\{y\leq x~|~B[y]=b\}}|$ and $\texttt{select}_b(B, r)=\min\{0\leq x

Authors: Gabriel Marques Domingues

We study the problem of constructing a dynamic fully indexable dictionary (FID) in the Word-RAM model using space close to the information-theoretic lower bound. A FID is a data-structure that encodes a bit-vector $B$ of length $u$ and answers, for $b\in\{0,1\}$, $\texttt{rank}_b(B, x)=|{\{y\leq x~|~B[y]=b\}}|$ and $\texttt{select}_b(B, r)=\min\{0\leq x

Accelerating Maximum Common Subgraph Computation by Exploiting Symmetries

from arXiv: Data Structures and Algorithms

Authors: Buddhi Kothalawala, Henning Koehler, Muhammad Farhan

The Maximum Common Subgraph (MCS) problem plays a key role in many applications, including cheminformatics, bioinformatics, and pattern recognition, where it is used to identify the largest shared substructure between two graphs. Although symmetry exploitation is a powerful means of reducing search space in combinatorial optimization, its potential in MCS algorithms has remained largely underexplored due to the challenges of detecting and integrating symmetries effectively. Existing approaches, such as RRSplit, partially address symmetry through vertex-equivalence reasoning on the variable graph, but symmetries in the value graph remain unexploited. In this work, we introduce a complete dual-symmetry breaking framework that simultaneously handles symmetries in both variable and value graphs. Our method identifies and exploits modular symmetries based on local neighborhood structures, allowing the algorithm to prune isomorphic subtrees during search while rigorously preserving optimality. Extensive experiments on standard MCS benchmarks show that our approach substantially outperforms the state-of-the-art RRSplit algorithm, solving more instances with significant reductions in both computation time and search space. These results highlight the practical effectiveness of comprehensive symmetry-aware pruning for accelerating exact MCS computation.

Authors: Buddhi Kothalawala, Henning Koehler, Muhammad Farhan

The Maximum Common Subgraph (MCS) problem plays a key role in many applications, including cheminformatics, bioinformatics, and pattern recognition, where it is used to identify the largest shared substructure between two graphs. Although symmetry exploitation is a powerful means of reducing search space in combinatorial optimization, its potential in MCS algorithms has remained largely underexplored due to the challenges of detecting and integrating symmetries effectively. Existing approaches, such as RRSplit, partially address symmetry through vertex-equivalence reasoning on the variable graph, but symmetries in the value graph remain unexploited. In this work, we introduce a complete dual-symmetry breaking framework that simultaneously handles symmetries in both variable and value graphs. Our method identifies and exploits modular symmetries based on local neighborhood structures, allowing the algorithm to prune isomorphic subtrees during search while rigorously preserving optimality. Extensive experiments on standard MCS benchmarks show that our approach substantially outperforms the state-of-the-art RRSplit algorithm, solving more instances with significant reductions in both computation time and search space. These results highlight the practical effectiveness of comprehensive symmetry-aware pruning for accelerating exact MCS computation.

Gabow's $O(\sqrt{n}m)$ Maximum Cardinality Matching Algorithm, Revisited

from arXiv: Data Structures and Algorithms

Authors: Kurt Mehlhorn, Romina Nobahari

We revisit Gabow's $O(\sqrt{n} m)$ maximum cardinality matching algorithm (The Weighted Matching Approach to Maximum Cardinality Matching, Fundamenta Informaticae, 2017). It adapts the weighted matching algorithm of Gabow and Tarjan~\cite{GT91} to maximum cardinality matching. Gabow's algorithm works iteratively. In each iteration, it constructs a maximal number of edge-disjoint shortest augmenting paths with respect to the current matching and augments them. It is well-known that $O(\sqrt{n})$ iterations suffice. Each iteration consists of three parts. In the first part, the length of the shortest augmenting path is computed. In the second part, an auxiliary graph $H$ is constructed with the property that shortest augmenting paths in $G$ correspond to augmenting paths in $H$. In the third part, a maximal set of edge-disjoint augmenting paths in $H$ is determined, and the paths are lifted to and augmented to $G$. We give a new algorithm for the first part. Gabow's algorithm for the first part is derived from Edmonds' primal-dual algorithm for weighted matching. We believe that our approach is more direct and will be easier to teach. We have implemented the algorithm; the implementation is available at the companion webpage (people.mpi-inf.mpg.de/~mehlhorn/CompanionPageGenMatchingImplementation.html).

Authors: Kurt Mehlhorn, Romina Nobahari

We revisit Gabow's $O(\sqrt{n} m)$ maximum cardinality matching algorithm (The Weighted Matching Approach to Maximum Cardinality Matching, Fundamenta Informaticae, 2017). It adapts the weighted matching algorithm of Gabow and Tarjan~\cite{GT91} to maximum cardinality matching. Gabow's algorithm works iteratively. In each iteration, it constructs a maximal number of edge-disjoint shortest augmenting paths with respect to the current matching and augments them. It is well-known that $O(\sqrt{n})$ iterations suffice. Each iteration consists of three parts. In the first part, the length of the shortest augmenting path is computed. In the second part, an auxiliary graph $H$ is constructed with the property that shortest augmenting paths in $G$ correspond to augmenting paths in $H$. In the third part, a maximal set of edge-disjoint augmenting paths in $H$ is determined, and the paths are lifted to and augmented to $G$. We give a new algorithm for the first part. Gabow's algorithm for the first part is derived from Edmonds' primal-dual algorithm for weighted matching. We believe that our approach is more direct and will be easier to teach. We have implemented the algorithm; the implementation is available at the companion webpage (https://people.mpi-inf.mpg.de/~mehlhorn/CompanionPageGenMatchingImplementation.html).

On the Complexity of Secluded Path Problems

from arXiv: Data Structures and Algorithms

Authors: Tesshu Hanaka, Daisuke Tsuru

This paper investigates the complexity of finding secluded paths in graphs. We focus on the \textsc{Short Secluded Path} problem and a natural new variant we introduce, \textsc{Shortest Secluded Path}. Formally, given an undirected graph $G=(V, E)$, two vertices $s,t\in V$, and two integers $k,l$, the \textsc{Short Secluded Path} problem asks whether there exists an $s$-$t$ path of length at most $k$ with at most $l$ neighbors. This problem is known to be computationally hard: it is W[1]-hard when parameterized by the path length $k$ or by cliquewidth, and para-NP-complete when parameterized by the number $l$ of neighbors. The fixed-parameter tractability is known for $k+l$ or treewidth. In this paper, we expand the parameterized complexity landscape by designing (1) an XP algorithm parameterized by cliquewidth and (2) fixed-parameter algorithms parameterized by neighborhood diversity and twin cover number, respectively. As a byproduct, our results also yield parameterized algorithms for the classic \textsc{$s$-$t$ $k$-Path} problem under the considered parameters. Furthermore, we introduce the \textsc{Shortest Secluded Path} problem, which seeks a shortest $s$-$t$ path with the minimum number of neighbors. In contrast to the hardness of the original problem, we reveal that this variant is solvable in polynomial time on unweighted graphs. We complete this by showing that for edge-weighted graphs, the problem becomes W[1]-hard yet remains in XP when parameterized by the shortest path distance between $s$ and $t$.

Authors: Tesshu Hanaka, Daisuke Tsuru

This paper investigates the complexity of finding secluded paths in graphs. We focus on the \textsc{Short Secluded Path} problem and a natural new variant we introduce, \textsc{Shortest Secluded Path}. Formally, given an undirected graph $G=(V, E)$, two vertices $s,t\in V$, and two integers $k,l$, the \textsc{Short Secluded Path} problem asks whether there exists an $s$-$t$ path of length at most $k$ with at most $l$ neighbors. This problem is known to be computationally hard: it is W[1]-hard when parameterized by the path length $k$ or by cliquewidth, and para-NP-complete when parameterized by the number $l$ of neighbors. The fixed-parameter tractability is known for $k+l$ or treewidth. In this paper, we expand the parameterized complexity landscape by designing (1) an XP algorithm parameterized by cliquewidth and (2) fixed-parameter algorithms parameterized by neighborhood diversity and twin cover number, respectively. As a byproduct, our results also yield parameterized algorithms for the classic \textsc{$s$-$t$ $k$-Path} problem under the considered parameters. Furthermore, we introduce the \textsc{Shortest Secluded Path} problem, which seeks a shortest $s$-$t$ path with the minimum number of neighbors. In contrast to the hardness of the original problem, we reveal that this variant is solvable in polynomial time on unweighted graphs. We complete this by showing that for edge-weighted graphs, the problem becomes W[1]-hard yet remains in XP when parameterized by the shortest path distance between $s$ and $t$.

Algorithmic warm starts for Hamiltonian Monte Carlo

from arXiv: Data Structures and Algorithms

Authors: Matthew S. Zhang, Jason M. Altschuler, Sinho Chewi

Generating samples from a continuous probability density is a central algorithmic problem across statistics, engineering, and the sciences. For high-dimensional settings, Hamiltonian Monte Carlo (HMC) is the default algorithm across mainstream software packages. However, despite the extensive line of work on HMC and its widespread empirical success, it remains unclear how many iterations of HMC are required as a function of the dimension $d$. On one hand, a variety of results show that Metropolized HMC converges in $O(d^{1/4})$ iterations from a warm start close to stationarity. On the other hand, Metropolized HMC is significantly slower without a warm start, e.g., requiring $Ω(d^{1/2})$ iterations even for simple target distributions such as isotropic Gaussians. Finding a warm start is therefore the computational bottleneck for HMC. We resolve this issue for the well-studied setting of sampling from a probability distribution satisfying strong log-concavity (or isoperimetry) and third-order derivative bounds. We prove that \emph{non-Metropolized} HMC generates a warm start in $\tilde{O}(d^{1/4})$ iterations, after which we can exploit the warm start using Metropolized HMC. Our final complexity of $\tilde{O}(d^{1/4})$ is the fastest algorithm for high-accuracy sampling under these assumptions, improving over the prior best of $\tilde{O}(d^{1/2})$. This closes the long line of work on the dimensional complexity of MHMC for such settings, and also provides a simple warm-start prescription for practical implementations.

Authors: Matthew S. Zhang, Jason M. Altschuler, Sinho Chewi

Generating samples from a continuous probability density is a central algorithmic problem across statistics, engineering, and the sciences. For high-dimensional settings, Hamiltonian Monte Carlo (HMC) is the default algorithm across mainstream software packages. However, despite the extensive line of work on HMC and its widespread empirical success, it remains unclear how many iterations of HMC are required as a function of the dimension $d$. On one hand, a variety of results show that Metropolized HMC converges in $O(d^{1/4})$ iterations from a warm start close to stationarity. On the other hand, Metropolized HMC is significantly slower without a warm start, e.g., requiring $Ω(d^{1/2})$ iterations even for simple target distributions such as isotropic Gaussians. Finding a warm start is therefore the computational bottleneck for HMC. We resolve this issue for the well-studied setting of sampling from a probability distribution satisfying strong log-concavity (or isoperimetry) and third-order derivative bounds. We prove that \emph{non-Metropolized} HMC generates a warm start in $\tilde{O}(d^{1/4})$ iterations, after which we can exploit the warm start using Metropolized HMC. Our final complexity of $\tilde{O}(d^{1/4})$ is the fastest algorithm for high-accuracy sampling under these assumptions, improving over the prior best of $\tilde{O}(d^{1/2})$. This closes the long line of work on the dimensional complexity of MHMC for such settings, and also provides a simple warm-start prescription for practical implementations.

Computing and Enumerating Minimal Common Supersequences Between Two Strings

from arXiv: Data Structures and Algorithms

Authors: Braeden Sopp, Adiesha Liyanage, Mingyang Gong, Binhai Zhu

Given \(k\) strings each of length at most $n$, computing the shortest common supersequence of them is a well-known NP-hard problem (when \(k\) is unbounded). On the other hand, when \(k=2\), such a shortest common supersequence can be computed in \(O(n^2)\) time using dynamic programming as a textbook example. In this paper, we consider the problem of computing a \emph{minimal} common supersequence and enumerating all minimal common supersequences for \(k=2\) input strings. Our results are summarized as follows. A minimal common supersequence of \(k=2\) input strings can be computed in $O(n)$ time. (The method also works when \(k\) is a constant). All minimal common supersequences between two input strings can be enumerated with a data structure of $O(n^2)$ space and an $O(n)$ time delay, and the data structure can be constructed in $O(n^3)$ time.

Authors: Braeden Sopp, Adiesha Liyanage, Mingyang Gong, Binhai Zhu

Given \(k\) strings each of length at most $n$, computing the shortest common supersequence of them is a well-known NP-hard problem (when \(k\) is unbounded). On the other hand, when \(k=2\), such a shortest common supersequence can be computed in \(O(n^2)\) time using dynamic programming as a textbook example. In this paper, we consider the problem of computing a \emph{minimal} common supersequence and enumerating all minimal common supersequences for \(k=2\) input strings. Our results are summarized as follows. A minimal common supersequence of \(k=2\) input strings can be computed in $O(n)$ time. (The method also works when \(k\) is a constant). All minimal common supersequences between two input strings can be enumerated with a data structure of $O(n^2)$ space and an $O(n)$ time delay, and the data structure can be constructed in $O(n^3)$ time.

Tuesday, March 24

Information Transit Got the Wrong Man

from Ben Recht

The mechanized tradition of peer review and the absurdity of bureaucratic conference review.

In case you hadn’t heard, people are using LLMs to create their peer reviews. I know, you’re as shocked as I am. To their credit, the program committee of the International Conference on Machine Learning (ICML) has been doing things to address the problem. Their attempts reveal the systematic problems here that are unfixable without a dramatic teardown.

Let’s recap the situation, though even typing it out makes me feel like a character in a Terry Gilliam movie.

In November 2025, the PC sent a series of surveys to past ICML reviewers to gauge their sentiment about LLMs in reviewing. They assumed that this list must also include a bunch of authors because they mandated a reciprocal reviewing policy for the 2025 conference, under which all submissions must have had at least one author who agreed to serve as a reviewer. In their final survey, they proposed the following policies for LLM reviewing at future conferences, and asked for preferences:

  • Policy A (Conservative): Use of LLMs for reviewing is strictly prohibited.

  • Policy B (Permissive): Reviewers may input the submission text into privacy-compliant LLMs. However, the assessment of the paper and the writing of the review must not be delegated to LLMs.

A random sample of 500 reviewers received the survey. 74 (15%) answered it. And the survey says!

This plot uses default matplotlib colors, so it must be science. I’m not sure what this measures, but whatever, surveys are the democratic way to make policy, amirite? Since the results weren’t equivocal, they decided to make both policy options available in 2026. That’s democracy.

In the multistage reviewer enrollment process for the 2026 conference, they gave reviewers the option of adhering to either Policy A or Policy B or saying they didn’t care. The people who didn’t care were then assigned to one of the two policies in one of the many absurdly long emails they send you when you participate in these conferences.1

Now here’s where it gets weird. The program committee decided to run a sting operation, watermarking the PDFs to trap people who uploaded them directly into LLMs. You can read about their ornate sting operation here. The details are boring. What’s interesting is their response. If they found that someone assigned to Policy A used LLMs in reviewing, then they rejected all papers of which that person was an author. They had 800 reviews that they flagged as violating policy A. They checked every one of these by hand to avoid false positives. Every. Single. One. Because that’s a good use of human manual labor. They ultimately rejected 500 papers associated with these naughty reviewers.

Everyone is patting themselves on the back about this, and tut-tutting those horrible people who dared violate the sacred random assignment of policy, and thanking the committee for their cleverness and transparency. But come on, this whole thing is absurd. The PC argues:

“​​We hope that by taking strong action against violations of agreed-upon policy we will remind the community that as our field changes rapidly the thing we must protect most actively is our trust in each other. If we cannot adapt our systems in a setting based in trust, we will find that they soon become outdated and meaningless. “

I’m sorry, but it’s already meaningless! ICML received over 33000 submissions. A random subset of 20-25% of these will be approved as “papers acceptable to go on one’s CV.” The process will churn forward. Everyone who attends the conference knows this process is impossibly bad, but the only proposed solutions make the paper-generation process more onerous for humans. This naturally leads people to offload work to LLMs. Next year, people will use watermark detection before they put the LLM into ChatGPT. The wheels of progress will continue rolling.

It’s unfortunate that the natural bureaucratic editorial mode is to assume everyone is cheating and to go on witch hunts to claim progress. The board wrote:

“Conferences must adapt, creating rules and policies to handle the new normal, and taking disciplinary action against those who break the rules and violate the trust that we all place in the review process. “

Psychology took this sort of approach in the 2010s. Though we all got to revel in the high-profile fraud schadenfreude, the field did not come out better for it.

Rejecting 800 of 33000 papers because of possibly inappropriate LLM use, when your LLM use policy is based on the most bizarre, arbitrary decision-making built upon a semblance of objective quantitative social science, is farce. At this point, the AI reviewing process can be nothing but farce. As Kevin Baker succinctly put it in his authoritatively inflectional essay on AI for science:

Systems can persist in dysfunction indefinitely, and absurdity is not self-correcting.

One nice thing about LLMs is that they show us which parts of our systems of intellect are mechanical traditions. LLMs are a good way to stress-test our systems for organizing experience and expertise. We’ll need to be more creative about what we want to do moving forward.

Moving forward requires us to talk more about the point of peer review. Yes, the AI conferences are the most absurd manifestation of this problem, but don’t think that your community is insulated from rampant LLM reviewing. At the Cultural AI conference, Mel Andrews showed us dozens of headlines across academia advocating for LLM review. Arguing that LLM review was better than human review. There are economists launching startups to do this as a service.

Andrews argued that the arguments in favor of LLM reviewing consistently conflated institutional and epistemic concerns. The institutional concerns are well known to us. Reviewing is an enormous burden of unpaid labor that further enriches rent-seeking publication houses, and reviewership is unfairly distributed across academia. The epistemic concerns worry that peer review doesn’t properly weed out invalid papers. At least in the sciences, peer review is supposedly meta-epistemic, judging the validity of papers that aim to get at scientific knowledge, understanding, and explanation. Many studies have found the current state of peer review unfit for this task.

Advocates for LLM peer review argue it solves both problems. Andrews took a hard line, claiming that it can’t solve the epistemic problems. Andrews’ boldest claim is that the relationship of the text generated by LLMs to semantic content and truth is always accidental or incidental. Hence, the mechanical aspects of peer review can only increase confabulation and error. Following tradition means not having to think, but peer review’s epistemic function demands thinking.

I don’t fully endorse Mel’s argument, but it’s a position worth airing and engaging with. By focusing solely on process and rules, tweaks to peer review make it more mechanical. Mechanization only makes LLMs better suited for the job. If epistemic cultivation of expertise and experience demands something beyond tradition, then more complex systems of checks and balances only stifle it.

Program committees in computer science used to be small groups of people who met in person to discuss every paper that would be presented at a conference. They are now ministries of truth that haruspicate the statistics of poorly designed surveys and build ornate policies for the masses. Program committees have become bureaucrats of the state, and they are forced to see like it. The bureaucratization of academic work product threatens its very epistemic nature. Perhaps the fix has to arise from spontaneous order.

Subscribe now

1

Here’s an email I received this morning asking me to be an area chair for one of the other big conferences, NeuRIPS. My AI detector flagged this email as “highly likely AI generated.”

By Ben Recht

Topological Collapse: P = NP Implies #P = FP via Solution-Space Homology

from arXiv: Computational Complexity

Authors: M. Alasli

We prove that P = NP implies #P = FP by exploiting the topological structure of 3SAT solution spaces. The argument proceeds via a dichotomy: any polynomial-time algorithm for 3SAT either operates without global knowledge of the solution-space topology, in which case it cannot certify unsatisfiability for instances with second Betti number b_2 = 2^{Omega(N)} (leading to contradiction), or it computes global topological invariants, which are #P-hard. As local information is provably insufficient and any useful global invariant is #P-hard, the dichotomy is exhaustive. The proof is non-relativizing, consistent with oracles separating P = NP from #P = FP, and therefore necessarily exploits non-oracle properties of computation. Combined with Toda's theorem, the result yields P = NP => #P = FP => PH = P, providing new structural evidence for P != NP via a topological mechanism. We complement the theoretical framework with empirical validation of solution-space shattering at scale (N up to 500), demonstrating that these topological barriers manifest as measurable hardness across five independent algorithm classes.

Authors: M. Alasli

We prove that P = NP implies #P = FP by exploiting the topological structure of 3SAT solution spaces. The argument proceeds via a dichotomy: any polynomial-time algorithm for 3SAT either operates without global knowledge of the solution-space topology, in which case it cannot certify unsatisfiability for instances with second Betti number b_2 = 2^{Omega(N)} (leading to contradiction), or it computes global topological invariants, which are #P-hard. As local information is provably insufficient and any useful global invariant is #P-hard, the dichotomy is exhaustive. The proof is non-relativizing, consistent with oracles separating P = NP from #P = FP, and therefore necessarily exploits non-oracle properties of computation. Combined with Toda's theorem, the result yields P = NP => #P = FP => PH = P, providing new structural evidence for P != NP via a topological mechanism. We complement the theoretical framework with empirical validation of solution-space shattering at scale (N up to 500), demonstrating that these topological barriers manifest as measurable hardness across five independent algorithm classes.

The color code, the surface code, and the transversal CNOT: NP-hardness of minimum-weight decoding

from arXiv: Computational Complexity

Authors: Shouzhen Gu, Lily Wang, Aleksander Kubica

The decoding problem is a ubiquitous algorithmic task in fault-tolerant quantum computing, and solving it efficiently is essential for scalable quantum computing. Here, we prove that minimum-weight decoding is NP-hard in three quintessential settings: (i) the color code with Pauli $Z$ errors, (ii) the surface code with Pauli $X$, $Y$ and $Z$ errors, and (iii) the surface code with a transversal CNOT gate, Pauli $Z$ and measurement bit-flip errors. Our results show that computational intractability already arises in basic and practically relevant decoding problems central to both quantum memories and logical circuit implementations, highlighting a sharp computational complexity separation between minimum-weight decoding and its approximate realizations.

Authors: Shouzhen Gu, Lily Wang, Aleksander Kubica

The decoding problem is a ubiquitous algorithmic task in fault-tolerant quantum computing, and solving it efficiently is essential for scalable quantum computing. Here, we prove that minimum-weight decoding is NP-hard in three quintessential settings: (i) the color code with Pauli $Z$ errors, (ii) the surface code with Pauli $X$, $Y$ and $Z$ errors, and (iii) the surface code with a transversal CNOT gate, Pauli $Z$ and measurement bit-flip errors. Our results show that computational intractability already arises in basic and practically relevant decoding problems central to both quantum memories and logical circuit implementations, highlighting a sharp computational complexity separation between minimum-weight decoding and its approximate realizations.

The Descriptive Complexity of Relation Modification Problems

from arXiv: Computational Complexity

Authors: Florian Chudigiewitsch, Marlene Gründel, Christian Komusiewicz, Nils Morawietz, Till Tantau

A relation modification problem gets a logical structure and a natural number k as input and asks whether k modifications of the structure suffice to make it satisfy a predefined property. We provide a complete classification of the classical and parameterized complexity of relation modification problems - the latter w. r. t. the modification budget k - based on the descriptive complexity of the respective target property. We consider different types of logical structures on which modifications are performed: Whereas monadic structures and undirected graphs without self-loops each yield their own complexity landscapes, we find that modifying undirected graphs with self-loops, directed graphs, or arbitrary logical structures is equally hard w. r. t. quantifier patterns. Moreover, we observe that all classes of problems considered in this paper are subject to a strong dichotomy in the sense that they are either very easy to solve (that is, they lie in paraAC^{0\uparrow} or TC^0) or intractable (that is, they contain W[2]-hard or NP-hard problems).

Authors: Florian Chudigiewitsch, Marlene Gründel, Christian Komusiewicz, Nils Morawietz, Till Tantau

A relation modification problem gets a logical structure and a natural number k as input and asks whether k modifications of the structure suffice to make it satisfy a predefined property. We provide a complete classification of the classical and parameterized complexity of relation modification problems - the latter w. r. t. the modification budget k - based on the descriptive complexity of the respective target property. We consider different types of logical structures on which modifications are performed: Whereas monadic structures and undirected graphs without self-loops each yield their own complexity landscapes, we find that modifying undirected graphs with self-loops, directed graphs, or arbitrary logical structures is equally hard w. r. t. quantifier patterns. Moreover, we observe that all classes of problems considered in this paper are subject to a strong dichotomy in the sense that they are either very easy to solve (that is, they lie in paraAC^{0\uparrow} or TC^0) or intractable (that is, they contain W[2]-hard or NP-hard problems).

Critical window for approximate counting in dense Ising models

from arXiv: Computational Complexity

Authors: Andreas Galanis, Daniel Stefankovic, Eric Vigoda

We study the complexity of approximating the partition function of dense Ising models in the critical regime. Recent work of Chen, Chen, Yin, and Zhang (FOCS 2025) established fast mixing at criticality, and even beyond criticality in a window of width $N^{-1/2}$. We complement these algorithmic results by proving nearly tight hardness bounds, thus yielding the first instance of a sharp scaling window for the computational complexity of approximate counting. Specifically, for the dense Ising model we show that approximating the partition function is computationally hard within a window of width $N^{-1/2+\varepsilon}$ for any constant $\varepsilon>0$. Standard hardness reductions for non-critical regimes break down at criticality due to bigger fluctuations in the underlying gadgets, leading to suboptimal bounds. We overcome this barrier via a global approach which aggregates fluctuations across all gadgets rather than requiring tight concentration guarantees for each individually. This new approach yields the optimal exponent for the critical window.

Authors: Andreas Galanis, Daniel Stefankovic, Eric Vigoda

We study the complexity of approximating the partition function of dense Ising models in the critical regime. Recent work of Chen, Chen, Yin, and Zhang (FOCS 2025) established fast mixing at criticality, and even beyond criticality in a window of width $N^{-1/2}$. We complement these algorithmic results by proving nearly tight hardness bounds, thus yielding the first instance of a sharp scaling window for the computational complexity of approximate counting. Specifically, for the dense Ising model we show that approximating the partition function is computationally hard within a window of width $N^{-1/2+\varepsilon}$ for any constant $\varepsilon>0$. Standard hardness reductions for non-critical regimes break down at criticality due to bigger fluctuations in the underlying gadgets, leading to suboptimal bounds. We overcome this barrier via a global approach which aggregates fluctuations across all gadgets rather than requiring tight concentration guarantees for each individually. This new approach yields the optimal exponent for the critical window.

Classification of Non-redundancy of Boolean Predicates of Arity 4

from arXiv: Computational Complexity

Authors: Joshua Brakensiek, Venkatesan Guruswami, Aaron Putterman

Given a constraint satisfaction problem (CSP) predicate $P \subseteq D^r$, the non-redundancy (NRD) of $P$ is maximum-sized instance on $n$ variables such that for every clause of the instance, there is an assignment which satisfies all but that clause. The study of NRD for various CSPs is an active area of research which combines ideas from extremal combinatorics, logic, lattice theory, and other techniques. Complete classifications are known in the cases $r=2$ and $(|D|=2, r=3)$. In this paper, we give a near-complete classification of the case $(|D|=2, r=4)$. Of the 400 distinct non-trivial Boolean predicates of arity 4, we implement an algorithmic procedure which perfectly classifies 397 of them. Of the remaining three, we solve two by reducing to extremal combinatorics problems -- leaving the last one as an open question. Along the way, we identify the first Boolean predicate whose non-redundancy asymptotics are non-polynomial.

Authors: Joshua Brakensiek, Venkatesan Guruswami, Aaron Putterman

Given a constraint satisfaction problem (CSP) predicate $P \subseteq D^r$, the non-redundancy (NRD) of $P$ is maximum-sized instance on $n$ variables such that for every clause of the instance, there is an assignment which satisfies all but that clause. The study of NRD for various CSPs is an active area of research which combines ideas from extremal combinatorics, logic, lattice theory, and other techniques. Complete classifications are known in the cases $r=2$ and $(|D|=2, r=3)$. In this paper, we give a near-complete classification of the case $(|D|=2, r=4)$. Of the 400 distinct non-trivial Boolean predicates of arity 4, we implement an algorithmic procedure which perfectly classifies 397 of them. Of the remaining three, we solve two by reducing to extremal combinatorics problems -- leaving the last one as an open question. Along the way, we identify the first Boolean predicate whose non-redundancy asymptotics are non-polynomial.

Flip Distance of Non-Crossing Spanning Trees: NP-Hardness and Improved Bounds

from arXiv: Computational Geometry

Authors: Håvard Bakke Bjerkevik, Joseph Dorfer, Linda Kleist, Torsten Ueckerdt, Birgit Vogtenhuber

We consider the problem of reconfiguring non-crossing spanning trees on point sets. For a set $P$ of $n$ points in general position in the plane, the flip graph $F(P)$ has a vertex for each non-crossing spanning tree on $P$ and an edge between any two spanning trees that can be transformed into each other by the exchange of a single edge. This flip graph has been intensively studied, lately with an emphasis on determining its diameter diam$(F(P))$ for sets $P$ of $n$ points in convex position. The current best bounds are $\frac{14}{9}n-O(1) \leq$ diam$(F(P))<\frac{15}{9}n-3$ [Bjerkevik, Kleist, Ueckerdt, and Vogtenhuber; SODA 2025]. The crucial tool for both the upper and lower bound are so-called *conflict graphs*, which the authors stated might be the key ingredient for determining the diameter (up to lower-order terms). In this paper, we pick up the concept of conflict graphs and show that this tool is even more versatile than previously hoped. As our first main result, we use conflict graphs to show that computing the flip distance between two non-crossing spanning trees is NP-hard, even for point sets in convex position. Interestingly, the result still holds for more constrained flip operations, concretely, compatible flips (where the removed and the added edge do not cross) and rotations (where the removed and the added edge share an endpoint). Extending the line of research from [BKUV SODA25], we present new insights on the diameter of the flip graph. Their lower bound is based on a constant-size pair of trees, one of which is *stacked*. We show that if one of the trees is stacked, then the lower bound is indeed optimal up to a constant term, that is, there exists a flip sequence of length at most $\frac{14}{9}(n-1)$ to any other tree. Lastly, we improve the lower bound on the diameter of the flip graph $F(P)$ for $n$ points in convex position to $\frac{11}{7}n-o(n)$.

Authors: Håvard Bakke Bjerkevik, Joseph Dorfer, Linda Kleist, Torsten Ueckerdt, Birgit Vogtenhuber

We consider the problem of reconfiguring non-crossing spanning trees on point sets. For a set $P$ of $n$ points in general position in the plane, the flip graph $F(P)$ has a vertex for each non-crossing spanning tree on $P$ and an edge between any two spanning trees that can be transformed into each other by the exchange of a single edge. This flip graph has been intensively studied, lately with an emphasis on determining its diameter diam$(F(P))$ for sets $P$ of $n$ points in convex position. The current best bounds are $\frac{14}{9}n-O(1) \leq$ diam$(F(P))<\frac{15}{9}n-3$ [Bjerkevik, Kleist, Ueckerdt, and Vogtenhuber; SODA 2025]. The crucial tool for both the upper and lower bound are so-called *conflict graphs*, which the authors stated might be the key ingredient for determining the diameter (up to lower-order terms). In this paper, we pick up the concept of conflict graphs and show that this tool is even more versatile than previously hoped. As our first main result, we use conflict graphs to show that computing the flip distance between two non-crossing spanning trees is NP-hard, even for point sets in convex position. Interestingly, the result still holds for more constrained flip operations, concretely, compatible flips (where the removed and the added edge do not cross) and rotations (where the removed and the added edge share an endpoint). Extending the line of research from [BKUV SODA25], we present new insights on the diameter of the flip graph. Their lower bound is based on a constant-size pair of trees, one of which is *stacked*. We show that if one of the trees is stacked, then the lower bound is indeed optimal up to a constant term, that is, there exists a flip sequence of length at most $\frac{14}{9}(n-1)$ to any other tree. Lastly, we improve the lower bound on the diameter of the flip graph $F(P)$ for $n$ points in convex position to $\frac{11}{7}n-o(n)$.

Separators for intersection graphs of spheres

from arXiv: Computational Geometry

Authors: Jacob Fox, Jonathan Tidor

We prove the existence of optimal separators for intersection graphs of balls and spheres in any dimension $d$. One of our results is that if an intersection graph of $n$ spheres in $\mathbb{R}^d$ has $m$ edges, then it contains a balanced separator of size $O_d(m^{1/d}n^{1-2/d})$. This bound is best possible in terms of the parameters involved. The same result holds if the balls and spheres are replaced by fat convex bodies and their boundaries.

Authors: Jacob Fox, Jonathan Tidor

We prove the existence of optimal separators for intersection graphs of balls and spheres in any dimension $d$. One of our results is that if an intersection graph of $n$ spheres in $\mathbb{R}^d$ has $m$ edges, then it contains a balanced separator of size $O_d(m^{1/d}n^{1-2/d})$. This bound is best possible in terms of the parameters involved. The same result holds if the balls and spheres are replaced by fat convex bodies and their boundaries.

Online Packing of Orthogonal Polygons

from arXiv: Computational Geometry

Authors: Tim Gerlach, Benjamin Hennies, Linda Kleist

While rectangular and box-shaped objects dominate the classic discourse of theoretic investigations, a fascinating frontier lies in packing more complex shapes. Given recent insights that convex polygons do not allow for constant competitive online algorithms for diverse variants under translation, we study orthogonal polygons, in particular of small complexity. For translational packings of orthogonal 6-gons, we show that the competitive ratio of any online algorithm that aims to pack the items into a minimal number of unit bins is in $Ω(n / \log n)$, where $n$ denotes the number of objects. In contrast, we show that constant competitive algorithms exist when the orthogonal 6-gons are symmetric or small. For (orthogonally convex) orthogonal 8-gons, we show that the trivial $n$-competitive algorithm, which places each item in its own bin, is best-possible, i.e., every online algorithm has an asymptotic competitive ratio of at least $n$. This implies that for general orthogonal polygons, the trivial algorithm is best possible. Interestingly, for packing degenerate orthogonal polygons (with thickness $0$), called skeletons, the change in complexity is even more drastic. While constant competitive algorithms for 6-skeletons exist, no online algorithm for 8-skeletons achieves a competitive ratio better than $n$. For other packing variants of orthogonal 6-gons under translation, our insights imply the following consequences. The asymptotic competitive ratio of any online algorithm is in $Ω(n / \log n)$ for strip packing, and there exist online algorithms with competitive ratios in $O(1)$ for perimeter packing, or in $O(\sqrt{n})$ for minimizing the area of the bounding box. Moreover, the critical packing density is positive (if every object individually fits into the interior of a unit bin).

Authors: Tim Gerlach, Benjamin Hennies, Linda Kleist

While rectangular and box-shaped objects dominate the classic discourse of theoretic investigations, a fascinating frontier lies in packing more complex shapes. Given recent insights that convex polygons do not allow for constant competitive online algorithms for diverse variants under translation, we study orthogonal polygons, in particular of small complexity. For translational packings of orthogonal 6-gons, we show that the competitive ratio of any online algorithm that aims to pack the items into a minimal number of unit bins is in $Ω(n / \log n)$, where $n$ denotes the number of objects. In contrast, we show that constant competitive algorithms exist when the orthogonal 6-gons are symmetric or small. For (orthogonally convex) orthogonal 8-gons, we show that the trivial $n$-competitive algorithm, which places each item in its own bin, is best-possible, i.e., every online algorithm has an asymptotic competitive ratio of at least $n$. This implies that for general orthogonal polygons, the trivial algorithm is best possible. Interestingly, for packing degenerate orthogonal polygons (with thickness $0$), called skeletons, the change in complexity is even more drastic. While constant competitive algorithms for 6-skeletons exist, no online algorithm for 8-skeletons achieves a competitive ratio better than $n$. For other packing variants of orthogonal 6-gons under translation, our insights imply the following consequences. The asymptotic competitive ratio of any online algorithm is in $Ω(n / \log n)$ for strip packing, and there exist online algorithms with competitive ratios in $O(1)$ for perimeter packing, or in $O(\sqrt{n})$ for minimizing the area of the bounding box. Moreover, the critical packing density is positive (if every object individually fits into the interior of a unit bin).

Bollobás-Meir TSP Conjecture Holds Asymptotically

from arXiv: Computational Geometry

Authors: Alexey Gordeev

In 1992, Bollobás and Meir showed that for every $k \geq 1$ there exists a constant $c_k$ such that, for any $n$ points in the $k$-dimensional unit cube $[0, 1]^k$, one can find a tour $x_1, \dots, x_n$ through these $n$ points with $\sum_{i = 1}^n |x_i - x_{i + 1}|^k \leq c_k$, where $x_{n + 1} = x_1$ and $|x - y|$ is the Euclidean distance between $x$ and $y$. Remarkably, this bound does not depend on $n$, the number of points. They conjectured that the optimal constant is $c_k = 2 \cdot k^{k / 2}$ and showed that it cannot be taken lower than that. This conjecture was recently revised for $k = 3$ by Balogh, Clemen and Dumitrescu, who showed that $c_3 \geq 2^{7/2} > 2 \cdot 3^{3/2}$. It remains open for all $k > 2$, with the best known upper bound $c_k \leq 2.65^k \cdot k^{k / 2} \cdot (1 + o_k(1))$. We significantly narrow the gap between lower and upper bounds on $c_k$, reducing it from exponential to linear. Specifically, we prove that $c_k \leq 2\mathrm{e}(k + 1) \cdot k^{k / 2}$ and $c_k = k^{k / 2} \cdot (2 + o_k(1))$, the latter establishing the conjecture asymptotically. We also obtain analogous results for related problems on Hamiltonian paths, spanning trees and perfect matchings in the unit cube. Our main tool is a new generalization of the ball packing argument used in earlier works.

Authors: Alexey Gordeev

In 1992, Bollobás and Meir showed that for every $k \geq 1$ there exists a constant $c_k$ such that, for any $n$ points in the $k$-dimensional unit cube $[0, 1]^k$, one can find a tour $x_1, \dots, x_n$ through these $n$ points with $\sum_{i = 1}^n |x_i - x_{i + 1}|^k \leq c_k$, where $x_{n + 1} = x_1$ and $|x - y|$ is the Euclidean distance between $x$ and $y$. Remarkably, this bound does not depend on $n$, the number of points. They conjectured that the optimal constant is $c_k = 2 \cdot k^{k / 2}$ and showed that it cannot be taken lower than that. This conjecture was recently revised for $k = 3$ by Balogh, Clemen and Dumitrescu, who showed that $c_3 \geq 2^{7/2} > 2 \cdot 3^{3/2}$. It remains open for all $k > 2$, with the best known upper bound $c_k \leq 2.65^k \cdot k^{k / 2} \cdot (1 + o_k(1))$. We significantly narrow the gap between lower and upper bounds on $c_k$, reducing it from exponential to linear. Specifically, we prove that $c_k \leq 2\mathrm{e}(k + 1) \cdot k^{k / 2}$ and $c_k = k^{k / 2} \cdot (2 + o_k(1))$, the latter establishing the conjecture asymptotically. We also obtain analogous results for related problems on Hamiltonian paths, spanning trees and perfect matchings in the unit cube. Our main tool is a new generalization of the ball packing argument used in earlier works.

Triangulating a Polygon with Holes in Optimal (Deterministic) Time

from arXiv: Computational Geometry

Authors: Timothy M. Chan

We consider the problem of triangulating a polygon with $n$ vertices and $h$ holes, or relatedly the problem of computing the trapezoidal decomposition of a collection of $h$ disjoint simple polygonal chains with $n$ vertices total. Clarkson, Cole, and Tarjan (1992) and Seidel (1991) gave randomized algorithms running in $O(n\log^*n + h\log h)$ time, while Bar-Yehuda and Chazelle (1994) described deterministic algorithms running in $O(n+h\log^{1+\varepsilon}h)$ or $O((n+h\log h)\log\log h)$ time, for an arbitrarily small positive constant $\varepsilon$. No improvements have been reported since. We describe a new $O(n + h\log h)$-time algorithm, which is optimal and deterministic. More generally, when the given polygonal chains are not necessarily simple and may intersect each other, we show how to compute their trapezoidal decomposition (and in particular, compute all intersections) in optimal $O(n + h\log h)$ deterministic time when the number of intersections is at most $n^{1-\varepsilon}$. To obtain these results, Chazelle's linear-time algorithm for triangulating a simple polygon is used as a black box.

Authors: Timothy M. Chan

We consider the problem of triangulating a polygon with $n$ vertices and $h$ holes, or relatedly the problem of computing the trapezoidal decomposition of a collection of $h$ disjoint simple polygonal chains with $n$ vertices total. Clarkson, Cole, and Tarjan (1992) and Seidel (1991) gave randomized algorithms running in $O(n\log^*n + h\log h)$ time, while Bar-Yehuda and Chazelle (1994) described deterministic algorithms running in $O(n+h\log^{1+\varepsilon}h)$ or $O((n+h\log h)\log\log h)$ time, for an arbitrarily small positive constant $\varepsilon$. No improvements have been reported since. We describe a new $O(n + h\log h)$-time algorithm, which is optimal and deterministic. More generally, when the given polygonal chains are not necessarily simple and may intersect each other, we show how to compute their trapezoidal decomposition (and in particular, compute all intersections) in optimal $O(n + h\log h)$ deterministic time when the number of intersections is at most $n^{1-\varepsilon}$. To obtain these results, Chazelle's linear-time algorithm for triangulating a simple polygon is used as a black box.

Computing the Girth of a Segment Intersection Graph

from arXiv: Computational Geometry

Authors: Timothy M. Chan, Yuancheng Yu

We present an algorithm that computes the girth of the intersection graph of $n$ given line segments in the plane in $O(n^{1.483})$ expected time. This is the first such algorithm with $O(n^{3/2-\varepsilon})$ running time for a positive constant $\varepsilon$, and makes progress towards an open question posed by Chan (SODA 2023). The main techniques include (i)~the usage of recent subcubic algorithms for bounded-difference min-plus matrix multiplication, and (ii)~an interesting variant of the planar graph separator theorem. The result extends to intersection graphs of connected algebraic curves or semialgebraic sets of constant description complexity.

Authors: Timothy M. Chan, Yuancheng Yu

We present an algorithm that computes the girth of the intersection graph of $n$ given line segments in the plane in $O(n^{1.483})$ expected time. This is the first such algorithm with $O(n^{3/2-\varepsilon})$ running time for a positive constant $\varepsilon$, and makes progress towards an open question posed by Chan (SODA 2023). The main techniques include (i)~the usage of recent subcubic algorithms for bounded-difference min-plus matrix multiplication, and (ii)~an interesting variant of the planar graph separator theorem. The result extends to intersection graphs of connected algebraic curves or semialgebraic sets of constant description complexity.

A Fast Quasi-Linear Heuristic for the Close-Enough Traveling Salesman Problem

from arXiv: Computational Geometry

Authors: Khoi Duong

We introduce a fast, quasi-linear-time heuristic for the Close-Enough Traveling Salesman Problem (CETSP), a continuous generalization of the Euclidean TSP in which each target is a disk that must be intersected. The method adapts the pair-center clustering paradigm to circular neighborhoods: a hierarchical clustering phase merges nearby disks into proxy circles using an R*-tree for efficient spatial queries, and a construction phase incrementally expands the hierarchy into a feasible tour while maintaining and locally optimizing tour points. Lightweight local improvements, selective reinsertion and constrained point reoptimization, reduce local inefficiencies without compromising scalability. The algorithm runs in expected O(n log n) time and, on benchmark instances reconstructed from the Mennell dataset, produces solutions within roughly 0-2% of state-of-the-art best-known values while requiring orders-of-magnitude less runtime than population-based metaheuristics. The approach trades some final-solution optimality for dramatic gains in speed and scalability, making it suitable for very large CETSP instances.

Authors: Khoi Duong

We introduce a fast, quasi-linear-time heuristic for the Close-Enough Traveling Salesman Problem (CETSP), a continuous generalization of the Euclidean TSP in which each target is a disk that must be intersected. The method adapts the pair-center clustering paradigm to circular neighborhoods: a hierarchical clustering phase merges nearby disks into proxy circles using an R*-tree for efficient spatial queries, and a construction phase incrementally expands the hierarchy into a feasible tour while maintaining and locally optimizing tour points. Lightweight local improvements, selective reinsertion and constrained point reoptimization, reduce local inefficiencies without compromising scalability. The algorithm runs in expected O(n log n) time and, on benchmark instances reconstructed from the Mennell dataset, produces solutions within roughly 0-2% of state-of-the-art best-known values while requiring orders-of-magnitude less runtime than population-based metaheuristics. The approach trades some final-solution optimality for dramatic gains in speed and scalability, making it suitable for very large CETSP instances.

Shadoks Approach to Parallel Reconfiguration of Triangulations

from arXiv: Computational Geometry

Authors: Guilherme D. da Fonseca, Fabien Feschet, Yan Gerard

We describe the methods used by Team Shadoks to win the CG:SHOP 2026 Challenge on parallel reconfiguration of planar triangulations. An instance is a collection of triangulations of a common point set. We must select a center triangulation and find short parallel-flip paths from each input triangulation to the center, minimizing the sum of path lengths. Our approach combines exact methods based on SAT with several greedy heuristics, and also makes use of SAT and MaxSAT for solution improvement. We present a SAT encoding for bounded-length paths and a global formulation for fixed path-length vectors. We discuss how these components interact in practice and summarize the performance of our solvers on the benchmark instances.

Authors: Guilherme D. da Fonseca, Fabien Feschet, Yan Gerard

We describe the methods used by Team Shadoks to win the CG:SHOP 2026 Challenge on parallel reconfiguration of planar triangulations. An instance is a collection of triangulations of a common point set. We must select a center triangulation and find short parallel-flip paths from each input triangulation to the center, minimizing the sum of path lengths. Our approach combines exact methods based on SAT with several greedy heuristics, and also makes use of SAT and MaxSAT for solution improvement. We present a SAT encoding for bounded-length paths and a global formulation for fixed path-length vectors. We discuss how these components interact in practice and summarize the performance of our solvers on the benchmark instances.

Approximating Convex Hulls via Range Queries

from arXiv: Computational Geometry

Authors: T. Schibler, J. Xue, J. Zhu

Recently, motivated by the rapid increase of the data size in various applications, Monemizadeh [APPROX'23] and Driemel, Monemizadeh, Oh, Staals, and Woodruff [SoCG'25] studied geometric problems in the setting where the only access to the input point set is via querying a range-search oracle. Algorithms in this setting are evaluated on two criteria: (i) the number of queries to the oracle and (ii) the error of the output. In this paper, we continue this line of research and investigate one of the most fundamental geometric problems in the oracle setting, i.e., the convex hull problem. Let $P$ be an unknown set of points in $[0,1]^d$ equipped with a range-emptiness oracle. Via querying the oracle, the algorithm is supposed to output a convex polygon $C \subseteq [0,1]^d$ as an estimation of the convex hull $CH(P)$ of $P$. The error of the output is defined as the volume of the symmetric difference $C \oplus CH(P) = (C \backslash CH(P)) \cup (CH(P) \backslash C)$. We prove tight and near-tight tradeoffs between the number of queries and the error of the output for different variants of the problem, depending on the type of the range-emptiness queries and whether the queries are non-adaptive or adaptive. - Orthogonal emptiness queries in $d$-dimensional space: We show that the minimum error a deterministic algorithm can achieve with $q$ queries is $Θ(q^{-1/d})$ if the queries are non-adaptive, and $Θ(q^{-1/(d-1)})$ if the queries are adaptive. In particular, in 2D, the bounds are $Θ(1/\sqrt{q})$ and $Θ(1/q)$ for non-adaptive and adaptive queries, respectively. - Halfplane emptiness queries in 2D: We show that the minimum error a deterministic algorithm can achieve with $q$ queries is $Θ(1/\sqrt{q})$ if the queries are non-adaptive, and $\widetildeΘ(1/q^2)$ if the queries are adaptive. Here $\widetildeΘ(\cdot)$ hides logarithmic factors.

Authors: T. Schibler, J. Xue, J. Zhu

Recently, motivated by the rapid increase of the data size in various applications, Monemizadeh [APPROX'23] and Driemel, Monemizadeh, Oh, Staals, and Woodruff [SoCG'25] studied geometric problems in the setting where the only access to the input point set is via querying a range-search oracle. Algorithms in this setting are evaluated on two criteria: (i) the number of queries to the oracle and (ii) the error of the output. In this paper, we continue this line of research and investigate one of the most fundamental geometric problems in the oracle setting, i.e., the convex hull problem. Let $P$ be an unknown set of points in $[0,1]^d$ equipped with a range-emptiness oracle. Via querying the oracle, the algorithm is supposed to output a convex polygon $C \subseteq [0,1]^d$ as an estimation of the convex hull $CH(P)$ of $P$. The error of the output is defined as the volume of the symmetric difference $C \oplus CH(P) = (C \backslash CH(P)) \cup (CH(P) \backslash C)$. We prove tight and near-tight tradeoffs between the number of queries and the error of the output for different variants of the problem, depending on the type of the range-emptiness queries and whether the queries are non-adaptive or adaptive. - Orthogonal emptiness queries in $d$-dimensional space: We show that the minimum error a deterministic algorithm can achieve with $q$ queries is $Θ(q^{-1/d})$ if the queries are non-adaptive, and $Θ(q^{-1/(d-1)})$ if the queries are adaptive. In particular, in 2D, the bounds are $Θ(1/\sqrt{q})$ and $Θ(1/q)$ for non-adaptive and adaptive queries, respectively. - Halfplane emptiness queries in 2D: We show that the minimum error a deterministic algorithm can achieve with $q$ queries is $Θ(1/\sqrt{q})$ if the queries are non-adaptive, and $\widetildeΘ(1/q^2)$ if the queries are adaptive. Here $\widetildeΘ(\cdot)$ hides logarithmic factors.

A Dividing Line for Structural Kernelization of Component Order Connectivity via Distance to Bounded Pathwidth

from arXiv: Data Structures and Algorithms

Authors: Jakob Greilhuber, Roohani Sharma

In this work we study a classic generalization of the Vertex Cover (VC) problem, called the Component Order Connectivity (COC) problem. In COC, given an undirected graph $G$, integers $d \geq 1$ and $k$, the goal is to determine if there is a set of at most $k$ vertices whose deletion results in a graph where each connected component has at most $d$ vertices. When $d=1$, this is exactly VC. This work is inspired by polynomial kernelization results with respect to structural parameters for VC. On one hand, Jansen & Bodlaender [TOCS 2013] show that VC admits a polynomial kernel when the parameter is the distance to treewidth-$1$ graphs, on the other hand Cygan, Lokshtanov, Pilipczuk, Pilipczuk & Saurabh [TOCS 2014] showed that VC does not admit a polynomial kernel when the parameter is distance to treewidth-$2$ graphs. Greilhuber & Sharma [IPEC 2024] showed that, for any $d \geq 2$, $d$-COC cannot admit a polynomial kernel when the parameter is distance to a forest of pathwidth $2$. Here, $d$-COC is the same as COC only that $d$ is a fixed constant not part of the input. We complement this result and show that like for the VC problem where distance to treewidth-$1$ graphs versus distance to treewidth-$2$ graphs is the dividing line between structural parameterizations that allow and respectively disallow polynomial kernelization, for COC this dividing line happens between distance to pathwidth-$1$ graphs and distance to pathwidth-$2$ graphs. The main technical result of this work is that COC admits a polynomial kernel parameterized by distance to pathwidth-$1$ graphs plus $d$.

Authors: Jakob Greilhuber, Roohani Sharma

In this work we study a classic generalization of the Vertex Cover (VC) problem, called the Component Order Connectivity (COC) problem. In COC, given an undirected graph $G$, integers $d \geq 1$ and $k$, the goal is to determine if there is a set of at most $k$ vertices whose deletion results in a graph where each connected component has at most $d$ vertices. When $d=1$, this is exactly VC. This work is inspired by polynomial kernelization results with respect to structural parameters for VC. On one hand, Jansen & Bodlaender [TOCS 2013] show that VC admits a polynomial kernel when the parameter is the distance to treewidth-$1$ graphs, on the other hand Cygan, Lokshtanov, Pilipczuk, Pilipczuk & Saurabh [TOCS 2014] showed that VC does not admit a polynomial kernel when the parameter is distance to treewidth-$2$ graphs. Greilhuber & Sharma [IPEC 2024] showed that, for any $d \geq 2$, $d$-COC cannot admit a polynomial kernel when the parameter is distance to a forest of pathwidth $2$. Here, $d$-COC is the same as COC only that $d$ is a fixed constant not part of the input. We complement this result and show that like for the VC problem where distance to treewidth-$1$ graphs versus distance to treewidth-$2$ graphs is the dividing line between structural parameterizations that allow and respectively disallow polynomial kernelization, for COC this dividing line happens between distance to pathwidth-$1$ graphs and distance to pathwidth-$2$ graphs. The main technical result of this work is that COC admits a polynomial kernel parameterized by distance to pathwidth-$1$ graphs plus $d$.

Stable Algorithms Lower Bounds for Estimation

from arXiv: Data Structures and Algorithms

Authors: Xifan Yu, Ilias Zadik

In this work, we show that for all statistical estimation problems, a natural MMSE instability (discontinuity) condition implies the failure of stable algorithms, serving as a version of OGP for estimation tasks. Using this criterion, we establish separations between stable and polynomial-time algorithms for the following MMSE-unstable tasks (i) Planted Shortest Path, where Dijkstra's algorithm succeeds, (ii) random Parity Codes, where Gaussian elimination succeeds, and (iii) Gaussian Subset Sum, where lattice-based methods succeed. For all three, we further show that all low-degree polynomials are stable, yielding separations against low-degree methods and a new method to bound the low-degree MMSE. In particular, our technique highlights that MMSE instability is a common feature for Shortest Path and the noiseless Parity Codes and Gaussian subset sum. Last, we highlight that our work places rigorous algorithmic footing on the long-standing physics belief that first-order phase transitions--which in this setting translates to MMSE-instability impose fundamental limits on classes of efficient algorithms.

Authors: Xifan Yu, Ilias Zadik

In this work, we show that for all statistical estimation problems, a natural MMSE instability (discontinuity) condition implies the failure of stable algorithms, serving as a version of OGP for estimation tasks. Using this criterion, we establish separations between stable and polynomial-time algorithms for the following MMSE-unstable tasks (i) Planted Shortest Path, where Dijkstra's algorithm succeeds, (ii) random Parity Codes, where Gaussian elimination succeeds, and (iii) Gaussian Subset Sum, where lattice-based methods succeed. For all three, we further show that all low-degree polynomials are stable, yielding separations against low-degree methods and a new method to bound the low-degree MMSE. In particular, our technique highlights that MMSE instability is a common feature for Shortest Path and the noiseless Parity Codes and Gaussian subset sum. Last, we highlight that our work places rigorous algorithmic footing on the long-standing physics belief that first-order phase transitions--which in this setting translates to MMSE-instability impose fundamental limits on classes of efficient algorithms.

On the Complexity of Fundamental Problems for DAG-Compressed Graphs

from arXiv: Data Structures and Algorithms

Authors: Florian Chudigiewitsch, Till Tantau, Felix Winkler

A DAG compression of a (typically dense) graph is a simple data structure that stores how vertex clusters are connected, where the clusters are described indirectly as sets of reachable sinks in a directed acyclic graph (DAG). They generalize tree compressions, where the clusters form a tree-like hierarchy, and we give the first proof that DAG compressions can achieve better compressions than tree compressions. Our interest in DAG compression stems from the fact that several simple standard algorithms, like breadth-first search on graphs, can be implemented so that they work directly on the compressed rather than on the original graph and so that, crucially, the runtime is relative to the (typically small) size of the compressed graph. We add another entry to the list of algorithms where this is possible, by showing that Kruskal's algorithm for computing minimum spanning trees can be adapted to work directly on DAG compressions. On the negative side, we answer the central open problem from previous work, namely how hard it is to compute a minimum-size DAG compression for a given graph: This is NP-hard; and this is even the case for the dynamic setting, where we must update the DAG compression optimally when a single edge is added or deleted in the input graph.

Authors: Florian Chudigiewitsch, Till Tantau, Felix Winkler

A DAG compression of a (typically dense) graph is a simple data structure that stores how vertex clusters are connected, where the clusters are described indirectly as sets of reachable sinks in a directed acyclic graph (DAG). They generalize tree compressions, where the clusters form a tree-like hierarchy, and we give the first proof that DAG compressions can achieve better compressions than tree compressions. Our interest in DAG compression stems from the fact that several simple standard algorithms, like breadth-first search on graphs, can be implemented so that they work directly on the compressed rather than on the original graph and so that, crucially, the runtime is relative to the (typically small) size of the compressed graph. We add another entry to the list of algorithms where this is possible, by showing that Kruskal's algorithm for computing minimum spanning trees can be adapted to work directly on DAG compressions. On the negative side, we answer the central open problem from previous work, namely how hard it is to compute a minimum-size DAG compression for a given graph: This is NP-hard; and this is even the case for the dynamic setting, where we must update the DAG compression optimally when a single edge is added or deleted in the input graph.

Finding Minimum Distance Preservers: A Parameterized Study

from arXiv: Data Structures and Algorithms

Authors: Kirill Simonov, Farehe Soheil, Shaily Verma

For a given graph $G$ and a subset of vertices $S$, a \emph{distance preserver} is a subgraph of $G$ that preserves shortest paths between the vertices of $S$. We distinguish between a \emph{subsetwise} distance preserver, which preserves distances between all pairs in $S$, and a \emph{pairwise} distance preserver, which preserves distances only between specific pairs of vertices in $S$, given in the input. While a large body of work is dedicated to upper and lower bounds on the size of distance preservers and, more generally, graph spanners, the computational complexity of finding the minimum distance preserver has received comparatively little attention. We consider the respective \scup{Subsetwise Distance Preserver}\xspace (\scup{SDP}\xspace) and \scup{Pairwise Distance Preserver}\xspace (\scup{PDP}\xspace) problems and initiate the study of their computational complexity. We provide a detailed complexity landscape with respect to natural parameters, including the number of terminals, solution size, vertex cover, and treewidth. Our main contributions are as follows: \begin{itemize} \setlength{\itemsep}{0.5em} \item Both \scup{PDP}\xspace and \scup{SDP}\xspace are \nph\ even on subgraphs of the grid. Moreover, when parameterized by the number of terminals, the problems are \wh{1}\ on subgraphs of the grid, while they become \textsc{FPT}\ on full grids. \item \scup{PDP}\xspace is \nph\ on graphs of vertex cover $3$, while \scup{SDP}\xspace is \textsc{FPT}\ when parameterized by the vertex cover of the graph. Thus, the vertex cover parameter distinguishes the two variants. \item Both problems are \textsc{FPT}\ when parameterized by the number of terminals and the treewidth of the graph. \end{itemize}

Authors: Kirill Simonov, Farehe Soheil, Shaily Verma

For a given graph $G$ and a subset of vertices $S$, a \emph{distance preserver} is a subgraph of $G$ that preserves shortest paths between the vertices of $S$. We distinguish between a \emph{subsetwise} distance preserver, which preserves distances between all pairs in $S$, and a \emph{pairwise} distance preserver, which preserves distances only between specific pairs of vertices in $S$, given in the input. While a large body of work is dedicated to upper and lower bounds on the size of distance preservers and, more generally, graph spanners, the computational complexity of finding the minimum distance preserver has received comparatively little attention. We consider the respective \scup{Subsetwise Distance Preserver}\xspace (\scup{SDP}\xspace) and \scup{Pairwise Distance Preserver}\xspace (\scup{PDP}\xspace) problems and initiate the study of their computational complexity. We provide a detailed complexity landscape with respect to natural parameters, including the number of terminals, solution size, vertex cover, and treewidth. Our main contributions are as follows: \begin{itemize} \setlength{\itemsep}{0.5em} \item Both \scup{PDP}\xspace and \scup{SDP}\xspace are \nph\ even on subgraphs of the grid. Moreover, when parameterized by the number of terminals, the problems are \wh{1}\ on subgraphs of the grid, while they become \textsc{FPT}\ on full grids. \item \scup{PDP}\xspace is \nph\ on graphs of vertex cover $3$, while \scup{SDP}\xspace is \textsc{FPT}\ when parameterized by the vertex cover of the graph. Thus, the vertex cover parameter distinguishes the two variants. \item Both problems are \textsc{FPT}\ when parameterized by the number of terminals and the treewidth of the graph. \end{itemize}

Charting the Diameter Computation Landscape of Geometric Intersection Graphs in Three Dimensions and Higher

from arXiv: Data Structures and Algorithms

Authors: Timothy M. Chan, Hsien-Chih Chang, Jie Gao, Sándor Kisfaludi-Bak, Hung Le, Da Wei Zheng

Recent research on computing the diameter of geometric intersection graphs has made significant strides, primarily focusing on the 2D case where truly subquadratic-time algorithms were given for simple objects such as unit-disks and (axis-aligned) squares. However, in three or higher dimensions, there is no known truly subquadratic-time algorithm for any intersection graph of non-trivial objects, even basic ones such as unit balls or (axis-aligned) unit cubes. This was partially explained by the pioneering work of Bringmann et al. [SoCG '22] which gave several truly subquadratic lower bounds, notably for unit balls or unit cubes in 3D when the graph diameter $Δ$ is at least $Ω(\log n)$, hinting at a pessimistic outlook for the complexity of the diameter problem in higher dimensions. In this paper, we substantially extend the landscape of diameter computation for objects in three and higher dimensions, giving a few positive results. Our highlighted findings include: - A truly subquadratic-time algorithm for deciding if the diameter of unit cubes in 3D is at most 3 (Diameter-3 hereafter), the first algorithm of its kind for objects in 3D or higher dimensions. Our algorithm is based on a novel connection to pseudolines, which is of independent interest. - A truly subquadratic time lower bound for \Diameter-3 of unit balls in 3D under the Orthogonal Vector (OV) hypothesis, giving the first separation between unit balls and unit cubes in the small diameter regime. Previously, computing the diameter for both objects was known to be truly subquadratic hard when the diameter is $Ω(\log n)$. - A near-linear-time algorithm for Diameter-2 of unit cubes in 3D, generalizing the previous result for unit squares in 2D. - A truly subquadratic-time algorithm and lower bound for Diameter-2 and Diameter-3 of rectangular boxes (of arbitrary dimension and sizes), respectively.

Authors: Timothy M. Chan, Hsien-Chih Chang, Jie Gao, Sándor Kisfaludi-Bak, Hung Le, Da Wei Zheng

Recent research on computing the diameter of geometric intersection graphs has made significant strides, primarily focusing on the 2D case where truly subquadratic-time algorithms were given for simple objects such as unit-disks and (axis-aligned) squares. However, in three or higher dimensions, there is no known truly subquadratic-time algorithm for any intersection graph of non-trivial objects, even basic ones such as unit balls or (axis-aligned) unit cubes. This was partially explained by the pioneering work of Bringmann et al. [SoCG '22] which gave several truly subquadratic lower bounds, notably for unit balls or unit cubes in 3D when the graph diameter $Δ$ is at least $Ω(\log n)$, hinting at a pessimistic outlook for the complexity of the diameter problem in higher dimensions. In this paper, we substantially extend the landscape of diameter computation for objects in three and higher dimensions, giving a few positive results. Our highlighted findings include: - A truly subquadratic-time algorithm for deciding if the diameter of unit cubes in 3D is at most 3 (Diameter-3 hereafter), the first algorithm of its kind for objects in 3D or higher dimensions. Our algorithm is based on a novel connection to pseudolines, which is of independent interest. - A truly subquadratic time lower bound for \Diameter-3 of unit balls in 3D under the Orthogonal Vector (OV) hypothesis, giving the first separation between unit balls and unit cubes in the small diameter regime. Previously, computing the diameter for both objects was known to be truly subquadratic hard when the diameter is $Ω(\log n)$. - A near-linear-time algorithm for Diameter-2 of unit cubes in 3D, generalizing the previous result for unit squares in 2D. - A truly subquadratic-time algorithm and lower bound for Diameter-2 and Diameter-3 of rectangular boxes (of arbitrary dimension and sizes), respectively.

Optimal-Cost Construction of Shallow Cuttings for 3-D Dominance Ranges in the I/O-Model

from arXiv: Data Structures and Algorithms

Authors: Yakov Nekrich, Saladi Rahul

Shallow cuttings are a fundamental tool in computational geometry and spatial databases for solving offline and online range searching problems. For a set $P$ of $N$ points in 3-D, at SODA'14, Afshani and Tsakalidis designed an optimal $O(N\log_2N)$ time algorithm that constructs shallow cuttings for 3-D dominance ranges in internal memory. Even though shallow cuttings are used in the I/O-model to design space and query efficient range searching data structures, an efficient construction of them is not known till now. In this paper, we design an optimal-cost algorithm to construct shallow cuttings for 3-D dominance ranges. The number of I/Os performed by the algorithm is $O\left(\frac{N}{B}\log_{M/B}\left(\frac{N}{B}\right) \right)$, where $B$ is the block size and $M$ is the memory size. As two applications of the optimal-cost construction algorithm, we design fast algorithms for offline 3-D dominance reporting and offline 3-D approximate dominance counting. We believe that our algorithm will find further applications in offline 3-D range searching problems and in improving construction cost of data structures for 3-D range searching problems.

Authors: Yakov Nekrich, Saladi Rahul

Shallow cuttings are a fundamental tool in computational geometry and spatial databases for solving offline and online range searching problems. For a set $P$ of $N$ points in 3-D, at SODA'14, Afshani and Tsakalidis designed an optimal $O(N\log_2N)$ time algorithm that constructs shallow cuttings for 3-D dominance ranges in internal memory. Even though shallow cuttings are used in the I/O-model to design space and query efficient range searching data structures, an efficient construction of them is not known till now. In this paper, we design an optimal-cost algorithm to construct shallow cuttings for 3-D dominance ranges. The number of I/Os performed by the algorithm is $O\left(\frac{N}{B}\log_{M/B}\left(\frac{N}{B}\right) \right)$, where $B$ is the block size and $M$ is the memory size. As two applications of the optimal-cost construction algorithm, we design fast algorithms for offline 3-D dominance reporting and offline 3-D approximate dominance counting. We believe that our algorithm will find further applications in offline 3-D range searching problems and in improving construction cost of data structures for 3-D range searching problems.

Fast Nearest Neighbor Search for $\ell_p$ Metrics

from arXiv: Data Structures and Algorithms

Authors: Robert Krauthgamer, Nir Petruschka

The Nearest Neighbor Search (NNS) problem asks to design a data structure that preprocesses an $n$-point dataset $X$ lying in a metric space $\mathcal{M}$, so that given a query point $q \in \mathcal{M}$, one can quickly return a point of $X$ minimizing the distance to $q$. The efficiency of such a data structure is evaluated primarily by the amount of space it uses and the time required to answer a query. We focus on the fast query-time regime, which is crucial for modern large-scale applications, where datasets are massive and queries must be processed online, and is often modeled by query time $\text{poly}(d \log n)$. Our main result is such a randomized data structure for NNS in $\ell_p$ spaces, $p>2$, that achieves $p^{O(1) + \log\log p}$ approximation with fast query time and $\text{poly}(dn)$ space. Our data structure improves, or is incomparable to, the state-of-the-art for the fast query-time regime from [Bartal and Gottlieb, TCS 2019] and [Krauthgamer, Petruschka and Sapir, FOCS 2025].

Authors: Robert Krauthgamer, Nir Petruschka

The Nearest Neighbor Search (NNS) problem asks to design a data structure that preprocesses an $n$-point dataset $X$ lying in a metric space $\mathcal{M}$, so that given a query point $q \in \mathcal{M}$, one can quickly return a point of $X$ minimizing the distance to $q$. The efficiency of such a data structure is evaluated primarily by the amount of space it uses and the time required to answer a query. We focus on the fast query-time regime, which is crucial for modern large-scale applications, where datasets are massive and queries must be processed online, and is often modeled by query time $\text{poly}(d \log n)$. Our main result is such a randomized data structure for NNS in $\ell_p$ spaces, $p>2$, that achieves $p^{O(1) + \log\log p}$ approximation with fast query time and $\text{poly}(dn)$ space. Our data structure improves, or is incomparable to, the state-of-the-art for the fast query-time regime from [Bartal and Gottlieb, TCS 2019] and [Krauthgamer, Petruschka and Sapir, FOCS 2025].

Optimal-Time Move Structure Balancing and LCP Array Computation from the RLBWT

from arXiv: Data Structures and Algorithms

Authors: Nathaniel K. Brown, Ahsan Sanaullah, Shaojie Zhang, Ben Langmead

On repetitive text collections of size $n$, the Burrows-Wheeler Transform (BWT) tends to have relatively fewer runs $r$ in its run-length encoded BWT (RLBWT). This motivates many RLBWT-related algorithms and data structures that can be designed in compressed $O(r)$-space. These approaches often use the RLBWT-derived permutations LF, FL, $φ$, and $φ^{-1}$, which can be represented using a move structure to obtain optimal $O(1)$-time for each permutation step in $O(r)$-space. They are then used to construct compressed space text indexes supporting efficient pattern matching queries. However, move structure construction in $O(r)$-space requires an $O(r \log r)$-time balancing stage. The longest common prefix array (LCP) of a text collection is used to support pattern matching queries and data structure construction. Recently, it was shown how to compute the LCP array in $O(n + r \log r)$-time and $O(r)$ additional space from an RLBWT. However, the bottleneck remains the $O(r \log r)$-time move structure balancing stage. In this paper, we describe an optimal $O(r)$-time and space algorithm to balance a move structure. This result is then applied to LCP construction from an RLBWT to obtain an optimal $O(n)$-time algorithm in $O(r)$-space in addition to the output, which implies an optimal-time algorithm for LCP array enumeration in compressed $O(r)$-space.

Authors: Nathaniel K. Brown, Ahsan Sanaullah, Shaojie Zhang, Ben Langmead

On repetitive text collections of size $n$, the Burrows-Wheeler Transform (BWT) tends to have relatively fewer runs $r$ in its run-length encoded BWT (RLBWT). This motivates many RLBWT-related algorithms and data structures that can be designed in compressed $O(r)$-space. These approaches often use the RLBWT-derived permutations LF, FL, $φ$, and $φ^{-1}$, which can be represented using a move structure to obtain optimal $O(1)$-time for each permutation step in $O(r)$-space. They are then used to construct compressed space text indexes supporting efficient pattern matching queries. However, move structure construction in $O(r)$-space requires an $O(r \log r)$-time balancing stage. The longest common prefix array (LCP) of a text collection is used to support pattern matching queries and data structure construction. Recently, it was shown how to compute the LCP array in $O(n + r \log r)$-time and $O(r)$ additional space from an RLBWT. However, the bottleneck remains the $O(r \log r)$-time move structure balancing stage. In this paper, we describe an optimal $O(r)$-time and space algorithm to balance a move structure. This result is then applied to LCP construction from an RLBWT to obtain an optimal $O(n)$-time algorithm in $O(r)$-space in addition to the output, which implies an optimal-time algorithm for LCP array enumeration in compressed $O(r)$-space.

Approximate Butterfly Counting in Sublinear Time

from arXiv: Data Structures and Algorithms

Authors: Chi Luo, Jiaxin Song, Yuhao Zhang, Kai Wang, Zhixing He, Kuan Yang

Bipartite graphs serve as a natural model for representing relationships between two different types of entities. When analyzing bipartite graphs, butterfly counting is a fundamental research problem that aims to count the number of butterflies (i.e., 2x2 bicliques) in a given bipartite graph. While this problem has been extensively studied in the literature, existing algorithms usually necessitate access to a large portion of the entire graph, presenting challenges in real scenarios where graphs are extremely large and I/O costs are expensive. In this paper, we study the butterfly counting problem under the query model, where the following query operations are permitted: degree query, neighbor query, and vertex-pair query. We propose TLS, a practical two-level sampling algorithm that can estimate the butterfly count accurately while accessing only a limited graph structure, achieving significantly lower query costs under the standard query model. TLS also incorporates several key techniques to control the variance, including "small-degree-first sampling" and "wedge sampling via small subsets". To ensure theoretical guarantees, we further introduce two novel techniques: "heavy-light partition" and "guess-and-prove", integrated into TLS. With these techniques, we prove that the algorithm can achieve a (1+eps) accuracy for any given approximation parameter 0 < eps < 1 on general bipartite graphs with a promised time and query complexity. In particular, the promised time is sublinear when the input graph is dense enough. Extensive experiments on 15 datasets demonstrate that TLS delivers robust estimates with up to three orders of magnitude lower query costs and runtime compared to existing solutions.

Authors: Chi Luo, Jiaxin Song, Yuhao Zhang, Kai Wang, Zhixing He, Kuan Yang

Bipartite graphs serve as a natural model for representing relationships between two different types of entities. When analyzing bipartite graphs, butterfly counting is a fundamental research problem that aims to count the number of butterflies (i.e., 2x2 bicliques) in a given bipartite graph. While this problem has been extensively studied in the literature, existing algorithms usually necessitate access to a large portion of the entire graph, presenting challenges in real scenarios where graphs are extremely large and I/O costs are expensive. In this paper, we study the butterfly counting problem under the query model, where the following query operations are permitted: degree query, neighbor query, and vertex-pair query. We propose TLS, a practical two-level sampling algorithm that can estimate the butterfly count accurately while accessing only a limited graph structure, achieving significantly lower query costs under the standard query model. TLS also incorporates several key techniques to control the variance, including "small-degree-first sampling" and "wedge sampling via small subsets". To ensure theoretical guarantees, we further introduce two novel techniques: "heavy-light partition" and "guess-and-prove", integrated into TLS. With these techniques, we prove that the algorithm can achieve a (1+eps) accuracy for any given approximation parameter 0 < eps < 1 on general bipartite graphs with a promised time and query complexity. In particular, the promised time is sublinear when the input graph is dense enough. Extensive experiments on 15 datasets demonstrate that TLS delivers robust estimates with up to three orders of magnitude lower query costs and runtime compared to existing solutions.

Non-Exclusive Notifications for Ride-Hailing at Lyft I: Single-Cycle Approximation Algorithms

from arXiv: Data Structures and Algorithms

Authors: Farbod Ekbatani, Rad Niazadeh, Mehdi Golari, Romain Camilleri, Titouan Jehl, Chris Sholley, Matthew Leventi, Theresa Calderon, Angela Lam, Paul Havard Duclos, Tim Holland, James Koch, Shreya Reddy

Ride-hailing platforms increasingly rely on non-exclusive notifications-broadcasting a single request to multiple drivers simultaneously-to mitigate inefficiencies caused by uncertain driver acceptance. In this paper, the first in a two-part collaboration with Lyft, we formally model the 'Notification Set Selection Problem' for a single decision cycle, where the platform determines the optimal subset of drivers to notify for each incoming ride request. We analyze this combinatorial optimization problem under two contention-resolution protocols: 'First Acceptance (FA)', which prioritizes speed by assigning the ride to the first responder, and 'Best Acceptance (BA)', which prioritizes match quality by selecting the highest-valued accepting driver. We show that welfare maximization under both mechanisms is strongly NP-hard, ruling out a Fully Polynomial Time Approximation Scheme (FPTAS). Despite this, we derive several positive algorithmic results. For FA, we present a Polynomial Time Approximation Scheme (PTAS) for the single-rider case and a constant-factor approximation (factor 4) for the general matching setting. We highlight that the FA valuation function can be viewed as a novel discrete choice model with theoretical properties of independent interest. For BA, we prove that the objective is monotone and submodular, admitting a standard $(1 - 1/e)$-approximation. Moreover, using a polynomial-time demand oracle that we design for this problem, we show it is possible to surpass the $(1 - 1/e)$ barrier. Finally, in the special case of homogeneous acceptance probabilities, we show that the BA problem can be solved exactly in polynomial time via a linear programming formulation. We validate the empirical performance our algorithms through numerical experiments on synthetic data and on instances calibrated using real ride-sharing data from Lyft.

Authors: Farbod Ekbatani, Rad Niazadeh, Mehdi Golari, Romain Camilleri, Titouan Jehl, Chris Sholley, Matthew Leventi, Theresa Calderon, Angela Lam, Paul Havard Duclos, Tim Holland, James Koch, Shreya Reddy

Ride-hailing platforms increasingly rely on non-exclusive notifications-broadcasting a single request to multiple drivers simultaneously-to mitigate inefficiencies caused by uncertain driver acceptance. In this paper, the first in a two-part collaboration with Lyft, we formally model the 'Notification Set Selection Problem' for a single decision cycle, where the platform determines the optimal subset of drivers to notify for each incoming ride request. We analyze this combinatorial optimization problem under two contention-resolution protocols: 'First Acceptance (FA)', which prioritizes speed by assigning the ride to the first responder, and 'Best Acceptance (BA)', which prioritizes match quality by selecting the highest-valued accepting driver. We show that welfare maximization under both mechanisms is strongly NP-hard, ruling out a Fully Polynomial Time Approximation Scheme (FPTAS). Despite this, we derive several positive algorithmic results. For FA, we present a Polynomial Time Approximation Scheme (PTAS) for the single-rider case and a constant-factor approximation (factor 4) for the general matching setting. We highlight that the FA valuation function can be viewed as a novel discrete choice model with theoretical properties of independent interest. For BA, we prove that the objective is monotone and submodular, admitting a standard $(1 - 1/e)$-approximation. Moreover, using a polynomial-time demand oracle that we design for this problem, we show it is possible to surpass the $(1 - 1/e)$ barrier. Finally, in the special case of homogeneous acceptance probabilities, we show that the BA problem can be solved exactly in polynomial time via a linear programming formulation. We validate the empirical performance our algorithms through numerical experiments on synthetic data and on instances calibrated using real ride-sharing data from Lyft.

Stationary Online Contention Resolution Schemes

from arXiv: Data Structures and Algorithms

Authors: Mohammad Reza Aminian, Rad Niazadeh, Pranav Nuti

Online contention resolution schemes (OCRSs) are a central tool in Bayesian online selection and resource allocation: they convert fractional ex-ante relaxations into feasible online policies while preserving each marginal probability up to a constant factor. Despite their importance, designing (near) optimal OCRSs is often technically challenging, and many existing constructions rely on indirect reductions to prophet inequalities and LP duality, resulting in algorithms that are difficult to interpret or implement. In this paper, we introduce "stationary online contention resolution schemes (S-OCRSs)," a permutation-invariant class of OCRSs in which the distribution of the selected feasible set is independent of arrival order. We show that S-OCRSs admit an exact distributional characterization together with a universal online implementation. We then develop a general `maximum-entropy' approach to construct and analyze S-OCRSs, reducing the design of online policies to constructing suitable distributions over feasible sets. This yields a new technical framework for designing simple and possibly improved OCRSs. We demonstrate the power of this framework across several canonical feasibility environments. In particular, we obtain an improved $(3-\sqrt{5})/2$-selectable OCRS for bipartite matchings, attaining the independence benchmark conjectured to be optimal and yielding the best known prophet inequality for this setting. We also obtain a $1-\sqrt{2/(πk)} + O(1/k)$-selectable OCRS for $k$-uniform matroids and a simple, explicit $1/2$-selectable OCRS for weakly Rayleigh matroids (including all $\mathbb{C}$-representable matroids such as graphic and laminar). While these guarantees match the best known bounds, our framework also yields concrete and systematic constructions, providing transparent algorithms in settings where previous OCRSs were implicit or technically involved.

Authors: Mohammad Reza Aminian, Rad Niazadeh, Pranav Nuti

Online contention resolution schemes (OCRSs) are a central tool in Bayesian online selection and resource allocation: they convert fractional ex-ante relaxations into feasible online policies while preserving each marginal probability up to a constant factor. Despite their importance, designing (near) optimal OCRSs is often technically challenging, and many existing constructions rely on indirect reductions to prophet inequalities and LP duality, resulting in algorithms that are difficult to interpret or implement. In this paper, we introduce "stationary online contention resolution schemes (S-OCRSs)," a permutation-invariant class of OCRSs in which the distribution of the selected feasible set is independent of arrival order. We show that S-OCRSs admit an exact distributional characterization together with a universal online implementation. We then develop a general `maximum-entropy' approach to construct and analyze S-OCRSs, reducing the design of online policies to constructing suitable distributions over feasible sets. This yields a new technical framework for designing simple and possibly improved OCRSs. We demonstrate the power of this framework across several canonical feasibility environments. In particular, we obtain an improved $(3-\sqrt{5})/2$-selectable OCRS for bipartite matchings, attaining the independence benchmark conjectured to be optimal and yielding the best known prophet inequality for this setting. We also obtain a $1-\sqrt{2/(πk)} + O(1/k)$-selectable OCRS for $k$-uniform matroids and a simple, explicit $1/2$-selectable OCRS for weakly Rayleigh matroids (including all $\mathbb{C}$-representable matroids such as graphic and laminar). While these guarantees match the best known bounds, our framework also yields concrete and systematic constructions, providing transparent algorithms in settings where previous OCRSs were implicit or technically involved.

Hardening Confidential Federated Compute against Side-channel Attacks

from arXiv: Data Structures and Algorithms

Authors: James Bell-Clark, Albert Cheu, Adria Gascon, Jonathan Katz

In this work, we identify a set of side-channels in our Confidential Federated Compute platform that a hypothetical insider could exploit to circumvent differential privacy (DP) guarantees. We show how DP can mitigate two of the side-channels, one of which has been implemented in our open-source library.

Authors: James Bell-Clark, Albert Cheu, Adria Gascon, Jonathan Katz

In this work, we identify a set of side-channels in our Confidential Federated Compute platform that a hypothetical insider could exploit to circumvent differential privacy (DP) guarantees. We show how DP can mitigate two of the side-channels, one of which has been implemented in our open-source library.

The Library Theorem: How External Organization Governs Agentic Reasoning Capacity

from arXiv: Data Structures and Algorithms

Authors: Zachary F. Mainen

Externalized reasoning is already exploited by transformer-based agents through chain-of-thought, but structured retrieval -- indexing over one's own reasoning state -- remains underexplored. We formalize the transformer context window as an I/O page and prove that tool-augmented agents with indexed external memory achieve exponentially lower retrieval cost than agents restricted to sequential scanning: $O(\log_b N)$ versus $Ω(N)$ page reads per query, and $O(T \log_b T)$ versus $Θ(T^2)$ cumulative cost over $T$ reasoning steps -- a gap that widens as deliberation deepens. We test these predictions on a controlled lookup benchmark across three content types -- random hashes, ordered integers, and encyclopedia entries -- varying store size from 50 to 5,000 items, and replicate key conditions across two model generations (GPT-4o-mini and GPT-5.4). On abstract content, the indexed agent achieves median 1 page read regardless of store size, confirming the $O(1)$ prediction. Sorted pages without an index fail to close the gap: the weaker model cannot sustain binary search at scale, and the stronger model achieves near-optimal $\log_2 N$ search but still loses to the index by $5\times$. On familiar content (encyclopedia entries), a competing failure mode emerges: the model recognizes the domain, bypasses the retrieval protocol, and generates answers from parametric memory, producing catastrophic token expenditure even when the index is sound. This parametric memory competition dissociates the two cognitive operations that indexing combines: understanding content (where language models excel) and following navigational protocols (where they fail when understanding tempts them to shortcut). The result argues for a separation of concerns: use language models for index construction, where semantic understanding helps, and deterministic algorithms for index traversal, where it hurts.

Authors: Zachary F. Mainen

Externalized reasoning is already exploited by transformer-based agents through chain-of-thought, but structured retrieval -- indexing over one's own reasoning state -- remains underexplored. We formalize the transformer context window as an I/O page and prove that tool-augmented agents with indexed external memory achieve exponentially lower retrieval cost than agents restricted to sequential scanning: $O(\log_b N)$ versus $Ω(N)$ page reads per query, and $O(T \log_b T)$ versus $Θ(T^2)$ cumulative cost over $T$ reasoning steps -- a gap that widens as deliberation deepens. We test these predictions on a controlled lookup benchmark across three content types -- random hashes, ordered integers, and encyclopedia entries -- varying store size from 50 to 5,000 items, and replicate key conditions across two model generations (GPT-4o-mini and GPT-5.4). On abstract content, the indexed agent achieves median 1 page read regardless of store size, confirming the $O(1)$ prediction. Sorted pages without an index fail to close the gap: the weaker model cannot sustain binary search at scale, and the stronger model achieves near-optimal $\log_2 N$ search but still loses to the index by $5\times$. On familiar content (encyclopedia entries), a competing failure mode emerges: the model recognizes the domain, bypasses the retrieval protocol, and generates answers from parametric memory, producing catastrophic token expenditure even when the index is sound. This parametric memory competition dissociates the two cognitive operations that indexing combines: understanding content (where language models excel) and following navigational protocols (where they fail when understanding tempts them to shortcut). The result argues for a separation of concerns: use language models for index construction, where semantic understanding helps, and deterministic algorithms for index traversal, where it hurts.

(Sets of ) Complement Scattered Factors

from arXiv: Data Structures and Algorithms

Authors: Duncan Adamson, Pamela Fleischmann, Annika Huch

Starting in the 1970s with the fundamental work of Imre Simon, \emph{scattered factors} (also known as subsequences or scattered subwords) have remained a consistently and heavily studied object. The majority of work on scattered factors can be split into two broad classes of problems: given a word, what information, in the form of scattered factors, are contained, and which are not. In this paper, we consider an intermediary problem, introducing the notion of \emph{complement scattered factors}. Given a word $w$ and a scattered factor $u$ of $w$, the complement scattered factors of $w$ with regards to $u$, $C(w, u)$, is the set of scattered factors in $w$ that can be formed by removing any embedding of $u$ from $w$. This is closely related to the \emph{shuffle} operation in which two words are intertwined, i.e., we extend previous work relating to the shuffle operator, using knowledge about scattered factors. Alongside introducing these sets, we provide combinatorial results on the size of the set $C(w, u)$, an algorithm to compute the set $C(w, u)$ from $w$ and $u$ in $O(\vert w \vert \cdot \vert u \vert \binom{w}{u})$ time, where $\binom{w}{u}$ denotes the number of embeddings of $u$ into $w$, an algorithm to construct $u$ from $w$ and $C(w, u)$ in $O(\vert w \vert^2 \binom{\vert w \vert}{\vert w \vert - \vert u \vert})$ time, and an algorithm to construct $w$ from $u$ and $C(w, u)$ in $O(\vert u \vert \cdot \vert w \vert^{\vert u \vert + 1})$ time.

Authors: Duncan Adamson, Pamela Fleischmann, Annika Huch

Starting in the 1970s with the fundamental work of Imre Simon, \emph{scattered factors} (also known as subsequences or scattered subwords) have remained a consistently and heavily studied object. The majority of work on scattered factors can be split into two broad classes of problems: given a word, what information, in the form of scattered factors, are contained, and which are not. In this paper, we consider an intermediary problem, introducing the notion of \emph{complement scattered factors}. Given a word $w$ and a scattered factor $u$ of $w$, the complement scattered factors of $w$ with regards to $u$, $C(w, u)$, is the set of scattered factors in $w$ that can be formed by removing any embedding of $u$ from $w$. This is closely related to the \emph{shuffle} operation in which two words are intertwined, i.e., we extend previous work relating to the shuffle operator, using knowledge about scattered factors. Alongside introducing these sets, we provide combinatorial results on the size of the set $C(w, u)$, an algorithm to compute the set $C(w, u)$ from $w$ and $u$ in $O(\vert w \vert \cdot \vert u \vert \binom{w}{u})$ time, where $\binom{w}{u}$ denotes the number of embeddings of $u$ into $w$, an algorithm to construct $u$ from $w$ and $C(w, u)$ in $O(\vert w \vert^2 \binom{\vert w \vert}{\vert w \vert - \vert u \vert})$ time, and an algorithm to construct $w$ from $u$ and $C(w, u)$ in $O(\vert u \vert \cdot \vert w \vert^{\vert u \vert + 1})$ time.

Monday, March 23

Strong Chain Quality

from Decentralized Thoughts

Chain Quality (CQ) is a core blockchain property. Roughly speaking, it says: Owning $3\%$ of the stake gives you inclusion rights over valid inputs of your choice in roughly $3\%$ of the blockspace over time. For Nakamoto style chains, this is called Ideal CQ (see here). Chain quality was sufficient for early generations of blockchains that had low throughput, but modern blockchains have much higher bandwidth and can commit many...

By Ittai Abraham, Pranav Garimidi, Joachim Neu

Chain Quality (CQ) is a core blockchain property. Roughly speaking, it says: Owning $3\%$ of the stake gives you inclusion rights over valid inputs of your choice in roughly $3\%$ of the blockspace over time. For Nakamoto style chains, this is called Ideal CQ (see here). Chain quality was sufficient for early generations of blockchains that had low throughput, but modern blockchains have much higher bandwidth and can commit many...

By Ittai Abraham, Pranav Garimidi, Joachim Neu

The monotonicity of the Franz-Parisi potential is equivalent with Low-degree MMSE lower bounds

from arXiv: Computational Complexity

Authors: Konstantinos Tsirkas, Leda Wang, Ilias Zadik

Over the last decades, two distinct approaches have been instrumental to our understanding of the computational complexity of statistical estimation. The statistical physics literature predicts algorithmic hardness through local stability and monotonicity properties of the Franz--Parisi (FP) potential \cite{franz1995recipes,franz1997phase}, while the mathematically rigorous literature characterizes hardness via the limitations of restricted algorithmic classes, most notably low-degree polynomial estimators \cite{hopkins2017efficient}. For many inference models, these two perspectives yield strikingly consistent predictions, giving rise to a long-standing open problem of establishing a precise mathematical relationship between them. In this work, we show that for estimation problems the power of low-degree polynomials is equivalent to the monotonicity of the annealed FP potential for a broad family of Gaussian additive models (GAMs) with signal-to-noise ratio $λ$. In particular, subject to a low-degree conjecture for GAMs, our results imply that the polynomial-time limits of these models are directly implied by the monotonicity of the annealed FP potential, in conceptual agreement with predictions from the physics literature dating back to the 1990s.

Authors: Konstantinos Tsirkas, Leda Wang, Ilias Zadik

Over the last decades, two distinct approaches have been instrumental to our understanding of the computational complexity of statistical estimation. The statistical physics literature predicts algorithmic hardness through local stability and monotonicity properties of the Franz--Parisi (FP) potential \cite{franz1995recipes,franz1997phase}, while the mathematically rigorous literature characterizes hardness via the limitations of restricted algorithmic classes, most notably low-degree polynomial estimators \cite{hopkins2017efficient}. For many inference models, these two perspectives yield strikingly consistent predictions, giving rise to a long-standing open problem of establishing a precise mathematical relationship between them. In this work, we show that for estimation problems the power of low-degree polynomials is equivalent to the monotonicity of the annealed FP potential for a broad family of Gaussian additive models (GAMs) with signal-to-noise ratio $λ$. In particular, subject to a low-degree conjecture for GAMs, our results imply that the polynomial-time limits of these models are directly implied by the monotonicity of the annealed FP potential, in conceptual agreement with predictions from the physics literature dating back to the 1990s.

Search-Driven Clause Learning for Product-State Quantum $k$-SAT (PRODSAT-QSAT)

from arXiv: Computational Complexity

Authors: Samuel González-Castillo, Joon Hyung Lee, Alfons Laarman

We study PRODSAT-QSAT($k$): given rank-one $k$-local projectors, determine whether a quantum $k$-SAT instance admits a satisfying product state. We present a CDCL-style refutation framework that searches a finite partition of each qubit's Bloch sphere while a sound theory solver checks region feasibility using a geometric overapproximation of the projection amplitudes for each constraint. When the theory solver proves that no state in a region can satisfy a constraint, it produces a sound conflict clause that blocks that region; accumulated blocking clauses can yield a global result of product-state unsatisfiability (UN-PRODSAT). We formalise the problem, prove the soundness of the clause-learning rule, and describe a practical algorithm and implementation.

Authors: Samuel González-Castillo, Joon Hyung Lee, Alfons Laarman

We study PRODSAT-QSAT($k$): given rank-one $k$-local projectors, determine whether a quantum $k$-SAT instance admits a satisfying product state. We present a CDCL-style refutation framework that searches a finite partition of each qubit's Bloch sphere while a sound theory solver checks region feasibility using a geometric overapproximation of the projection amplitudes for each constraint. When the theory solver proves that no state in a region can satisfy a constraint, it produces a sound conflict clause that blocks that region; accumulated blocking clauses can yield a global result of product-state unsatisfiability (UN-PRODSAT). We formalise the problem, prove the soundness of the clause-learning rule, and describe a practical algorithm and implementation.