Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Tuesday, March 24

Information Transit Got the Wrong Man

from Ben Recht

The mechanized tradition of peer review and the absurdity of bureaucratic conference review.

In case you hadn’t heard, people are using LLMs to create their peer reviews. I know, you’re as shocked as I am. To their credit, the program committee of the International Conference on Machine Learning (ICML) has been doing things to address the problem. Their attempts reveal the systematic problems here that are unfixable without a dramatic teardown.

Let’s recap the situation, though even typing it out makes me feel like a character in a Terry Gilliam movie.

In November 2025, the PC sent a series of surveys to past ICML reviewers to gauge their sentiment about LLMs in reviewing. They assumed that this list must also include a bunch of authors because they mandated a reciprocal reviewing policy for the 2025 conference, under which all submissions must have had at least one author who agreed to serve as a reviewer. In their final survey, they proposed the following policies for LLM reviewing at future conferences, and asked for preferences:

  • Policy A (Conservative): Use of LLMs for reviewing is strictly prohibited.

  • Policy B (Permissive): Reviewers may input the submission text into privacy-compliant LLMs. However, the assessment of the paper and the writing of the review must not be delegated to LLMs.

500 people received the survey. 74 (15%) answered it. And the survey says!

This plot uses default matplotlib colors, so it must be science. I’m not sure what this measures, but whatever, surveys are the democratic way to make policy, amirite? Since the results weren’t equivocal, they decided to make both policy options available in 2026. That’s democracy.

In the multistage reviewer enrollment process for the 2026 conference, they gave reviewers the option of adhering to either Policy A or Policy B or saying they didn’t care. The people who didn’t care were then assigned to one of the two policies in one of the many absurdly long emails they send you when you participate in these conferences.1

Now here’s where it gets weird. The program committee decided to run a sting operation, watermarking the PDFs to trap people who uploaded them directly into LLMs. You can read about their ornate sting operation here. The details are boring. What’s interesting is their response. If they found that someone assigned to Policy A used LLMs in reviewing, then they rejected all papers of which that person was an author. They had 800 reviews that they flagged as violating policy A. They checked every one of these by hand to avoid false positives. Every. Single. One. Because that’s a good use of human manual labor. They ultimately rejected 500 papers associated with these naughty reviewers.

Everyone is patting themselves on the back about this, and tut-tutting those horrible people who dared violate the sacred random assignment of policy, and thanking the committee for their cleverness and transparency. But come on, this whole thing is absurd. The PC argues:

“​​We hope that by taking strong action against violations of agreed-upon policy we will remind the community that as our field changes rapidly the thing we must protect most actively is our trust in each other. If we cannot adapt our systems in a setting based in trust, we will find that they soon become outdated and meaningless. “

I’m sorry, but it’s already meaningless! ICML received over 33000 submissions. A random subset of 20-25% of these will be approved as “papers acceptable to go on one’s CV.” The process will churn forward. Everyone who attends the conference knows this process is impossibly bad, but the only proposed solutions make the paper-generation process more onerous for humans. This naturally leads people to offload work to LLMs. Next year, people will use watermark detection before they put the LLM into ChatGPT. The wheels of progress will continue rolling.

It’s unfortunate that the natural bureaucratic editorial mode is to assume everyone is cheating and to go on witch hunts to claim progress. The board wrote:

“Conferences must adapt, creating rules and policies to handle the new normal, and taking disciplinary action against those who break the rules and violate the trust that we all place in the review process. “

Psychology took this sort of approach in the 2010s. Though we all got to revel in the high-profile fraud schadenfreude, the field did not come out better for it.

Rejecting 800 of 33000 papers because of possibly inappropriate LLM use, when your LLM use policy is based on the most bizarre, arbitrary decision-making built upon a semblance of objective quantitative social science, is farce. At this point, the AI reviewing process can be nothing but farce. As Kevin Baker succinctly put it in his authoritatively inflectional essay on AI for science:

Systems can persist in dysfunction indefinitely, and absurdity is not self-correcting.

One nice thing about LLMs is that they show us which parts of our systems of intellect are mechanical traditions. LLMs are a good way to stress-test our systems for organizing experience and expertise. We’ll need to be more creative about what we want to do moving forward.

Moving forward requires us to talk more about the point of peer review. Yes, the AI conferences are the most absurd manifestation of this problem, but don’t think that your community is insulated from rampant LLM reviewing. At the Cultural AI conference, Mel Andrews showed us dozens of headlines across academia advocating for LLM review. Arguing that LLM review was better than human review. There are economists launching startups to do this as a service.

Andrews argued that the arguments in favor of LLM reviewing consistently conflated institutional and epistemic concerns. The institutional concerns are well known to us. Reviewing is an enormous burden of unpaid labor that further enriches rent-seeking publication houses, and reviewership is unfairly distributed across academia. The epistemic concerns worry that peer review doesn’t properly weed out invalid papers. At least in the sciences, peer review is supposedly meta-epistemic, judging the validity of papers that aim to get at scientific knowledge, understanding, and explanation. Many studies have found the current state of peer review unfit for this task.

Advocates for LLM peer review argue it solves both problems. Andrews took a hard line, claiming that it can’t solve the epistemic problems. Andrews’ boldest claim is that the relationship of the text generated by LLMs to semantic content and truth is always accidental or incidental. Hence, the mechanical aspects of peer review can only increase confabulation and error. Following tradition means not having to think, but peer review’s epistemic function demands thinking.

I don’t fully endorse Mel’s argument, but it’s a position worth airing and engaging with. By focusing solely on process and rules, tweaks to peer review make it more mechanical. Mechanization only makes LLMs better suited for the job. If epistemic cultivation of expertise and experience demands something beyond tradition, then more complex systems of checks and balances only stifle it.

Program committees in computer science used to be small groups of people who met in person to discuss every paper that would be presented at a conference. They are now ministries of truth that haruspicate the statistics of poorly designed surveys and build ornate policies for the masses. Program committees have become bureaucrats of the state, and they are forced to see like it. The bureaucratization of academic work product threatens its very epistemic nature. Perhaps the fix has to arise from spontaneous order.

Subscribe now

1

Here’s an email I received this morning asking me to be an area chair for one of the other big conferences, NeuRIPS. My AI detector flagged this email as “highly likely AI generated.”

By Ben Recht

Topological Collapse: P = NP Implies #P = FP via Solution-Space Homology

from arXiv: Computational Complexity

Authors: M. Alasli

We prove that P = NP implies #P = FP by exploiting the topological structure of 3SAT solution spaces. The argument proceeds via a dichotomy: any polynomial-time algorithm for 3SAT either operates without global knowledge of the solution-space topology, in which case it cannot certify unsatisfiability for instances with second Betti number b_2 = 2^{Omega(N)} (leading to contradiction), or it computes global topological invariants, which are #P-hard. As local information is provably insufficient and any useful global invariant is #P-hard, the dichotomy is exhaustive. The proof is non-relativizing, consistent with oracles separating P = NP from #P = FP, and therefore necessarily exploits non-oracle properties of computation. Combined with Toda's theorem, the result yields P = NP => #P = FP => PH = P, providing new structural evidence for P != NP via a topological mechanism. We complement the theoretical framework with empirical validation of solution-space shattering at scale (N up to 500), demonstrating that these topological barriers manifest as measurable hardness across five independent algorithm classes.

Authors: M. Alasli

We prove that P = NP implies #P = FP by exploiting the topological structure of 3SAT solution spaces. The argument proceeds via a dichotomy: any polynomial-time algorithm for 3SAT either operates without global knowledge of the solution-space topology, in which case it cannot certify unsatisfiability for instances with second Betti number b_2 = 2^{Omega(N)} (leading to contradiction), or it computes global topological invariants, which are #P-hard. As local information is provably insufficient and any useful global invariant is #P-hard, the dichotomy is exhaustive. The proof is non-relativizing, consistent with oracles separating P = NP from #P = FP, and therefore necessarily exploits non-oracle properties of computation. Combined with Toda's theorem, the result yields P = NP => #P = FP => PH = P, providing new structural evidence for P != NP via a topological mechanism. We complement the theoretical framework with empirical validation of solution-space shattering at scale (N up to 500), demonstrating that these topological barriers manifest as measurable hardness across five independent algorithm classes.

The color code, the surface code, and the transversal CNOT: NP-hardness of minimum-weight decoding

from arXiv: Computational Complexity

Authors: Shouzhen Gu, Lily Wang, Aleksander Kubica

The decoding problem is a ubiquitous algorithmic task in fault-tolerant quantum computing, and solving it efficiently is essential for scalable quantum computing. Here, we prove that minimum-weight decoding is NP-hard in three quintessential settings: (i) the color code with Pauli $Z$ errors, (ii) the surface code with Pauli $X$, $Y$ and $Z$ errors, and (iii) the surface code with a transversal CNOT gate, Pauli $Z$ and measurement bit-flip errors. Our results show that computational intractability already arises in basic and practically relevant decoding problems central to both quantum memories and logical circuit implementations, highlighting a sharp computational complexity separation between minimum-weight decoding and its approximate realizations.

Authors: Shouzhen Gu, Lily Wang, Aleksander Kubica

The decoding problem is a ubiquitous algorithmic task in fault-tolerant quantum computing, and solving it efficiently is essential for scalable quantum computing. Here, we prove that minimum-weight decoding is NP-hard in three quintessential settings: (i) the color code with Pauli $Z$ errors, (ii) the surface code with Pauli $X$, $Y$ and $Z$ errors, and (iii) the surface code with a transversal CNOT gate, Pauli $Z$ and measurement bit-flip errors. Our results show that computational intractability already arises in basic and practically relevant decoding problems central to both quantum memories and logical circuit implementations, highlighting a sharp computational complexity separation between minimum-weight decoding and its approximate realizations.

The Descriptive Complexity of Relation Modification Problems

from arXiv: Computational Complexity

Authors: Florian Chudigiewitsch, Marlene Gründel, Christian Komusiewicz, Nils Morawietz, Till Tantau

A relation modification problem gets a logical structure and a natural number k as input and asks whether k modifications of the structure suffice to make it satisfy a predefined property. We provide a complete classification of the classical and parameterized complexity of relation modification problems - the latter w. r. t. the modification budget k - based on the descriptive complexity of the respective target property. We consider different types of logical structures on which modifications are performed: Whereas monadic structures and undirected graphs without self-loops each yield their own complexity landscapes, we find that modifying undirected graphs with self-loops, directed graphs, or arbitrary logical structures is equally hard w. r. t. quantifier patterns. Moreover, we observe that all classes of problems considered in this paper are subject to a strong dichotomy in the sense that they are either very easy to solve (that is, they lie in paraAC^{0\uparrow} or TC^0) or intractable (that is, they contain W[2]-hard or NP-hard problems).

Authors: Florian Chudigiewitsch, Marlene Gründel, Christian Komusiewicz, Nils Morawietz, Till Tantau

A relation modification problem gets a logical structure and a natural number k as input and asks whether k modifications of the structure suffice to make it satisfy a predefined property. We provide a complete classification of the classical and parameterized complexity of relation modification problems - the latter w. r. t. the modification budget k - based on the descriptive complexity of the respective target property. We consider different types of logical structures on which modifications are performed: Whereas monadic structures and undirected graphs without self-loops each yield their own complexity landscapes, we find that modifying undirected graphs with self-loops, directed graphs, or arbitrary logical structures is equally hard w. r. t. quantifier patterns. Moreover, we observe that all classes of problems considered in this paper are subject to a strong dichotomy in the sense that they are either very easy to solve (that is, they lie in paraAC^{0\uparrow} or TC^0) or intractable (that is, they contain W[2]-hard or NP-hard problems).

Critical window for approximate counting in dense Ising models

from arXiv: Computational Complexity

Authors: Andreas Galanis, Daniel Stefankovic, Eric Vigoda

We study the complexity of approximating the partition function of dense Ising models in the critical regime. Recent work of Chen, Chen, Yin, and Zhang (FOCS 2025) established fast mixing at criticality, and even beyond criticality in a window of width $N^{-1/2}$. We complement these algorithmic results by proving nearly tight hardness bounds, thus yielding the first instance of a sharp scaling window for the computational complexity of approximate counting. Specifically, for the dense Ising model we show that approximating the partition function is computationally hard within a window of width $N^{-1/2+\varepsilon}$ for any constant $\varepsilon>0$. Standard hardness reductions for non-critical regimes break down at criticality due to bigger fluctuations in the underlying gadgets, leading to suboptimal bounds. We overcome this barrier via a global approach which aggregates fluctuations across all gadgets rather than requiring tight concentration guarantees for each individually. This new approach yields the optimal exponent for the critical window.

Authors: Andreas Galanis, Daniel Stefankovic, Eric Vigoda

We study the complexity of approximating the partition function of dense Ising models in the critical regime. Recent work of Chen, Chen, Yin, and Zhang (FOCS 2025) established fast mixing at criticality, and even beyond criticality in a window of width $N^{-1/2}$. We complement these algorithmic results by proving nearly tight hardness bounds, thus yielding the first instance of a sharp scaling window for the computational complexity of approximate counting. Specifically, for the dense Ising model we show that approximating the partition function is computationally hard within a window of width $N^{-1/2+\varepsilon}$ for any constant $\varepsilon>0$. Standard hardness reductions for non-critical regimes break down at criticality due to bigger fluctuations in the underlying gadgets, leading to suboptimal bounds. We overcome this barrier via a global approach which aggregates fluctuations across all gadgets rather than requiring tight concentration guarantees for each individually. This new approach yields the optimal exponent for the critical window.

Classification of Non-redundancy of Boolean Predicates of Arity 4

from arXiv: Computational Complexity

Authors: Joshua Brakensiek, Venkatesan Guruswami, Aaron Putterman

Given a constraint satisfaction problem (CSP) predicate $P \subseteq D^r$, the non-redundancy (NRD) of $P$ is maximum-sized instance on $n$ variables such that for every clause of the instance, there is an assignment which satisfies all but that clause. The study of NRD for various CSPs is an active area of research which combines ideas from extremal combinatorics, logic, lattice theory, and other techniques. Complete classifications are known in the cases $r=2$ and $(|D|=2, r=3)$. In this paper, we give a near-complete classification of the case $(|D|=2, r=4)$. Of the 400 distinct non-trivial Boolean predicates of arity 4, we implement an algorithmic procedure which perfectly classifies 397 of them. Of the remaining three, we solve two by reducing to extremal combinatorics problems -- leaving the last one as an open question. Along the way, we identify the first Boolean predicate whose non-redundancy asymptotics are non-polynomial.

Authors: Joshua Brakensiek, Venkatesan Guruswami, Aaron Putterman

Given a constraint satisfaction problem (CSP) predicate $P \subseteq D^r$, the non-redundancy (NRD) of $P$ is maximum-sized instance on $n$ variables such that for every clause of the instance, there is an assignment which satisfies all but that clause. The study of NRD for various CSPs is an active area of research which combines ideas from extremal combinatorics, logic, lattice theory, and other techniques. Complete classifications are known in the cases $r=2$ and $(|D|=2, r=3)$. In this paper, we give a near-complete classification of the case $(|D|=2, r=4)$. Of the 400 distinct non-trivial Boolean predicates of arity 4, we implement an algorithmic procedure which perfectly classifies 397 of them. Of the remaining three, we solve two by reducing to extremal combinatorics problems -- leaving the last one as an open question. Along the way, we identify the first Boolean predicate whose non-redundancy asymptotics are non-polynomial.

Flip Distance of Non-Crossing Spanning Trees: NP-Hardness and Improved Bounds

from arXiv: Computational Geometry

Authors: Håvard Bakke Bjerkevik, Joseph Dorfer, Linda Kleist, Torsten Ueckerdt, Birgit Vogtenhuber

We consider the problem of reconfiguring non-crossing spanning trees on point sets. For a set $P$ of $n$ points in general position in the plane, the flip graph $F(P)$ has a vertex for each non-crossing spanning tree on $P$ and an edge between any two spanning trees that can be transformed into each other by the exchange of a single edge. This flip graph has been intensively studied, lately with an emphasis on determining its diameter diam$(F(P))$ for sets $P$ of $n$ points in convex position. The current best bounds are $\frac{14}{9}n-O(1) \leq$ diam$(F(P))<\frac{15}{9}n-3$ [Bjerkevik, Kleist, Ueckerdt, and Vogtenhuber; SODA 2025]. The crucial tool for both the upper and lower bound are so-called *conflict graphs*, which the authors stated might be the key ingredient for determining the diameter (up to lower-order terms). In this paper, we pick up the concept of conflict graphs and show that this tool is even more versatile than previously hoped. As our first main result, we use conflict graphs to show that computing the flip distance between two non-crossing spanning trees is NP-hard, even for point sets in convex position. Interestingly, the result still holds for more constrained flip operations, concretely, compatible flips (where the removed and the added edge do not cross) and rotations (where the removed and the added edge share an endpoint). Extending the line of research from [BKUV SODA25], we present new insights on the diameter of the flip graph. Their lower bound is based on a constant-size pair of trees, one of which is *stacked*. We show that if one of the trees is stacked, then the lower bound is indeed optimal up to a constant term, that is, there exists a flip sequence of length at most $\frac{14}{9}(n-1)$ to any other tree. Lastly, we improve the lower bound on the diameter of the flip graph $F(P)$ for $n$ points in convex position to $\frac{11}{7}n-o(n)$.

Authors: Håvard Bakke Bjerkevik, Joseph Dorfer, Linda Kleist, Torsten Ueckerdt, Birgit Vogtenhuber

We consider the problem of reconfiguring non-crossing spanning trees on point sets. For a set $P$ of $n$ points in general position in the plane, the flip graph $F(P)$ has a vertex for each non-crossing spanning tree on $P$ and an edge between any two spanning trees that can be transformed into each other by the exchange of a single edge. This flip graph has been intensively studied, lately with an emphasis on determining its diameter diam$(F(P))$ for sets $P$ of $n$ points in convex position. The current best bounds are $\frac{14}{9}n-O(1) \leq$ diam$(F(P))<\frac{15}{9}n-3$ [Bjerkevik, Kleist, Ueckerdt, and Vogtenhuber; SODA 2025]. The crucial tool for both the upper and lower bound are so-called *conflict graphs*, which the authors stated might be the key ingredient for determining the diameter (up to lower-order terms). In this paper, we pick up the concept of conflict graphs and show that this tool is even more versatile than previously hoped. As our first main result, we use conflict graphs to show that computing the flip distance between two non-crossing spanning trees is NP-hard, even for point sets in convex position. Interestingly, the result still holds for more constrained flip operations, concretely, compatible flips (where the removed and the added edge do not cross) and rotations (where the removed and the added edge share an endpoint). Extending the line of research from [BKUV SODA25], we present new insights on the diameter of the flip graph. Their lower bound is based on a constant-size pair of trees, one of which is *stacked*. We show that if one of the trees is stacked, then the lower bound is indeed optimal up to a constant term, that is, there exists a flip sequence of length at most $\frac{14}{9}(n-1)$ to any other tree. Lastly, we improve the lower bound on the diameter of the flip graph $F(P)$ for $n$ points in convex position to $\frac{11}{7}n-o(n)$.

Separators for intersection graphs of spheres

from arXiv: Computational Geometry

Authors: Jacob Fox, Jonathan Tidor

We prove the existence of optimal separators for intersection graphs of balls and spheres in any dimension $d$. One of our results is that if an intersection graph of $n$ spheres in $\mathbb{R}^d$ has $m$ edges, then it contains a balanced separator of size $O_d(m^{1/d}n^{1-2/d})$. This bound is best possible in terms of the parameters involved. The same result holds if the balls and spheres are replaced by fat convex bodies and their boundaries.

Authors: Jacob Fox, Jonathan Tidor

We prove the existence of optimal separators for intersection graphs of balls and spheres in any dimension $d$. One of our results is that if an intersection graph of $n$ spheres in $\mathbb{R}^d$ has $m$ edges, then it contains a balanced separator of size $O_d(m^{1/d}n^{1-2/d})$. This bound is best possible in terms of the parameters involved. The same result holds if the balls and spheres are replaced by fat convex bodies and their boundaries.

Online Packing of Orthogonal Polygons

from arXiv: Computational Geometry

Authors: Tim Gerlach, Benjamin Hennies, Linda Kleist

While rectangular and box-shaped objects dominate the classic discourse of theoretic investigations, a fascinating frontier lies in packing more complex shapes. Given recent insights that convex polygons do not allow for constant competitive online algorithms for diverse variants under translation, we study orthogonal polygons, in particular of small complexity. For translational packings of orthogonal 6-gons, we show that the competitive ratio of any online algorithm that aims to pack the items into a minimal number of unit bins is in $Ω(n / \log n)$, where $n$ denotes the number of objects. In contrast, we show that constant competitive algorithms exist when the orthogonal 6-gons are symmetric or small. For (orthogonally convex) orthogonal 8-gons, we show that the trivial $n$-competitive algorithm, which places each item in its own bin, is best-possible, i.e., every online algorithm has an asymptotic competitive ratio of at least $n$. This implies that for general orthogonal polygons, the trivial algorithm is best possible. Interestingly, for packing degenerate orthogonal polygons (with thickness $0$), called skeletons, the change in complexity is even more drastic. While constant competitive algorithms for 6-skeletons exist, no online algorithm for 8-skeletons achieves a competitive ratio better than $n$. For other packing variants of orthogonal 6-gons under translation, our insights imply the following consequences. The asymptotic competitive ratio of any online algorithm is in $Ω(n / \log n)$ for strip packing, and there exist online algorithms with competitive ratios in $O(1)$ for perimeter packing, or in $O(\sqrt{n})$ for minimizing the area of the bounding box. Moreover, the critical packing density is positive (if every object individually fits into the interior of a unit bin).

Authors: Tim Gerlach, Benjamin Hennies, Linda Kleist

While rectangular and box-shaped objects dominate the classic discourse of theoretic investigations, a fascinating frontier lies in packing more complex shapes. Given recent insights that convex polygons do not allow for constant competitive online algorithms for diverse variants under translation, we study orthogonal polygons, in particular of small complexity. For translational packings of orthogonal 6-gons, we show that the competitive ratio of any online algorithm that aims to pack the items into a minimal number of unit bins is in $Ω(n / \log n)$, where $n$ denotes the number of objects. In contrast, we show that constant competitive algorithms exist when the orthogonal 6-gons are symmetric or small. For (orthogonally convex) orthogonal 8-gons, we show that the trivial $n$-competitive algorithm, which places each item in its own bin, is best-possible, i.e., every online algorithm has an asymptotic competitive ratio of at least $n$. This implies that for general orthogonal polygons, the trivial algorithm is best possible. Interestingly, for packing degenerate orthogonal polygons (with thickness $0$), called skeletons, the change in complexity is even more drastic. While constant competitive algorithms for 6-skeletons exist, no online algorithm for 8-skeletons achieves a competitive ratio better than $n$. For other packing variants of orthogonal 6-gons under translation, our insights imply the following consequences. The asymptotic competitive ratio of any online algorithm is in $Ω(n / \log n)$ for strip packing, and there exist online algorithms with competitive ratios in $O(1)$ for perimeter packing, or in $O(\sqrt{n})$ for minimizing the area of the bounding box. Moreover, the critical packing density is positive (if every object individually fits into the interior of a unit bin).

Bollobás-Meir TSP Conjecture Holds Asymptotically

from arXiv: Computational Geometry

Authors: Alexey Gordeev

In 1992, Bollobás and Meir showed that for every $k \geq 1$ there exists a constant $c_k$ such that, for any $n$ points in the $k$-dimensional unit cube $[0, 1]^k$, one can find a tour $x_1, \dots, x_n$ through these $n$ points with $\sum_{i = 1}^n |x_i - x_{i + 1}|^k \leq c_k$, where $x_{n + 1} = x_1$ and $|x - y|$ is the Euclidean distance between $x$ and $y$. Remarkably, this bound does not depend on $n$, the number of points. They conjectured that the optimal constant is $c_k = 2 \cdot k^{k / 2}$ and showed that it cannot be taken lower than that. This conjecture was recently revised for $k = 3$ by Balogh, Clemen and Dumitrescu, who showed that $c_3 \geq 2^{7/2} > 2 \cdot 3^{3/2}$. It remains open for all $k > 2$, with the best known upper bound $c_k \leq 2.65^k \cdot k^{k / 2} \cdot (1 + o_k(1))$. We significantly narrow the gap between lower and upper bounds on $c_k$, reducing it from exponential to linear. Specifically, we prove that $c_k \leq 2\mathrm{e}(k + 1) \cdot k^{k / 2}$ and $c_k = k^{k / 2} \cdot (2 + o_k(1))$, the latter establishing the conjecture asymptotically. We also obtain analogous results for related problems on Hamiltonian paths, spanning trees and perfect matchings in the unit cube. Our main tool is a new generalization of the ball packing argument used in earlier works.

Authors: Alexey Gordeev

In 1992, Bollobás and Meir showed that for every $k \geq 1$ there exists a constant $c_k$ such that, for any $n$ points in the $k$-dimensional unit cube $[0, 1]^k$, one can find a tour $x_1, \dots, x_n$ through these $n$ points with $\sum_{i = 1}^n |x_i - x_{i + 1}|^k \leq c_k$, where $x_{n + 1} = x_1$ and $|x - y|$ is the Euclidean distance between $x$ and $y$. Remarkably, this bound does not depend on $n$, the number of points. They conjectured that the optimal constant is $c_k = 2 \cdot k^{k / 2}$ and showed that it cannot be taken lower than that. This conjecture was recently revised for $k = 3$ by Balogh, Clemen and Dumitrescu, who showed that $c_3 \geq 2^{7/2} > 2 \cdot 3^{3/2}$. It remains open for all $k > 2$, with the best known upper bound $c_k \leq 2.65^k \cdot k^{k / 2} \cdot (1 + o_k(1))$. We significantly narrow the gap between lower and upper bounds on $c_k$, reducing it from exponential to linear. Specifically, we prove that $c_k \leq 2\mathrm{e}(k + 1) \cdot k^{k / 2}$ and $c_k = k^{k / 2} \cdot (2 + o_k(1))$, the latter establishing the conjecture asymptotically. We also obtain analogous results for related problems on Hamiltonian paths, spanning trees and perfect matchings in the unit cube. Our main tool is a new generalization of the ball packing argument used in earlier works.

Triangulating a Polygon with Holes in Optimal (Deterministic) Time

from arXiv: Computational Geometry

Authors: Timothy M. Chan

We consider the problem of triangulating a polygon with $n$ vertices and $h$ holes, or relatedly the problem of computing the trapezoidal decomposition of a collection of $h$ disjoint simple polygonal chains with $n$ vertices total. Clarkson, Cole, and Tarjan (1992) and Seidel (1991) gave randomized algorithms running in $O(n\log^*n + h\log h)$ time, while Bar-Yehuda and Chazelle (1994) described deterministic algorithms running in $O(n+h\log^{1+\varepsilon}h)$ or $O((n+h\log h)\log\log h)$ time, for an arbitrarily small positive constant $\varepsilon$. No improvements have been reported since. We describe a new $O(n + h\log h)$-time algorithm, which is optimal and deterministic. More generally, when the given polygonal chains are not necessarily simple and may intersect each other, we show how to compute their trapezoidal decomposition (and in particular, compute all intersections) in optimal $O(n + h\log h)$ deterministic time when the number of intersections is at most $n^{1-\varepsilon}$. To obtain these results, Chazelle's linear-time algorithm for triangulating a simple polygon is used as a black box.

Authors: Timothy M. Chan

We consider the problem of triangulating a polygon with $n$ vertices and $h$ holes, or relatedly the problem of computing the trapezoidal decomposition of a collection of $h$ disjoint simple polygonal chains with $n$ vertices total. Clarkson, Cole, and Tarjan (1992) and Seidel (1991) gave randomized algorithms running in $O(n\log^*n + h\log h)$ time, while Bar-Yehuda and Chazelle (1994) described deterministic algorithms running in $O(n+h\log^{1+\varepsilon}h)$ or $O((n+h\log h)\log\log h)$ time, for an arbitrarily small positive constant $\varepsilon$. No improvements have been reported since. We describe a new $O(n + h\log h)$-time algorithm, which is optimal and deterministic. More generally, when the given polygonal chains are not necessarily simple and may intersect each other, we show how to compute their trapezoidal decomposition (and in particular, compute all intersections) in optimal $O(n + h\log h)$ deterministic time when the number of intersections is at most $n^{1-\varepsilon}$. To obtain these results, Chazelle's linear-time algorithm for triangulating a simple polygon is used as a black box.

Computing the Girth of a Segment Intersection Graph

from arXiv: Computational Geometry

Authors: Timothy M. Chan, Yuancheng Yu

We present an algorithm that computes the girth of the intersection graph of $n$ given line segments in the plane in $O(n^{1.483})$ expected time. This is the first such algorithm with $O(n^{3/2-\varepsilon})$ running time for a positive constant $\varepsilon$, and makes progress towards an open question posed by Chan (SODA 2023). The main techniques include (i)~the usage of recent subcubic algorithms for bounded-difference min-plus matrix multiplication, and (ii)~an interesting variant of the planar graph separator theorem. The result extends to intersection graphs of connected algebraic curves or semialgebraic sets of constant description complexity.

Authors: Timothy M. Chan, Yuancheng Yu

We present an algorithm that computes the girth of the intersection graph of $n$ given line segments in the plane in $O(n^{1.483})$ expected time. This is the first such algorithm with $O(n^{3/2-\varepsilon})$ running time for a positive constant $\varepsilon$, and makes progress towards an open question posed by Chan (SODA 2023). The main techniques include (i)~the usage of recent subcubic algorithms for bounded-difference min-plus matrix multiplication, and (ii)~an interesting variant of the planar graph separator theorem. The result extends to intersection graphs of connected algebraic curves or semialgebraic sets of constant description complexity.

A Fast Quasi-Linear Heuristic for the Close-Enough Traveling Salesman Problem

from arXiv: Computational Geometry

Authors: Khoi Duong

We introduce a fast, quasi-linear-time heuristic for the Close-Enough Traveling Salesman Problem (CETSP), a continuous generalization of the Euclidean TSP in which each target is a disk that must be intersected. The method adapts the pair-center clustering paradigm to circular neighborhoods: a hierarchical clustering phase merges nearby disks into proxy circles using an R*-tree for efficient spatial queries, and a construction phase incrementally expands the hierarchy into a feasible tour while maintaining and locally optimizing tour points. Lightweight local improvements, selective reinsertion and constrained point reoptimization, reduce local inefficiencies without compromising scalability. The algorithm runs in expected O(n log n) time and, on benchmark instances reconstructed from the Mennell dataset, produces solutions within roughly 0-2% of state-of-the-art best-known values while requiring orders-of-magnitude less runtime than population-based metaheuristics. The approach trades some final-solution optimality for dramatic gains in speed and scalability, making it suitable for very large CETSP instances.

Authors: Khoi Duong

We introduce a fast, quasi-linear-time heuristic for the Close-Enough Traveling Salesman Problem (CETSP), a continuous generalization of the Euclidean TSP in which each target is a disk that must be intersected. The method adapts the pair-center clustering paradigm to circular neighborhoods: a hierarchical clustering phase merges nearby disks into proxy circles using an R*-tree for efficient spatial queries, and a construction phase incrementally expands the hierarchy into a feasible tour while maintaining and locally optimizing tour points. Lightweight local improvements, selective reinsertion and constrained point reoptimization, reduce local inefficiencies without compromising scalability. The algorithm runs in expected O(n log n) time and, on benchmark instances reconstructed from the Mennell dataset, produces solutions within roughly 0-2% of state-of-the-art best-known values while requiring orders-of-magnitude less runtime than population-based metaheuristics. The approach trades some final-solution optimality for dramatic gains in speed and scalability, making it suitable for very large CETSP instances.

Shadoks Approach to Parallel Reconfiguration of Triangulations

from arXiv: Computational Geometry

Authors: Guilherme D. da Fonseca, Fabien Feschet, Yan Gerard

We describe the methods used by Team Shadoks to win the CG:SHOP 2026 Challenge on parallel reconfiguration of planar triangulations. An instance is a collection of triangulations of a common point set. We must select a center triangulation and find short parallel-flip paths from each input triangulation to the center, minimizing the sum of path lengths. Our approach combines exact methods based on SAT with several greedy heuristics, and also makes use of SAT and MaxSAT for solution improvement. We present a SAT encoding for bounded-length paths and a global formulation for fixed path-length vectors. We discuss how these components interact in practice and summarize the performance of our solvers on the benchmark instances.

Authors: Guilherme D. da Fonseca, Fabien Feschet, Yan Gerard

We describe the methods used by Team Shadoks to win the CG:SHOP 2026 Challenge on parallel reconfiguration of planar triangulations. An instance is a collection of triangulations of a common point set. We must select a center triangulation and find short parallel-flip paths from each input triangulation to the center, minimizing the sum of path lengths. Our approach combines exact methods based on SAT with several greedy heuristics, and also makes use of SAT and MaxSAT for solution improvement. We present a SAT encoding for bounded-length paths and a global formulation for fixed path-length vectors. We discuss how these components interact in practice and summarize the performance of our solvers on the benchmark instances.

Approximating Convex Hulls via Range Queries

from arXiv: Computational Geometry

Authors: T. Schibler, J. Xue, J. Zhu

Recently, motivated by the rapid increase of the data size in various applications, Monemizadeh [APPROX'23] and Driemel, Monemizadeh, Oh, Staals, and Woodruff [SoCG'25] studied geometric problems in the setting where the only access to the input point set is via querying a range-search oracle. Algorithms in this setting are evaluated on two criteria: (i) the number of queries to the oracle and (ii) the error of the output. In this paper, we continue this line of research and investigate one of the most fundamental geometric problems in the oracle setting, i.e., the convex hull problem. Let $P$ be an unknown set of points in $[0,1]^d$ equipped with a range-emptiness oracle. Via querying the oracle, the algorithm is supposed to output a convex polygon $C \subseteq [0,1]^d$ as an estimation of the convex hull $CH(P)$ of $P$. The error of the output is defined as the volume of the symmetric difference $C \oplus CH(P) = (C \backslash CH(P)) \cup (CH(P) \backslash C)$. We prove tight and near-tight tradeoffs between the number of queries and the error of the output for different variants of the problem, depending on the type of the range-emptiness queries and whether the queries are non-adaptive or adaptive. - Orthogonal emptiness queries in $d$-dimensional space: We show that the minimum error a deterministic algorithm can achieve with $q$ queries is $Θ(q^{-1/d})$ if the queries are non-adaptive, and $Θ(q^{-1/(d-1)})$ if the queries are adaptive. In particular, in 2D, the bounds are $Θ(1/\sqrt{q})$ and $Θ(1/q)$ for non-adaptive and adaptive queries, respectively. - Halfplane emptiness queries in 2D: We show that the minimum error a deterministic algorithm can achieve with $q$ queries is $Θ(1/\sqrt{q})$ if the queries are non-adaptive, and $\widetildeΘ(1/q^2)$ if the queries are adaptive. Here $\widetildeΘ(\cdot)$ hides logarithmic factors.

Authors: T. Schibler, J. Xue, J. Zhu

Recently, motivated by the rapid increase of the data size in various applications, Monemizadeh [APPROX'23] and Driemel, Monemizadeh, Oh, Staals, and Woodruff [SoCG'25] studied geometric problems in the setting where the only access to the input point set is via querying a range-search oracle. Algorithms in this setting are evaluated on two criteria: (i) the number of queries to the oracle and (ii) the error of the output. In this paper, we continue this line of research and investigate one of the most fundamental geometric problems in the oracle setting, i.e., the convex hull problem. Let $P$ be an unknown set of points in $[0,1]^d$ equipped with a range-emptiness oracle. Via querying the oracle, the algorithm is supposed to output a convex polygon $C \subseteq [0,1]^d$ as an estimation of the convex hull $CH(P)$ of $P$. The error of the output is defined as the volume of the symmetric difference $C \oplus CH(P) = (C \backslash CH(P)) \cup (CH(P) \backslash C)$. We prove tight and near-tight tradeoffs between the number of queries and the error of the output for different variants of the problem, depending on the type of the range-emptiness queries and whether the queries are non-adaptive or adaptive. - Orthogonal emptiness queries in $d$-dimensional space: We show that the minimum error a deterministic algorithm can achieve with $q$ queries is $Θ(q^{-1/d})$ if the queries are non-adaptive, and $Θ(q^{-1/(d-1)})$ if the queries are adaptive. In particular, in 2D, the bounds are $Θ(1/\sqrt{q})$ and $Θ(1/q)$ for non-adaptive and adaptive queries, respectively. - Halfplane emptiness queries in 2D: We show that the minimum error a deterministic algorithm can achieve with $q$ queries is $Θ(1/\sqrt{q})$ if the queries are non-adaptive, and $\widetildeΘ(1/q^2)$ if the queries are adaptive. Here $\widetildeΘ(\cdot)$ hides logarithmic factors.

A Dividing Line for Structural Kernelization of Component Order Connectivity via Distance to Bounded Pathwidth

from arXiv: Data Structures and Algorithms

Authors: Jakob Greilhuber, Roohani Sharma

In this work we study a classic generalization of the Vertex Cover (VC) problem, called the Component Order Connectivity (COC) problem. In COC, given an undirected graph $G$, integers $d \geq 1$ and $k$, the goal is to determine if there is a set of at most $k$ vertices whose deletion results in a graph where each connected component has at most $d$ vertices. When $d=1$, this is exactly VC. This work is inspired by polynomial kernelization results with respect to structural parameters for VC. On one hand, Jansen & Bodlaender [TOCS 2013] show that VC admits a polynomial kernel when the parameter is the distance to treewidth-$1$ graphs, on the other hand Cygan, Lokshtanov, Pilipczuk, Pilipczuk & Saurabh [TOCS 2014] showed that VC does not admit a polynomial kernel when the parameter is distance to treewidth-$2$ graphs. Greilhuber & Sharma [IPEC 2024] showed that, for any $d \geq 2$, $d$-COC cannot admit a polynomial kernel when the parameter is distance to a forest of pathwidth $2$. Here, $d$-COC is the same as COC only that $d$ is a fixed constant not part of the input. We complement this result and show that like for the VC problem where distance to treewidth-$1$ graphs versus distance to treewidth-$2$ graphs is the dividing line between structural parameterizations that allow and respectively disallow polynomial kernelization, for COC this dividing line happens between distance to pathwidth-$1$ graphs and distance to pathwidth-$2$ graphs. The main technical result of this work is that COC admits a polynomial kernel parameterized by distance to pathwidth-$1$ graphs plus $d$.

Authors: Jakob Greilhuber, Roohani Sharma

In this work we study a classic generalization of the Vertex Cover (VC) problem, called the Component Order Connectivity (COC) problem. In COC, given an undirected graph $G$, integers $d \geq 1$ and $k$, the goal is to determine if there is a set of at most $k$ vertices whose deletion results in a graph where each connected component has at most $d$ vertices. When $d=1$, this is exactly VC. This work is inspired by polynomial kernelization results with respect to structural parameters for VC. On one hand, Jansen & Bodlaender [TOCS 2013] show that VC admits a polynomial kernel when the parameter is the distance to treewidth-$1$ graphs, on the other hand Cygan, Lokshtanov, Pilipczuk, Pilipczuk & Saurabh [TOCS 2014] showed that VC does not admit a polynomial kernel when the parameter is distance to treewidth-$2$ graphs. Greilhuber & Sharma [IPEC 2024] showed that, for any $d \geq 2$, $d$-COC cannot admit a polynomial kernel when the parameter is distance to a forest of pathwidth $2$. Here, $d$-COC is the same as COC only that $d$ is a fixed constant not part of the input. We complement this result and show that like for the VC problem where distance to treewidth-$1$ graphs versus distance to treewidth-$2$ graphs is the dividing line between structural parameterizations that allow and respectively disallow polynomial kernelization, for COC this dividing line happens between distance to pathwidth-$1$ graphs and distance to pathwidth-$2$ graphs. The main technical result of this work is that COC admits a polynomial kernel parameterized by distance to pathwidth-$1$ graphs plus $d$.

Stable Algorithms Lower Bounds for Estimation

from arXiv: Data Structures and Algorithms

Authors: Xifan Yu, Ilias Zadik

In this work, we show that for all statistical estimation problems, a natural MMSE instability (discontinuity) condition implies the failure of stable algorithms, serving as a version of OGP for estimation tasks. Using this criterion, we establish separations between stable and polynomial-time algorithms for the following MMSE-unstable tasks (i) Planted Shortest Path, where Dijkstra's algorithm succeeds, (ii) random Parity Codes, where Gaussian elimination succeeds, and (iii) Gaussian Subset Sum, where lattice-based methods succeed. For all three, we further show that all low-degree polynomials are stable, yielding separations against low-degree methods and a new method to bound the low-degree MMSE. In particular, our technique highlights that MMSE instability is a common feature for Shortest Path and the noiseless Parity Codes and Gaussian subset sum. Last, we highlight that our work places rigorous algorithmic footing on the long-standing physics belief that first-order phase transitions--which in this setting translates to MMSE-instability impose fundamental limits on classes of efficient algorithms.

Authors: Xifan Yu, Ilias Zadik

In this work, we show that for all statistical estimation problems, a natural MMSE instability (discontinuity) condition implies the failure of stable algorithms, serving as a version of OGP for estimation tasks. Using this criterion, we establish separations between stable and polynomial-time algorithms for the following MMSE-unstable tasks (i) Planted Shortest Path, where Dijkstra's algorithm succeeds, (ii) random Parity Codes, where Gaussian elimination succeeds, and (iii) Gaussian Subset Sum, where lattice-based methods succeed. For all three, we further show that all low-degree polynomials are stable, yielding separations against low-degree methods and a new method to bound the low-degree MMSE. In particular, our technique highlights that MMSE instability is a common feature for Shortest Path and the noiseless Parity Codes and Gaussian subset sum. Last, we highlight that our work places rigorous algorithmic footing on the long-standing physics belief that first-order phase transitions--which in this setting translates to MMSE-instability impose fundamental limits on classes of efficient algorithms.

On the Complexity of Fundamental Problems for DAG-Compressed Graphs

from arXiv: Data Structures and Algorithms

Authors: Florian Chudigiewitsch, Till Tantau, Felix Winkler

A DAG compression of a (typically dense) graph is a simple data structure that stores how vertex clusters are connected, where the clusters are described indirectly as sets of reachable sinks in a directed acyclic graph (DAG). They generalize tree compressions, where the clusters form a tree-like hierarchy, and we give the first proof that DAG compressions can achieve better compressions than tree compressions. Our interest in DAG compression stems from the fact that several simple standard algorithms, like breadth-first search on graphs, can be implemented so that they work directly on the compressed rather than on the original graph and so that, crucially, the runtime is relative to the (typically small) size of the compressed graph. We add another entry to the list of algorithms where this is possible, by showing that Kruskal's algorithm for computing minimum spanning trees can be adapted to work directly on DAG compressions. On the negative side, we answer the central open problem from previous work, namely how hard it is to compute a minimum-size DAG compression for a given graph: This is NP-hard; and this is even the case for the dynamic setting, where we must update the DAG compression optimally when a single edge is added or deleted in the input graph.

Authors: Florian Chudigiewitsch, Till Tantau, Felix Winkler

A DAG compression of a (typically dense) graph is a simple data structure that stores how vertex clusters are connected, where the clusters are described indirectly as sets of reachable sinks in a directed acyclic graph (DAG). They generalize tree compressions, where the clusters form a tree-like hierarchy, and we give the first proof that DAG compressions can achieve better compressions than tree compressions. Our interest in DAG compression stems from the fact that several simple standard algorithms, like breadth-first search on graphs, can be implemented so that they work directly on the compressed rather than on the original graph and so that, crucially, the runtime is relative to the (typically small) size of the compressed graph. We add another entry to the list of algorithms where this is possible, by showing that Kruskal's algorithm for computing minimum spanning trees can be adapted to work directly on DAG compressions. On the negative side, we answer the central open problem from previous work, namely how hard it is to compute a minimum-size DAG compression for a given graph: This is NP-hard; and this is even the case for the dynamic setting, where we must update the DAG compression optimally when a single edge is added or deleted in the input graph.

Finding Minimum Distance Preservers: A Parameterized Study

from arXiv: Data Structures and Algorithms

Authors: Kirill Simonov, Farehe Soheil, Shaily Verma

For a given graph $G$ and a subset of vertices $S$, a \emph{distance preserver} is a subgraph of $G$ that preserves shortest paths between the vertices of $S$. We distinguish between a \emph{subsetwise} distance preserver, which preserves distances between all pairs in $S$, and a \emph{pairwise} distance preserver, which preserves distances only between specific pairs of vertices in $S$, given in the input. While a large body of work is dedicated to upper and lower bounds on the size of distance preservers and, more generally, graph spanners, the computational complexity of finding the minimum distance preserver has received comparatively little attention. We consider the respective \scup{Subsetwise Distance Preserver}\xspace (\scup{SDP}\xspace) and \scup{Pairwise Distance Preserver}\xspace (\scup{PDP}\xspace) problems and initiate the study of their computational complexity. We provide a detailed complexity landscape with respect to natural parameters, including the number of terminals, solution size, vertex cover, and treewidth. Our main contributions are as follows: \begin{itemize} \setlength{\itemsep}{0.5em} \item Both \scup{PDP}\xspace and \scup{SDP}\xspace are \nph\ even on subgraphs of the grid. Moreover, when parameterized by the number of terminals, the problems are \wh{1}\ on subgraphs of the grid, while they become \textsc{FPT}\ on full grids. \item \scup{PDP}\xspace is \nph\ on graphs of vertex cover $3$, while \scup{SDP}\xspace is \textsc{FPT}\ when parameterized by the vertex cover of the graph. Thus, the vertex cover parameter distinguishes the two variants. \item Both problems are \textsc{FPT}\ when parameterized by the number of terminals and the treewidth of the graph. \end{itemize}

Authors: Kirill Simonov, Farehe Soheil, Shaily Verma

For a given graph $G$ and a subset of vertices $S$, a \emph{distance preserver} is a subgraph of $G$ that preserves shortest paths between the vertices of $S$. We distinguish between a \emph{subsetwise} distance preserver, which preserves distances between all pairs in $S$, and a \emph{pairwise} distance preserver, which preserves distances only between specific pairs of vertices in $S$, given in the input. While a large body of work is dedicated to upper and lower bounds on the size of distance preservers and, more generally, graph spanners, the computational complexity of finding the minimum distance preserver has received comparatively little attention. We consider the respective \scup{Subsetwise Distance Preserver}\xspace (\scup{SDP}\xspace) and \scup{Pairwise Distance Preserver}\xspace (\scup{PDP}\xspace) problems and initiate the study of their computational complexity. We provide a detailed complexity landscape with respect to natural parameters, including the number of terminals, solution size, vertex cover, and treewidth. Our main contributions are as follows: \begin{itemize} \setlength{\itemsep}{0.5em} \item Both \scup{PDP}\xspace and \scup{SDP}\xspace are \nph\ even on subgraphs of the grid. Moreover, when parameterized by the number of terminals, the problems are \wh{1}\ on subgraphs of the grid, while they become \textsc{FPT}\ on full grids. \item \scup{PDP}\xspace is \nph\ on graphs of vertex cover $3$, while \scup{SDP}\xspace is \textsc{FPT}\ when parameterized by the vertex cover of the graph. Thus, the vertex cover parameter distinguishes the two variants. \item Both problems are \textsc{FPT}\ when parameterized by the number of terminals and the treewidth of the graph. \end{itemize}

Charting the Diameter Computation Landscape of Geometric Intersection Graphs in Three Dimensions and Higher

from arXiv: Data Structures and Algorithms

Authors: Timothy M. Chan, Hsien-Chih Chang, Jie Gao, Sándor Kisfaludi-Bak, Hung Le, Da Wei Zheng

Recent research on computing the diameter of geometric intersection graphs has made significant strides, primarily focusing on the 2D case where truly subquadratic-time algorithms were given for simple objects such as unit-disks and (axis-aligned) squares. However, in three or higher dimensions, there is no known truly subquadratic-time algorithm for any intersection graph of non-trivial objects, even basic ones such as unit balls or (axis-aligned) unit cubes. This was partially explained by the pioneering work of Bringmann et al. [SoCG '22] which gave several truly subquadratic lower bounds, notably for unit balls or unit cubes in 3D when the graph diameter $Δ$ is at least $Ω(\log n)$, hinting at a pessimistic outlook for the complexity of the diameter problem in higher dimensions. In this paper, we substantially extend the landscape of diameter computation for objects in three and higher dimensions, giving a few positive results. Our highlighted findings include: - A truly subquadratic-time algorithm for deciding if the diameter of unit cubes in 3D is at most 3 (Diameter-3 hereafter), the first algorithm of its kind for objects in 3D or higher dimensions. Our algorithm is based on a novel connection to pseudolines, which is of independent interest. - A truly subquadratic time lower bound for \Diameter-3 of unit balls in 3D under the Orthogonal Vector (OV) hypothesis, giving the first separation between unit balls and unit cubes in the small diameter regime. Previously, computing the diameter for both objects was known to be truly subquadratic hard when the diameter is $Ω(\log n)$. - A near-linear-time algorithm for Diameter-2 of unit cubes in 3D, generalizing the previous result for unit squares in 2D. - A truly subquadratic-time algorithm and lower bound for Diameter-2 and Diameter-3 of rectangular boxes (of arbitrary dimension and sizes), respectively.

Authors: Timothy M. Chan, Hsien-Chih Chang, Jie Gao, Sándor Kisfaludi-Bak, Hung Le, Da Wei Zheng

Recent research on computing the diameter of geometric intersection graphs has made significant strides, primarily focusing on the 2D case where truly subquadratic-time algorithms were given for simple objects such as unit-disks and (axis-aligned) squares. However, in three or higher dimensions, there is no known truly subquadratic-time algorithm for any intersection graph of non-trivial objects, even basic ones such as unit balls or (axis-aligned) unit cubes. This was partially explained by the pioneering work of Bringmann et al. [SoCG '22] which gave several truly subquadratic lower bounds, notably for unit balls or unit cubes in 3D when the graph diameter $Δ$ is at least $Ω(\log n)$, hinting at a pessimistic outlook for the complexity of the diameter problem in higher dimensions. In this paper, we substantially extend the landscape of diameter computation for objects in three and higher dimensions, giving a few positive results. Our highlighted findings include: - A truly subquadratic-time algorithm for deciding if the diameter of unit cubes in 3D is at most 3 (Diameter-3 hereafter), the first algorithm of its kind for objects in 3D or higher dimensions. Our algorithm is based on a novel connection to pseudolines, which is of independent interest. - A truly subquadratic time lower bound for \Diameter-3 of unit balls in 3D under the Orthogonal Vector (OV) hypothesis, giving the first separation between unit balls and unit cubes in the small diameter regime. Previously, computing the diameter for both objects was known to be truly subquadratic hard when the diameter is $Ω(\log n)$. - A near-linear-time algorithm for Diameter-2 of unit cubes in 3D, generalizing the previous result for unit squares in 2D. - A truly subquadratic-time algorithm and lower bound for Diameter-2 and Diameter-3 of rectangular boxes (of arbitrary dimension and sizes), respectively.

Optimal-Cost Construction of Shallow Cuttings for 3-D Dominance Ranges in the I/O-Model

from arXiv: Data Structures and Algorithms

Authors: Yakov Nekrich, Saladi Rahul

Shallow cuttings are a fundamental tool in computational geometry and spatial databases for solving offline and online range searching problems. For a set $P$ of $N$ points in 3-D, at SODA'14, Afshani and Tsakalidis designed an optimal $O(N\log_2N)$ time algorithm that constructs shallow cuttings for 3-D dominance ranges in internal memory. Even though shallow cuttings are used in the I/O-model to design space and query efficient range searching data structures, an efficient construction of them is not known till now. In this paper, we design an optimal-cost algorithm to construct shallow cuttings for 3-D dominance ranges. The number of I/Os performed by the algorithm is $O\left(\frac{N}{B}\log_{M/B}\left(\frac{N}{B}\right) \right)$, where $B$ is the block size and $M$ is the memory size. As two applications of the optimal-cost construction algorithm, we design fast algorithms for offline 3-D dominance reporting and offline 3-D approximate dominance counting. We believe that our algorithm will find further applications in offline 3-D range searching problems and in improving construction cost of data structures for 3-D range searching problems.

Authors: Yakov Nekrich, Saladi Rahul

Shallow cuttings are a fundamental tool in computational geometry and spatial databases for solving offline and online range searching problems. For a set $P$ of $N$ points in 3-D, at SODA'14, Afshani and Tsakalidis designed an optimal $O(N\log_2N)$ time algorithm that constructs shallow cuttings for 3-D dominance ranges in internal memory. Even though shallow cuttings are used in the I/O-model to design space and query efficient range searching data structures, an efficient construction of them is not known till now. In this paper, we design an optimal-cost algorithm to construct shallow cuttings for 3-D dominance ranges. The number of I/Os performed by the algorithm is $O\left(\frac{N}{B}\log_{M/B}\left(\frac{N}{B}\right) \right)$, where $B$ is the block size and $M$ is the memory size. As two applications of the optimal-cost construction algorithm, we design fast algorithms for offline 3-D dominance reporting and offline 3-D approximate dominance counting. We believe that our algorithm will find further applications in offline 3-D range searching problems and in improving construction cost of data structures for 3-D range searching problems.

Fast Nearest Neighbor Search for $\ell_p$ Metrics

from arXiv: Data Structures and Algorithms

Authors: Robert Krauthgamer, Nir Petruschka

The Nearest Neighbor Search (NNS) problem asks to design a data structure that preprocesses an $n$-point dataset $X$ lying in a metric space $\mathcal{M}$, so that given a query point $q \in \mathcal{M}$, one can quickly return a point of $X$ minimizing the distance to $q$. The efficiency of such a data structure is evaluated primarily by the amount of space it uses and the time required to answer a query. We focus on the fast query-time regime, which is crucial for modern large-scale applications, where datasets are massive and queries must be processed online, and is often modeled by query time $\text{poly}(d \log n)$. Our main result is such a randomized data structure for NNS in $\ell_p$ spaces, $p>2$, that achieves $p^{O(1) + \log\log p}$ approximation with fast query time and $\text{poly}(dn)$ space. Our data structure improves, or is incomparable to, the state-of-the-art for the fast query-time regime from [Bartal and Gottlieb, TCS 2019] and [Krauthgamer, Petruschka and Sapir, FOCS 2025].

Authors: Robert Krauthgamer, Nir Petruschka

The Nearest Neighbor Search (NNS) problem asks to design a data structure that preprocesses an $n$-point dataset $X$ lying in a metric space $\mathcal{M}$, so that given a query point $q \in \mathcal{M}$, one can quickly return a point of $X$ minimizing the distance to $q$. The efficiency of such a data structure is evaluated primarily by the amount of space it uses and the time required to answer a query. We focus on the fast query-time regime, which is crucial for modern large-scale applications, where datasets are massive and queries must be processed online, and is often modeled by query time $\text{poly}(d \log n)$. Our main result is such a randomized data structure for NNS in $\ell_p$ spaces, $p>2$, that achieves $p^{O(1) + \log\log p}$ approximation with fast query time and $\text{poly}(dn)$ space. Our data structure improves, or is incomparable to, the state-of-the-art for the fast query-time regime from [Bartal and Gottlieb, TCS 2019] and [Krauthgamer, Petruschka and Sapir, FOCS 2025].

Optimal-Time Move Structure Balancing and LCP Array Computation from the RLBWT

from arXiv: Data Structures and Algorithms

Authors: Nathaniel K. Brown, Ahsan Sanaullah, Shaojie Zhang, Ben Langmead

On repetitive text collections of size $n$, the Burrows-Wheeler Transform (BWT) tends to have relatively fewer runs $r$ in its run-length encoded BWT (RLBWT). This motivates many RLBWT-related algorithms and data structures that can be designed in compressed $O(r)$-space. These approaches often use the RLBWT-derived permutations LF, FL, $φ$, and $φ^{-1}$, which can be represented using a move structure to obtain optimal $O(1)$-time for each permutation step in $O(r)$-space. They are then used to construct compressed space text indexes supporting efficient pattern matching queries. However, move structure construction in $O(r)$-space requires an $O(r \log r)$-time balancing stage. The longest common prefix array (LCP) of a text collection is used to support pattern matching queries and data structure construction. Recently, it was shown how to compute the LCP array in $O(n + r \log r)$-time and $O(r)$ additional space from an RLBWT. However, the bottleneck remains the $O(r \log r)$-time move structure balancing stage. In this paper, we describe an optimal $O(r)$-time and space algorithm to balance a move structure. This result is then applied to LCP construction from an RLBWT to obtain an optimal $O(n)$-time algorithm in $O(r)$-space in addition to the output, which implies an optimal-time algorithm for LCP array enumeration in compressed $O(r)$-space.

Authors: Nathaniel K. Brown, Ahsan Sanaullah, Shaojie Zhang, Ben Langmead

On repetitive text collections of size $n$, the Burrows-Wheeler Transform (BWT) tends to have relatively fewer runs $r$ in its run-length encoded BWT (RLBWT). This motivates many RLBWT-related algorithms and data structures that can be designed in compressed $O(r)$-space. These approaches often use the RLBWT-derived permutations LF, FL, $φ$, and $φ^{-1}$, which can be represented using a move structure to obtain optimal $O(1)$-time for each permutation step in $O(r)$-space. They are then used to construct compressed space text indexes supporting efficient pattern matching queries. However, move structure construction in $O(r)$-space requires an $O(r \log r)$-time balancing stage. The longest common prefix array (LCP) of a text collection is used to support pattern matching queries and data structure construction. Recently, it was shown how to compute the LCP array in $O(n + r \log r)$-time and $O(r)$ additional space from an RLBWT. However, the bottleneck remains the $O(r \log r)$-time move structure balancing stage. In this paper, we describe an optimal $O(r)$-time and space algorithm to balance a move structure. This result is then applied to LCP construction from an RLBWT to obtain an optimal $O(n)$-time algorithm in $O(r)$-space in addition to the output, which implies an optimal-time algorithm for LCP array enumeration in compressed $O(r)$-space.

Approximate Butterfly Counting in Sublinear Time

from arXiv: Data Structures and Algorithms

Authors: Chi Luo, Jiaxin Song, Yuhao Zhang, Kai Wang, Zhixing He, Kuan Yang

Bipartite graphs serve as a natural model for representing relationships between two different types of entities. When analyzing bipartite graphs, butterfly counting is a fundamental research problem that aims to count the number of butterflies (i.e., 2x2 bicliques) in a given bipartite graph. While this problem has been extensively studied in the literature, existing algorithms usually necessitate access to a large portion of the entire graph, presenting challenges in real scenarios where graphs are extremely large and I/O costs are expensive. In this paper, we study the butterfly counting problem under the query model, where the following query operations are permitted: degree query, neighbor query, and vertex-pair query. We propose TLS, a practical two-level sampling algorithm that can estimate the butterfly count accurately while accessing only a limited graph structure, achieving significantly lower query costs under the standard query model. TLS also incorporates several key techniques to control the variance, including "small-degree-first sampling" and "wedge sampling via small subsets". To ensure theoretical guarantees, we further introduce two novel techniques: "heavy-light partition" and "guess-and-prove", integrated into TLS. With these techniques, we prove that the algorithm can achieve a (1+eps) accuracy for any given approximation parameter 0 < eps < 1 on general bipartite graphs with a promised time and query complexity. In particular, the promised time is sublinear when the input graph is dense enough. Extensive experiments on 15 datasets demonstrate that TLS delivers robust estimates with up to three orders of magnitude lower query costs and runtime compared to existing solutions.

Authors: Chi Luo, Jiaxin Song, Yuhao Zhang, Kai Wang, Zhixing He, Kuan Yang

Bipartite graphs serve as a natural model for representing relationships between two different types of entities. When analyzing bipartite graphs, butterfly counting is a fundamental research problem that aims to count the number of butterflies (i.e., 2x2 bicliques) in a given bipartite graph. While this problem has been extensively studied in the literature, existing algorithms usually necessitate access to a large portion of the entire graph, presenting challenges in real scenarios where graphs are extremely large and I/O costs are expensive. In this paper, we study the butterfly counting problem under the query model, where the following query operations are permitted: degree query, neighbor query, and vertex-pair query. We propose TLS, a practical two-level sampling algorithm that can estimate the butterfly count accurately while accessing only a limited graph structure, achieving significantly lower query costs under the standard query model. TLS also incorporates several key techniques to control the variance, including "small-degree-first sampling" and "wedge sampling via small subsets". To ensure theoretical guarantees, we further introduce two novel techniques: "heavy-light partition" and "guess-and-prove", integrated into TLS. With these techniques, we prove that the algorithm can achieve a (1+eps) accuracy for any given approximation parameter 0 < eps < 1 on general bipartite graphs with a promised time and query complexity. In particular, the promised time is sublinear when the input graph is dense enough. Extensive experiments on 15 datasets demonstrate that TLS delivers robust estimates with up to three orders of magnitude lower query costs and runtime compared to existing solutions.

Non-Exclusive Notifications for Ride-Hailing at Lyft I: Single-Cycle Approximation Algorithms

from arXiv: Data Structures and Algorithms

Authors: Farbod Ekbatani, Rad Niazadeh, Mehdi Golari, Romain Camilleri, Titouan Jehl, Chris Sholley, Matthew Leventi, Theresa Calderon, Angela Lam, Paul Havard Duclos, Tim Holland, James Koch, Shreya Reddy

Ride-hailing platforms increasingly rely on non-exclusive notifications-broadcasting a single request to multiple drivers simultaneously-to mitigate inefficiencies caused by uncertain driver acceptance. In this paper, the first in a two-part collaboration with Lyft, we formally model the 'Notification Set Selection Problem' for a single decision cycle, where the platform determines the optimal subset of drivers to notify for each incoming ride request. We analyze this combinatorial optimization problem under two contention-resolution protocols: 'First Acceptance (FA)', which prioritizes speed by assigning the ride to the first responder, and 'Best Acceptance (BA)', which prioritizes match quality by selecting the highest-valued accepting driver. We show that welfare maximization under both mechanisms is strongly NP-hard, ruling out a Fully Polynomial Time Approximation Scheme (FPTAS). Despite this, we derive several positive algorithmic results. For FA, we present a Polynomial Time Approximation Scheme (PTAS) for the single-rider case and a constant-factor approximation (factor 4) for the general matching setting. We highlight that the FA valuation function can be viewed as a novel discrete choice model with theoretical properties of independent interest. For BA, we prove that the objective is monotone and submodular, admitting a standard $(1 - 1/e)$-approximation. Moreover, using a polynomial-time demand oracle that we design for this problem, we show it is possible to surpass the $(1 - 1/e)$ barrier. Finally, in the special case of homogeneous acceptance probabilities, we show that the BA problem can be solved exactly in polynomial time via a linear programming formulation. We validate the empirical performance our algorithms through numerical experiments on synthetic data and on instances calibrated using real ride-sharing data from Lyft.

Authors: Farbod Ekbatani, Rad Niazadeh, Mehdi Golari, Romain Camilleri, Titouan Jehl, Chris Sholley, Matthew Leventi, Theresa Calderon, Angela Lam, Paul Havard Duclos, Tim Holland, James Koch, Shreya Reddy

Ride-hailing platforms increasingly rely on non-exclusive notifications-broadcasting a single request to multiple drivers simultaneously-to mitigate inefficiencies caused by uncertain driver acceptance. In this paper, the first in a two-part collaboration with Lyft, we formally model the 'Notification Set Selection Problem' for a single decision cycle, where the platform determines the optimal subset of drivers to notify for each incoming ride request. We analyze this combinatorial optimization problem under two contention-resolution protocols: 'First Acceptance (FA)', which prioritizes speed by assigning the ride to the first responder, and 'Best Acceptance (BA)', which prioritizes match quality by selecting the highest-valued accepting driver. We show that welfare maximization under both mechanisms is strongly NP-hard, ruling out a Fully Polynomial Time Approximation Scheme (FPTAS). Despite this, we derive several positive algorithmic results. For FA, we present a Polynomial Time Approximation Scheme (PTAS) for the single-rider case and a constant-factor approximation (factor 4) for the general matching setting. We highlight that the FA valuation function can be viewed as a novel discrete choice model with theoretical properties of independent interest. For BA, we prove that the objective is monotone and submodular, admitting a standard $(1 - 1/e)$-approximation. Moreover, using a polynomial-time demand oracle that we design for this problem, we show it is possible to surpass the $(1 - 1/e)$ barrier. Finally, in the special case of homogeneous acceptance probabilities, we show that the BA problem can be solved exactly in polynomial time via a linear programming formulation. We validate the empirical performance our algorithms through numerical experiments on synthetic data and on instances calibrated using real ride-sharing data from Lyft.

Stationary Online Contention Resolution Schemes

from arXiv: Data Structures and Algorithms

Authors: Mohammad Reza Aminian, Rad Niazadeh, Pranav Nuti

Online contention resolution schemes (OCRSs) are a central tool in Bayesian online selection and resource allocation: they convert fractional ex-ante relaxations into feasible online policies while preserving each marginal probability up to a constant factor. Despite their importance, designing (near) optimal OCRSs is often technically challenging, and many existing constructions rely on indirect reductions to prophet inequalities and LP duality, resulting in algorithms that are difficult to interpret or implement. In this paper, we introduce "stationary online contention resolution schemes (S-OCRSs)," a permutation-invariant class of OCRSs in which the distribution of the selected feasible set is independent of arrival order. We show that S-OCRSs admit an exact distributional characterization together with a universal online implementation. We then develop a general `maximum-entropy' approach to construct and analyze S-OCRSs, reducing the design of online policies to constructing suitable distributions over feasible sets. This yields a new technical framework for designing simple and possibly improved OCRSs. We demonstrate the power of this framework across several canonical feasibility environments. In particular, we obtain an improved $(3-\sqrt{5})/2$-selectable OCRS for bipartite matchings, attaining the independence benchmark conjectured to be optimal and yielding the best known prophet inequality for this setting. We also obtain a $1-\sqrt{2/(πk)} + O(1/k)$-selectable OCRS for $k$-uniform matroids and a simple, explicit $1/2$-selectable OCRS for weakly Rayleigh matroids (including all $\mathbb{C}$-representable matroids such as graphic and laminar). While these guarantees match the best known bounds, our framework also yields concrete and systematic constructions, providing transparent algorithms in settings where previous OCRSs were implicit or technically involved.

Authors: Mohammad Reza Aminian, Rad Niazadeh, Pranav Nuti

Online contention resolution schemes (OCRSs) are a central tool in Bayesian online selection and resource allocation: they convert fractional ex-ante relaxations into feasible online policies while preserving each marginal probability up to a constant factor. Despite their importance, designing (near) optimal OCRSs is often technically challenging, and many existing constructions rely on indirect reductions to prophet inequalities and LP duality, resulting in algorithms that are difficult to interpret or implement. In this paper, we introduce "stationary online contention resolution schemes (S-OCRSs)," a permutation-invariant class of OCRSs in which the distribution of the selected feasible set is independent of arrival order. We show that S-OCRSs admit an exact distributional characterization together with a universal online implementation. We then develop a general `maximum-entropy' approach to construct and analyze S-OCRSs, reducing the design of online policies to constructing suitable distributions over feasible sets. This yields a new technical framework for designing simple and possibly improved OCRSs. We demonstrate the power of this framework across several canonical feasibility environments. In particular, we obtain an improved $(3-\sqrt{5})/2$-selectable OCRS for bipartite matchings, attaining the independence benchmark conjectured to be optimal and yielding the best known prophet inequality for this setting. We also obtain a $1-\sqrt{2/(πk)} + O(1/k)$-selectable OCRS for $k$-uniform matroids and a simple, explicit $1/2$-selectable OCRS for weakly Rayleigh matroids (including all $\mathbb{C}$-representable matroids such as graphic and laminar). While these guarantees match the best known bounds, our framework also yields concrete and systematic constructions, providing transparent algorithms in settings where previous OCRSs were implicit or technically involved.

Hardening Confidential Federated Compute against Side-channel Attacks

from arXiv: Data Structures and Algorithms

Authors: James Bell-Clark, Albert Cheu, Adria Gascon, Jonathan Katz

In this work, we identify a set of side-channels in our Confidential Federated Compute platform that a hypothetical insider could exploit to circumvent differential privacy (DP) guarantees. We show how DP can mitigate two of the side-channels, one of which has been implemented in our open-source library.

Authors: James Bell-Clark, Albert Cheu, Adria Gascon, Jonathan Katz

In this work, we identify a set of side-channels in our Confidential Federated Compute platform that a hypothetical insider could exploit to circumvent differential privacy (DP) guarantees. We show how DP can mitigate two of the side-channels, one of which has been implemented in our open-source library.

The Library Theorem: How External Organization Governs Agentic Reasoning Capacity

from arXiv: Data Structures and Algorithms

Authors: Zachary F. Mainen

Externalized reasoning is already exploited by transformer-based agents through chain-of-thought, but structured retrieval -- indexing over one's own reasoning state -- remains underexplored. We formalize the transformer context window as an I/O page and prove that tool-augmented agents with indexed external memory achieve exponentially lower retrieval cost than agents restricted to sequential scanning: $O(\log_b N)$ versus $Ω(N)$ page reads per query, and $O(T \log_b T)$ versus $Θ(T^2)$ cumulative cost over $T$ reasoning steps -- a gap that widens as deliberation deepens. We test these predictions on a controlled lookup benchmark across three content types -- random hashes, ordered integers, and encyclopedia entries -- varying store size from 50 to 5,000 items, and replicate key conditions across two model generations (GPT-4o-mini and GPT-5.4). On abstract content, the indexed agent achieves median 1 page read regardless of store size, confirming the $O(1)$ prediction. Sorted pages without an index fail to close the gap: the weaker model cannot sustain binary search at scale, and the stronger model achieves near-optimal $\log_2 N$ search but still loses to the index by $5\times$. On familiar content (encyclopedia entries), a competing failure mode emerges: the model recognizes the domain, bypasses the retrieval protocol, and generates answers from parametric memory, producing catastrophic token expenditure even when the index is sound. This parametric memory competition dissociates the two cognitive operations that indexing combines: understanding content (where language models excel) and following navigational protocols (where they fail when understanding tempts them to shortcut). The result argues for a separation of concerns: use language models for index construction, where semantic understanding helps, and deterministic algorithms for index traversal, where it hurts.

Authors: Zachary F. Mainen

Externalized reasoning is already exploited by transformer-based agents through chain-of-thought, but structured retrieval -- indexing over one's own reasoning state -- remains underexplored. We formalize the transformer context window as an I/O page and prove that tool-augmented agents with indexed external memory achieve exponentially lower retrieval cost than agents restricted to sequential scanning: $O(\log_b N)$ versus $Ω(N)$ page reads per query, and $O(T \log_b T)$ versus $Θ(T^2)$ cumulative cost over $T$ reasoning steps -- a gap that widens as deliberation deepens. We test these predictions on a controlled lookup benchmark across three content types -- random hashes, ordered integers, and encyclopedia entries -- varying store size from 50 to 5,000 items, and replicate key conditions across two model generations (GPT-4o-mini and GPT-5.4). On abstract content, the indexed agent achieves median 1 page read regardless of store size, confirming the $O(1)$ prediction. Sorted pages without an index fail to close the gap: the weaker model cannot sustain binary search at scale, and the stronger model achieves near-optimal $\log_2 N$ search but still loses to the index by $5\times$. On familiar content (encyclopedia entries), a competing failure mode emerges: the model recognizes the domain, bypasses the retrieval protocol, and generates answers from parametric memory, producing catastrophic token expenditure even when the index is sound. This parametric memory competition dissociates the two cognitive operations that indexing combines: understanding content (where language models excel) and following navigational protocols (where they fail when understanding tempts them to shortcut). The result argues for a separation of concerns: use language models for index construction, where semantic understanding helps, and deterministic algorithms for index traversal, where it hurts.

(Sets of ) Complement Scattered Factors

from arXiv: Data Structures and Algorithms

Authors: Duncan Adamson, Pamela Fleischmann, Annika Huch

Starting in the 1970s with the fundamental work of Imre Simon, \emph{scattered factors} (also known as subsequences or scattered subwords) have remained a consistently and heavily studied object. The majority of work on scattered factors can be split into two broad classes of problems: given a word, what information, in the form of scattered factors, are contained, and which are not. In this paper, we consider an intermediary problem, introducing the notion of \emph{complement scattered factors}. Given a word $w$ and a scattered factor $u$ of $w$, the complement scattered factors of $w$ with regards to $u$, $C(w, u)$, is the set of scattered factors in $w$ that can be formed by removing any embedding of $u$ from $w$. This is closely related to the \emph{shuffle} operation in which two words are intertwined, i.e., we extend previous work relating to the shuffle operator, using knowledge about scattered factors. Alongside introducing these sets, we provide combinatorial results on the size of the set $C(w, u)$, an algorithm to compute the set $C(w, u)$ from $w$ and $u$ in $O(\vert w \vert \cdot \vert u \vert \binom{w}{u})$ time, where $\binom{w}{u}$ denotes the number of embeddings of $u$ into $w$, an algorithm to construct $u$ from $w$ and $C(w, u)$ in $O(\vert w \vert^2 \binom{\vert w \vert}{\vert w \vert - \vert u \vert})$ time, and an algorithm to construct $w$ from $u$ and $C(w, u)$ in $O(\vert u \vert \cdot \vert w \vert^{\vert u \vert + 1})$ time.

Authors: Duncan Adamson, Pamela Fleischmann, Annika Huch

Starting in the 1970s with the fundamental work of Imre Simon, \emph{scattered factors} (also known as subsequences or scattered subwords) have remained a consistently and heavily studied object. The majority of work on scattered factors can be split into two broad classes of problems: given a word, what information, in the form of scattered factors, are contained, and which are not. In this paper, we consider an intermediary problem, introducing the notion of \emph{complement scattered factors}. Given a word $w$ and a scattered factor $u$ of $w$, the complement scattered factors of $w$ with regards to $u$, $C(w, u)$, is the set of scattered factors in $w$ that can be formed by removing any embedding of $u$ from $w$. This is closely related to the \emph{shuffle} operation in which two words are intertwined, i.e., we extend previous work relating to the shuffle operator, using knowledge about scattered factors. Alongside introducing these sets, we provide combinatorial results on the size of the set $C(w, u)$, an algorithm to compute the set $C(w, u)$ from $w$ and $u$ in $O(\vert w \vert \cdot \vert u \vert \binom{w}{u})$ time, where $\binom{w}{u}$ denotes the number of embeddings of $u$ into $w$, an algorithm to construct $u$ from $w$ and $C(w, u)$ in $O(\vert w \vert^2 \binom{\vert w \vert}{\vert w \vert - \vert u \vert})$ time, and an algorithm to construct $w$ from $u$ and $C(w, u)$ in $O(\vert u \vert \cdot \vert w \vert^{\vert u \vert + 1})$ time.

Monday, March 23

Strong Chain Quality

from Decentralized Thoughts

Chain Quality (CQ) is a core blockchain property. Roughly speaking, it says: Owning $3\%$ of the stake gives you inclusion rights over valid inputs of your choice in roughly $3\%$ of the blockspace over time. For Nakamoto style chains, this is called Ideal CQ (see here). Chain quality was sufficient for early generations of blockchains that had low throughput, but modern blockchains have much higher bandwidth and can commit many...

By Ittai Abraham, Pranav Garimidi, Joachim Neu

Chain Quality (CQ) is a core blockchain property. Roughly speaking, it says: Owning $3\%$ of the stake gives you inclusion rights over valid inputs of your choice in roughly $3\%$ of the blockspace over time. For Nakamoto style chains, this is called Ideal CQ (see here). Chain quality was sufficient for early generations of blockchains that had low throughput, but modern blockchains have much higher bandwidth and can commit many...

By Ittai Abraham, Pranav Garimidi, Joachim Neu

The monotonicity of the Franz-Parisi potential is equivalent with Low-degree MMSE lower bounds

from arXiv: Computational Complexity

Authors: Konstantinos Tsirkas, Leda Wang, Ilias Zadik

Over the last decades, two distinct approaches have been instrumental to our understanding of the computational complexity of statistical estimation. The statistical physics literature predicts algorithmic hardness through local stability and monotonicity properties of the Franz--Parisi (FP) potential \cite{franz1995recipes,franz1997phase}, while the mathematically rigorous literature characterizes hardness via the limitations of restricted algorithmic classes, most notably low-degree polynomial estimators \cite{hopkins2017efficient}. For many inference models, these two perspectives yield strikingly consistent predictions, giving rise to a long-standing open problem of establishing a precise mathematical relationship between them. In this work, we show that for estimation problems the power of low-degree polynomials is equivalent to the monotonicity of the annealed FP potential for a broad family of Gaussian additive models (GAMs) with signal-to-noise ratio $λ$. In particular, subject to a low-degree conjecture for GAMs, our results imply that the polynomial-time limits of these models are directly implied by the monotonicity of the annealed FP potential, in conceptual agreement with predictions from the physics literature dating back to the 1990s.

Authors: Konstantinos Tsirkas, Leda Wang, Ilias Zadik

Over the last decades, two distinct approaches have been instrumental to our understanding of the computational complexity of statistical estimation. The statistical physics literature predicts algorithmic hardness through local stability and monotonicity properties of the Franz--Parisi (FP) potential \cite{franz1995recipes,franz1997phase}, while the mathematically rigorous literature characterizes hardness via the limitations of restricted algorithmic classes, most notably low-degree polynomial estimators \cite{hopkins2017efficient}. For many inference models, these two perspectives yield strikingly consistent predictions, giving rise to a long-standing open problem of establishing a precise mathematical relationship between them. In this work, we show that for estimation problems the power of low-degree polynomials is equivalent to the monotonicity of the annealed FP potential for a broad family of Gaussian additive models (GAMs) with signal-to-noise ratio $λ$. In particular, subject to a low-degree conjecture for GAMs, our results imply that the polynomial-time limits of these models are directly implied by the monotonicity of the annealed FP potential, in conceptual agreement with predictions from the physics literature dating back to the 1990s.

Search-Driven Clause Learning for Product-State Quantum $k$-SAT (PRODSAT-QSAT)

from arXiv: Computational Complexity

Authors: Samuel González-Castillo, Joon Hyung Lee, Alfons Laarman

We study PRODSAT-QSAT($k$): given rank-one $k$-local projectors, determine whether a quantum $k$-SAT instance admits a satisfying product state. We present a CDCL-style refutation framework that searches a finite partition of each qubit's Bloch sphere while a sound theory solver checks region feasibility using a geometric overapproximation of the projection amplitudes for each constraint. When the theory solver proves that no state in a region can satisfy a constraint, it produces a sound conflict clause that blocks that region; accumulated blocking clauses can yield a global result of product-state unsatisfiability (UN-PRODSAT). We formalise the problem, prove the soundness of the clause-learning rule, and describe a practical algorithm and implementation.

Authors: Samuel González-Castillo, Joon Hyung Lee, Alfons Laarman

We study PRODSAT-QSAT($k$): given rank-one $k$-local projectors, determine whether a quantum $k$-SAT instance admits a satisfying product state. We present a CDCL-style refutation framework that searches a finite partition of each qubit's Bloch sphere while a sound theory solver checks region feasibility using a geometric overapproximation of the projection amplitudes for each constraint. When the theory solver proves that no state in a region can satisfy a constraint, it produces a sound conflict clause that blocks that region; accumulated blocking clauses can yield a global result of product-state unsatisfiability (UN-PRODSAT). We formalise the problem, prove the soundness of the clause-learning rule, and describe a practical algorithm and implementation.

Constrained Nonnegative Gram Feasibility is $\exists\mathbb{R}$-Complete

from arXiv: Computational Complexity

Authors: Angshul Majumdar

We study the computational complexity of constrained nonnegative Gram feasibility. Given a partially specified symmetric matrix together with affine relations among selected entries, the problem asks whether there exists a nonnegative matrix $H \in \mathbb{R}_+^{n\times r}$ such that $W = HH^\top$ satisfies all specified entries and affine constraints. Such factorizations arise naturally in structured low-rank matrix representations and geometric embedding problems. We prove that this feasibility problem is $\exists\mathbb{R}$-complete already for rank $r=2$. The hardness result is obtained via a polynomial-time reduction from the arithmetic feasibility problem \textsc{ETR-AMI}. The reduction exploits a geometric encoding of arithmetic constraints within rank-$2$ nonnegative Gram representations: by fixing anchor directions in $\mathbb{R}_+^2$ and representing variables through vectors of the form $(x,1)$, addition and multiplication constraints can be realized through inner-product relations. Combined with the semialgebraic formulation of the feasibility conditions, this establishes $\exists\mathbb{R}$-completeness. We further show that the hardness extends to every fixed rank $r\ge 2$. Our results place constrained symmetric nonnegative Gram factorization among the growing family of geometric feasibility problems that are complete for the existential theory of the reals. Finally, we discuss limitations of the result and highlight the open problem of determining the complexity of unconstrained symmetric nonnegative factorization feasibility.

Authors: Angshul Majumdar

We study the computational complexity of constrained nonnegative Gram feasibility. Given a partially specified symmetric matrix together with affine relations among selected entries, the problem asks whether there exists a nonnegative matrix $H \in \mathbb{R}_+^{n\times r}$ such that $W = HH^\top$ satisfies all specified entries and affine constraints. Such factorizations arise naturally in structured low-rank matrix representations and geometric embedding problems. We prove that this feasibility problem is $\exists\mathbb{R}$-complete already for rank $r=2$. The hardness result is obtained via a polynomial-time reduction from the arithmetic feasibility problem \textsc{ETR-AMI}. The reduction exploits a geometric encoding of arithmetic constraints within rank-$2$ nonnegative Gram representations: by fixing anchor directions in $\mathbb{R}_+^2$ and representing variables through vectors of the form $(x,1)$, addition and multiplication constraints can be realized through inner-product relations. Combined with the semialgebraic formulation of the feasibility conditions, this establishes $\exists\mathbb{R}$-completeness. We further show that the hardness extends to every fixed rank $r\ge 2$. Our results place constrained symmetric nonnegative Gram factorization among the growing family of geometric feasibility problems that are complete for the existential theory of the reals. Finally, we discuss limitations of the result and highlight the open problem of determining the complexity of unconstrained symmetric nonnegative factorization feasibility.

Communication Complexity of Disjointness under Product Distributions

from arXiv: Computational Complexity

Authors: Zach Hunter, Aleksa Milojević, Benny Sudakov, Istvan Tomon

Determining the randomized (or distributional) communication complexity of disjointness is a central problem in communication complexity, having roots in the foundational work of Babai, Frankl, and Simon in the 1980s and culminating in the famous works of Kalyanasundaram-Schnitger and Razborov in 1992. However, the question of obtaining tight bounds for product distributions persisted until the more recent work of Bottesch, Gavinsky, and Klauck resolved it. In this note we revisit this classical problem and give a short, streamlined proof of the best bounds, with improved quantitative dependence on the error parameter. Our approach is based on a simple combinatorial lemma that may be of independent interest: if two sets drawn independently from two distributions are disjoint with non-negligible probability, then one can extract two subfamilies of reasonably large measure that are fully cross-disjoint (equivalently, a large monochromatic rectangle for disjointness).

Authors: Zach Hunter, Aleksa Milojević, Benny Sudakov, Istvan Tomon

Determining the randomized (or distributional) communication complexity of disjointness is a central problem in communication complexity, having roots in the foundational work of Babai, Frankl, and Simon in the 1980s and culminating in the famous works of Kalyanasundaram-Schnitger and Razborov in 1992. However, the question of obtaining tight bounds for product distributions persisted until the more recent work of Bottesch, Gavinsky, and Klauck resolved it. In this note we revisit this classical problem and give a short, streamlined proof of the best bounds, with improved quantitative dependence on the error parameter. Our approach is based on a simple combinatorial lemma that may be of independent interest: if two sets drawn independently from two distributions are disjoint with non-negligible probability, then one can extract two subfamilies of reasonably large measure that are fully cross-disjoint (equivalently, a large monochromatic rectangle for disjointness).

On the size of k-irreducible triangulations

from arXiv: Computational Geometry

Authors: Vincent Delecroix, Oscar Fontaine, Arnaud de Mesmay

A triangulation of a surface is k-irreducible if every non-contractible curve has length at least k and any edge contraction breaks this property. Equivalently, every edge belongs to a non-contractible curve of length k and there are no shorter non-contractible curves. We prove that a k-irreducible triangulation of a surface of genus g has $O(k^2g)$ triangles, which is optimal. This is an improvement over the previous best bound $k^{O(k)} g^2$ of Gao, Richter and Seymour [Journal of Combinatorial Theory, Series B, 1996].

Authors: Vincent Delecroix, Oscar Fontaine, Arnaud de Mesmay

A triangulation of a surface is k-irreducible if every non-contractible curve has length at least k and any edge contraction breaks this property. Equivalently, every edge belongs to a non-contractible curve of length k and there are no shorter non-contractible curves. We prove that a k-irreducible triangulation of a surface of genus g has $O(k^2g)$ triangles, which is optimal. This is an improvement over the previous best bound $k^{O(k)} g^2$ of Gao, Richter and Seymour [Journal of Combinatorial Theory, Series B, 1996].

The Voronoi Diagram of Four Lines in $\mathbb{R}^3$

from arXiv: Computational Geometry

Authors: Evanthia Papadopoulou, Zeyu Wang

We consider the Voronoi diagram of lines in $\mathbb{R}^3$ under the Euclidean metric, and give a full classification of its structure in the base case of four lines in general position. We first show that the number of vertices in the Voronoi diagram of four lines in general position is always even, between 0 and 8, and all such numbers can be realized. We identify a key structure for the diagram formation, called a \emph{twist}, which is a pair of consecutive intersections among trisector branches; only two types of twists are possible, so-called \emph{full} and \emph{partial} twists. A full twist is a purely local structure, which can be inserted or removed without affecting the rest of the diagram. Assuming no full twists, the nearest and the farthest Voronoi diagrams of four lines, each have 15 distinct topologies, which are in one-to-one correspondence; the two-dimensional faces are all unbounded, and the total number of vertices is at most six. The unbounded features of the farthest diagram, encoded in a two-dimensional spherical map, are also in one-to-one correspondence. The identified topologies are all realizable. Any Voronoi diagram of four lines in general position in $\mathbb{R}^3$ can be obtained from one of these topologies by inserting full twists; each twist induces a bounded face of exactly two vertices in both the nearest and farthest diagrams. We obtain the classification by an exhaustive search algorithm using some new structural and combinatorial observations of line Voronoi diagrams.

Authors: Evanthia Papadopoulou, Zeyu Wang

We consider the Voronoi diagram of lines in $\mathbb{R}^3$ under the Euclidean metric, and give a full classification of its structure in the base case of four lines in general position. We first show that the number of vertices in the Voronoi diagram of four lines in general position is always even, between 0 and 8, and all such numbers can be realized. We identify a key structure for the diagram formation, called a \emph{twist}, which is a pair of consecutive intersections among trisector branches; only two types of twists are possible, so-called \emph{full} and \emph{partial} twists. A full twist is a purely local structure, which can be inserted or removed without affecting the rest of the diagram. Assuming no full twists, the nearest and the farthest Voronoi diagrams of four lines, each have 15 distinct topologies, which are in one-to-one correspondence; the two-dimensional faces are all unbounded, and the total number of vertices is at most six. The unbounded features of the farthest diagram, encoded in a two-dimensional spherical map, are also in one-to-one correspondence. The identified topologies are all realizable. Any Voronoi diagram of four lines in general position in $\mathbb{R}^3$ can be obtained from one of these topologies by inserting full twists; each twist induces a bounded face of exactly two vertices in both the nearest and farthest diagrams. We obtain the classification by an exhaustive search algorithm using some new structural and combinatorial observations of line Voronoi diagrams.

Better Sampling Bounds for Restricted Delaunay Triangulations and a Star-Shaped Property for Restricted Voronoi Cells

from arXiv: Computational Geometry

Authors: Jonathan Richard Shewchuk

The restricted Delaunay triangulation of a closed surface $Σ$ and a finite point set $V \subset Σ$ is a subcomplex of the Delaunay tetrahedralization of $V$ whose triangles approximate $Σ$. It is well known that if $V$ is a sufficiently dense sample of a smooth $Σ$, then the union of the restricted Delaunay triangles is homeomorphic to $Σ$. We show that an $ε$-sample with $ε\leq 0.3245$ suffices. By comparison, Dey proves it for a $0.18$-sample; our improved sampling bound reduces the number of sample points required by a factor of $3.25$. More importantly, we improve a related sampling bound of Cheng et al. for Delaunay surface meshing, reducing the number of sample points required by a factor of $21$. The first step of our homeomorphism proof is particularly interesting: we show that for a $0.44$-sample, the restricted Voronoi cell of each site $v \in V$ is homeomorphic to a disk, and the orthogonal projection of the cell onto $T_vΣ$ (the plane tangent to $Σ$ at $v$) is star-shaped.

Authors: Jonathan Richard Shewchuk

The restricted Delaunay triangulation of a closed surface $Σ$ and a finite point set $V \subset Σ$ is a subcomplex of the Delaunay tetrahedralization of $V$ whose triangles approximate $Σ$. It is well known that if $V$ is a sufficiently dense sample of a smooth $Σ$, then the union of the restricted Delaunay triangles is homeomorphic to $Σ$. We show that an $ε$-sample with $ε\leq 0.3245$ suffices. By comparison, Dey proves it for a $0.18$-sample; our improved sampling bound reduces the number of sample points required by a factor of $3.25$. More importantly, we improve a related sampling bound of Cheng et al. for Delaunay surface meshing, reducing the number of sample points required by a factor of $21$. The first step of our homeomorphism proof is particularly interesting: we show that for a $0.44$-sample, the restricted Voronoi cell of each site $v \in V$ is homeomorphic to a disk, and the orthogonal projection of the cell onto $T_vΣ$ (the plane tangent to $Σ$ at $v$) is star-shaped.

Locality Sensitive Hashing in Hyperbolic Space

from arXiv: Computational Geometry

Authors: Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin

For a metric space $(X, d)$, a family $\mathcal{H}$ of locality sensitive hash functions is called $(r, cr, p_1, p_2)$ sensitive if a randomly chosen function $h\in \mathcal{H}$ has probability at least $p_1$ (at most $p_2$) to map any $a, b\in X$ in the same hash bucket if $d(a, b)\leq r$ (or $d(a, b)\geq cr$). Locality Sensitive Hashing (LSH) is one of the most popular techniques for approximate nearest-neighbor search in high-dimensional spaces, and has been studied extensively for Hamming, Euclidean, and spherical geometries. An $(r, cr, p_1, p_2)$-sensitive hash function enables approximate nearest neighbor search (i.e., returning a point within distance $cr$ from a query $q$ if there exists a point within distance $r$ from $q$) with space $O(n^{1+ρ})$ and query time $O(n^ρ)$ where $ρ=\frac{\log 1/p_1}{\log 1/p_2}$. But LSH for hyperbolic spaces $\mathbb{H}^d$ remains largely unexplored. In this work, we present the first LSH construction native to hyperbolic space. For the hyperbolic plane $(d=2)$, we show a construction achieving $ρ\leq 1/c$, based on the hyperplane rounding scheme. For general hyperbolic spaces $(d \geq 3)$, we use dimension reduction from $\mathbb{H}^d$ to $\mathbb{H}^2$ and the 2D hyperbolic LSH to get $ρ\leq 1.59/c$. On the lower bound side, we show that the lower bound on $ρ$ of Euclidean LSH extends to the hyperbolic setting via local isometry, therefore giving $ρ\geq 1/c^2$.

Authors: Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin

For a metric space $(X, d)$, a family $\mathcal{H}$ of locality sensitive hash functions is called $(r, cr, p_1, p_2)$ sensitive if a randomly chosen function $h\in \mathcal{H}$ has probability at least $p_1$ (at most $p_2$) to map any $a, b\in X$ in the same hash bucket if $d(a, b)\leq r$ (or $d(a, b)\geq cr$). Locality Sensitive Hashing (LSH) is one of the most popular techniques for approximate nearest-neighbor search in high-dimensional spaces, and has been studied extensively for Hamming, Euclidean, and spherical geometries. An $(r, cr, p_1, p_2)$-sensitive hash function enables approximate nearest neighbor search (i.e., returning a point within distance $cr$ from a query $q$ if there exists a point within distance $r$ from $q$) with space $O(n^{1+ρ})$ and query time $O(n^ρ)$ where $ρ=\frac{\log 1/p_1}{\log 1/p_2}$. But LSH for hyperbolic spaces $\mathbb{H}^d$ remains largely unexplored. In this work, we present the first LSH construction native to hyperbolic space. For the hyperbolic plane $(d=2)$, we show a construction achieving $ρ\leq 1/c$, based on the hyperplane rounding scheme. For general hyperbolic spaces $(d \geq 3)$, we use dimension reduction from $\mathbb{H}^d$ to $\mathbb{H}^2$ and the 2D hyperbolic LSH to get $ρ\leq 1.59/c$. On the lower bound side, we show that the lower bound on $ρ$ of Euclidean LSH extends to the hyperbolic setting via local isometry, therefore giving $ρ\geq 1/c^2$.

Unlabeled Multi-Robot Motion Planning with Improved Separation Trade-offs

from arXiv: Computational Geometry

Authors: Tsuri Farhana, Omrit Filtser, Shalev Goldshtein

We study unlabeled multi-robot motion planning for unit-disk robots in a polygonal environment. Although the problem is hard in general, polynomial-time solutions exist under appropriate separation assumptions on start and target positions. Banyassady et al. (SoCG'22) guarantee feasibility in simple polygons under start--start and target--target distances of at least $4$, and start--target distances of at least $3$, but without optimality guarantees. Solovey et al. (RSS'15) provide a near-optimal solution in general polygonal domains, under stricter conditions: start/target positions must have pairwise distance at least $4$, and at least $\sqrt{5}\approx2.236$ from obstacles. This raises the question of whether polynomial-time algorithms can be obtained in even more densely packed environments. In this paper we present a generalized algorithm that achieve different trade-offs on the robots-separation and obstacles-separation bounds, all significantly improving upon the state of the art. Specifically, we obtain polynomial-time constant-approximation algorithms to minimize the total path length when (i) the robots-separation is $2\tfrac{2}{3}$ and the obstacles-separation is $1\tfrac{2}{3}$, or (ii) the robots-separation is $\approx3.291$ and the obstacles-separation $\approx1.354$. Additionally, we introduce a different strategy yielding a polynomial-time solution when the robots-separation is only $2$, and the obstacles-separation is $3$. Finally, we show that without any robots-separation assumption, obstacles-separation of at least $1.5$ may be necessary for a solution to exist.

Authors: Tsuri Farhana, Omrit Filtser, Shalev Goldshtein

We study unlabeled multi-robot motion planning for unit-disk robots in a polygonal environment. Although the problem is hard in general, polynomial-time solutions exist under appropriate separation assumptions on start and target positions. Banyassady et al. (SoCG'22) guarantee feasibility in simple polygons under start--start and target--target distances of at least $4$, and start--target distances of at least $3$, but without optimality guarantees. Solovey et al. (RSS'15) provide a near-optimal solution in general polygonal domains, under stricter conditions: start/target positions must have pairwise distance at least $4$, and at least $\sqrt{5}\approx2.236$ from obstacles. This raises the question of whether polynomial-time algorithms can be obtained in even more densely packed environments. In this paper we present a generalized algorithm that achieve different trade-offs on the robots-separation and obstacles-separation bounds, all significantly improving upon the state of the art. Specifically, we obtain polynomial-time constant-approximation algorithms to minimize the total path length when (i) the robots-separation is $2\tfrac{2}{3}$ and the obstacles-separation is $1\tfrac{2}{3}$, or (ii) the robots-separation is $\approx3.291$ and the obstacles-separation $\approx1.354$. Additionally, we introduce a different strategy yielding a polynomial-time solution when the robots-separation is only $2$, and the obstacles-separation is $3$. Finally, we show that without any robots-separation assumption, obstacles-separation of at least $1.5$ may be necessary for a solution to exist.

GeoLAN: Geometric Learning of Latent Explanatory Directions in Large Language Models

from arXiv: Computational Geometry

Authors: Tianyu Bell Pan, Damon L. Woodard

Large language models (LLMs) demonstrate strong performance, but they often lack transparency. We introduce GeoLAN, a training framework that treats token representations as geometric trajectories and applies stickiness conditions inspired by recent developments related to the Kakeya Conjecture. We have developed two differentiable regularizers, Katz-Tao Convex Wolff (KT-CW) and Katz-Tao Attention (KT-Attn), that promote isotropy and encourage diverse attention. Our experiments with Gemma-3 (1B, 4B, 12B) and Llama-3-8B show that GeoLAN frequently maintains task accuracy while improving geometric metrics and reducing certain fairness biases. These benefits are most significant in mid-sized models. Our findings reveal scale-dependent trade-offs between geometric precision and performance, suggesting that geometry-aware training is a promising approach to enhance mechanistic interpretability.

Authors: Tianyu Bell Pan, Damon L. Woodard

Large language models (LLMs) demonstrate strong performance, but they often lack transparency. We introduce GeoLAN, a training framework that treats token representations as geometric trajectories and applies stickiness conditions inspired by recent developments related to the Kakeya Conjecture. We have developed two differentiable regularizers, Katz-Tao Convex Wolff (KT-CW) and Katz-Tao Attention (KT-Attn), that promote isotropy and encourage diverse attention. Our experiments with Gemma-3 (1B, 4B, 12B) and Llama-3-8B show that GeoLAN frequently maintains task accuracy while improving geometric metrics and reducing certain fairness biases. These benefits are most significant in mid-sized models. Our findings reveal scale-dependent trade-offs between geometric precision and performance, suggesting that geometry-aware training is a promising approach to enhance mechanistic interpretability.

Lazy Kronecker Product

from arXiv: Data Structures and Algorithms

Authors: Zhao Song

In this paper, we show how to generalize the lazy update regime from dynamic matrix product [Cohen, Lee, Song STOC 2019, JACM 2021] to dynamic kronecker product. We provide an algorithm that uses $n^{ω( \lceil k/2 \rceil, \lfloor k/2 \rfloor, a )-a}$ amortized update time and $ n^{ω( \lceil(k-s)/2 \rceil, \lfloor (k-s)/2 \rfloor,a )}$ worst case query time for dynamic kronecker product problem. Unless tensor MV conjecture is false, there is no algorithm that can use both $n^{ω( \lceil k/2 \rceil, \lfloor k/2 \rfloor, a )-a-Ω(1)}$ amortized update time, and $ n^{ω( \lceil(k-s)/2 \rceil, \lfloor (k-s)/2 \rfloor,a )-Ω(1)}$ worst case query time.

Authors: Zhao Song

In this paper, we show how to generalize the lazy update regime from dynamic matrix product [Cohen, Lee, Song STOC 2019, JACM 2021] to dynamic kronecker product. We provide an algorithm that uses $n^{ω( \lceil k/2 \rceil, \lfloor k/2 \rfloor, a )-a}$ amortized update time and $ n^{ω( \lceil(k-s)/2 \rceil, \lfloor (k-s)/2 \rfloor,a )}$ worst case query time for dynamic kronecker product problem. Unless tensor MV conjecture is false, there is no algorithm that can use both $n^{ω( \lceil k/2 \rceil, \lfloor k/2 \rfloor, a )-a-Ω(1)}$ amortized update time, and $ n^{ω( \lceil(k-s)/2 \rceil, \lfloor (k-s)/2 \rfloor,a )-Ω(1)}$ worst case query time.

Algorithms for Euclidean Distance Matrix Completion: Exploiting Proximity to Triviality

from arXiv: Data Structures and Algorithms

Authors: Fedor V. Fomin, Petr A. Golovach, M. S. Ramanujan, Saket Saurabh

In the d-Euclidean Distance Matrix Completion (d-EDMC) problem, one aims to determine whether a given partial matrix of pairwise distances can be extended to a full Euclidean distance matrix in d dimensions. This problem is a cornerstone of computational geometry with numerous applications. While classical work on this problem often focuses on exploiting connections to semidefinite programming typically leading to approximation algorithms, we focus on exact algorithms and propose a novel distance-from-triviality parameterization framework to obtain tractability results for d-EDMC. We identify key structural patterns in the input that capture entry density, including chordal substructures and coverability of specified entries by fully specified principal submatrices. We obtain: (1) The first fixed-parameter algorithm (FPT algorithm) for d-EDMC parameterized by d and the maximum number of unspecified entries per row/column. This is achieved through a novel compression algorithm that reduces a given instance to a submatrix on O(1) rows (for fixed values of the parameters). (2) The first FPT algorithm for d-EDMC parameterized by d and the minimum number of fully specified principal submatrices whose entries cover all specified entries of the given matrix. This result is also achieved through a compression algorithm. (3) A polynomial-time algorithm for d-EDMC when both d and the minimum fill-in of a natural graph representing the specified entries are fixed constants. This result is achieved by combining tools from distance geometry and algorithms from real algebraic geometry. Our work identifies interesting parallels between EDM completion and graph problems, with our algorithms exploiting techniques from both domains.

Authors: Fedor V. Fomin, Petr A. Golovach, M. S. Ramanujan, Saket Saurabh

In the d-Euclidean Distance Matrix Completion (d-EDMC) problem, one aims to determine whether a given partial matrix of pairwise distances can be extended to a full Euclidean distance matrix in d dimensions. This problem is a cornerstone of computational geometry with numerous applications. While classical work on this problem often focuses on exploiting connections to semidefinite programming typically leading to approximation algorithms, we focus on exact algorithms and propose a novel distance-from-triviality parameterization framework to obtain tractability results for d-EDMC. We identify key structural patterns in the input that capture entry density, including chordal substructures and coverability of specified entries by fully specified principal submatrices. We obtain: (1) The first fixed-parameter algorithm (FPT algorithm) for d-EDMC parameterized by d and the maximum number of unspecified entries per row/column. This is achieved through a novel compression algorithm that reduces a given instance to a submatrix on O(1) rows (for fixed values of the parameters). (2) The first FPT algorithm for d-EDMC parameterized by d and the minimum number of fully specified principal submatrices whose entries cover all specified entries of the given matrix. This result is also achieved through a compression algorithm. (3) A polynomial-time algorithm for d-EDMC when both d and the minimum fill-in of a natural graph representing the specified entries are fixed constants. This result is achieved by combining tools from distance geometry and algorithms from real algebraic geometry. Our work identifies interesting parallels between EDM completion and graph problems, with our algorithms exploiting techniques from both domains.

Power laws and power-of-two-choices

from arXiv: Data Structures and Algorithms

Authors: Amanda Redlich

This paper analyzes a variation on the well-known "power of two choices" allocation algorithms. Classically, the smallest of $d$ randomly-chosen options is selected. We investigate what happens when the largest of $d$ randomly-chosen options is selected. This process generates a power-law-like distribution: the $i^{th}$-smallest value scales with $i^{d-1}$, where $d$ is the number of randomly-chosen options, with high probability. We give a formula for the expectation and show the distribution is concentrated around the expectation

Authors: Amanda Redlich

This paper analyzes a variation on the well-known "power of two choices" allocation algorithms. Classically, the smallest of $d$ randomly-chosen options is selected. We investigate what happens when the largest of $d$ randomly-chosen options is selected. This process generates a power-law-like distribution: the $i^{th}$-smallest value scales with $i^{d-1}$, where $d$ is the number of randomly-chosen options, with high probability. We give a formula for the expectation and show the distribution is concentrated around the expectation

Computational Complexity Analysis of Interval Methods in Solving Uncertain Nonlinear Systems

from arXiv: Data Structures and Algorithms

Authors: Rudra Prakash, S. Janardhanan, Shaunak Sen

This paper analyses the computational complexity of validated interval methods for uncertain nonlinear systems. Interval analysis produces guaranteed enclosures that account for uncertainty and round-off, but its adoption is often limited by computational cost in high dimensions. We develop an algorithm-level worst-case framework that makes the dependence on the initial search volume $\mathrm{Vol}(X_0)$, the target tolerance $\varepsilon$, and the costs of validated primitives explicit (inclusion-function evaluation, Jacobian evaluation, and interval linear algebra). Within this framework, we derive worst-case time and space bounds for interval bisection, subdivision$+$filter, interval constraint propagation, interval Newton, and interval Krawczyk. The bounds quantify the scaling with $\mathrm{Vol}(X_0)$ and $\varepsilon$ for validated steady-state enclosure and highlight dominant cost drivers. We also show that determinant and inverse computation for interval matrices via naive Laplace expansion is factorial in the matrix dimension, motivating specialised interval linear algebra. Finally, interval Newton and interval Krawczyk have comparable leading-order costs; Krawczyk is typically cheaper in practice because it inverts a real midpoint matrix rather than an interval matrix. These results support the practical design of solvers for validated steady-state analysis in applications such as biochemical reaction network modelling, robust parameter estimation, and other uncertainty-aware computations in systems and synthetic biology.

Authors: Rudra Prakash, S. Janardhanan, Shaunak Sen

This paper analyses the computational complexity of validated interval methods for uncertain nonlinear systems. Interval analysis produces guaranteed enclosures that account for uncertainty and round-off, but its adoption is often limited by computational cost in high dimensions. We develop an algorithm-level worst-case framework that makes the dependence on the initial search volume $\mathrm{Vol}(X_0)$, the target tolerance $\varepsilon$, and the costs of validated primitives explicit (inclusion-function evaluation, Jacobian evaluation, and interval linear algebra). Within this framework, we derive worst-case time and space bounds for interval bisection, subdivision$+$filter, interval constraint propagation, interval Newton, and interval Krawczyk. The bounds quantify the scaling with $\mathrm{Vol}(X_0)$ and $\varepsilon$ for validated steady-state enclosure and highlight dominant cost drivers. We also show that determinant and inverse computation for interval matrices via naive Laplace expansion is factorial in the matrix dimension, motivating specialised interval linear algebra. Finally, interval Newton and interval Krawczyk have comparable leading-order costs; Krawczyk is typically cheaper in practice because it inverts a real midpoint matrix rather than an interval matrix. These results support the practical design of solvers for validated steady-state analysis in applications such as biochemical reaction network modelling, robust parameter estimation, and other uncertainty-aware computations in systems and synthetic biology.

Range-Based Set Reconciliation via Range-Summarizable Order-Statistics Stores

from arXiv: Data Structures and Algorithms

Authors: Elvio G. Amparore

Range-Based Set Reconciliation (RBSR) synchronizes ordered sets by recursively comparing summaries of contiguous ranges and refining only the mismatching parts. While its communication complexity is well understood, its local computational cost fundamentally depends on the storage backend that must answer repeated range-summary, rank, and enumeration queries during refinement. We argue that a natural storage abstraction for RBSR implementations based on composable range aggregates is a \emph{range-summarizable order-statistics store} (RSOS): a dynamic ordered-set structure supporting composable summaries of contiguous ranges together with rank/select navigation. This identifies and formalizes the backend contract needed for efficient recursive refinement, combining range-summary support with order-statistics navigation for balanced partitioning. We then show that a specific augmentation of B\textsuperscript{+}-trees with subtree counts and composable summaries realizes a RSOS, and we derive corresponding bounds on local reconciliation work in this abstract storage model. Finally, we introduce AELMDB, an extension of LMDB that realizes this design inside a persistent memory-mapped engine, and evaluate it through an integration with Negentropy. The results show that placing the reconciliation oracle inside the storage tree substantially reduces local reconciliation cost on the evaluated reconciliation-heavy workloads compared with an open-source persistent baseline based on auxiliary tree caches, while the window-subrange ablation further confirms the usefulness of the systems optimizations built on top of the core aggregate representation.

Authors: Elvio G. Amparore

Range-Based Set Reconciliation (RBSR) synchronizes ordered sets by recursively comparing summaries of contiguous ranges and refining only the mismatching parts. While its communication complexity is well understood, its local computational cost fundamentally depends on the storage backend that must answer repeated range-summary, rank, and enumeration queries during refinement. We argue that a natural storage abstraction for RBSR implementations based on composable range aggregates is a \emph{range-summarizable order-statistics store} (RSOS): a dynamic ordered-set structure supporting composable summaries of contiguous ranges together with rank/select navigation. This identifies and formalizes the backend contract needed for efficient recursive refinement, combining range-summary support with order-statistics navigation for balanced partitioning. We then show that a specific augmentation of B\textsuperscript{+}-trees with subtree counts and composable summaries realizes a RSOS, and we derive corresponding bounds on local reconciliation work in this abstract storage model. Finally, we introduce AELMDB, an extension of LMDB that realizes this design inside a persistent memory-mapped engine, and evaluate it through an integration with Negentropy. The results show that placing the reconciliation oracle inside the storage tree substantially reduces local reconciliation cost on the evaluated reconciliation-heavy workloads compared with an open-source persistent baseline based on auxiliary tree caches, while the window-subrange ablation further confirms the usefulness of the systems optimizations built on top of the core aggregate representation.

Scalable Learning of Multivariate Distributions via Coresets

from arXiv: Data Structures and Algorithms

Authors: Zeyu Ding, Katja Ickstadt, Nadja Klein, Alexander Munteanu, Simon Omlor

Efficient and scalable non-parametric or semi-parametric regression analysis and density estimation are of crucial importance to the fields of statistics and machine learning. However, available methods are limited in their ability to handle large-scale data. We address this issue by developing a novel coreset construction for multivariate conditional transformation models (MCTMs) to enhance their scalability and training efficiency. To the best of our knowledge, these are the first coresets for semi-parametric distributional models. Our approach yields substantial data reduction via importance sampling. It ensures with high probability that the log-likelihood remains within multiplicative error bounds of $(1\pm\varepsilon)$ and thereby maintains statistical model accuracy. Compared to conventional full-parametric models, where coresets have been incorporated before, our semi-parametric approach exhibits enhanced adaptability, particularly in scenarios where complex distributions and non-linear relationships are present, but not fully understood. To address numerical problems associated with normalizing logarithmic terms, we follow a geometric approximation based on the convex hull of input data. This ensures feasible, stable, and accurate inference in scenarios involving large amounts of data. Numerical experiments demonstrate substantially improved computational efficiency when handling large and complex datasets, thus laying the foundation for a broad range of applications within the statistics and machine learning communities.

Authors: Zeyu Ding, Katja Ickstadt, Nadja Klein, Alexander Munteanu, Simon Omlor

Efficient and scalable non-parametric or semi-parametric regression analysis and density estimation are of crucial importance to the fields of statistics and machine learning. However, available methods are limited in their ability to handle large-scale data. We address this issue by developing a novel coreset construction for multivariate conditional transformation models (MCTMs) to enhance their scalability and training efficiency. To the best of our knowledge, these are the first coresets for semi-parametric distributional models. Our approach yields substantial data reduction via importance sampling. It ensures with high probability that the log-likelihood remains within multiplicative error bounds of $(1\pm\varepsilon)$ and thereby maintains statistical model accuracy. Compared to conventional full-parametric models, where coresets have been incorporated before, our semi-parametric approach exhibits enhanced adaptability, particularly in scenarios where complex distributions and non-linear relationships are present, but not fully understood. To address numerical problems associated with normalizing logarithmic terms, we follow a geometric approximation based on the convex hull of input data. This ensures feasible, stable, and accurate inference in scenarios involving large amounts of data. Numerical experiments demonstrate substantially improved computational efficiency when handling large and complex datasets, thus laying the foundation for a broad range of applications within the statistics and machine learning communities.

Envy-Free School Redistricting Between Two Groups

from arXiv: Data Structures and Algorithms

Authors: Daisuke Shibatani, Yutaro Yamaguchi

We study an application of fair division theory to school redistricting. Procaccia, Robinson, and Tucker-Foltz (SODA 2024) recently proposed a mathematical model to generate redistricting plans that provide theoretically guaranteed fairness among demographic groups of students. They showed that an almost proportional allocation can be found by adding $O(g \log g)$ extra seats in total, where $g$ is the number of groups. In contrast, for three or more groups, adding $o(n)$ extra seats is not sufficient to obtain an almost envy-free allocation in general, where $n$ is the total number of students. In this paper, we focus on the case of two groups. We introduce a relevant relaxation of envy-freeness, termed 1-relaxed envy-freeness, which limits the capacity violation not in total but at each school to at most one. We show that there always exists a 1-relaxed envy-free allocation, which can be found in polynomial time.

Authors: Daisuke Shibatani, Yutaro Yamaguchi

We study an application of fair division theory to school redistricting. Procaccia, Robinson, and Tucker-Foltz (SODA 2024) recently proposed a mathematical model to generate redistricting plans that provide theoretically guaranteed fairness among demographic groups of students. They showed that an almost proportional allocation can be found by adding $O(g \log g)$ extra seats in total, where $g$ is the number of groups. In contrast, for three or more groups, adding $o(n)$ extra seats is not sufficient to obtain an almost envy-free allocation in general, where $n$ is the total number of students. In this paper, we focus on the case of two groups. We introduce a relevant relaxation of envy-freeness, termed 1-relaxed envy-freeness, which limits the capacity violation not in total but at each school to at most one. We show that there always exists a 1-relaxed envy-free allocation, which can be found in polynomial time.

Sunday, March 22

A $100 gift card could be legit. A $1000 is obviously a Scam. What should scammers do?

from Computational Complexity

 If I get an email offering me a $1000 for I DON"T KNOW SINCE  I ignore it and don't even bother looking for other signs it is a scam. 

If I get an email offering me $100  I may look more carefully and often they are legit (most common is to give a per-publication review of a math book---sometimes just questions, but more often a written report). 

Most offers I get are either $1000 or $100. Today I got one for $750 which inspired this post (I ignored the offer without checking). 

Which nets more people $100 or $1000?

1) If people are like me then $100 fools more people. But people like me will still CHECK CAREFULLY. I sometimes feed the email into ChatGPT for an opinion to see if it's a scam. (Spellcheck still things ChatGPT should be spelled catgut.) 

2) Are there people who would fill out the survey (or whatever) for $1000 but not for $100? I ask non rhetorically as always. Are such people more gullible?


Would scammers make more money if they offered $100 instead of $1000 ?

1) More people would fall for the $100 scam. Or maybe not---do some people not bother if it's only $100?

2) Depending on how they are scamming you, will they get less out of it if they only offer $100?

Here are types of scams:

1) They send you a check for $1000 + x and say WHOOPS- please email us a check for $x. I've heard of this in the Can you tutor our daughter in math? scam. For this one, offering $1000 nets the scammer more money since for $100+x, x will be smaller than $1000+x.

2) They want to harvest your personal information. For these I don't think they will gain more if they do 1000 vs 100. 

One more thought:

1) I said that for $100 I take it seriously but for $1000 I don't

2) I said that $750 I do not take it seriously.

3) What's the cutoff?  Obviously $289.

By gasarch

 If I get an email offering me a $1000 for I DON"T KNOW SINCE  I ignore it and don't even bother looking for other signs it is a scam. 

If I get an email offering me $100  I may look more carefully and often they are legit (most common is to give a per-publication review of a math book---sometimes just questions, but more often a written report). 

Most offers I get are either $1000 or $100. Today I got one for $750 which inspired this post (I ignored the offer without checking). 

Which nets more people $100 or $1000?

1) If people are like me then $100 fools more people. But people like me will still CHECK CAREFULLY. I sometimes feed the email into ChatGPT for an opinion to see if it's a scam. (Spellcheck still things ChatGPT should be spelled catgut.) 

2) Are there people who would fill out the survey (or whatever) for $1000 but not for $100? I ask non rhetorically as always. Are such people more gullible?


Would scammers make more money if they offered $100 instead of $1000 ?

1) More people would fall for the $100 scam. Or maybe not---do some people not bother if it's only $100?

2) Depending on how they are scamming you, will they get less out of it if they only offer $100?

Here are types of scams:

1) They send you a check for $1000 + x and say WHOOPS- please email us a check for $x. I've heard of this in the Can you tutor our daughter in math? scam. For this one, offering $1000 nets the scammer more money since for $100+x, x will be smaller than $1000+x.

2) They want to harvest your personal information. For these I don't think they will gain more if they do 1000 vs 100. 

One more thought:

1) I said that for $100 I take it seriously but for $1000 I don't

2) I said that $750 I do not take it seriously.

3) What's the cutoff?  Obviously $289.

By gasarch

Starting Today: Kazhdan Sunday seminar: “Boolean Functions, Hypercontractivity, and Applications”

from Gil Kalai

Sunday, 22 March, 2026 – 11:00 to 13:00 Today the seminar will take place via zoom. After Pesach we hope to make it in Ross 70 seminar room.   Repeats every week every Sunday until the end of June 2026 Kazhdan … Continue reading →

Sunday, 22 March, 2026 – 11:00 to 13:00

Today the seminar will take place via zoom. After Pesach we hope to make it in Ross 70 seminar room.  

Repeats every week every Sunday until the end of June 2026

Kazhdan Seminar: Boolean Functions, Hypercontractivity, and Applications

Instructors: Noam Lifshitz and Gil Kalai 

Abstract:

Boolean functions are central objects in combinatorics, complexity theory, probability theory, and other areas of mathematics and computer science. Fourier methods have come to play an important role in the analysis of Boolean functions, and so are hypercontractive inequalities. New hypercontractive inequalities for global functions were developed in the last decade and have led to the resolution of some long-standing problems and to further applications to group theory and representation theory.

Prerequisites

Basic knowledge of probability, linear algebra, and group theory.

Related blog posts: Joram’s seminar 2025: Hypercontractivity, Groups and Representations;  Noam Lifshitz: A new hypercontractivity inequality — The proof!; To cheer you up in difficult times 3: A guest post by Noam Lifshitz on the new hypercontractivity inequality of Peter Keevash, Noam Lifshitz, Eoin Long and Dor MinzerNati’s Influence.


Tentative Weekly Schedule

Part I: Classical Tools on the Hypercube

    1. Foundations: Fourier expansion on the Boolean cube \{0,1\}^n, the Noise Operator, and Total Influence. Basic isoperimetric inequalities. 

  • The Classical Hypercontractive Inequality: The Bonami-Beckner-Gross inequality. Proof and immediate consequences.

    1. Fundamental Theorems: The KKL Theorem (Kahn-Kalai-Linial) and Friedgut’s Junta Theorem. Variance bounds for first passage percolation. 

 
  • The Limits of the Classical Theory: Failure of hypercontractivity in non-product domains and sparse regimes.

 

Part II: Hypercontractivity for Global Functions

5. The “Global” Philosophy: Defining global functions and the need to exclude “local” functions (dictators) to recover strong bounds.

6. The Global Hypercontractive Inequality: Proof of the inequality for global functions on the $p$-biased cube.

7. Extension to High-Rank Finite Groups:

* Hypercontractivity in the Symmetric Group S_n and Alternating Group A_n.

* The interplay between representation theory and small-set expansion.

 

8. Polynomial Bogolyubov in Finite Simple Groups:

* Growth and mixing of subsets of finite groups via hypercontractivity and representation theory.

* Finding polynomially large cosets inside A^3.

 

Part III: Applications to Additive Combinatorics & Geometry

9. Product-Free Sets in A_n:

* Using global hypercontractivity to determine the structure of the largest product-free subsets of the Alternating Group.

* Stability of global functions on A_n.

10. Circle Method Hypercontractivity & The Sárközy Problem:

* The Green-Sawhney approach: “Arithmetic level-d inequalities.”

* Applying hypercontractivity to the major arcs in the Circle Method to break the square-root barrier for Sárközy’s Theorem.

11. Geometry Meets Representation Theory in Compact Lie Groups:

* The Geometric Engine: How Ricci curvature and the Bakry-Émery criterion control the heat flow on compact Lie groups (SU(n)).

* From Geometry to Combinatorics: Using geometric hypercontractivity to control high-dimensional representations.

* Applications: Characterizing product-free sets in SU(n) and proving probabilistic Waring-type theorems.

Part IV: Applications to Character Theory

12. The Character Bound Problem:

* Introduction to character bounds and the Guralnick-Larsen-Tiep (GLT) bounds.

* The classical approach via algebraic geometry (Deligne-Lusztig theory).

13. A New Approach via Hypercontractivity:

* Re-obtaining the GLT bounds using global hypercontractivity.

* The “elementary” proof that bypasses Deligne-Lusztig machinery.


Key Bibliography

  1. O’Donnell, R. Analysis of Boolean Functions. Cambridge University Press, 2014.

  2. Keevash, P., Lifshitz, N., Long, E., & Minzer, D. Hypercontractivity for global functions and sharp thresholds. (2024).

  3. Keller, N., Lifshitz, N., & Marcus, O. Sharp Hypercontractivity for Global Functions.

  4. Green, B., & Sawhney, M. New bounds for the Furstenberg-Sárközy theorem. (2024).

  5. Ellis, D., Kindler, G., Lifshitz, N., & Minzer, D. Product mixing in compact Lie groups. (2024).

  6. Filmus, Y., Kindler, G., Lifshitz, N., & Minzer, D. Hypercontractivity on the symmetric group. (2023).


By Gil Kalai

Saturday, March 21

TR26-040 | Communication Complexity of Disjointness under Product Distributions | Zach Hunter, Aleksa Milojevic, Benny Sudakov, Istvan Tomon

from ECCC Papers

Determining the randomized (or distributional) communication complexity of disjointness is a central problem in communication complexity, having roots in the foundational work of Babai, Frankl, and Simon in the 1980s and culminating in the famous works of Kalyanasundaram-Schnitger and Razborov in 1992. However, the question of obtaining tight bounds for product distributions persisted until the more recent work of Bottesch, Gavinsky, and Klauck resolved it. In this note we revisit this classical problem and give a short, streamlined proof of the best bounds, with improved quantitative dependence on the error parameter. Our approach is based on a simple combinatorial lemma that may be of independent interest: if two sets drawn independently from two distributions are disjoint with non-negligible probability, then one can extract two subfamilies of reasonably large measure that are fully cross-disjoint (equivalently, a large monochromatic rectangle for disjointness).

Determining the randomized (or distributional) communication complexity of disjointness is a central problem in communication complexity, having roots in the foundational work of Babai, Frankl, and Simon in the 1980s and culminating in the famous works of Kalyanasundaram-Schnitger and Razborov in 1992. However, the question of obtaining tight bounds for product distributions persisted until the more recent work of Bottesch, Gavinsky, and Klauck resolved it. In this note we revisit this classical problem and give a short, streamlined proof of the best bounds, with improved quantitative dependence on the error parameter. Our approach is based on a simple combinatorial lemma that may be of independent interest: if two sets drawn independently from two distributions are disjoint with non-negligible probability, then one can extract two subfamilies of reasonably large measure that are fully cross-disjoint (equivalently, a large monochromatic rectangle for disjointness).