Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Wednesday, June 03

APX-Hardness of Computing Lipschitz Constants for Multi-Parametric Quadratic Programs

from arXiv: Computational Complexity

Authors: Xingchen Li, Kunpeng Liu, Keyou You

Computing the Lipschitz constant of the solution map of a multi-parametric quadratic program is important for the analysis of optimization-based control. This problem is governed by three factors: the parameter dimension, the number of decision variables, and the number of constraints. While empirical evidence has long suggested exponential complexity, a rigorous complexity-theoretic proof has been lacking. In this paper, we fill this gap by proving that this problem is not only NP-hard but also APX-hard. Furthermore, we reveal that: (a) the problem becomes polynomial-time solvable when the number of constraints or decision variables is fixed; and (b) both NP-hardness and APX-hardness persist even in the scalar parameter case. These results confirm that the complexity stems from the number of constraints and variables, rather than the parameter dimension. Numerical experiments further validate these theoretical findings.

Authors: Xingchen Li, Kunpeng Liu, Keyou You

Computing the Lipschitz constant of the solution map of a multi-parametric quadratic program is important for the analysis of optimization-based control. This problem is governed by three factors: the parameter dimension, the number of decision variables, and the number of constraints. While empirical evidence has long suggested exponential complexity, a rigorous complexity-theoretic proof has been lacking. In this paper, we fill this gap by proving that this problem is not only NP-hard but also APX-hard. Furthermore, we reveal that: (a) the problem becomes polynomial-time solvable when the number of constraints or decision variables is fixed; and (b) both NP-hardness and APX-hardness persist even in the scalar parameter case. These results confirm that the complexity stems from the number of constraints and variables, rather than the parameter dimension. Numerical experiments further validate these theoretical findings.

Collision Resistance of Single-Layer Neural Nets

from arXiv: Computational Complexity

Authors: Marco Benedetti, Andrej Bogdanov, Enrico M. Malatesta, Marc Mézard, Gianmarco Perrupato, Alon Rosen, Nikolaj I. Schwartzbach, Riccardo Zecchina

We initiate the study of the algorithmic complexity of finding collisions in single-layer binary neural networks. Given a random matrix $\mathbf{A} \in \mathbb{R}^{m\times n}$, an input $\mathbf{x} \in \{-1,1\}^n$ is mapped to a binary output vector $\varphi(\mathbf{A}\mathbf{x})\in \{-1,1\}^m$, where $\varphi$ is an activation function with constant behavior on $[κ, \infty)$ for some threshold $κ\geq 0$. We identify the threshold scale $κ=Θ(1/\sqrtα)$, where $α=m/n$, as separating two complementary phenomena. When $κ\ll 1/\sqrtα$, we give a simple online algorithm that efficiently produces extensive collisions. When $κ\gg 1/\sqrtα$, for a natural \emph{randomized} non-periodic activation and suitable oscillation complexity, we prove that the extensive-collision space exhibits an overlap gap property (OGP), yielding an exponential lower bound against online algorithms. Ours is the first work to use the overlap gap property as a rigorous criterion for collision resistance. The key difference between collision finding and average-case search is that collision finding has a new ``worst-case'' aspect: the collision finder has full control over the choice of colliding pairs. Our lower bound is proved in the online model; extending such guarantees to broader classes of algorithms, including spectral, algebraic, lattice-based, or quantum methods, remains an open direction.

Authors: Marco Benedetti, Andrej Bogdanov, Enrico M. Malatesta, Marc Mézard, Gianmarco Perrupato, Alon Rosen, Nikolaj I. Schwartzbach, Riccardo Zecchina

We initiate the study of the algorithmic complexity of finding collisions in single-layer binary neural networks. Given a random matrix $\mathbf{A} \in \mathbb{R}^{m\times n}$, an input $\mathbf{x} \in \{-1,1\}^n$ is mapped to a binary output vector $\varphi(\mathbf{A}\mathbf{x})\in \{-1,1\}^m$, where $\varphi$ is an activation function with constant behavior on $[κ, \infty)$ for some threshold $κ\geq 0$. We identify the threshold scale $κ=Θ(1/\sqrtα)$, where $α=m/n$, as separating two complementary phenomena. When $κ\ll 1/\sqrtα$, we give a simple online algorithm that efficiently produces extensive collisions. When $κ\gg 1/\sqrtα$, for a natural \emph{randomized} non-periodic activation and suitable oscillation complexity, we prove that the extensive-collision space exhibits an overlap gap property (OGP), yielding an exponential lower bound against online algorithms. Ours is the first work to use the overlap gap property as a rigorous criterion for collision resistance. The key difference between collision finding and average-case search is that collision finding has a new ``worst-case'' aspect: the collision finder has full control over the choice of colliding pairs. Our lower bound is proved in the online model; extending such guarantees to broader classes of algorithms, including spectral, algebraic, lattice-based, or quantum methods, remains an open direction.

Quantum-Classical Equivalence for AND-Functions

from arXiv: Computational Complexity

Authors: Sreejata Kishor Bhattacharya, Farzan Byramji, Arkadev Chattopadhyay, Yogesh Dahiya, Shachar Lovett

A major open problem in quantum communication complexity is whether quantum protocols can be exponentially more efficient than classical protocols for computing total Boolean functions; the prevailing conjecture is that they cannot be so. In a seminal work, Razborov (2002) resolved this question for AND-functions of the form $$ F(x,y) = f(x_1 \land y_1, \ldots, x_n \land y_n), $$ when the outer function $f$ is symmetric, by proving that their bounded-error quantum and classical communication complexities are polynomially related. Since then, extending this result to all AND-functions has remained open and has been posed by several authors. In this work, we settle this problem in a strong way. We show that for every Boolean function $f$, the bounded-error quantum and classical deterministic communication complexities of the function $f \circ \mathrm{AND}_2$ are polynomially related, up to polylogarithmic factors in $n$. We prove this by showing that both are characterized--up to polynomial loss--by the logarithm of the De Morgan sparsity of $f$. Our results build on the recent work of Chattopadhyay, Dahiya, and Lovett (2025) on structural characterizations of non-sparse Boolean functions, which we extend to resolve the conjecture for general AND-functions.

Authors: Sreejata Kishor Bhattacharya, Farzan Byramji, Arkadev Chattopadhyay, Yogesh Dahiya, Shachar Lovett

A major open problem in quantum communication complexity is whether quantum protocols can be exponentially more efficient than classical protocols for computing total Boolean functions; the prevailing conjecture is that they cannot be so. In a seminal work, Razborov (2002) resolved this question for AND-functions of the form $$ F(x,y) = f(x_1 \land y_1, \ldots, x_n \land y_n), $$ when the outer function $f$ is symmetric, by proving that their bounded-error quantum and classical communication complexities are polynomially related. Since then, extending this result to all AND-functions has remained open and has been posed by several authors. In this work, we settle this problem in a strong way. We show that for every Boolean function $f$, the bounded-error quantum and classical deterministic communication complexities of the function $f \circ \mathrm{AND}_2$ are polynomially related, up to polylogarithmic factors in $n$. We prove this by showing that both are characterized--up to polynomial loss--by the logarithm of the De Morgan sparsity of $f$. Our results build on the recent work of Chattopadhyay, Dahiya, and Lovett (2025) on structural characterizations of non-sparse Boolean functions, which we extend to resolve the conjecture for general AND-functions.

Lean 4 Machine-Verified Proof of P = NP via the Pedigree Polytope Membership Problem

from arXiv: Computational Complexity

Authors: T. S. Arthanari

The Membership Problem for Pedigree Polytope (M3P) asks, given $X\in\mathbb{Q}^{\binom{n}{3}}$, whether $X\in\mathrm{conv}(P_n)$, where $P_n$ is the set of all pedigrees. A pedigree is a structured encoding of a Hamiltonian cycle construction in $K_n$. We establish that M3P is solvable in strongly polynomial time via a recursively constructed layered network $(N_k, R_k, μ)$ and a multicommodity flow problem MCF$(k)$. The necessary and sufficient condition for membership established is that the optimal total flow in MCF$(n-1)$ equals the maximum possible flow $z_{\max}$. The complexity analysis, grounded in Tardos's strongly polynomial algorithm for combinatorial linear programs (1986), shows that this condition can be checked in strongly polynomial time in the dimension of the matrix involved. By sufficiency, this implies M3P~$\in$~P. Since the Symmetric Travelling Salesman Problem (STSP) reduces to M3P via the Multistage Insertion (MI) formulation (Arthanari 1983), STSP is solvable in polynomial time, and the P vs.NP question is resolved. The proofs leading to this result are fully machine-verified in Lean~4/Mathlib4, with zero unresolved \texttt{sorry}s in the main proof chain. The main contribution is the Lean~4 machine verification of all proofs in the main chain, resulting in \texttt{theorem p\_equals\_np}: P = NP. The Lean~4 formal verification covers the sufficiency of MCF(n-1) for membership in $\mathrm{conv}(P_n)$, and the P = NP chain via Maurras (2002), Grötschel--Lovász--Schrijver (1988), Cook (1971), and Karp (1972). The complete lean project (36 Lean~4 files, 2968/2968 build targets clean) is available at github.com/TiruArt/Pedigree-Polytopes-Lean4.

Authors: T. S. Arthanari

The Membership Problem for Pedigree Polytope (M3P) asks, given $X\in\mathbb{Q}^{\binom{n}{3}}$, whether $X\in\mathrm{conv}(P_n)$, where $P_n$ is the set of all pedigrees. A pedigree is a structured encoding of a Hamiltonian cycle construction in $K_n$. We establish that M3P is solvable in strongly polynomial time via a recursively constructed layered network $(N_k, R_k, μ)$ and a multicommodity flow problem MCF$(k)$. The necessary and sufficient condition for membership established is that the optimal total flow in MCF$(n-1)$ equals the maximum possible flow $z_{\max}$. The complexity analysis, grounded in Tardos's strongly polynomial algorithm for combinatorial linear programs (1986), shows that this condition can be checked in strongly polynomial time in the dimension of the matrix involved. By sufficiency, this implies M3P~$\in$~P. Since the Symmetric Travelling Salesman Problem (STSP) reduces to M3P via the Multistage Insertion (MI) formulation (Arthanari 1983), STSP is solvable in polynomial time, and the P vs.NP question is resolved. The proofs leading to this result are fully machine-verified in Lean~4/Mathlib4, with zero unresolved \texttt{sorry}s in the main proof chain. The main contribution is the Lean~4 machine verification of all proofs in the main chain, resulting in \texttt{theorem p\_equals\_np}: P = NP. The Lean~4 formal verification covers the sufficiency of MCF(n-1) for membership in $\mathrm{conv}(P_n)$, and the P = NP chain via Maurras (2002), Grötschel--Lovász--Schrijver (1988), Cook (1971), and Karp (1972). The complete lean project (36 Lean~4 files, 2968/2968 build targets clean) is available at https://github.com/TiruArt/Pedigree-Polytopes-Lean4.

Scaling Laws for Neural-Network Quantum States

from arXiv: Computational Complexity

Authors: Riccardo Rende, Alessandro Sinibaldi, Luciano Loris Viteritti, Roeland Wiersema, Antoine Georges, Giuseppe Carleo

Scaling laws, the power-law relations between loss, architecture size, and compute observed in modern neural networks, offer a quantitative way to characterize the complexity of a learning problem, with the exponent governing the decay of the loss reflecting how rapidly additional resources translate into improved accuracy, and thus how hard the target is to learn. Whether an analogous framework can characterize the complexity of physical problems remains open. We address this question for Neural-Network Quantum States, a leading variational approach for strongly correlated quantum many-body systems. Using transformer wave functions to approximate ground states of the $J_1$-$J_2$ Heisenberg model on triangular and square lattices with up to $20\times 20$ sites, we find that the $V$-score, a measure of accuracy of a variational state, decays as a power law in training compute. Under an appropriate rescaling of compute, results for different system sizes collapse onto a single curve, analogous to scaling collapse in critical phenomena. The resulting power law is, to a good approximation, independent of the number of sites, showing that the transformer Ansatz is size-consistent for the systems considered. The exponent decreases systematically with frustration, identifying it as a quantitative measure of representational difficulty of the ground state and establishing scaling laws as a general framework for benchmarking variational ansätze.

Authors: Riccardo Rende, Alessandro Sinibaldi, Luciano Loris Viteritti, Roeland Wiersema, Antoine Georges, Giuseppe Carleo

Scaling laws, the power-law relations between loss, architecture size, and compute observed in modern neural networks, offer a quantitative way to characterize the complexity of a learning problem, with the exponent governing the decay of the loss reflecting how rapidly additional resources translate into improved accuracy, and thus how hard the target is to learn. Whether an analogous framework can characterize the complexity of physical problems remains open. We address this question for Neural-Network Quantum States, a leading variational approach for strongly correlated quantum many-body systems. Using transformer wave functions to approximate ground states of the $J_1$-$J_2$ Heisenberg model on triangular and square lattices with up to $20\times 20$ sites, we find that the $V$-score, a measure of accuracy of a variational state, decays as a power law in training compute. Under an appropriate rescaling of compute, results for different system sizes collapse onto a single curve, analogous to scaling collapse in critical phenomena. The resulting power law is, to a good approximation, independent of the number of sites, showing that the transformer Ansatz is size-consistent for the systems considered. The exponent decreases systematically with frustration, identifying it as a quantitative measure of representational difficulty of the ground state and establishing scaling laws as a general framework for benchmarking variational ansätze.

Optimizing Explicit Unit-Distance Lower-Bound Certificates

from arXiv: Computational Geometry

Authors: Michael T. M. Emmerich

The 2026 disproof of Erdős's unit-distance conjecture and Sawin's subsequent explicit quantitative refinement show that the maximum number $u(n)$ of unit distances among $n$ planar points can exceed $n^{1+\varepsilon}$ for a fixed positive $\varepsilon$. Sawin's explicit bound gives more than $n^{1.014}$ unit distances for arbitrarily large $n$ and exposes finite parameters whose choice is not fully optimized. This report formulates the finite parameter-selection task as a variant of a nonlinear integer programming problem and proposes an open-source Python verification pipeline, first validated by reproducing Sawin's published parameter choice and then applied to computationally improved certificates. The main computational contribution is an integer optimization and checking procedure for the sets of primes $T$ and $S_Q$, the integer multiplicities $k(p)$, and a rationally encoded real parameter $R$. The optimization pipelines are intentionally lightweight and replicable on standard hardware: we propose a deterministic greedy construction heuristic, a Tailored Integer Evolution Strategy with repair operators for number-theoretic feasibility, and a two-parent discrete-recombination variant. Four certificate levels are compared: Sawin's published example with $δ=0.0141144286784982\ldots$, a greedy optimization certificate with $δ=0.0151718056372133\ldots$, a Tailored Integer Evolution Strategy certificate with rational $R=6672416/100000$ and $δ=0.0152616610684193\ldots$, and a Tailored Integer Evolution Strategy with discrete recombination, again with $R=6672416/100000$, giving $δ=0.0152628688170072\ldots$. Consequently, subject to Sawin's explicit criterion being applied exactly as cited, the best current certificate supports the cautious clean statement $u(n)>n^{1.0152}$ for arbitrarily large $n$.

Authors: Michael T. M. Emmerich

The 2026 disproof of Erdős's unit-distance conjecture and Sawin's subsequent explicit quantitative refinement show that the maximum number $u(n)$ of unit distances among $n$ planar points can exceed $n^{1+\varepsilon}$ for a fixed positive $\varepsilon$. Sawin's explicit bound gives more than $n^{1.014}$ unit distances for arbitrarily large $n$ and exposes finite parameters whose choice is not fully optimized. This report formulates the finite parameter-selection task as a variant of a nonlinear integer programming problem and proposes an open-source Python verification pipeline, first validated by reproducing Sawin's published parameter choice and then applied to computationally improved certificates. The main computational contribution is an integer optimization and checking procedure for the sets of primes $T$ and $S_Q$, the integer multiplicities $k(p)$, and a rationally encoded real parameter $R$. The optimization pipelines are intentionally lightweight and replicable on standard hardware: we propose a deterministic greedy construction heuristic, a Tailored Integer Evolution Strategy with repair operators for number-theoretic feasibility, and a two-parent discrete-recombination variant. Four certificate levels are compared: Sawin's published example with $δ=0.0141144286784982\ldots$, a greedy optimization certificate with $δ=0.0151718056372133\ldots$, a Tailored Integer Evolution Strategy certificate with rational $R=6672416/100000$ and $δ=0.0152616610684193\ldots$, and a Tailored Integer Evolution Strategy with discrete recombination, again with $R=6672416/100000$, giving $δ=0.0152628688170072\ldots$. Consequently, subject to Sawin's explicit criterion being applied exactly as cited, the best current certificate supports the cautious clean statement $u(n)>n^{1.0152}$ for arbitrarily large $n$.

Towards Efficient Synthesis of Quantum Graph States by Fusing Graph Motifs

from arXiv: Computational Geometry

Authors: Tingxiang Ji, Hansika Weerasena, Demitry Farfurnik, Jianqing Liu

Photonic graph states with advanced topologies can enable measurement-based quantum computing, distributed quantum sensing, and quantum interconnects. However, the efficient generation of photonic graph states is limited by the probabilistic nature of photonic entangling operations and the exponential dependence of generation rate on resource cost. In this work, we study photonic graph state synthesis as a cost-aware decomposition problem, exploiting local Clifford (LC) equivalence to identify more synthesis-friendly representations of the target graph state before decomposition. Specifically, we propose Cost-aware Fusion-based Decomposition (CFD), a three-stage heuristic framework that decomposes a target graph state into ring, star, and linear motifs, and assembles them via Type-I fusion operations to minimize fusion overhead and physical-qubit consumption. We further show that selecting the LC-equivalent graph state with the minimum number of edges provides a highly effective proxy for near-optimal synthesis: in many cases it matches the best generation rate observed within the LC equivalence class under CFD, and in most remaining cases it remains close to it. Numerical evaluations on graph state orbit data and 2D and 3D lattice graph states demonstrate that CFD achieves up to 84.6\% reduction in resource overhead compared to baseline constructions, and yields improvements in photonic generation rate spanning multiple orders of magnitude. These results suggest that combining structure-aware motif decomposition with LC equivalence is a practical and scalable strategy for photonic graph state synthesis.

Authors: Tingxiang Ji, Hansika Weerasena, Demitry Farfurnik, Jianqing Liu

Photonic graph states with advanced topologies can enable measurement-based quantum computing, distributed quantum sensing, and quantum interconnects. However, the efficient generation of photonic graph states is limited by the probabilistic nature of photonic entangling operations and the exponential dependence of generation rate on resource cost. In this work, we study photonic graph state synthesis as a cost-aware decomposition problem, exploiting local Clifford (LC) equivalence to identify more synthesis-friendly representations of the target graph state before decomposition. Specifically, we propose Cost-aware Fusion-based Decomposition (CFD), a three-stage heuristic framework that decomposes a target graph state into ring, star, and linear motifs, and assembles them via Type-I fusion operations to minimize fusion overhead and physical-qubit consumption. We further show that selecting the LC-equivalent graph state with the minimum number of edges provides a highly effective proxy for near-optimal synthesis: in many cases it matches the best generation rate observed within the LC equivalence class under CFD, and in most remaining cases it remains close to it. Numerical evaluations on graph state orbit data and 2D and 3D lattice graph states demonstrate that CFD achieves up to 84.6\% reduction in resource overhead compared to baseline constructions, and yields improvements in photonic generation rate spanning multiple orders of magnitude. These results suggest that combining structure-aware motif decomposition with LC equivalence is a practical and scalable strategy for photonic graph state synthesis.

Planar Perfect Matching Counting is as Hard as Determinants

from arXiv: Data Structures and Algorithms

Authors: Radu Curticapean, Jiaheng Wang

In the 1960s, Fisher, Kasteleyn and Temperley designed an ingenious algorithm for computing the partition function of the dimer model, or equivalently, for counting perfect matchings in edge-weighted planar graphs (Philos. Mag. 1961; J. Mathematical Phys. 1963). This FKT algorithm later became the foundation for Valiant's holographic algorithms (FOCS 2004; SIAM J. Comput. 2008), which motivated the study of counting problems under the Holant framework. Combined with an algorithm by Yuster (FOCS 2008), the FKT algorithm allows us to count edge-weighted perfect matchings in planar $n$-vertex graphs with $\tilde{O}(n^{ω/2})$ arithmetic operations, where $ω<2.372$ is the matrix multiplication exponent. We prove a corresponding lower bound: Over algebraic circuits and other sufficiently strong computational models, perfect matchings in edge-weighted $n$-vertex planar graphs $G$ cannot be counted in $O(n^{ω/2-ε})$ arithmetic operations. This confirms the optimality of Yuster's algorithm. Our bound holds even when $G$ is an edge-weighted square grid.

Authors: Radu Curticapean, Jiaheng Wang

In the 1960s, Fisher, Kasteleyn and Temperley designed an ingenious algorithm for computing the partition function of the dimer model, or equivalently, for counting perfect matchings in edge-weighted planar graphs (Philos. Mag. 1961; J. Mathematical Phys. 1963). This FKT algorithm later became the foundation for Valiant's holographic algorithms (FOCS 2004; SIAM J. Comput. 2008), which motivated the study of counting problems under the Holant framework. Combined with an algorithm by Yuster (FOCS 2008), the FKT algorithm allows us to count edge-weighted perfect matchings in planar $n$-vertex graphs with $\tilde{O}(n^{ω/2})$ arithmetic operations, where $ω<2.372$ is the matrix multiplication exponent. We prove a corresponding lower bound: Over algebraic circuits and other sufficiently strong computational models, perfect matchings in edge-weighted $n$-vertex planar graphs $G$ cannot be counted in $O(n^{ω/2-ε})$ arithmetic operations. This confirms the optimality of Yuster's algorithm. Our bound holds even when $G$ is an edge-weighted square grid.

The Grothendieck Constant is Less Than $\fracπ{2 \log (1+ \sqrt{2})} - 10^{-5}$

from arXiv: Data Structures and Algorithms

Authors: Alan Li, Rahul Saha, Anton Xue, Adam Klivans, Pravesh K Kothari, Raghu Meka, Swarat Chaudhury

We prove that the Grothendieck constant $K_G < $\fracπ{2 \log (1+ \sqrt{2})} - 10^{-5}$. This improves on the work of braverman et. al.

Authors: Alan Li, Rahul Saha, Anton Xue, Adam Klivans, Pravesh K Kothari, Raghu Meka, Swarat Chaudhury

We prove that the Grothendieck constant $K_G < $\fracπ{2 \log (1+ \sqrt{2})} - 10^{-5}$. This improves on the work of braverman et. al.

Ranked MSO-enumeration over compressed words

from arXiv: Data Structures and Algorithms

Authors: Markus Lohrey

It is shown that the ranked query enumeration problem for a fixed MSO-query on strings can be solved with linear preprocessing and constant delay in the grammar-compressed setting, where the input string is given by a so-called straight-line program, i.e., a context-free grammar that produces exactly one string. Moreover, `ranked' means that the output tuples of the MSO-query are printed in a specific order that has to be MSO-definable. This is the first result for ranked query enumeration on compressed data. A corollary of this result is that for a fixed polyregular function $f$ and a word $w$ that is given by a straight-line program of size $n$, one can list after preprocessing time $\mathcal{O}(n)$ the symbols in $f(w)$ from left to right with constant delay, which generalizes a result of Bojanczyk for the case where $w$ is uncompressed. The proofs for these results are based on factorization trees, which are made accessible to the grammar-compressed setting (a contribution of independent interest).

Authors: Markus Lohrey

It is shown that the ranked query enumeration problem for a fixed MSO-query on strings can be solved with linear preprocessing and constant delay in the grammar-compressed setting, where the input string is given by a so-called straight-line program, i.e., a context-free grammar that produces exactly one string. Moreover, `ranked' means that the output tuples of the MSO-query are printed in a specific order that has to be MSO-definable. This is the first result for ranked query enumeration on compressed data. A corollary of this result is that for a fixed polyregular function $f$ and a word $w$ that is given by a straight-line program of size $n$, one can list after preprocessing time $\mathcal{O}(n)$ the symbols in $f(w)$ from left to right with constant delay, which generalizes a result of Bojanczyk for the case where $w$ is uncompressed. The proofs for these results are based on factorization trees, which are made accessible to the grammar-compressed setting (a contribution of independent interest).

Revisiting $O(n \log \log n)$ chaining for anchored edit distance

from arXiv: Data Structures and Algorithms

Authors: Nicola Rizzo, Ragnar Groot Koerkamp

Colinear chaining is a classical heuristic for sequence alignment: it enables scalable genome comparison and is a main component of many state-of-the-art read mappers based on seed-chain-extend. The earliest $O(n \log \log n)$ time algorithms by Eppstein et al. (J. ACM, 1992) chained $n$ fragments between two sequences $T$ and $Q$ while minimizing a gap cost based on the diagonal distance $Δ_{\text{diag}}$ between consecutive fragments. They also forbid fragment overlaps, which are essential in current chaining formulations: in long-read mapping, overlaps improve sensitivity and avoid restrictions on the fragment class considered. Jain, Gibney, and Thankachan (J. Comput. Biol. 2022) recently combined a $Δ_{\text{diag}} = |Δ_T -Δ_Q|$ overlap cost with the classic $L_\infty = \max(Δ_T , Δ_Q)$ gap cost that takes the maximum between the horizontal and vertical gap between the fragments and they proved that chaining under this cost model is equivalent to the anchored edit distance. We improve the existing $O(n \log^3 n)$-time algorithm for anchored edit distance to $O(n \log \log n)$ time in $O(n)$ space, by combining the gap-cost computation of Chao and Miller (Algorithmica, 1995) with the overlap-cost computation of Baker and Giancarlo (ESA, 1998). By developing llchain, a simpler $O(n \log n)$-time implementation of our method, we show how chaining algorithms that might have been recently overlooked by the bioinformatics community scale competitively to millions of fragments and large genomes. On average, llchain is $10\times$ faster than other methods on instances with $3\,000\,000$ anchors, and over $3\times$ faster on MEMs between HiFi reads and a reference human genome.

Authors: Nicola Rizzo, Ragnar Groot Koerkamp

Colinear chaining is a classical heuristic for sequence alignment: it enables scalable genome comparison and is a main component of many state-of-the-art read mappers based on seed-chain-extend. The earliest $O(n \log \log n)$ time algorithms by Eppstein et al. (J. ACM, 1992) chained $n$ fragments between two sequences $T$ and $Q$ while minimizing a gap cost based on the diagonal distance $Δ_{\text{diag}}$ between consecutive fragments. They also forbid fragment overlaps, which are essential in current chaining formulations: in long-read mapping, overlaps improve sensitivity and avoid restrictions on the fragment class considered. Jain, Gibney, and Thankachan (J. Comput. Biol. 2022) recently combined a $Δ_{\text{diag}} = |Δ_T -Δ_Q|$ overlap cost with the classic $L_\infty = \max(Δ_T , Δ_Q)$ gap cost that takes the maximum between the horizontal and vertical gap between the fragments and they proved that chaining under this cost model is equivalent to the anchored edit distance. We improve the existing $O(n \log^3 n)$-time algorithm for anchored edit distance to $O(n \log \log n)$ time in $O(n)$ space, by combining the gap-cost computation of Chao and Miller (Algorithmica, 1995) with the overlap-cost computation of Baker and Giancarlo (ESA, 1998). By developing llchain, a simpler $O(n \log n)$-time implementation of our method, we show how chaining algorithms that might have been recently overlooked by the bioinformatics community scale competitively to millions of fragments and large genomes. On average, llchain is $10\times$ faster than other methods on instances with $3\,000\,000$ anchors, and over $3\times$ faster on MEMs between HiFi reads and a reference human genome.

Deterministic Distance Approximation in MPC via Improved Hitting Sets

from arXiv: Data Structures and Algorithms

Authors: Kyungjin Cho, Michal Dory, Yannic Maus, Tijn de Vos

In this paper, we provide the first deterministic algorithms with sublogarithmic round complexity for spanners and approximate shortest paths in various MPC models. Moreover, we significantly improve upon the state of the art in the deterministic Congested Clique. In particular, we obtain the following four results on undirected graphs: 1. In both linear MPC and Congested Clique, we obtain an $O(k)$ stretch-spanner of a weighted graph of size $O(n^{1+1/k})$ in $O(1)$ rounds, for some parameter $k\ge 0$. For $k=O(\log{n})$, this leads to an $O(\log n)$ approximation of APSP in constant rounds in both models. 2. In sublinear MPC, we obtain an $O(k^{1+\varepsilon})$-stretch spanner of a weighted graph of size $O(n^{1+1/k})$ in $O(\log k)$ rounds, for any fixed constant $\varepsilon>0$. 3. In Congested Clique, we obtain $O(1)$-approximate APSP for weighted graphs in $O(\log \log \log n)$ rounds. 4. In near-linear MPC, we obtain $(1+\varepsilon)$-approximate single-source shortest paths and $O(1)$-approximate all-pairs shortest paths for unweighted graphs in $\textsf{poly}\log \log n$ rounds. Our algorithm only requires a single near-linear memory machine, where the rest can have sublinear memory. Our deterministic algorithms obtain similar guarantees to the state of the art randomized algorithms without incurring additional factors in the round complexity. To obtain these results, we inspect the randomized algorithms and isolate a randomized sampling routine. Then we derandomize these sampling routines by using a deterministic hitting set. Hereto, we develop a versatile deterministic hitting set algorithm, which we hope will have further derandomization applications.

Authors: Kyungjin Cho, Michal Dory, Yannic Maus, Tijn de Vos

In this paper, we provide the first deterministic algorithms with sublogarithmic round complexity for spanners and approximate shortest paths in various MPC models. Moreover, we significantly improve upon the state of the art in the deterministic Congested Clique. In particular, we obtain the following four results on undirected graphs: 1. In both linear MPC and Congested Clique, we obtain an $O(k)$ stretch-spanner of a weighted graph of size $O(n^{1+1/k})$ in $O(1)$ rounds, for some parameter $k\ge 0$. For $k=O(\log{n})$, this leads to an $O(\log n)$ approximation of APSP in constant rounds in both models. 2. In sublinear MPC, we obtain an $O(k^{1+\varepsilon})$-stretch spanner of a weighted graph of size $O(n^{1+1/k})$ in $O(\log k)$ rounds, for any fixed constant $\varepsilon>0$. 3. In Congested Clique, we obtain $O(1)$-approximate APSP for weighted graphs in $O(\log \log \log n)$ rounds. 4. In near-linear MPC, we obtain $(1+\varepsilon)$-approximate single-source shortest paths and $O(1)$-approximate all-pairs shortest paths for unweighted graphs in $\textsf{poly}\log \log n$ rounds. Our algorithm only requires a single near-linear memory machine, where the rest can have sublinear memory. Our deterministic algorithms obtain similar guarantees to the state of the art randomized algorithms without incurring additional factors in the round complexity. To obtain these results, we inspect the randomized algorithms and isolate a randomized sampling routine. Then we derandomize these sampling routines by using a deterministic hitting set. Hereto, we develop a versatile deterministic hitting set algorithm, which we hope will have further derandomization applications.

HRNN: A Hybrid Graph Index for Approximate Reverse k-Nearest Neighbor Search on High-Dimensional Vectors

from arXiv: Data Structures and Algorithms

Authors: Wenxuan Xia, Mingyu Yang, Wentao Li, Wei Wang

Reverse k-nearest neighbor (RkNN) search returns all data points that regard a query vector as one of their k-nearest neighbors (kNNs). Existing RkNN methods typically follow a filter-and-verification framework: vectors near the query vector are first collected as candidates and then verified against their kNN-radius (i.e., the distance to their k-th nearest neighbor). However, existing methods face two key limitations in high-dimensional spaces. First, nearby vectors often do not belong to the query's true RkNN set, resulting in excessive candidate expansion overhead. Second, existing methods compute kNN-radius online during verification, incurring substantial query-processing cost. To address these limitations, we propose HRNN, a hybrid graph index for approximate RkNN search. (1) Rather than directly treating nearby vectors as RkNN candidates, HRNN uses them as proxy points based on the assumption that a query's RkNN results can often be discovered through the RkNN results of its nearby vectors. (2) To reduce verification cost, HRNN materializes high-fidelity kNN-radius offline, eliminating expensive online reconstruction while preserving accuracy. HRNN combines a navigation graph, a ranked KNN graph, and reverse-neighbor lists into a hybrid index that supports efficient proxy retrieval, candidate generation, and kNN-radius access. We also develop efficient index construction and append-only maintenance algorithms. Extensive experiments show that HRNN consistently outperforms existing methods, achieving up to one order of magnitude higher throughput. Moreover, HRNN scales to datasets containing up to 10 million high-dimensional vectors while supporting efficient dynamic index maintenance.

Authors: Wenxuan Xia, Mingyu Yang, Wentao Li, Wei Wang

Reverse k-nearest neighbor (RkNN) search returns all data points that regard a query vector as one of their k-nearest neighbors (kNNs). Existing RkNN methods typically follow a filter-and-verification framework: vectors near the query vector are first collected as candidates and then verified against their kNN-radius (i.e., the distance to their k-th nearest neighbor). However, existing methods face two key limitations in high-dimensional spaces. First, nearby vectors often do not belong to the query's true RkNN set, resulting in excessive candidate expansion overhead. Second, existing methods compute kNN-radius online during verification, incurring substantial query-processing cost. To address these limitations, we propose HRNN, a hybrid graph index for approximate RkNN search. (1) Rather than directly treating nearby vectors as RkNN candidates, HRNN uses them as proxy points based on the assumption that a query's RkNN results can often be discovered through the RkNN results of its nearby vectors. (2) To reduce verification cost, HRNN materializes high-fidelity kNN-radius offline, eliminating expensive online reconstruction while preserving accuracy. HRNN combines a navigation graph, a ranked KNN graph, and reverse-neighbor lists into a hybrid index that supports efficient proxy retrieval, candidate generation, and kNN-radius access. We also develop efficient index construction and append-only maintenance algorithms. Extensive experiments show that HRNN consistently outperforms existing methods, achieving up to one order of magnitude higher throughput. Moreover, HRNN scales to datasets containing up to 10 million high-dimensional vectors while supporting efficient dynamic index maintenance.

Parallel Metric Skiplists and Nearest Neighbor Search

from arXiv: Data Structures and Algorithms

Authors: Xiangyun Ding, Rohin Garg, Yan Gu, Yihan Sun

The metric skip-list is a data structure designed for efficient nearest and $k$-nearest neighbor search in metric spaces. For many real-world datasets with reasonable distributions - specifically, those with a constant expansion rate - it supports $\tilde{O}(n)$ construction time and $O(k\log n)$ query time, where $n$ is the input size and $k$ is the number of nearest neighbors in queries. Notably, unlike alternative approaches, it does not require a bounded aspect ratio, making it more flexible for input data distributions. However, the inherently sequential nature of its original construction has, to our knowledge, precluded any existing parallel algorithm. In this paper, we present highly parallel and work-efficient algorithms for constructing metric skip lists. Under the assumption of a constant expansion rate, our approach achieves an expected work of $O(n \log n)$ and a polylogarithmic span with high probability. Our design is based on novel algorithmic insights that improves the sequential procedure, enabling a divide-and-conquer strategy that facilitates parallelism while maintaining efficiency. With our algorithms, we can also support improved bounds for relevant applications using nearest neighbor as building blocks, including bichromatic closest pair (BCP), density-based clustering, and $k$-NN graph construction, among others. To our knowledge, many of these results represent the first solutions to achieve both work efficiency and polylogarithmic span, relying solely on the assumption of a constant expansion rate.

Authors: Xiangyun Ding, Rohin Garg, Yan Gu, Yihan Sun

The metric skip-list is a data structure designed for efficient nearest and $k$-nearest neighbor search in metric spaces. For many real-world datasets with reasonable distributions - specifically, those with a constant expansion rate - it supports $\tilde{O}(n)$ construction time and $O(k\log n)$ query time, where $n$ is the input size and $k$ is the number of nearest neighbors in queries. Notably, unlike alternative approaches, it does not require a bounded aspect ratio, making it more flexible for input data distributions. However, the inherently sequential nature of its original construction has, to our knowledge, precluded any existing parallel algorithm. In this paper, we present highly parallel and work-efficient algorithms for constructing metric skip lists. Under the assumption of a constant expansion rate, our approach achieves an expected work of $O(n \log n)$ and a polylogarithmic span with high probability. Our design is based on novel algorithmic insights that improves the sequential procedure, enabling a divide-and-conquer strategy that facilitates parallelism while maintaining efficiency. With our algorithms, we can also support improved bounds for relevant applications using nearest neighbor as building blocks, including bichromatic closest pair (BCP), density-based clustering, and $k$-NN graph construction, among others. To our knowledge, many of these results represent the first solutions to achieve both work efficiency and polylogarithmic span, relying solely on the assumption of a constant expansion rate.

From Non-Convex to Strongly Convex: Curvature-Adaptive FTPL for Online Optimization

from arXiv: Data Structures and Algorithms

Authors: Moses Charikar, Chirag Pabbaraju, Ambuj Tewari

Curvature adaptivity is a classical theme in online optimization: for convex Lipschitz losses, adaptive methods interpolate between the optimal $O(\sqrt{T})$ regret for general convex losses and $O(\log T)$ regret under strong convexity. Recent work has shown that Follow-the-Perturbed-Leader (FTPL) achieves optimal $O(\sqrt{T})$ regret even for online non-convex Lipschitz losses, assuming access to an approximate offline-optimization oracle, but these guarantees do not exploit curvature. We show that FTPL can be made curvature-adaptive in the non-convex setting, without knowing in advance how curvature will accumulate over time. Our algorithm replaces the fixed perturbation scale of standard FTPL with a time-varying scale chosen using only past information. We give a simple follow-the-leader tuning rule for this scale and show that it competes, up to constants, with the best choice in hindsight. The resulting method achieves $O(\sqrt{T})$ regret for arbitrary non-convex Lipschitz losses and improves as cumulative curvature grows; with sufficiently accurate oracle calls, it achieves $O(\log T)$ regret when cumulative curvature grows linearly, which includes the classical strongly convex regime. We complement these upper bounds with matching lower bounds for prescribed cumulative-curvature sequences, already for one-dimensional convex losses, showing that the tradeoff between worst-case non-convex regret and curvature-driven fast rates is intrinsic.

Authors: Moses Charikar, Chirag Pabbaraju, Ambuj Tewari

Curvature adaptivity is a classical theme in online optimization: for convex Lipschitz losses, adaptive methods interpolate between the optimal $O(\sqrt{T})$ regret for general convex losses and $O(\log T)$ regret under strong convexity. Recent work has shown that Follow-the-Perturbed-Leader (FTPL) achieves optimal $O(\sqrt{T})$ regret even for online non-convex Lipschitz losses, assuming access to an approximate offline-optimization oracle, but these guarantees do not exploit curvature. We show that FTPL can be made curvature-adaptive in the non-convex setting, without knowing in advance how curvature will accumulate over time. Our algorithm replaces the fixed perturbation scale of standard FTPL with a time-varying scale chosen using only past information. We give a simple follow-the-leader tuning rule for this scale and show that it competes, up to constants, with the best choice in hindsight. The resulting method achieves $O(\sqrt{T})$ regret for arbitrary non-convex Lipschitz losses and improves as cumulative curvature grows; with sufficiently accurate oracle calls, it achieves $O(\log T)$ regret when cumulative curvature grows linearly, which includes the classical strongly convex regime. We complement these upper bounds with matching lower bounds for prescribed cumulative-curvature sequences, already for one-dimensional convex losses, showing that the tradeoff between worst-case non-convex regret and curvature-driven fast rates is intrinsic.

On the gap of quiver representations

from arXiv: Data Structures and Algorithms

Authors: John Maar

The nullcone membership problem, deciding whether an orbit closure contains the origin, is fundamental in computational invariant theory. For self-adjoint groups, Bürgisser, Franks, Garg, Oliveira, Walter and Wigderson gave a geodesic optimization algorithm whose complexity is controlled by the gap, a condition number of the representation. We study the gap for quiver representations under the action of the special linear group. We prove that the inverse gap is polynomially bounded in the number of vertices and the maximum dimension for type A and $\hat{A}$, as well as tree quivers with uniform dimension vectors. Consequently, the algorithm of Bürgisser et al. solves the nullcone membership problem in polynomial time for these families. In contrast, we construct families of quivers and dimension vectors where the gap is exponentially small in the number of leaves, furthermore, for every connected quiver we exhibit dimension vectors such that the weight margin (a related condition number) is exponentially small in the number of vertices. We also extend our results to $σ$-semistability, thereby giving a new proof of a recent result of Iwamasa, Oki, and Soma.

Authors: John Maar

The nullcone membership problem, deciding whether an orbit closure contains the origin, is fundamental in computational invariant theory. For self-adjoint groups, Bürgisser, Franks, Garg, Oliveira, Walter and Wigderson gave a geodesic optimization algorithm whose complexity is controlled by the gap, a condition number of the representation. We study the gap for quiver representations under the action of the special linear group. We prove that the inverse gap is polynomially bounded in the number of vertices and the maximum dimension for type A and $\hat{A}$, as well as tree quivers with uniform dimension vectors. Consequently, the algorithm of Bürgisser et al. solves the nullcone membership problem in polynomial time for these families. In contrast, we construct families of quivers and dimension vectors where the gap is exponentially small in the number of leaves, furthermore, for every connected quiver we exhibit dimension vectors such that the weight margin (a related condition number) is exponentially small in the number of vertices. We also extend our results to $σ$-semistability, thereby giving a new proof of a recent result of Iwamasa, Oki, and Soma.

Online K-d tree for approximate neighborhood search in data streams

from arXiv: Data Structures and Algorithms

Authors: Eduardo V. L. Barboza, Robert Sabourin, Rafael M. O. Cruz

The k-Nearest Neighbors (kNN) algorithm has long been widely used in Machine Learning (ML) applications. However, the main concern when using it is the computational cost required for neighborhood search, which can make it unfeasible for large-scale applications. Optimization algorithms, such as the K-d tree, become an option in such scenarios. Under data streams, it can be challenging to maintain the properties of the K-d tree, as it requires inserting and deleting nodes on the fly. These operations can make maintaining the tree's balance and invariants difficult. Additionally, traditional K-d trees were initially designed for Minkowski-based distance functions. In this work, we describe an Online K-d tree and its adaptation to the Canberra distance that supports dynamic updates over data streams while preserving the structural invariants required for efficient traversal. Experimental analysis demonstrates that the Online K-d tree algorithm achieves faster processing time under data streams, and that adapting to the Canberra distance enabled effective subtree pruning, as evidenced by a minor loss in average accuracy and a substantial gain in instances processed per second. Our implementation can be found in our GitHub repository

Authors: Eduardo V. L. Barboza, Robert Sabourin, Rafael M. O. Cruz

The k-Nearest Neighbors (kNN) algorithm has long been widely used in Machine Learning (ML) applications. However, the main concern when using it is the computational cost required for neighborhood search, which can make it unfeasible for large-scale applications. Optimization algorithms, such as the K-d tree, become an option in such scenarios. Under data streams, it can be challenging to maintain the properties of the K-d tree, as it requires inserting and deleting nodes on the fly. These operations can make maintaining the tree's balance and invariants difficult. Additionally, traditional K-d trees were initially designed for Minkowski-based distance functions. In this work, we describe an Online K-d tree and its adaptation to the Canberra distance that supports dynamic updates over data streams while preserving the structural invariants required for efficient traversal. Experimental analysis demonstrates that the Online K-d tree algorithm achieves faster processing time under data streams, and that adapting to the Canberra distance enabled effective subtree pruning, as evidenced by a minor loss in average accuracy and a substantial gain in instances processed per second. Our implementation can be found in our GitHub repository

Tuesday, June 02

On hope

from Scott Aaronson

The comments on my previous post, on recent AI breakthroughs in solving Erdös problems and beyond, must’ve set some sort of record for the number of separate reasons commenters offered me to despair about the future of humanity. All this in a post that I saw as relatively nerdy and anodyne, goring few oxen, when […]

The comments on my previous post, on recent AI breakthroughs in solving Erdös problems and beyond, must’ve set some sort of record for the number of separate reasons commenters offered me to despair about the future of humanity. All this in a post that I saw as relatively nerdy and anodyne, goring few oxen, when I clicked “Publish”!

According to some persistent commenters, the only reason why I wrote about recent AI-enabled math breakthroughs is that I’m a shameless shill for the AI companies, my loud public criticisms of those companies being nothing more than a cynical smokescreen. Except that I’m also a laughable dupe of the AI companies.

See, AI, despite all appearances to the contrary, has not solved the Erdös unit distance problem or any other important math problems at all. It’s merely produced vast amounts of garbage via brute-force search, and then human mathematicians, sifting through the digital garbage pile, found some things they could call “proofs.” Except also, those human mathematicians aren’t even real mathematicians! They’re merely Hungarian combinatorialists, the kind obsessed with trivial, uninteresting Erdös problems, which it stands to reason that AI can now solve. AI will never touch the truly deep, creative parts of math, epitomized by Grothendieck-style algebraic geometry.

(When I relayed this to a world-leading algebraic geometer of my acquaintance, he laughed and said that everyone has to tell themselves whatever it takes to cope with the situation. He himself has been using LLMs in his research, and while they can’t yet write his papers for him, he expects them to improve very rapidly.)

When pressed, my commenters made it explicit: Timothy Gowers, the Fields Medalist who told his fellow mathematicians that he hopes they’re sitting down before he broke the news about the Erdös distance problem, is not a real mathematician, just a combinatorial puzzle-solver. Paul Erdös himself was not a real mathematician either.

Oh, and also, I’m a genocidal Judeofascist Zionist. That entered the comments as well, with the pretext being that I had shared a GPT-written story about ancient Israelites.

(Note: For every comment that I allowed to appear in the thread, assume as usual that there are many worse ones that I didn’t.)

Does anyone wonder why I blog so much less than I used to? Seeing what humanity has to offer, as reflected in my comment section, I feel like maybe we should take our chances with our future AI overlords. Except that some of my comments are—ironically, given their content—likely to be AI-generated as well.

These days friend after friend of mine, colleague after colleague, acquaintance after acquaintance is becoming a multimillionaire or even a billionaire from startup equity. Meanwhile, I’ve scrupulously abstained from all of that.

Why? Well, probably the single biggest reason has been Shtetl-Optimized. I’ve zealously sought to protect my “neutrality” and “objectivity” as a commentator, on this free (and even ad-free) public forum—the one where I try every week to reason with anonymous trolls with “proton.me” email addresses who show up to call me a hack and a shill and a baby-killer and a dunce. Ironically, the actual billionaires who I know hardly ever get called those sorts of names, mostly because they don’t offer the world a huge attack surface like I do. Or if they occasionally do get called them, they don’t care.

On reflection, all the commenters calling me a dunce have a point. When one looks at how I chose to spend my life, versus how all these billionaires did, I am kind of a moron.

And yet I titled this post “On Hope.” In a situation like the present, one needs to find hope wherever I can. And right now, I’m choosing to find it in this open letter, which has been signed by over 1,250 professors at the University of California. Let me quote the beginning of it:

To the UC Regents, UCOP, Academic Senate leadership, and the people of California:

We write as University of California mathematics faculty, joined by faculty from other STEM disciplines. UC has long served students from every background and has been a powerful engine of social mobility for the people of California. That public trust must be protected for future generations. Today, UC’s mission is at risk. To preserve that mission:

We call for the reinstatement of the SAT/ACT mathematics requirement for applicants to STEM majors beginning with the 2027 admissions cycle, alongside STEM faculty oversight of readiness standards and admissions practices affecting those majors.

Over the past five years, we have seen a widening divergence in mathematical preparation levels within the same classroom. This trend indicates that current admissions practices do not provide a sufficiently reliable check on mathematical readiness for STEM majors. The UC San Diego Senate–Administration Workgroup on Admissions report documents this crisis in stark terms: in the last five years, the number of students whose mathematics skills fall below high school level increased nearly thirtyfold; moreover, 70% of those students fall below middle school levels, reaching roughly one in twelve members of the entering cohort. These findings are corroborated by data across our campuses…

Everywhere one looks right now, and on every part of the political spectrum, doofuses and blankfaces strut across the earth triumphantly. Yet there remain pockets of sanity. What reading this open letter told me is that the University of California STEM faculty is one of them.

With enough such pockets, I could live a perfectly reasonable rest of my life, from now until my natural death (or until AI changes all our lives beyond recognition), regardless of who shows up in 3 … 2 … 1 … with a “proton.me” email address to confidently tell me otherwise.

By Scott

TR26-090 | Multilinear Formula Lower Bounds for Sparse Determinants | Pruthvi Boyapati, Suryajith Chillara, Pratyush Vempati

from ECCC Papers

Raz (2009) proved that multilinear formulas computing the determinant of a generic $n \times n$ matrix require size $n^{\Omega(\log n)}$. A fundamental question in understanding this lower bound is identifying which structural properties of the determinant drive this hardness. In pursuit of this question, we prove the existence of $n \times n$ symbolic matrices with only $\Theta(n\log^6 n)$ nonzero entries—reducing the variable count by a factor of $n/\log^6 n$—such that any multilinear formula computing their determinants still requires size $n^{\Omega(\log n)}$. Our construction uses rectangle sampling from the complete bipartite graph to generate sparse matrices that simultaneously maintain perfect matchings (ensuring nonzero determinant) while exhibiting diagonal imbalance under random vertex permutations—a geometric property we identify as the key driver of factor imbalance in Raz's framework. This demonstrates that Raz's partial derivatives method is remarkably robust to sparsification, and suggests that the fundamental source of multilinear hardness for determinant lies in expansion-like combinatorial structure rather than density. Our techniques combine concentration inequalities for dependent random variables with insights from random graph theory.
Raz (2009) proved that multilinear formulas computing the determinant of a generic $n \times n$ matrix require size $n^{\Omega(\log n)}$. A fundamental question in understanding this lower bound is identifying which structural properties of the determinant drive this hardness. In pursuit of this question, we prove the existence of $n \times n$ symbolic matrices with only $\Theta(n\log^6 n)$ nonzero entries—reducing the variable count by a factor of $n/\log^6 n$—such that any multilinear formula computing their determinants still requires size $n^{\Omega(\log n)}$. Our construction uses rectangle sampling from the complete bipartite graph to generate sparse matrices that simultaneously maintain perfect matchings (ensuring nonzero determinant) while exhibiting diagonal imbalance under random vertex permutations—a geometric property we identify as the key driver of factor imbalance in Raz's framework. This demonstrates that Raz's partial derivatives method is remarkably robust to sparsification, and suggests that the fundamental source of multilinear hardness for determinant lies in expansion-like combinatorial structure rather than density. Our techniques combine concentration inequalities for dependent random variables with insights from random graph theory.

Mesmerized by My Own Beat

from Ben Recht

On the history of anticoagulation and the role of randomized evidence in cardiology.

You might have noticed a pattern in last week’s blog history of the megatrials in cardiology. All the breakthrough trials I covered were essentially drug trials of anticoagulant interventions like aspirin, streptokinase, and heparin. This, of course, wasn’t just a coincidence.

Though randomized trials are often pitched as the ultimate arbiters of causality, you need to have a good reason to do one. You can’t (and don’t) just propose a random trial with a control group. Not only must there be considerable uncertainty about the outcome to get a trial approved, but there must also be some reason to do the trial in the first place. There has to be some intuition that leads some investigator to propose some intervention, and they have to have enough faith in that intervention to devote massive resources to proving it’s efficacious. Though randomized trials sit far above expert opinion in the hierarchy of clinical evidence, expert opinion necessarily must still drive the operation.

So what was the core hypothesis driving the series of anti-coagulant megatrials? Whether clots were a primary cause of heart attacks was a hot debate in cardiology well into the 1980s. The prominent theory for much of the 20th century posited that it was arterial hardening and valve narrowing that caused heart attacks. In this model, blood clots were caused by the heart attack and not the other way around. This felt consistent with the evidence, as most of those who died from heart attacks didn’t have blood clots in their autopsies.

It wasn’t until 1980 that a paradigm-shifting study showed that blocked arteries were overwhelmingly common in the early stages of heart attacks. The investigators corroborated earlier findings that the clots were less prevalent the following day. This suggested that some biological mechanism was naturally breaking down the clots during heart attacks, and suggested a mode of attack for treatment. Cardiologists spent a decade finding new and creative ways to break down blood clots in the heart and intervene as early as possible, leading to marked improvements in survival.

The large pragmatic drug trials were important for building practice. They did find a fairly robust suite of therapies for treating heart attacks. Could an individual cardiologist have determined a better regimen? Possibly. But the megatrial infrastructure of cardiology trials proved that, for the right clinical hypothesis, professional societies of trialists could maximize population outcomes to achieve a reasonable standard of care with 3-6 times lower mortality than the prior one.

Now, I also mentioned a trial that wasn’t about coagulation: CAST. As a reminder, the CAST trial studied the value of drugs that quelled irregular heartbeats in heart attack patients. The trial found that these therapies led to a major increase in mortality, and it ended the practice. But why were doctors convinced that this was a good idea in the first place?

I suppose it might sound reasonable to make a sick heart look “healthier” by messing with its rhythm. But what’s the model behind why that would be a good idea? I’m sure there is some grand rounds somewhere that lays out the full hypothesis, but I couldn’t find one, and the literature associated with the CAST trial doesn’t provide it. Instead, the trial report cites a bunch of correlational studies showing that arrhythmia is correlated with mortality in heart attack patients. The main driving hypothesis seems to be this statistical correlation. Expert opinion on anti-arrhythmia drugs was decidedly mixed before CAST, and that’s part of the reason the trial went forward.

It’s worth dwelling on one remarkable feature of the CAST trial. The investigators were allowed to compare against a placebo. Without the placebo arm, the bad practice of treating arrhythmias would not have been ended. It remains a heated debate in medical ethics as to when it’s acceptable to compare an intervention to a placebo rather than to the standard of care. There is a strong case for placebos. A cascade of RCTs has a great deal of path dependence. Every trial slightly adjusts the standard of care along a single axis. This is what optimizers call random search, and there is nothing to prevent it from ending up in a disadvantageous local minimum. Thus, trials against the standard of care might keep you in an iatrogenic region of medical policy. It is quite possible that the standard of care needs to be completely reimagined to achieve the next major therapeutic improvement. As evidence, this was exactly what was needed to force cardiology to shift to anticoagulant therapy. CAST remains a classic reminder that it is not just ethical but often imperative to test against placebos.

Though often held up as a poster child for the megatrial, the full history of cardiology looks an awful lot like the rest of science and engineering. There isn’t a single magic bullet that automatically led to improved therapies. There was a mixture of expert opinion and institutional debate. Small studies inspired huge randomized trials. The megatrials were guided by biological plausibility and mechanistic intuition. The RCT was clearly an important part of improving practice, but it was part of a complex curation of expertise, rather than a cut-and-dried gold standard.

However, some people stubbornly maintain that RCTs are the only pure way to get to the truth of what works. This belief led to one of the weirder moves in medical history that’s still with us today: The rise of the postmodern thinking of evidence-based medicine. This is the topic of my next post.

Subscribe now

By Ben Recht

Structure-Informed Multiple Sequence Alignment: A Formal Model and Hardness Results

from arXiv: Computational Complexity

Authors: Yoshiki Kanazawa, Naphan Benchasattabuse, Michal Hajdušek, Rodney Van Meter

We formulate a structure-informed multiple sequence alignment problem, denoted MSA-S. The model abstracts biological sequences as strings and structural information as designated position-pairs. It augments a fixed pairwise string score, defined by a fixed non-gap symbol-pair scoring rule and fixed affine gap penalties, with a binary overlap score on designated position-pairs, which can be interpreted as a contact-map overlap score in structural applications. This yields a fixed-score, integer-valued optimization model suitable for complexity-theoretic analysis. Under this formulation, we show that the decision problem MSA-S-DEC is NP-complete for a broad class of fixed pairwise string scoring schemes. We also show that NP-hardness persists even under the restriction that every designated position-pair set is nonempty and the pair-overlap threshold is strictly positive. For the associated scalarized optimization problem MSA-S-OPT(lambda) with any fixed rational constant lambda >= 1, we further show that, under the canonical unit scheme for the non-gap symbol-pair scoring rule, MSA-S-OPT(lambda) admits no polynomial-time approximation scheme (PTAS) even for two input strings (k = 2), unless P = NP. These results establish a formal complexity-theoretic baseline for structure-informed multiple sequence alignment.

Authors: Yoshiki Kanazawa, Naphan Benchasattabuse, Michal Hajdušek, Rodney Van Meter

We formulate a structure-informed multiple sequence alignment problem, denoted MSA-S. The model abstracts biological sequences as strings and structural information as designated position-pairs. It augments a fixed pairwise string score, defined by a fixed non-gap symbol-pair scoring rule and fixed affine gap penalties, with a binary overlap score on designated position-pairs, which can be interpreted as a contact-map overlap score in structural applications. This yields a fixed-score, integer-valued optimization model suitable for complexity-theoretic analysis. Under this formulation, we show that the decision problem MSA-S-DEC is NP-complete for a broad class of fixed pairwise string scoring schemes. We also show that NP-hardness persists even under the restriction that every designated position-pair set is nonempty and the pair-overlap threshold is strictly positive. For the associated scalarized optimization problem MSA-S-OPT(lambda) with any fixed rational constant lambda >= 1, we further show that, under the canonical unit scheme for the non-gap symbol-pair scoring rule, MSA-S-OPT(lambda) admits no polynomial-time approximation scheme (PTAS) even for two input strings (k = 2), unless P = NP. These results establish a formal complexity-theoretic baseline for structure-informed multiple sequence alignment.

From Time to Space: The Impact of Linearity in Higher-Order Datalog

from arXiv: Computational Complexity

Authors: Angelos Charalambidis, Babis Kostopoulos, Panos Rondogiannis

We consider a fragment of Higher-Order Datalog with negation and argue that it generalizes the familiar and important fragment of Linear Datalog. We investigate the expressive power of this fragment, establishing a tight connection with the hierarchy of space complexity classes. In particular, we demonstrate that for all $k \ge 1$, the $(k+1)$-order fragment of Stratified Linear Higher-Order Datalog$^\neg$ captures $(k-1)$-EXPSPACE. This result suggests that restricting programs to linear recursion shifts the expressive power of the corresponding fragments from time to space, generalizing the classical result that (Stratified) Linear Datalog captures NL. Unlike the first-order setting where an ordering assumption is required to capture NL, our results hold without any such assumption on the input database. The proof relies on simulating space-bounded Turing machines using Stratified Linear Higher-Order Datalog$^\neg$ programs and providing a space-efficient evaluation of the query program. We argue that identifying such computationally well-behaved fragments is a crucial step towards paving the way for practical implementations of Higher-Order Datalog.

Authors: Angelos Charalambidis, Babis Kostopoulos, Panos Rondogiannis

We consider a fragment of Higher-Order Datalog with negation and argue that it generalizes the familiar and important fragment of Linear Datalog. We investigate the expressive power of this fragment, establishing a tight connection with the hierarchy of space complexity classes. In particular, we demonstrate that for all $k \ge 1$, the $(k+1)$-order fragment of Stratified Linear Higher-Order Datalog$^\neg$ captures $(k-1)$-EXPSPACE. This result suggests that restricting programs to linear recursion shifts the expressive power of the corresponding fragments from time to space, generalizing the classical result that (Stratified) Linear Datalog captures NL. Unlike the first-order setting where an ordering assumption is required to capture NL, our results hold without any such assumption on the input database. The proof relies on simulating space-bounded Turing machines using Stratified Linear Higher-Order Datalog$^\neg$ programs and providing a space-efficient evaluation of the query program. We argue that identifying such computationally well-behaved fragments is a crucial step towards paving the way for practical implementations of Higher-Order Datalog.

Attention Dynamics and Adaptive Decision Support in C5ISR: A Recurrence Quantification Analysis of Visual and Multimodal Attention Guidance Effects on Mission Performance

from arXiv: Computational Complexity

Authors: Hyun-Gee Jei, Caleb J. Armstrong, Farzan Sasangohar

Modern command, control, communications, computers, cyber, intelligence, surveillance, and reconnaissance (C5ISR) environments place substantial attentional demands on mission commanders. Failures in attention allocation in these high-risk settings can have severe operational consequences. This study investigates the efficacy of gaze-driven, attention-guided adaptive decision support tools, including visual-only and multimodal designs, in a high-fidelity simulated military command center. To characterize gaze and attentional dynamics during interaction with these tools, recurrence quantification analysis was applied to eye-tracking data. Stepwise regression using the Bayesian information criterion was then used to identify recurrence-based gaze metrics associated with performance. Results showed that the multimodal adaptive decision support tool was associated with significantly higher performance than the visual-only attention-guided tool. Average diagonal line length showed a negative linear association with performance, whereas entropy showed a positive linear association. Recurrence rate, determinism, and entropy also showed nonlinear quadratic relationships with performance. In particular, recurrence rate and determinism followed an inverted-U pattern consistent with the Yerkes-Dodson law. These findings suggest that effective performance in dynamic C5ISR contexts depends on a balance between structured and flexible visual scanning, and that recurrence-based gaze metrics can help characterize attentional dynamics during interaction with adaptive decision support systems.

Authors: Hyun-Gee Jei, Caleb J. Armstrong, Farzan Sasangohar

Modern command, control, communications, computers, cyber, intelligence, surveillance, and reconnaissance (C5ISR) environments place substantial attentional demands on mission commanders. Failures in attention allocation in these high-risk settings can have severe operational consequences. This study investigates the efficacy of gaze-driven, attention-guided adaptive decision support tools, including visual-only and multimodal designs, in a high-fidelity simulated military command center. To characterize gaze and attentional dynamics during interaction with these tools, recurrence quantification analysis was applied to eye-tracking data. Stepwise regression using the Bayesian information criterion was then used to identify recurrence-based gaze metrics associated with performance. Results showed that the multimodal adaptive decision support tool was associated with significantly higher performance than the visual-only attention-guided tool. Average diagonal line length showed a negative linear association with performance, whereas entropy showed a positive linear association. Recurrence rate, determinism, and entropy also showed nonlinear quadratic relationships with performance. In particular, recurrence rate and determinism followed an inverted-U pattern consistent with the Yerkes-Dodson law. These findings suggest that effective performance in dynamic C5ISR contexts depends on a balance between structured and flexible visual scanning, and that recurrence-based gaze metrics can help characterize attentional dynamics during interaction with adaptive decision support systems.

Recursive Jump Operators and Optimal Proof Systems

from arXiv: Computational Complexity

Authors: Fabian Egidy

We study the relationship between the existence of optimal proof systems and recursive jump operators, two central open problems in proof complexity. For a set L, an optimal proof system is a strongest proof system in terms of proof length, whereas a recursive jump operator uniformly transforms any proof system for L into a stronger one with respect to proof length, thereby witnessing non-optimality. It is clear that the existence of a recursive jump operator for L rules out optimal proof systems for L. Khaniki (FOCS 2024) is interested in the converse of this implication and explicitly poses the following question, where TAUT denotes the set of propositional tautologies. Q: Does the non-existence of optimal proof systems for TAUT imply the existence of recursive jump operators for TAUT? We generalize and address this question from both a relativized and an unrelativized perspective. We show that proving a positive answer for Q is provably hard by constructing the following oracle. O: The polynomial-time hierarchy is infinite, TAUT has no optimal proof systems, and TAUT has no recursive jump operators. This shows that Khaniki's question can not be answered in the positive by relativizable means, even under the standard complexity-theoretic assumption that the polynomial-time hierarchy is infinite. In contrast, we obtain positive results when the question Q is posed for sets different from TAUT. We prove that the existence of recursive jump operators is upward closed under $\leq_{\text{m}}^{\text{p}}$-reducibility, a result that so far was only known for the non-existence of optimal proof systems. Furthermore, we show that the sets known to have no optimal proof systems by Messner (STACS 1999) in fact admit recursive jump operators. Thus, essentially all sets currently known to have no optimal proof systems have recursive jump operators.

Authors: Fabian Egidy

We study the relationship between the existence of optimal proof systems and recursive jump operators, two central open problems in proof complexity. For a set L, an optimal proof system is a strongest proof system in terms of proof length, whereas a recursive jump operator uniformly transforms any proof system for L into a stronger one with respect to proof length, thereby witnessing non-optimality. It is clear that the existence of a recursive jump operator for L rules out optimal proof systems for L. Khaniki (FOCS 2024) is interested in the converse of this implication and explicitly poses the following question, where TAUT denotes the set of propositional tautologies. Q: Does the non-existence of optimal proof systems for TAUT imply the existence of recursive jump operators for TAUT? We generalize and address this question from both a relativized and an unrelativized perspective. We show that proving a positive answer for Q is provably hard by constructing the following oracle. O: The polynomial-time hierarchy is infinite, TAUT has no optimal proof systems, and TAUT has no recursive jump operators. This shows that Khaniki's question can not be answered in the positive by relativizable means, even under the standard complexity-theoretic assumption that the polynomial-time hierarchy is infinite. In contrast, we obtain positive results when the question Q is posed for sets different from TAUT. We prove that the existence of recursive jump operators is upward closed under $\leq_{\text{m}}^{\text{p}}$-reducibility, a result that so far was only known for the non-existence of optimal proof systems. Furthermore, we show that the sets known to have no optimal proof systems by Messner (STACS 1999) in fact admit recursive jump operators. Thus, essentially all sets currently known to have no optimal proof systems have recursive jump operators.

On the Complexity of Recurrence Evaluation

from arXiv: Computational Complexity

Authors: Artem Parfenov, Michael Vyalyi

In this paper, we study the complexity of the recurrence evaluation problem. We are interested in finitely valued recurrent functions. We present two results in this direction. First, we study the recurrence problem for sequences, assuming that a recurrence relation is defined by a fixed function, while the offsets are part of the input. Depending on the form of presentation (whether the offsets are given in unary or in binary), the problem is PSPACE-complete or EXP-complete. Second, we study recurrences defined by the NAND function. They are related to impartial games. We prove PP-hardness of the recurrence evaluation problem for a very simple 3-dimensional game, in which the offset vectors are coordinate vectors (1,0,0), (0,1,0) and (0,0,1) but the boundary conditions are arbitrary. In other words, we consider generalized winning conditions for the game extending the normal and the misère winning conditions.

Authors: Artem Parfenov, Michael Vyalyi

In this paper, we study the complexity of the recurrence evaluation problem. We are interested in finitely valued recurrent functions. We present two results in this direction. First, we study the recurrence problem for sequences, assuming that a recurrence relation is defined by a fixed function, while the offsets are part of the input. Depending on the form of presentation (whether the offsets are given in unary or in binary), the problem is PSPACE-complete or EXP-complete. Second, we study recurrences defined by the NAND function. They are related to impartial games. We prove PP-hardness of the recurrence evaluation problem for a very simple 3-dimensional game, in which the offset vectors are coordinate vectors (1,0,0), (0,1,0) and (0,0,1) but the boundary conditions are arbitrary. In other words, we consider generalized winning conditions for the game extending the normal and the misère winning conditions.

Hardness of Approximate Hylland-Zeckhauser Equilibria

from arXiv: Computational Complexity

Authors: Mark Braverman, Jingyi Liu, Eric Xue, Chenghan Zhou

In this paper, we investigate the computational hardness of finding fractional allocations to unit-demand players using competitive equilibria from equal incomes (CEEI), where we allow a small constant error in players' response to market prices (also known as an approximate Hylland-Zeckhauser equilibrium). We show that assuming the $\mathbf{(\varepsilon,δ)}$-Generalized Circuits problem is PPAD-hard (the "PCP for PPAD" conjecture), finding an approximate HZ equilibrium is also PPAD-hard. This result provides additional motivation for trying to prove the PCP for PPAD conjecture as a tool for obtaining robust computational hardness results about markets. Further, we introduce a natural restriction on approximate HZ equilibria, where players' bundles may still only be approximately optimal given the prices, but may not contain positive-price items for which the player has zero utility. We show unconditionally that there exists a constant $ε$ such that finding a restricted $ε$-HZ equilibrium is PPAD-hard.

Authors: Mark Braverman, Jingyi Liu, Eric Xue, Chenghan Zhou

In this paper, we investigate the computational hardness of finding fractional allocations to unit-demand players using competitive equilibria from equal incomes (CEEI), where we allow a small constant error in players' response to market prices (also known as an approximate Hylland-Zeckhauser equilibrium). We show that assuming the $\mathbf{(\varepsilon,δ)}$-Generalized Circuits problem is PPAD-hard (the "PCP for PPAD" conjecture), finding an approximate HZ equilibrium is also PPAD-hard. This result provides additional motivation for trying to prove the PCP for PPAD conjecture as a tool for obtaining robust computational hardness results about markets. Further, we introduce a natural restriction on approximate HZ equilibria, where players' bundles may still only be approximately optimal given the prices, but may not contain positive-price items for which the player has zero utility. We show unconditionally that there exists a constant $ε$ such that finding a restricted $ε$-HZ equilibrium is PPAD-hard.

Grid Programs: A Two-Dimensional, Variable-Free Model of Computation

from arXiv: Computational Complexity

Authors: Ezequiel López-Rubio

We introduce Grid Programs, a novel model of computation in which programs are finite two-dimensional arrangements of instructions on an integer grid rather than linear sequences of statements. Three properties distinguish this model fundamentally from classical frameworks: (i) programs are planar structures through which an instruction pointer moves in the four cardinal directions; (ii) there are no syntax constraints, so any assignment of instructions to grid cells constitutes a valid program; and (iii) the model uses no named variables or explicit memory addresses. Program state is maintained through a data stack, an address stack, and a circularly doubly linked list accessed via three named pointers. Control flow is achieved spatially, with branching encoded as perpendicular turns of the instruction pointer. The address stack stores triplets (cell row, cell column, direction), enabling precise restoration of both position and heading after branches, loops, and function calls. We give a formal operational semantics, present a representative instruction set covering arithmetic, control flow, and linked-list manipulation, and work through several detailed examples, including an absolute-value function, a factorial computation, a linear-search algorithm, a string-reversal program, and a while-loop summation. We establish that Grid Programs are Turing-complete by simulating an arbitrary register machine, and we discuss their relationship to prior two-dimensional languages such as Befunge and Funge-98, to stack-based languages such as Forth and PostScript, and to dataflow and spatial computation models. Grid Programs offer a fresh vantage point for exploring the design space of computation, with potential applications in visual programming environments, cellular-automaton-inspired hardware, and obfuscation-resistant code.

Authors: Ezequiel López-Rubio

We introduce Grid Programs, a novel model of computation in which programs are finite two-dimensional arrangements of instructions on an integer grid rather than linear sequences of statements. Three properties distinguish this model fundamentally from classical frameworks: (i) programs are planar structures through which an instruction pointer moves in the four cardinal directions; (ii) there are no syntax constraints, so any assignment of instructions to grid cells constitutes a valid program; and (iii) the model uses no named variables or explicit memory addresses. Program state is maintained through a data stack, an address stack, and a circularly doubly linked list accessed via three named pointers. Control flow is achieved spatially, with branching encoded as perpendicular turns of the instruction pointer. The address stack stores triplets (cell row, cell column, direction), enabling precise restoration of both position and heading after branches, loops, and function calls. We give a formal operational semantics, present a representative instruction set covering arithmetic, control flow, and linked-list manipulation, and work through several detailed examples, including an absolute-value function, a factorial computation, a linear-search algorithm, a string-reversal program, and a while-loop summation. We establish that Grid Programs are Turing-complete by simulating an arbitrary register machine, and we discuss their relationship to prior two-dimensional languages such as Befunge and Funge-98, to stack-based languages such as Forth and PostScript, and to dataflow and spatial computation models. Grid Programs offer a fresh vantage point for exploring the design space of computation, with potential applications in visual programming environments, cellular-automaton-inspired hardware, and obfuscation-resistant code.

Verifying global identifiability of parametric linear ODE models is NP-hard

from arXiv: Computational Complexity

Authors: Alexey Ovchinnikov, Pedro Soto

Global parameter identifiability is a property of a parametric ODE model to recover the parameter values uniquely from the input-output data. Not all parametric ODE models have this property, and checking for parameter identifiability is a prerequisite to perform numerical parameter estimation. There are many algorithms and software packages for global parameter identifiability, and frequently the runtime is large. However, the computational complexity for this problem has not been analyzed yet, though there are complexity results for local (finitely many values fit the data) parameter identifiability. In this paper, we estimate the complexity of checking global parameter identifiability over real fields for ODE models that depend linearly on the state variables and rationally on the parameters. In particular, we prove that it is equivalent to the injectivity problem.

Authors: Alexey Ovchinnikov, Pedro Soto

Global parameter identifiability is a property of a parametric ODE model to recover the parameter values uniquely from the input-output data. Not all parametric ODE models have this property, and checking for parameter identifiability is a prerequisite to perform numerical parameter estimation. There are many algorithms and software packages for global parameter identifiability, and frequently the runtime is large. However, the computational complexity for this problem has not been analyzed yet, though there are complexity results for local (finitely many values fit the data) parameter identifiability. In this paper, we estimate the complexity of checking global parameter identifiability over real fields for ODE models that depend linearly on the state variables and rationally on the parameters. In particular, we prove that it is equivalent to the injectivity problem.

On Fréchet Traveling Salesmen Problems

from arXiv: Computational Geometry

Authors: Omrit Filtser, Tzalik Maimon, Michal Moiseev

The Fréchet distance is a well-studied distance measure between two curves. In this work, we demonstrate that the merit of Fréchet distance extends beyond evaluating similarity, and introduce a new setting in which it proves useful. Consider a situation where two agents are required to visit a given set of sites, while staying close to each other throughout their traversal. In this paper, we study problems where the goal is to construct two curves whose vertices are from a given set of points, under the constraint that the Fréchet distance between the curves is kept as small as possible. This problem can be viewed as a variant of the Traveling Salesman Problem (TSP), and thus may be of interest in routing, network planning and more. We present a near-linear algorithm for this problem under the discrete Fréchet distance, and explore several variants of the problem, including minimizing the lengths of the curves and balancing the number of sites assigned to each agent. Lastly, we prove that the problem is NP-hard under the continuous Fréchet Distance.

Authors: Omrit Filtser, Tzalik Maimon, Michal Moiseev

The Fréchet distance is a well-studied distance measure between two curves. In this work, we demonstrate that the merit of Fréchet distance extends beyond evaluating similarity, and introduce a new setting in which it proves useful. Consider a situation where two agents are required to visit a given set of sites, while staying close to each other throughout their traversal. In this paper, we study problems where the goal is to construct two curves whose vertices are from a given set of points, under the constraint that the Fréchet distance between the curves is kept as small as possible. This problem can be viewed as a variant of the Traveling Salesman Problem (TSP), and thus may be of interest in routing, network planning and more. We present a near-linear algorithm for this problem under the discrete Fréchet distance, and explore several variants of the problem, including minimizing the lengths of the curves and balancing the number of sites assigned to each agent. Lastly, we prove that the problem is NP-hard under the continuous Fréchet Distance.

Subgrid Marching Tetrahedra

from arXiv: Computational Geometry

Authors: Hossein Baktash, Mark Gillespie, Keenan Crane

We describe a method for recovering a manifold, intersection-free triangle mesh from the points where edges of a tetrahedral grid pierce a continuous surface. Unlike classic marching cubes or tets, our subgrid marching scheme allows arbitrarily many surface patches within a single cell, capturing fine features and thin sheets. Moreover, it requires neither a well-defined inside/outside (allowing surfaces with boundary), nor consistently-oriented input geometry. Yet we retain the local, parallel nature of classic marching: reconstruction is performed independently per tet, yielding a conforming mesh across tet boundaries. Our key innovation is a generalization of normal coordinates from geometric topology, which encode surface connectivity via arbitrary integer intersection counts along each grid edge. This encoding sidesteps the usual Nyquist--Shannon limit, putting no lower bound on the size of features that can be resolved on a fixed grid. In practice, for similar compute time and equal grid resolution -- or even an equal number of output triangles -- meshes produced by subgrid marching are far more accurate than those from classic marching. Beyond standard contouring, our method can be used to convert polygon soup into a manifold, intersection-free mesh.

Authors: Hossein Baktash, Mark Gillespie, Keenan Crane

We describe a method for recovering a manifold, intersection-free triangle mesh from the points where edges of a tetrahedral grid pierce a continuous surface. Unlike classic marching cubes or tets, our subgrid marching scheme allows arbitrarily many surface patches within a single cell, capturing fine features and thin sheets. Moreover, it requires neither a well-defined inside/outside (allowing surfaces with boundary), nor consistently-oriented input geometry. Yet we retain the local, parallel nature of classic marching: reconstruction is performed independently per tet, yielding a conforming mesh across tet boundaries. Our key innovation is a generalization of normal coordinates from geometric topology, which encode surface connectivity via arbitrary integer intersection counts along each grid edge. This encoding sidesteps the usual Nyquist--Shannon limit, putting no lower bound on the size of features that can be resolved on a fixed grid. In practice, for similar compute time and equal grid resolution -- or even an equal number of output triangles -- meshes produced by subgrid marching are far more accurate than those from classic marching. Beyond standard contouring, our method can be used to convert polygon soup into a manifold, intersection-free mesh.

Towards fast computation of higher discrete homology

from arXiv: Computational Geometry

Authors: Jacob Ender, Chris Kapulkin

We develop a new algorithm for computing the second discrete homology group of a graph which is much faster when compared to existing algorithms. To do so, we identify five basic shapes, which are quotient graphs of the 3-cube with the property that the injective maps from them detect all possible 2-boundaries in the singular chain complex computing discrete homology.

Authors: Jacob Ender, Chris Kapulkin

We develop a new algorithm for computing the second discrete homology group of a graph which is much faster when compared to existing algorithms. To do so, we identify five basic shapes, which are quotient graphs of the 3-cube with the property that the injective maps from them detect all possible 2-boundaries in the singular chain complex computing discrete homology.

$O(n +f(k))$: Truly Linear FPT

from arXiv: Data Structures and Algorithms

Authors: Benjamin Merlin Bumpus, Rod Downey, Tala Eagling-Vose, Jessica Enright, Michael R. Fellows, David C. Kutner, Laura Larios-Jones, Barnaby Martin, Frances Rosamond, Ella Yates

Parameterized complexity has always been concerned with practical computing: by confining combinatorial explosion to a secondary parameter $k$, one can uncover why and how many NP-hard problems are effectively tackled in practice. Today, however, the scale of data has changed: scientists study Big Data, which is so large that even quadratic dependence in the total input size $n$ is unaffordable. Therefore, what constitutes a practical algorithm has also changed. Classically, parameterized complexity is blind to the difference between defining fixed parameter tractability multiplicatively (i.e. $f(k) \cdot n^c$) or additively (i.e. $f(k) + n^c$). But what if the constant $c$ is one and we require true linearity, is this distinction still inconsequential? Here, we define and explore Truly Linear FPT (TLFPT) -- that is $O(n)+f(k)$ -- and show that it is a strict subset of Linear FPT (LFPT) -- that is $O(n) \cdot f(k)$ -- via diagonalization. Populating TLFPT requires careful consideration of linear-time algorithmics and data structures. We meet many inhabitants of TLFPT: SAT, Vertex Cover, Min-Max Matching, $(n-k)$-Coloring, Diverse Pair of Matchings, $k$-Path, and $H$-Coloring. Our parameterizations are equally varied. Beyond classical parameters like solution size, we leverage two parameters, treedepth and BFS-width, which are particularly well-suited to the TLFPT regime. We do so by developing techniques based on depth- and breadth-first search. For parameterized complexity to be of service to the scientific community, we need to contend with Big Data. For sufficiently large inputs, FPT beyond linear may not suffice. Thus, there is a practical and theoretical need for more ambitious goals. TLFPT is a first step forward.

Authors: Benjamin Merlin Bumpus, Rod Downey, Tala Eagling-Vose, Jessica Enright, Michael R. Fellows, David C. Kutner, Laura Larios-Jones, Barnaby Martin, Frances Rosamond, Ella Yates

Parameterized complexity has always been concerned with practical computing: by confining combinatorial explosion to a secondary parameter $k$, one can uncover why and how many NP-hard problems are effectively tackled in practice. Today, however, the scale of data has changed: scientists study Big Data, which is so large that even quadratic dependence in the total input size $n$ is unaffordable. Therefore, what constitutes a practical algorithm has also changed. Classically, parameterized complexity is blind to the difference between defining fixed parameter tractability multiplicatively (i.e. $f(k) \cdot n^c$) or additively (i.e. $f(k) + n^c$). But what if the constant $c$ is one and we require true linearity, is this distinction still inconsequential? Here, we define and explore Truly Linear FPT (TLFPT) -- that is $O(n)+f(k)$ -- and show that it is a strict subset of Linear FPT (LFPT) -- that is $O(n) \cdot f(k)$ -- via diagonalization. Populating TLFPT requires careful consideration of linear-time algorithmics and data structures. We meet many inhabitants of TLFPT: SAT, Vertex Cover, Min-Max Matching, $(n-k)$-Coloring, Diverse Pair of Matchings, $k$-Path, and $H$-Coloring. Our parameterizations are equally varied. Beyond classical parameters like solution size, we leverage two parameters, treedepth and BFS-width, which are particularly well-suited to the TLFPT regime. We do so by developing techniques based on depth- and breadth-first search. For parameterized complexity to be of service to the scientific community, we need to contend with Big Data. For sufficiently large inputs, FPT beyond linear may not suffice. Thus, there is a practical and theoretical need for more ambitious goals. TLFPT is a first step forward.

Search-space Reduction for Boolean MinCSPs via Essential Constraints

from arXiv: Data Structures and Algorithms

Authors: Bart M. P. Jansen, Ruben F. A. Verhaegh

For a fixed set $\mathcal{F}$ of Boolean constraint types, a MinCSP$(\mathcal{F})$-instance consists of a formula $F$ that applies $m$ constraints from $\mathcal{F}$ to a set of $n$ Boolean variables. The goal is to remove a minimum subset of constraint applications from $F$ to make the remaining formula satisfiable. Previous work characterized how the choice of $\mathcal{F}$ affects its polynomial-time solvability and approximability. We extend a recently introduced preprocessing framework for graph problems to the problem above. Rephrased in the context of CSPs, this framework defines a constraint application from a given formula $F$ as $c$-essential if it is contained in all $c$-approximate solutions to $F$. Being able to efficiently detect these essential parts of a solution reduces the search space of any follow-up FPT algorithms parameterized by the solution size and yields an immediate asymptotic improvement to the runtime of such algorithms. In this work, we present a dichotomy theorem that distinguishes constraint sets $\mathcal{F}$ for which $c_\mathcal{F}$-essential constraint applications can be detected efficiently for some $c_{\mathcal{F}} \in \mathcal{O}(1)$, from those for which this task is intractable under established complexity-theoretic conjectures. Our results show that for any set $\mathcal{F}$ of bijunctive constraints, there is a polynomial-time algorithm that detects $\mathcal{O}(1)$-essential constraint applications. This contrasts the fact that constant-factor approximating a bijunctive MinCSP$(\mathcal{F})$-problem is intractable under the Unique Games Conjecture.

Authors: Bart M. P. Jansen, Ruben F. A. Verhaegh

For a fixed set $\mathcal{F}$ of Boolean constraint types, a MinCSP$(\mathcal{F})$-instance consists of a formula $F$ that applies $m$ constraints from $\mathcal{F}$ to a set of $n$ Boolean variables. The goal is to remove a minimum subset of constraint applications from $F$ to make the remaining formula satisfiable. Previous work characterized how the choice of $\mathcal{F}$ affects its polynomial-time solvability and approximability. We extend a recently introduced preprocessing framework for graph problems to the problem above. Rephrased in the context of CSPs, this framework defines a constraint application from a given formula $F$ as $c$-essential if it is contained in all $c$-approximate solutions to $F$. Being able to efficiently detect these essential parts of a solution reduces the search space of any follow-up FPT algorithms parameterized by the solution size and yields an immediate asymptotic improvement to the runtime of such algorithms. In this work, we present a dichotomy theorem that distinguishes constraint sets $\mathcal{F}$ for which $c_\mathcal{F}$-essential constraint applications can be detected efficiently for some $c_{\mathcal{F}} \in \mathcal{O}(1)$, from those for which this task is intractable under established complexity-theoretic conjectures. Our results show that for any set $\mathcal{F}$ of bijunctive constraints, there is a polynomial-time algorithm that detects $\mathcal{O}(1)$-essential constraint applications. This contrasts the fact that constant-factor approximating a bijunctive MinCSP$(\mathcal{F})$-problem is intractable under the Unique Games Conjecture.

High-Dimensional Expanders, the Sparsest Cut Problem, and Steurer's Conjecture

from arXiv: Data Structures and Algorithms

Authors: Farzam Ebrahimnejad, Shayan Oveis Gharan

In 2010, Steurer conjectured that any family of $n$ unit-norm vectors $v_1,\dots,v_n$ with polynomially small average correlation $\mathbb{E}_{i,j}|\langle v_i,v_j\rangle|\leq n^{-ε}$ contains linear-sized constant-separated sets. We refute this conjecture in a strong sense using the machinery of sparse high-dimensional expanders: such vector families do not even have linear-sized $\frac{1}{\log^{1/4-o(1)}(n)}$-separated sets. Consequently, we show that there are families of vertex expanders on $n$ vertices for which the (average) $L_2$-mixing time to the uniform distribution of any reweighted simple random walk is at least $\log^{5/4-o(1)} n$.

Authors: Farzam Ebrahimnejad, Shayan Oveis Gharan

In 2010, Steurer conjectured that any family of $n$ unit-norm vectors $v_1,\dots,v_n$ with polynomially small average correlation $\mathbb{E}_{i,j}|\langle v_i,v_j\rangle|\leq n^{-ε}$ contains linear-sized constant-separated sets. We refute this conjecture in a strong sense using the machinery of sparse high-dimensional expanders: such vector families do not even have linear-sized $\frac{1}{\log^{1/4-o(1)}(n)}$-separated sets. Consequently, we show that there are families of vertex expanders on $n$ vertices for which the (average) $L_2$-mixing time to the uniform distribution of any reweighted simple random walk is at least $\log^{5/4-o(1)} n$.

Terminal Steiner tree problem : Complexity and Algorithms

from arXiv: Data Structures and Algorithms

Authors: Jyothish S, Sadagopan Narasimhan

Given a connected graph $G$ and a terminal set $R \subseteq V(G)$, the Steiner tree problem (ST) asks for a tree that spans all of $R$ with at most $r$ vertices from $V(G)\backslash R$, for some integer $r\geq 0$. It is known from (Garey et al.,1977 ) that ST is NP-complete. A Steiner tree in which all terminal vertices are constrained to be leaves is called a terminal Steiner tree. Our study addresses the existence of a terminal Steiner tree, its complexity across various graph classes, black-box applications of the ST, and a fixed-parameter tractable (FPT) algorithm with respect to the number of terminals.

Authors: Jyothish S, Sadagopan Narasimhan

Given a connected graph $G$ and a terminal set $R \subseteq V(G)$, the Steiner tree problem (ST) asks for a tree that spans all of $R$ with at most $r$ vertices from $V(G)\backslash R$, for some integer $r\geq 0$. It is known from (Garey et al.,1977 ) that ST is NP-complete. A Steiner tree in which all terminal vertices are constrained to be leaves is called a terminal Steiner tree. Our study addresses the existence of a terminal Steiner tree, its complexity across various graph classes, black-box applications of the ST, and a fixed-parameter tractable (FPT) algorithm with respect to the number of terminals.

Exact Sampling of Permutations with a Fixed Longest Increasing Subsequence

from arXiv: Data Structures and Algorithms

Authors: Peter Clifford, Raphaël Clifford

We study exact uniform sampling of permutations of length $n$ whose longest increasing subsequence (LIS) has prescribed length $k$. For $k \in Θ(n)$, we give a direct rejection sampler whose expected running time is $O(n\log\log n)$ in the word-RAM model. The sampler uses an expanded proposal space consisting of permutations together with a specified increasing subsequence, and accepts exactly those proposals whose specified subsequence is the leftmost LIS. For arbitrary $1\le k\le n$, we give an exact sampler based on the Robinson--Schensted correspondence. The algorithm samples the corresponding Plancherel-conditioned shape by computing exact completion counts via determinant identities, and then samples two uniform tableaux of that shape. The direct implementation runs in $\tilde O(n^4k^5)$ expected time. We then show that the same sampler can be implemented in expected $\tilde O(n^3k^4)$ time by evaluating a determinant oracle through Hankel moment matrices.

Authors: Peter Clifford, Raphaël Clifford

We study exact uniform sampling of permutations of length $n$ whose longest increasing subsequence (LIS) has prescribed length $k$. For $k \in Θ(n)$, we give a direct rejection sampler whose expected running time is $O(n\log\log n)$ in the word-RAM model. The sampler uses an expanded proposal space consisting of permutations together with a specified increasing subsequence, and accepts exactly those proposals whose specified subsequence is the leftmost LIS. For arbitrary $1\le k\le n$, we give an exact sampler based on the Robinson--Schensted correspondence. The algorithm samples the corresponding Plancherel-conditioned shape by computing exact completion counts via determinant identities, and then samples two uniform tableaux of that shape. The direct implementation runs in $\tilde O(n^4k^5)$ expected time. We then show that the same sampler can be implemented in expected $\tilde O(n^3k^4)$ time by evaluating a determinant oracle through Hankel moment matrices.

Efficiently Listing Projected Trees, and Equivalence of Listing and Enumeration

from arXiv: Data Structures and Algorithms

Authors: Karl Bringmann, Nick Fischer, Yanheng Wang

The subgraph isomorphism problem and its generalizations such as conjunctive queries, where some nodes are projected, are among the most fundamental problems in graph algorithms and database theory. In this paper, we study the listing and enumeration variants of these problems and present two main results. (1) We present the first algorithms for enumerating projected trees with polynomial preprocessing time ($\widetilde{O}(n^{17.42})$) and polylogarithmic delay ($\mathrm{polylog}(n)$). Prior to this work, all algorithms in the literature required time $Ω(n^{Ω(k)} + t)$ or $t \cdot n^{Ω(1)}$ to list all copies of a $k$-node tree with projections, where $t$ is the number of solutions. Our result generalizes to arbitrary projected hypergraphs, achieving enumeration in preprocessing time $\widetilde{O}(m^{17.42 \cdot \mathrm{subw}(H)})$ and polylogarithmic delay, where $\mathrm{subw}(H)$ is the submodular width of the pattern hypergraph $H$. We heavily rely on fast (rectangular and output-sensitive) matrix multiplication, which we complement by fine-grained lower bounds indicating that any algorithm beating time $Ω(n^{Ω(k)} + t)$ must rely on fast matrix multiplication. (2) As our second main result, we present a generic enumeration-to-listing reduction, establishing that listing and enumeration are equivalent under natural assumptions. For (colored) subgraph isomorphism, our reduction transforms any listing algorithm running in time $O(f(n,m) + t \cdot g(n,m))$ into an enumeration algorithm with preprocessing time $O((f(n,m)+g(n,m)+m) \log^2 n)$ and delay $O(g(n,m))$. We utilize this equivalence as a tool for proving our first main result, and we expect that our generic reduction will find many future applications.

Authors: Karl Bringmann, Nick Fischer, Yanheng Wang

The subgraph isomorphism problem and its generalizations such as conjunctive queries, where some nodes are projected, are among the most fundamental problems in graph algorithms and database theory. In this paper, we study the listing and enumeration variants of these problems and present two main results. (1) We present the first algorithms for enumerating projected trees with polynomial preprocessing time ($\widetilde{O}(n^{17.42})$) and polylogarithmic delay ($\mathrm{polylog}(n)$). Prior to this work, all algorithms in the literature required time $Ω(n^{Ω(k)} + t)$ or $t \cdot n^{Ω(1)}$ to list all copies of a $k$-node tree with projections, where $t$ is the number of solutions. Our result generalizes to arbitrary projected hypergraphs, achieving enumeration in preprocessing time $\widetilde{O}(m^{17.42 \cdot \mathrm{subw}(H)})$ and polylogarithmic delay, where $\mathrm{subw}(H)$ is the submodular width of the pattern hypergraph $H$. We heavily rely on fast (rectangular and output-sensitive) matrix multiplication, which we complement by fine-grained lower bounds indicating that any algorithm beating time $Ω(n^{Ω(k)} + t)$ must rely on fast matrix multiplication. (2) As our second main result, we present a generic enumeration-to-listing reduction, establishing that listing and enumeration are equivalent under natural assumptions. For (colored) subgraph isomorphism, our reduction transforms any listing algorithm running in time $O(f(n,m) + t \cdot g(n,m))$ into an enumeration algorithm with preprocessing time $O((f(n,m)+g(n,m)+m) \log^2 n)$ and delay $O(g(n,m))$. We utilize this equivalence as a tool for proving our first main result, and we expect that our generic reduction will find many future applications.

The Completion-Threshold Framework for Obligatory-Test Scheduling on Multiple Machines

from arXiv: Data Structures and Algorithms

Authors: Kao-Chuan Liang, Ya-Chun Liang

We study online scheduling with obligatory testing on $m$ identical machines with the objective of minimizing the sum of completion times. In this model, every job must undergo a test before its actual processing time is revealed. Consequently, the central algorithmic challenge is no longer whether to acquire information, but how to optimally balance machine capacity between revealing unknown jobs and processing currently known ones. While this tradeoff becomes structurally richer in the multiple-machine setting, the only prior explicit deterministic lower bound for this objective was $\sqrt{2}$, established strictly for a single machine in 2024 by Dogeas et al. [ESA 2024: 48:1-48:14]. Our core conceptual contribution is demonstrating that completion-threshold quantities, denoted $T_X$, serve as the fundamental analytical metric for this setting. Because every completed job must first pass through the testing phase, delayed revelation inherently forces delayed completion. By bounding these $T_X$ thresholds, we systematically derive strong lower bounds on the total completion time. Utilizing this framework, we establish the first substantial deterministic lower bounds for multiple machines, including a three-type bound of $1.4811$ and a multi-type dyadic construction that asymptotically approaches $3/2$. Finally, we complement these theoretical limits with a deterministic $2$-competitive list-scheduling algorithm for arbitrary test times.

Authors: Kao-Chuan Liang, Ya-Chun Liang

We study online scheduling with obligatory testing on $m$ identical machines with the objective of minimizing the sum of completion times. In this model, every job must undergo a test before its actual processing time is revealed. Consequently, the central algorithmic challenge is no longer whether to acquire information, but how to optimally balance machine capacity between revealing unknown jobs and processing currently known ones. While this tradeoff becomes structurally richer in the multiple-machine setting, the only prior explicit deterministic lower bound for this objective was $\sqrt{2}$, established strictly for a single machine in 2024 by Dogeas et al. [ESA 2024: 48:1-48:14]. Our core conceptual contribution is demonstrating that completion-threshold quantities, denoted $T_X$, serve as the fundamental analytical metric for this setting. Because every completed job must first pass through the testing phase, delayed revelation inherently forces delayed completion. By bounding these $T_X$ thresholds, we systematically derive strong lower bounds on the total completion time. Utilizing this framework, we establish the first substantial deterministic lower bounds for multiple machines, including a three-type bound of $1.4811$ and a multi-type dyadic construction that asymptotically approaches $3/2$. Finally, we complement these theoretical limits with a deterministic $2$-competitive list-scheduling algorithm for arbitrary test times.

A Near-Optimal Offline Algorithm for Dynamic All-Pairs Shortest Paths in Planar Digraphs

from arXiv: Data Structures and Algorithms

Authors: Debarati Das, Maximilian Probst Gutenberg, Christian Wulff-Nilsen

In the planar, dynamic All-Pairs Shortest Paths (APSP) problem, a planar, weighted digraph $G$ undergoes a sequence of edge weight updates and the goal is to maintain a data structure on $G$, that can quickly answer distance queries between any two vertices $x,y \in V(G)$. The currently best algorithms for this problem require $\tilde{O}(n^{2/3})$ worst-case update and query time, while conditional lower bounds show that either update or query time $n^{0.5-δ}$ is needed for any constant $δ> 0$. In this article, we present the first algorithm with near-optimal $\tilde{O}(\sqrt{n})$ worst-case update and query time for the offline setting, where the update sequence is given initially. This result is obtained by giving the first offline dynamic algorithm for maintaining dense distance graphs (DDGs) faster than recomputing from scratch after each update. Further, we also present an \emph{online} algorithm for the incremental APSP problem with $\tilde{O}(\sqrt{n})$ worst-case update/ query time. This allows us to reduce the online dynamic APSP problem to the online decremental APSP problem, which constitutes partial progress even for the online version of this notorious problem.

Authors: Debarati Das, Maximilian Probst Gutenberg, Christian Wulff-Nilsen

In the planar, dynamic All-Pairs Shortest Paths (APSP) problem, a planar, weighted digraph $G$ undergoes a sequence of edge weight updates and the goal is to maintain a data structure on $G$, that can quickly answer distance queries between any two vertices $x,y \in V(G)$. The currently best algorithms for this problem require $\tilde{O}(n^{2/3})$ worst-case update and query time, while conditional lower bounds show that either update or query time $n^{0.5-δ}$ is needed for any constant $δ> 0$. In this article, we present the first algorithm with near-optimal $\tilde{O}(\sqrt{n})$ worst-case update and query time for the offline setting, where the update sequence is given initially. This result is obtained by giving the first offline dynamic algorithm for maintaining dense distance graphs (DDGs) faster than recomputing from scratch after each update. Further, we also present an \emph{online} algorithm for the incremental APSP problem with $\tilde{O}(\sqrt{n})$ worst-case update/ query time. This allows us to reduce the online dynamic APSP problem to the online decremental APSP problem, which constitutes partial progress even for the online version of this notorious problem.

Scalable Concurrent Queues for GPU

from arXiv: Data Structures and Algorithms

Authors: Pratheek Prakash Shetty, Thomas R. W. Scogland, Wu-chun Feng

Concurrent queues can significantly impact supercomputing performance by being critical bottlenecks for task distribution, load balancing, and resource utilization. As HPC systems move beyond 10-million processor cores, the ability to rapidly move items between producer and consumer threads without excessive locking is essential for efficient queues, preventing idle cores, maximizing utilization, and achieving high parallel speedup. While concurrent queues are well studied on CPUs, they remain largely unexplored on modern GPUs, where SIMT execution, massive parallelism, and atomic contention reshape the design space. We present three linearizable GPU concurrent queues spanning from lock-free to wait-free guarantees: (1) G-WFQ-YMC, an adaptation of Yang and Mellor-Crummey's wait-free queue using preallocated segments; (2) G-LFQ, a bounded lock-free queue that uses wave-batched fast paths to maximize throughput; and (3) G-WFQ, a bounded wait-free queue that packs shared state into 64-bit compare-and-swap operations while preserving linearizability and bounded memory.

Authors: Pratheek Prakash Shetty, Thomas R. W. Scogland, Wu-chun Feng

Concurrent queues can significantly impact supercomputing performance by being critical bottlenecks for task distribution, load balancing, and resource utilization. As HPC systems move beyond 10-million processor cores, the ability to rapidly move items between producer and consumer threads without excessive locking is essential for efficient queues, preventing idle cores, maximizing utilization, and achieving high parallel speedup. While concurrent queues are well studied on CPUs, they remain largely unexplored on modern GPUs, where SIMT execution, massive parallelism, and atomic contention reshape the design space. We present three linearizable GPU concurrent queues spanning from lock-free to wait-free guarantees: (1) G-WFQ-YMC, an adaptation of Yang and Mellor-Crummey's wait-free queue using preallocated segments; (2) G-LFQ, a bounded lock-free queue that uses wave-batched fast paths to maximize throughput; and (3) G-WFQ, a bounded wait-free queue that packs shared state into 64-bit compare-and-swap operations while preserving linearizability and bounded memory.

Towards Optimal Robustness in Learning-Augmented Paging

from arXiv: Data Structures and Algorithms

Authors: Peng Chen, Hailiang Zhao, Xueyan Tang, Yixuan Wang, Shuiguang Deng

Learning-augmented paging has been extensively studied in recent years. A key advantage over naive ML-based approaches is \emph{bounded robustness}, which guarantees worst-case performance even when predictions are inaccurate, making these algorithms valuable for real-world systems. Prior work achieves robustness bounds of $2H_k + O(1)$ in the randomized setting, leaving a gap to the optimal competitive ratio $H_k$. In this paper, we study how to close this gap. We begin by reviewing online optimality and proving a new property of the latest $H_k$-competitive algorithm, which facilitates our analysis in the learning-augmented setting. Then, we review existing learning-augmented paging algorithms and introduce a unifying primitive, the \emph{relative prediction budget}, which captures the essence of establishing robustness and reveals that prior algorithms either overuse or underutilize predictions. Guided by the above analysis, we develop a new framework that achieves the best-possible robustness up to an additive constant for learning-augmented paging: $H_k + O(1)$. Experiments further demonstrate strong practical performance.

Authors: Peng Chen, Hailiang Zhao, Xueyan Tang, Yixuan Wang, Shuiguang Deng

Learning-augmented paging has been extensively studied in recent years. A key advantage over naive ML-based approaches is \emph{bounded robustness}, which guarantees worst-case performance even when predictions are inaccurate, making these algorithms valuable for real-world systems. Prior work achieves robustness bounds of $2H_k + O(1)$ in the randomized setting, leaving a gap to the optimal competitive ratio $H_k$. In this paper, we study how to close this gap. We begin by reviewing online optimality and proving a new property of the latest $H_k$-competitive algorithm, which facilitates our analysis in the learning-augmented setting. Then, we review existing learning-augmented paging algorithms and introduce a unifying primitive, the \emph{relative prediction budget}, which captures the essence of establishing robustness and reveals that prior algorithms either overuse or underutilize predictions. Guided by the above analysis, we develop a new framework that achieves the best-possible robustness up to an additive constant for learning-augmented paging: $H_k + O(1)$. Experiments further demonstrate strong practical performance.

Adversarial Configurations for the ReCom Transition Function

from arXiv: Data Structures and Algorithms

Authors: Micah Gold

ReCom is a leading Markov Chain Monte Carlo algorithm for sampling balanced graph partitions in computational redistricting. At each step, its transition function proposes a new partition by merging two adjacent districts and if possible re-splitting the conjoined region. The transition function is efficient in practice, however, it is unknown whether it is guaranteed to run in polynomial time. In this report we exhibit an explicit family of 3-partitions on planar square grid graphs from which ReCom requires an exponentially large expected number of steps to re-split the graph (even if we admit approximately balanced splits), showing that in the worst case ReCom does not run in polynomial time. Notably, this result implies that ReCom is not technically rapidly mixing (if started from an adversarial configuration, ReCom requires exponential many steps to reach the stationary distribution).

Authors: Micah Gold

ReCom is a leading Markov Chain Monte Carlo algorithm for sampling balanced graph partitions in computational redistricting. At each step, its transition function proposes a new partition by merging two adjacent districts and if possible re-splitting the conjoined region. The transition function is efficient in practice, however, it is unknown whether it is guaranteed to run in polynomial time. In this report we exhibit an explicit family of 3-partitions on planar square grid graphs from which ReCom requires an exponentially large expected number of steps to re-split the graph (even if we admit approximately balanced splits), showing that in the worst case ReCom does not run in polynomial time. Notably, this result implies that ReCom is not technically rapidly mixing (if started from an adversarial configuration, ReCom requires exponential many steps to reach the stationary distribution).

On Thin Perfect Matchings up to Polylogarithmic Factors

from arXiv: Data Structures and Algorithms

Authors: Alireza Haqi, Shayan Oveis Gharan

We resolve the thin matching problem proposed by Anari, Charikar and Ramakrishnan [ACR23] up to polylogarithmic factors. Given a fractional perfect matching $x$, we say a perfect matching $M$ is $α$-thin w.r.t. $x$ if for any cut $(S,\overline{S})$, we have $$ |M \cap E(S,\overline{S})| \leq α\cdot x(S,\overline{S}).$$ [ACR23] conjectured that for any fractional perfect matching $x$, there exists a perfect matching $M$ which is $O(1)$-thin w.r.t. $x$. First, we show that if $M$ is restricted to be in the support of $x$, then $α\geq Ω(n)$ and we complement this by designing an efficient algorithm that outputs an $O(n\log n)$-thin perfect matching where $n$ is the number of vertices. Then, we relax this constraint and show that for any fractional perfect matching $x$, there is a perfect matching $M$ (which is not necessarily in the support of $x$) such that $M$ is $\text{polylog}(n)$-thin w.r.t. $x$. All results work for both bipartite and non-bipartite graphs. We also discuss applications to the metric distortion problem.

Authors: Alireza Haqi, Shayan Oveis Gharan

We resolve the thin matching problem proposed by Anari, Charikar and Ramakrishnan [ACR23] up to polylogarithmic factors. Given a fractional perfect matching $x$, we say a perfect matching $M$ is $α$-thin w.r.t. $x$ if for any cut $(S,\overline{S})$, we have $$ |M \cap E(S,\overline{S})| \leq α\cdot x(S,\overline{S}).$$ [ACR23] conjectured that for any fractional perfect matching $x$, there exists a perfect matching $M$ which is $O(1)$-thin w.r.t. $x$. First, we show that if $M$ is restricted to be in the support of $x$, then $α\geq Ω(n)$ and we complement this by designing an efficient algorithm that outputs an $O(n\log n)$-thin perfect matching where $n$ is the number of vertices. Then, we relax this constraint and show that for any fractional perfect matching $x$, there is a perfect matching $M$ (which is not necessarily in the support of $x$) such that $M$ is $\text{polylog}(n)$-thin w.r.t. $x$. All results work for both bipartite and non-bipartite graphs. We also discuss applications to the metric distortion problem.

Multiagent Matroid Upgrading: Greedy is Fair and Efficient

from arXiv: Data Structures and Algorithms

Authors: Qingwen Ma, Chao Peng, Changfeng Xu, Chenyang Xu, Ruilong Zhang

This paper introduces a general multiagent matroid upgrading problem that models a broad class of real-world resource allocation tasks. In this setting, there are multiple agents and a ground set of elements, where each element is assigned to a specific agent and has two associated costs: a default cost and a reduced (upgraded) cost. Upgrading an element lowers its cost to the upgraded value, while non-upgraded elements retain their default costs. Each agent is associated with its own matroid, with the goal of finding a minimum-cost basis. The central task is to select at most k elements to upgrade so as to minimize a non-decreasing convex function over the agents' minimum basis costs, capturing both efficiency and fairness objectives in multiagent systems.

Authors: Qingwen Ma, Chao Peng, Changfeng Xu, Chenyang Xu, Ruilong Zhang

This paper introduces a general multiagent matroid upgrading problem that models a broad class of real-world resource allocation tasks. In this setting, there are multiple agents and a ground set of elements, where each element is assigned to a specific agent and has two associated costs: a default cost and a reduced (upgraded) cost. Upgrading an element lowers its cost to the upgraded value, while non-upgraded elements retain their default costs. Each agent is associated with its own matroid, with the goal of finding a minimum-cost basis. The central task is to select at most k elements to upgrade so as to minimize a non-decreasing convex function over the agents' minimum basis costs, capturing both efficiency and fairness objectives in multiagent systems.

Dynamic Breadth First Search with Predictions

from arXiv: Data Structures and Algorithms

Authors: Shahbaz Khan, Shubham Kumar Verma, Utkarsh Lohiya

Given a graph $G(V,E)$ having $n$ vertices and $m$ edges, we maintain its Breadth-First Search (BFS) tree from source $s$ under an online sequence of edge updates in the prediction model. Our approach leverages a predicted update sequence aiding online processing. We present algorithms for incremental (insertions-only), decremental (deletions-only), and fully dynamic (insertions and deletions) settings that maintain a BFS tree (parent and level information). Classically, the incremental and decremental BFS tree requires total $O(mn)$ time [JACM81], with amortized $O(n)$ and worst-case $O(m)$ update time. The combinatorial BMM conjecture restricts any polynomial improvement [FOCS14] even when the updates are known in advance [STOC15]. For fully dynamic BFS trees, only the trivial $O(m)$ time recomputation is known. Our complexity bounds are expressed in prediction error measures, where error vertices are those having incorrectly predicted distances, with the corresponding difference as their error. The vertex prediction error $η_{v}$ is the sum of degrees of error vertices, weighted vertex prediction error $η^*_{v}$ is error-weighted sum of degrees of error vertices, and $η_e$ counts the incorrectly predicted updates. For incremental and decremental BFS, our algorithm requires respectively $O(η_v + η_e)$ and $O(\min\{m,η^*_v + η_e\})$ worst case update time using $O(mn)$ preprocessing time and space, and total update time of $O(η^*_v + η_e)$. For fully-dynamic updates, our algorithm requires $O(\min\{m,η^*_v+η_e\})$ worst case update time. At its core, we extend the classical ES Trees [JACM81] for batch updates and fully dynamic updates. This simple extension is sufficient to give a competitive prediction algorithm, which may be generalized to other graph problems. We also consider space optimizations and error correction to improve our results.

Authors: Shahbaz Khan, Shubham Kumar Verma, Utkarsh Lohiya

Given a graph $G(V,E)$ having $n$ vertices and $m$ edges, we maintain its Breadth-First Search (BFS) tree from source $s$ under an online sequence of edge updates in the prediction model. Our approach leverages a predicted update sequence aiding online processing. We present algorithms for incremental (insertions-only), decremental (deletions-only), and fully dynamic (insertions and deletions) settings that maintain a BFS tree (parent and level information). Classically, the incremental and decremental BFS tree requires total $O(mn)$ time [JACM81], with amortized $O(n)$ and worst-case $O(m)$ update time. The combinatorial BMM conjecture restricts any polynomial improvement [FOCS14] even when the updates are known in advance [STOC15]. For fully dynamic BFS trees, only the trivial $O(m)$ time recomputation is known. Our complexity bounds are expressed in prediction error measures, where error vertices are those having incorrectly predicted distances, with the corresponding difference as their error. The vertex prediction error $η_{v}$ is the sum of degrees of error vertices, weighted vertex prediction error $η^*_{v}$ is error-weighted sum of degrees of error vertices, and $η_e$ counts the incorrectly predicted updates. For incremental and decremental BFS, our algorithm requires respectively $O(η_v + η_e)$ and $O(\min\{m,η^*_v + η_e\})$ worst case update time using $O(mn)$ preprocessing time and space, and total update time of $O(η^*_v + η_e)$. For fully-dynamic updates, our algorithm requires $O(\min\{m,η^*_v+η_e\})$ worst case update time. At its core, we extend the classical ES Trees [JACM81] for batch updates and fully dynamic updates. This simple extension is sufficient to give a competitive prediction algorithm, which may be generalized to other graph problems. We also consider space optimizations and error correction to improve our results.

The World's Fastest Matching Engine Algorithm

from arXiv: Data Structures and Algorithms

Authors: Jake Yoon

Every electronic exchange relies on an order book whose storage layer determines matching latency. The dominant implementation -- linked lists chained through a balanced tree -- imposes two costs on every operation: pointer-chased traversal to reach the insertion point, and root-to-leaf search to locate the target price level. Under micro-burst conditions these costs produce tail-latency spikes that degrade market quality when liquidity is most needed. We present two data-structure contributions that eliminate these costs. The first is the Priority-Indicated Node (PIN), a priority queue in which entries occupy fixed-capacity, contiguously addressable slots, each carrying a per-slot indicator encoding the entry's global priority. Unlike heaps, which require O(log n) comparisons per operation, the PIN resolves insertion position directly from the indicators without comparing entries; indicator updates are O(1), independent of queue size. The second addresses a broader inefficiency: balanced search trees search root-to-leaf on every insertion and deletion, even when the caller already knows the key's in-order neighbors -- as in ordered event streams, incremental index already knows the key's in-order neighbors -- as in ordered event streams, incremental index maintenance, and electronic trading. Neighbor-aware insertion and deletion exploit known neighbor references to attach or remove a node with O(1) reference writes, followed by single-path rebalancing, uniformly across red-black, AVL, and B/B+-tree variants. A single CPU core sustains 32 million order messages per second with sub-microsecond tail latency under multi-million message-per-second micro-bursts, and is 5-11x faster than the best available open-source matching engines on the same hardware. Scaled to a single 96-core instance, the engine sustains 640 million messages per second across 10,000 symbols.

Authors: Jake Yoon

Every electronic exchange relies on an order book whose storage layer determines matching latency. The dominant implementation -- linked lists chained through a balanced tree -- imposes two costs on every operation: pointer-chased traversal to reach the insertion point, and root-to-leaf search to locate the target price level. Under micro-burst conditions these costs produce tail-latency spikes that degrade market quality when liquidity is most needed. We present two data-structure contributions that eliminate these costs. The first is the Priority-Indicated Node (PIN), a priority queue in which entries occupy fixed-capacity, contiguously addressable slots, each carrying a per-slot indicator encoding the entry's global priority. Unlike heaps, which require O(log n) comparisons per operation, the PIN resolves insertion position directly from the indicators without comparing entries; indicator updates are O(1), independent of queue size. The second addresses a broader inefficiency: balanced search trees search root-to-leaf on every insertion and deletion, even when the caller already knows the key's in-order neighbors -- as in ordered event streams, incremental index already knows the key's in-order neighbors -- as in ordered event streams, incremental index maintenance, and electronic trading. Neighbor-aware insertion and deletion exploit known neighbor references to attach or remove a node with O(1) reference writes, followed by single-path rebalancing, uniformly across red-black, AVL, and B/B+-tree variants. A single CPU core sustains 32 million order messages per second with sub-microsecond tail latency under multi-million message-per-second micro-bursts, and is 5-11x faster than the best available open-source matching engines on the same hardware. Scaled to a single 96-core instance, the engine sustains 640 million messages per second across 10,000 symbols.

Repeated Descent: A Framework for Online Budget-Feasible Auctions

from arXiv: Data Structures and Algorithms

Authors: Andreas Charalampopoulos, Dimitris Fotakis, Thanos Tolias

We study budget feasible procurement auctions, in which $n$ agents, each with a privately held service cost, offer their services to an employer. The employer seeks to maximize a public submodular valuation function over the set of hired agents, while facing a hard budget constraint. We consider an online posted-price setting, in which agents arrive in a uniformly random order (a.k.a. \emph{secretary arrivals}) and the employer must make irrevocable take-it-or-leave-it offers upon their arrival. The employer does not get any feedback about the agent service costs other than whether they accept the offer or not. We introduce Repeated Descent (a.k.a. \RED), a deterministic framework based on adaptive linear posted pricing. \RED enforces budget feasibility by adaptively adjusting its pricing and balancing each pricing level with the number of agents considered in it. Using \RED as the main building block, we obtain a $1046$-competitive posted-price mechanism for online budget feasible auctions with secretary agent arrivals and submodular valuations, thus improving on the previously best known ratio of (Charalampopoulos et al., EC 2025) by several orders of magnitude. Combining \RED with random subsampling, we obtain the first constant-competitive posted-price budget feasible mechanism for non-monotone submodular valuations. On the negative side, we show that every online budget feasible mechanism with XOS valuations has a competitive ratio of $Ω\!\left(\tfrac{\log n}{(\log\log n)^2}\right)$.

Authors: Andreas Charalampopoulos, Dimitris Fotakis, Thanos Tolias

We study budget feasible procurement auctions, in which $n$ agents, each with a privately held service cost, offer their services to an employer. The employer seeks to maximize a public submodular valuation function over the set of hired agents, while facing a hard budget constraint. We consider an online posted-price setting, in which agents arrive in a uniformly random order (a.k.a. \emph{secretary arrivals}) and the employer must make irrevocable take-it-or-leave-it offers upon their arrival. The employer does not get any feedback about the agent service costs other than whether they accept the offer or not. We introduce Repeated Descent (a.k.a. \RED), a deterministic framework based on adaptive linear posted pricing. \RED enforces budget feasibility by adaptively adjusting its pricing and balancing each pricing level with the number of agents considered in it. Using \RED as the main building block, we obtain a $1046$-competitive posted-price mechanism for online budget feasible auctions with secretary agent arrivals and submodular valuations, thus improving on the previously best known ratio of (Charalampopoulos et al., EC 2025) by several orders of magnitude. Combining \RED with random subsampling, we obtain the first constant-competitive posted-price budget feasible mechanism for non-monotone submodular valuations. On the negative side, we show that every online budget feasible mechanism with XOS valuations has a competitive ratio of $Ω\!\left(\tfrac{\log n}{(\log\log n)^2}\right)$.

Constant-Stretch Rounding on the Hypersimplex

from arXiv: Data Structures and Algorithms

Authors: Nima Anari, Alireza Haqi, Eric Ma

We study correlated rounding on the hypersimplex, the base polytope of the uniform matroid. For each point $x$ in the hypersimplex, the goal is to sample a $k$-subset $A(x)$ with marginals $x$, while coupling the samples for all choices of $x$ so that nearby inputs produce nearby sets. We give a constant-stretch scheme. Our scheme samples the maximum-entropy $k$-subset distribution with prescribed marginals using a common random ordering and common uniform thresholds. For every $x,y\in[0,1]^n$ with $\sum_i x_i=\sum_i y_i=k$, it satisfies $\mathbb{E}[|A(x)\triangle A(y)|]\le 6\|x-y\|_1$. This improves the previous $O(\log k)$ bound for hypersimplex correlated rounding and answers an open question raised by Naor, Raju, Shetty, Srinivasan, Valieva, and Wajc. By adding dummy coordinates, the same result gives stretch at most $12$ for the at-most-$k$ polytope. The proof was found in a GPT 5.5 Pro Extended conversation prompted by the authors, and Codex was used to help assemble the manuscript under the authors' supervision.

Authors: Nima Anari, Alireza Haqi, Eric Ma

We study correlated rounding on the hypersimplex, the base polytope of the uniform matroid. For each point $x$ in the hypersimplex, the goal is to sample a $k$-subset $A(x)$ with marginals $x$, while coupling the samples for all choices of $x$ so that nearby inputs produce nearby sets. We give a constant-stretch scheme. Our scheme samples the maximum-entropy $k$-subset distribution with prescribed marginals using a common random ordering and common uniform thresholds. For every $x,y\in[0,1]^n$ with $\sum_i x_i=\sum_i y_i=k$, it satisfies $\mathbb{E}[|A(x)\triangle A(y)|]\le 6\|x-y\|_1$. This improves the previous $O(\log k)$ bound for hypersimplex correlated rounding and answers an open question raised by Naor, Raju, Shetty, Srinivasan, Valieva, and Wajc. By adding dummy coordinates, the same result gives stretch at most $12$ for the at-most-$k$ polytope. The proof was found in a GPT 5.5 Pro Extended conversation prompted by the authors, and Codex was used to help assemble the manuscript under the authors' supervision.

Eulerian-spanning set and coboundary operator: An investigation of maxcut beyond planar graphs

from arXiv: Data Structures and Algorithms

Authors: Qiming Fang, Sihong Shao, Yuxuan Wu

Using the concepts of Eulerian-spanning set and coboundary operator, we generalize Hadlock's conversion of the maxcut problem on planar graphs to one on general graphs with non-negative weights. Using our conversion, we can explore algorithms for maxcut beyond the class of planar graphs. We obtain a Fixed-Parameter Tractable algorithm for $k$-contraction apex graphs. Specifically, our algorithm can be applied to graphs with crossing number $k$, giving an $O(2^k(n+k)^{3/2}\log (n+k))$-time algorithm that matches the best known results when restricted to non-negative weights.

Authors: Qiming Fang, Sihong Shao, Yuxuan Wu

Using the concepts of Eulerian-spanning set and coboundary operator, we generalize Hadlock's conversion of the maxcut problem on planar graphs to one on general graphs with non-negative weights. Using our conversion, we can explore algorithms for maxcut beyond the class of planar graphs. We obtain a Fixed-Parameter Tractable algorithm for $k$-contraction apex graphs. Specifically, our algorithm can be applied to graphs with crossing number $k$, giving an $O(2^k(n+k)^{3/2}\log (n+k))$-time algorithm that matches the best known results when restricted to non-negative weights.

Easy, robust approximate message passing for planted spike models

from arXiv: Data Structures and Algorithms

Authors: Misha Ivkov, Tselil Schramm

We present a simple and efficient algorithm for robust approximate message passing (AMP) in the spiked matrix setting. In particular, let $\varepsilon$ be a sufficiently small constant, and suppose that $X \in \mathbb R^{n \times n}$ is a Gaussian matrix with a planted rank-$1$ spike, and $E \in \mathbb R^{n \times n}$ is an adversarially chosen matrix supported on an $\varepsilon n \times \varepsilon n$ principal minor. Let $v_{\mathrm{AMP}}(X)$ be the output of an AMP iteration on the uncorrupted matrix $X$. We give a procedure that, given access only to the corrupted matrix $Y = X + E$, computes a vector $v_{\mathrm{ALG}}(Y)$ which is $\tilde{O}(\sqrt{\varepsilon})$-close to $v_{\mathrm{AMP}}(X)$, for any of a class of AMP iterations which includes sparse Principal Component Analysis (PCA), non-negative PCA, and $\mathbb Z_2$ synchronization. Our algorithm consists of a spectral pre-processing step combined with a robust spectral initialization procedure; given these inputs, we prove that (perhaps surprisingly) AMP is robust out-of-the-box.

Authors: Misha Ivkov, Tselil Schramm

We present a simple and efficient algorithm for robust approximate message passing (AMP) in the spiked matrix setting. In particular, let $\varepsilon$ be a sufficiently small constant, and suppose that $X \in \mathbb R^{n \times n}$ is a Gaussian matrix with a planted rank-$1$ spike, and $E \in \mathbb R^{n \times n}$ is an adversarially chosen matrix supported on an $\varepsilon n \times \varepsilon n$ principal minor. Let $v_{\mathrm{AMP}}(X)$ be the output of an AMP iteration on the uncorrupted matrix $X$. We give a procedure that, given access only to the corrupted matrix $Y = X + E$, computes a vector $v_{\mathrm{ALG}}(Y)$ which is $\tilde{O}(\sqrt{\varepsilon})$-close to $v_{\mathrm{AMP}}(X)$, for any of a class of AMP iterations which includes sparse Principal Component Analysis (PCA), non-negative PCA, and $\mathbb Z_2$ synchronization. Our algorithm consists of a spectral pre-processing step combined with a robust spectral initialization procedure; given these inputs, we prove that (perhaps surprisingly) AMP is robust out-of-the-box.