Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Wednesday, February 18

Devezer's Urn

from Ben Recht

LLMs make metascience easier, but that doesn't increase metascientific validity.

In a week of incredibly annoying and inaccurate AI discourse, I’m instead drawn to write about metascientific squabbles. Engaging with AI discourse means ignoring ignorant bait articles that falsely cast academia as a unipolar axis defined by a professor in the Pacific Northwest and looking at how AI is shaping or being shaped by our social systems.

So yes, I’m intrigued by this new AI-driven political science paper by Ryan Briggs, Jonathan Mellon, and Vincent Arel-Bundock. They use large language models to scour over one hundred thousand political science papers published after 2010 and find that only 2% report null findings in the abstract. They use a statistical model to argue that this number should be 80%. As Briggs said on social media, the authors think this finding is “disastrous” for political science.

This is the latest paper that invariably appears on my timeline claiming to have scientific proof that all science is wrong. Longtime readers know I’m no fan of quantitative social science. However, I’m even less of a fan of quantitative meta-science. I’ve written multiple posts about this bizarre practice. There’s no shorter path to a paper than data mining existing papers and tut-tutting a scientific community for its questionable research practices that pad CVs while failing science. People keep writing these metascience papers, and LLMs make writing them infinitely easier. I guess that means I have to keep writing these blogposts.

So let me ask you, argmin reader: what should the statistical summarization of political science abstracts look like? What is the stochastic process that generates pseudorandom numbers and produces arXiv PDFs? What are the sufficient statistics for this distribution? The authors don’t say. Instead, their paper posits a simplistic statistical model that Berna Devezer pejoratively calls the “urn model of science.” Political scientists reach into a magical Polya urn to find a hypothesis. The urn contains 75% true hypotheses and 25% false hypotheses. Then they test the hypothesis using a perfectly normally distributed experiment. The null hypotheses are rejected based on the nebulous holy parameters statisticians call “power” and “size.” Under this model, where the authors set the power to 25% (a number plucked from a previous meta-scientific screed by one of the coauthors), they compute that eighty percent number I mentioned above.

The authors think the gap between 80% and 2% is large and evidence of widespread research malpractice. They conclude, “The publication filter that we document here is not a benign pathology. It is a structural feature of our reporting practices that threatens the credibility and accumulation of knowledge in political science.”

Now, here’s the thing. I understand that metascientists think they can just appeal to a caricatured version of Karl Popper’s philosophy of science and think that when you get a finding like this, you have refuted the idea that people are practicing good research. But the one thing every scientist should learn is the immediate counterargument to Popperian falsification called the Quine-Duhem problem. When formulating a prediction about an experimental outcome, a hypothesis never stands alone. You need to append a long chain of auxiliary hypotheses about your mathematical models, your theoretical constructs, your conditions of ceteris paribus, your measurement devices, and so on, to make any prediction. When you see the opposite of what your hypothesis predicts, that means either the hypothesis is wrong or one of your auxiliary hypotheses is wrong. You need a logical conjunction of hypotheses to make a prediction. When an observation contradicts a prediction, any of the clauses in that conjunction could be to blame.

In the case of statistical meta-science, if your toy statistical model predicts a certain curve under “ideal” research practices, and you find a different curve, it’s possible that the curve derived from undergraduate probability has nothing to do with scientific practice.

I mean, come on, the urn model of scientific practice is more insulting than the stochastic parrots model of LLMs. We don’t do research by picking random experiments independently. Test choice is informed by past practice, advisors’ tastes, measurement constraints, and the whims of journal reviewers. We certainly don’t write papers about random tests.

And we should ask ourselves why these significance tests litter social science papers. It’s an unfortunate convention that everyone knows is harmful. To first order, everyone hates null hypothesis significance tests. Most people realize that there’s no faster way to strangle a discipline than with the logically incoherent garrote of the significance test.

Unfortunately, some die-hard proponents still believe that null-hypothesis significance testing will prevent people from being fooled by things that are too good to be true. Bizarrely, 100 years of significance testing has not yet convinced them that the promise of significance testing was always too good to be true.

Indeed, we’ve known it’s too good to be true since Neyman proposed the potential outcomes model. Even in the social sciences, we’ve known the statistical testing framework is useless for over fifty years. Significance testing “is never a sufficient condition for claiming that a theory has been usefully corroborated, that a meaningful empirical fact has been established, or that an experimental report ought to be published.” The null ritual of power analyses and 0.05 rejections is incoherent, and it’s just a game evolutionarily designed to grease the publication market. As ‪Mark Copelovitch perfectly put it, “the entire edifice causal identification champions have built over the last decade is mainly barriers to entry and primarily about methodological tastes.”

The hardcore statistical wing of metascience has strong, peculiar normative beliefs about what science should be. Somehow, we fix all of science if we preregister every possible significance test and publish all observed outcomes. This view is not scientific, of course. There is no evidence whatsoever that ornate causal identification strategies, complex regressions, and dozens of pages of robustness checks “fix” quantitative social science. And yet, the proponents disguise their irrational normative claims about mathematical statistics in a language of rationality. They claim all knowledge apparently hinges on significance tests and complex causal inference machinery.

However, as Copelvich put it, the credibility revolution’s central tenet that the only meaningful test of causation is a randomized clinical trial, whether real or imagined, is a matter of taste. There are many confusing mathematical tools you can learn to play this charade of credibility, but it’s just a framework of expertise that crowds out alternative explanation, interpretation, and intervention.

There is an allure to the quantitative end of metascience. Scientists feel they are best equipped to clean their own house. They think the historians, qualitative sociologists, and theorists who have described the nuances, idiosyncrasies, and power structures of contemporary science are all postmodernist weirdos ready to eat up the next Sokal Hoax. And yet, maybe we need those historians, philosophers, and sociologists and their disciplinary norms to understand more clearly the normative world that mainstream metascience wants.

Subscribe now

By Ben Recht

Joe Halpern (1953-2026)

from Computational Complexity

♦Computer Science Professor Joseph Halpern passed away on Friday after a long battle with cancer. He was a leader in the mathematical reasoning about knowledge. His paper with Yoram Moses, Knowledge and Common Knowledge in a Distributed Environment, received both the 1997 Gödel Prize and the 2009 Dijkstra Prize. Halpern also co-authored a comprehensive book on the topic.

Halpern helped create a model of knowledge representation which consisted of a set of states of the world, and each person has a partition into a collection of sets of states, where states are in the same partition if that person can't distinguish between those states. You can use this system to define knowledge and common knowledge, and model problems like muddy children. It also serves as a great framework for temporal logic. 

Halpern led the creation of the Computing Research Repository (CoRR), a forerunner of arXiv, and would later moderate CS papers for arXiv. 

Joe Halpern was the driving force behind the Theoretical Aspects of Rationality and Knowledge (TARK) conference, which attracts philosophers, economists, computer scientists and others to discuss what it means to know stuff. I had two papers in TARK 2009 in Stanford. But my favorite TARK memory came from a debate at the 1998 TARK conference at Northwestern. 

Consider the centipede game, where two players alternate turns where each can either play to the right (R/r), or defect (D/d) to end the game immediately, with payouts in the diagram below.

♦The game is solved by backward induction, working out that in each subgame the player does better defecting.

The debate asked the following. Player 1 needs to think about the backward induction of the future moves, considering the case where player 2 played right in its first move. But this is an irrational move, so why should you assume player 2 is being rational when playing its second move later on?
Someone said such reasoning is fine, like when we assume that square root of two is rational, in order to get a contradiction. The counter argument: Square root of two does not "choose" to be irrational.
Thank you Joe for helping us think about knowledge and giving us the forums to do so.

By Lance Fortnow

Computer Science Professor Joseph Halpern passed away on Friday after a long battle with cancer. He was a leader in the mathematical reasoning about knowledge. His paper with Yoram Moses, Knowledge and Common Knowledge in a Distributed Environment, received both the 1997 Gödel Prize and the 2009 Dijkstra Prize. Halpern also co-authored a comprehensive book on the topic.

Halpern helped create a model of knowledge representation which consisted of a set of states of the world, and each person has a partition into a collection of sets of states, where states are in the same partition if that person can't distinguish between those states. You can use this system to define knowledge and common knowledge, and model problems like muddy children. It also serves as a great framework for temporal logic

Halpern led the creation of the Computing Research Repository (CoRR), a forerunner of arXiv, and would later moderate CS papers for arXiv. 

Joe Halpern was the driving force behind the Theoretical Aspects of Rationality and Knowledge (TARK) conference, which attracts philosophers, economists, computer scientists and others to discuss what it means to know stuff. I had two papers in TARK 2009 in Stanford. But my favorite TARK memory came from a debate at the 1998 TARK conference at Northwestern. 

Consider the centipede game, where two players alternate turns where each can either play to the right (R/r), or defect (D/d) to end the game immediately, with payouts in the diagram below.

The game is solved by backward induction, working out that in each subgame the player does better defecting.

The debate asked the following. Player 1 needs to think about the backward induction of the future moves, considering the case where player 2 played right in its first move. But this is an irrational move, so why should you assume player 2 is being rational when playing its second move later on?

Someone said such reasoning is fine, like when we assume that square root of two is rational, in order to get a contradiction. The counter argument: Square root of two does not "choose" to be irrational.

Thank you Joe for helping us think about knowledge and giving us the forums to do so.

By Lance Fortnow

postdoc at ENS Lyon (apply by April 3, 2026)

from CCI: jobs

The LIP laboratory is opening a one-year post-doctoral fellowship in Computer Science at Ecole Normale Supérieure de Lyon, France. All research themes of the laboratory are eligible. Website: www.ens-lyon.fr/LIP/index.php/latest-news/725-offre-post-doc-lip-2026-red Email: [isabelle.guerin-lassous,loris.marchal]@ens-lyon.fr

The LIP laboratory is opening a one-year post-doctoral fellowship in Computer Science at Ecole Normale Supérieure de Lyon, France. All research themes of the laboratory are eligible.

Website: https://www.ens-lyon.fr/LIP/index.php/latest-news/725-offre-post-doc-lip-2026-red
Email: [isabelle.guerin-lassous,loris.marchal]@ens-lyon.fr

By shacharlovett

TR26-023 | Separations above TFNP from Sherali-Adams Lower Bounds | Anna Gal, Noah Fleming, Deniz Imrek, Christophe Marciot

from ECCC Papers

Unlike in TFNP, for which there is an abundance of problems capturing natural existence principles which are incomparable (in the black-box setting), Kleinberg et al. [KKMP21] observed that many of the natural problems considered so far in the second level of the total function polynomial hierarchy (TF$\Sigma_2$) reduce to the Strong Avoid problem. In this work, we prove that the Linear Ordering Principle does not reduce to Strong Avoid in the black-box setting, exhibiting the first TF$\Sigma_2$ problem that lies outside of the class of problems reducible to Strong Avoid. The proof of our separation exploits a connection between total search problems in the polynomial hierarchy and proof complexity, recently developed by Fleming, Imrek, and Marciot [FIM25]. In particular, this implies that to show our separation, it suffices to show that there is no small proof of the Linear Ordering Principle in a $\Sigma_2$-variant of the Sherali-Adams proof system. To do so, we extend the classical pseudo-expectation method to the $\Sigma_2$ setting, showing that the existence of a $\Sigma_2$ pseudo-expectation precludes a $\Sigma_2$ Sherali-Adams proof. The main technical challenge is in proving the existence of such a pseudo-expectation, we manage to do so by solving a combinatorial covering problem about permutations. We also show that the extended pseudo-expectation bound implies that the Linear Ordering Principle cannot be reduced to any problem admitting a low-degree Sherali-Adams refutation.

Unlike in TFNP, for which there is an abundance of problems capturing natural existence principles which are incomparable (in the black-box setting), Kleinberg et al. [KKMP21] observed that many of the natural problems considered so far in the second level of the total function polynomial hierarchy (TF$\Sigma_2$) reduce to the Strong Avoid problem. In this work, we prove that the Linear Ordering Principle does not reduce to Strong Avoid in the black-box setting, exhibiting the first TF$\Sigma_2$ problem that lies outside of the class of problems reducible to Strong Avoid. The proof of our separation exploits a connection between total search problems in the polynomial hierarchy and proof complexity, recently developed by Fleming, Imrek, and Marciot [FIM25]. In particular, this implies that to show our separation, it suffices to show that there is no small proof of the Linear Ordering Principle in a $\Sigma_2$-variant of the Sherali-Adams proof system. To do so, we extend the classical pseudo-expectation method to the $\Sigma_2$ setting, showing that the existence of a $\Sigma_2$ pseudo-expectation precludes a $\Sigma_2$ Sherali-Adams proof. The main technical challenge is in proving the existence of such a pseudo-expectation, we manage to do so by solving a combinatorial covering problem about permutations. We also show that the extended pseudo-expectation bound implies that the Linear Ordering Principle cannot be reduced to any problem admitting a low-degree Sherali-Adams refutation.

Polynomial-time isomorphism test for $k$-generated extensions of abelian groups

from arXiv: Computational Complexity

Authors: Saveliy V. Skresanov

The group isomorphism problem asks whether two finite groups given by their Cayley tables are isomorphic or not. Although there are polynomial-time algorithms for some specific group classes, the best known algorithm for testing isomorphism of arbitrary groups of order $ n $ has time complexity $ n^{O(\log n)} $. We consider the group isomorphism problem for some extensions of abelian groups by $ k $-generated groups for bounded $ k $. In particular, we prove that one can decide isomorphism of abelian-by-cyclic extensions in polynomial time, generalizing a 2009 result of Le Gall for coprime extensions. As another application, we give a polynomial-time isomorphism test for abelian-by-simple group extensions, generalizing a 2017 result of Grochow and Qiao for central extensions. The main novelty of the proof is a polynomial-time algorithm for computing the unit group of a finite ring, which might be of independent interest.

Authors: Saveliy V. Skresanov

The group isomorphism problem asks whether two finite groups given by their Cayley tables are isomorphic or not. Although there are polynomial-time algorithms for some specific group classes, the best known algorithm for testing isomorphism of arbitrary groups of order $ n $ has time complexity $ n^{O(\log n)} $. We consider the group isomorphism problem for some extensions of abelian groups by $ k $-generated groups for bounded $ k $. In particular, we prove that one can decide isomorphism of abelian-by-cyclic extensions in polynomial time, generalizing a 2009 result of Le Gall for coprime extensions. As another application, we give a polynomial-time isomorphism test for abelian-by-simple group extensions, generalizing a 2017 result of Grochow and Qiao for central extensions. The main novelty of the proof is a polynomial-time algorithm for computing the unit group of a finite ring, which might be of independent interest.

Efficient quantum circuits for high-dimensional representations of SU(n) and Ramanujan quantum expanders

from arXiv: Computational Complexity

Authors: Vishnu Iyer, Siddhartha Jain, Stephen Jordan, Rolando Somma

We present efficient quantum circuits that implement high-dimensional unitary irreducible representations (irreps) of $SU(n)$, where $n \ge 2$ is constant. For dimension $N$ and error $ε$, the number of quantum gates in our circuits is polynomial in $\log(N)$ and $\log(1/ε)$. Our construction relies on the Jordan-Schwinger representation, which allows us to realize irreps of $SU(n)$ in the Hilbert space of $n$ quantum harmonic oscillators. Together with a recent efficient quantum Hermite transform, which allows us to map the computational basis states to the eigenstates of the quantum harmonic oscillator, this allows us to implement these irreps efficiently. Our quantum circuits can be used to construct explicit Ramanujan quantum expanders, a longstanding open problem. They can also be used to fast-forward the evolution of certain quantum systems.

Authors: Vishnu Iyer, Siddhartha Jain, Stephen Jordan, Rolando Somma

We present efficient quantum circuits that implement high-dimensional unitary irreducible representations (irreps) of $SU(n)$, where $n \ge 2$ is constant. For dimension $N$ and error $ε$, the number of quantum gates in our circuits is polynomial in $\log(N)$ and $\log(1/ε)$. Our construction relies on the Jordan-Schwinger representation, which allows us to realize irreps of $SU(n)$ in the Hilbert space of $n$ quantum harmonic oscillators. Together with a recent efficient quantum Hermite transform, which allows us to map the computational basis states to the eigenstates of the quantum harmonic oscillator, this allows us to implement these irreps efficiently. Our quantum circuits can be used to construct explicit Ramanujan quantum expanders, a longstanding open problem. They can also be used to fast-forward the evolution of certain quantum systems.

Natural Privacy Filters Are Not Always Free: A Characterization of Free Natural Filters

from arXiv: Data Structures and Algorithms

Authors: Matthew Regehr, Bingshan Hu, Ethan Leeman, Pasin Manurangsi, Pierre Tholoniat, Mathias Lécuyer

We study natural privacy filters, which enable the exact composition of differentially private (DP) mechanisms with adaptively chosen privacy characteristics. Earlier privacy filters consider only simple privacy parameters such as Rényi-DP or Gaussian DP parameters. Natural filters account for the entire privacy profile of every query, promising greater utility for a given privacy budget. We show that, contrary to other forms of DP, natural privacy filters are not free in general. Indeed, we show that only families of privacy mechanisms that are well-ordered when composed admit free natural privacy filters.

Authors: Matthew Regehr, Bingshan Hu, Ethan Leeman, Pasin Manurangsi, Pierre Tholoniat, Mathias Lécuyer

We study natural privacy filters, which enable the exact composition of differentially private (DP) mechanisms with adaptively chosen privacy characteristics. Earlier privacy filters consider only simple privacy parameters such as Rényi-DP or Gaussian DP parameters. Natural filters account for the entire privacy profile of every query, promising greater utility for a given privacy budget. We show that, contrary to other forms of DP, natural privacy filters are not free in general. Indeed, we show that only families of privacy mechanisms that are well-ordered when composed admit free natural privacy filters.

Local Node Differential Privacy

from arXiv: Data Structures and Algorithms

Authors: Sofya Raskhodnikova, Adam Smith, Connor Wagaman, Anatoly Zavyalov

We initiate an investigation of node differential privacy for graphs in the local model of private data analysis. In our model, dubbed LNDP, each node sees its own edge list and releases the output of a local randomizer on this input. These outputs are aggregated by an untrusted server to obtain a final output. We develop a novel algorithmic framework for this setting that allows us to accurately answer arbitrary linear queries on a blurry approximation of the input graph's degree distribution. For some natural problems, the resulting algorithms match the accuracy achievable with node privacy in the central model, where data are held and processed by a trusted server. We also prove lower bounds on the error required by LNDP that imply the optimality of our algorithms for several fundamental graph statistics. We then lift these lower bounds to the interactive LNDP setting, demonstrating the optimality of our algorithms even when constantly many rounds of interaction are permitted. Obtaining our lower bounds requires new approaches, since those developed for the usual local model do not apply to the inherently overlapping inputs that arise from graphs. Finally, we prove structural results that reveal qualitative differences between local node privacy and the standard local model for tabular data.

Authors: Sofya Raskhodnikova, Adam Smith, Connor Wagaman, Anatoly Zavyalov

We initiate an investigation of node differential privacy for graphs in the local model of private data analysis. In our model, dubbed LNDP, each node sees its own edge list and releases the output of a local randomizer on this input. These outputs are aggregated by an untrusted server to obtain a final output. We develop a novel algorithmic framework for this setting that allows us to accurately answer arbitrary linear queries on a blurry approximation of the input graph's degree distribution. For some natural problems, the resulting algorithms match the accuracy achievable with node privacy in the central model, where data are held and processed by a trusted server. We also prove lower bounds on the error required by LNDP that imply the optimality of our algorithms for several fundamental graph statistics. We then lift these lower bounds to the interactive LNDP setting, demonstrating the optimality of our algorithms even when constantly many rounds of interaction are permitted. Obtaining our lower bounds requires new approaches, since those developed for the usual local model do not apply to the inherently overlapping inputs that arise from graphs. Finally, we prove structural results that reveal qualitative differences between local node privacy and the standard local model for tabular data.

A Weighted-to-Unweighted Reduction for Matroid Intersection

from arXiv: Data Structures and Algorithms

Authors: Aditi Dudeja, Mara Grilnberger

Given two matroids $\mathcal{M}_1$ and $\mathcal{M}_2$ over the same ground set, the matroid intersection problem is to find the maximum cardinality common independent set. In the weighted version of the problem, the goal is to find a maximum weight common independent set. It has been a matter of interest to find efficient approximation algorithms for this problem in various settings. In many of these models, there is a gap between the best known results for the unweighted and weighted versions. In this work, we address the question of closing this gap. Our main result is a reduction which converts any $α$-approximate unweighted matroid intersection algorithm into an $α(1-\varepsilon)$-approximate weighted matroid intersection algorithm, while increasing the runtime of the algorithm by a $\log W$ factor, where $W$ is the aspect ratio. Our framework is versatile and translates to settings such as streaming and one-way communication complexity where matroid intersection is well-studied. As a by-product of our techniques, we derive new results for weighted matroid intersection in these models.

Authors: Aditi Dudeja, Mara Grilnberger

Given two matroids $\mathcal{M}_1$ and $\mathcal{M}_2$ over the same ground set, the matroid intersection problem is to find the maximum cardinality common independent set. In the weighted version of the problem, the goal is to find a maximum weight common independent set. It has been a matter of interest to find efficient approximation algorithms for this problem in various settings. In many of these models, there is a gap between the best known results for the unweighted and weighted versions. In this work, we address the question of closing this gap. Our main result is a reduction which converts any $α$-approximate unweighted matroid intersection algorithm into an $α(1-\varepsilon)$-approximate weighted matroid intersection algorithm, while increasing the runtime of the algorithm by a $\log W$ factor, where $W$ is the aspect ratio. Our framework is versatile and translates to settings such as streaming and one-way communication complexity where matroid intersection is well-studied. As a by-product of our techniques, we derive new results for weighted matroid intersection in these models.

Fair Correlation Clustering Meets Graph Parameters

from arXiv: Data Structures and Algorithms

Authors: Johannes Blaha, Robert Ganian, Katharina Gillig, Jonathan S. Højlev, Simon Wietheger

We study the generalization of Correlation Clustering which incorporates fairness constraints via the notion of fairlets. The corresponding Fair Correlation Clustering problem has been studied from several perspectives to date, but has so far lacked a detailed analysis from the parameterized complexity paradigm. We close this gap by providing tractability results for the problem under a variety of structural graph parameterizations, including treewidth, treedepth and the vertex cover number; our results lie at the very edge of tractability given the known NP-hardness of the problem on severely restricted inputs.

Authors: Johannes Blaha, Robert Ganian, Katharina Gillig, Jonathan S. Højlev, Simon Wietheger

We study the generalization of Correlation Clustering which incorporates fairness constraints via the notion of fairlets. The corresponding Fair Correlation Clustering problem has been studied from several perspectives to date, but has so far lacked a detailed analysis from the parameterized complexity paradigm. We close this gap by providing tractability results for the problem under a variety of structural graph parameterizations, including treewidth, treedepth and the vertex cover number; our results lie at the very edge of tractability given the known NP-hardness of the problem on severely restricted inputs.

Memory Reallocation with Polylogarithmic Overhead

from arXiv: Data Structures and Algorithms

Authors: Ce Jin

The Memory Reallocation problem asks to dynamically maintain an assignment of given objects of various sizes to non-overlapping contiguous chunks of memory, while supporting updates (insertions/deletions) in an online fashion. The total size of live objects at any time is guaranteed to be at most a $1-ε$ fraction of the total memory. To handle an online update, the allocator may rearrange the objects in memory to make space, and the overhead for this update is defined as the total size of moved objects divided by the size of the object being inserted/deleted. Our main result is an allocator with worst-case expected overhead $\mathrm{polylog}(ε^{-1})$. This exponentially improves the previous worst-case expected overhead $\tilde O(ε^{-1/2})$ achieved by Farach-Colton, Kuszmaul, Sheffield, and Westover (2024), narrowing the gap towards the $Ω(\logε^{-1})$ lower bound. Our improvement is based on an application of the sunflower lemma previously used by Erdős and Sárközy (1992) in the context of subset sums. Our allocator achieves polylogarithmic overhead only in expectation, and sometimes performs expensive rebuilds. Our second technical result shows that this is necessary: it is impossible to achieve subpolynomial overhead with high probability.

Authors: Ce Jin

The Memory Reallocation problem asks to dynamically maintain an assignment of given objects of various sizes to non-overlapping contiguous chunks of memory, while supporting updates (insertions/deletions) in an online fashion. The total size of live objects at any time is guaranteed to be at most a $1-ε$ fraction of the total memory. To handle an online update, the allocator may rearrange the objects in memory to make space, and the overhead for this update is defined as the total size of moved objects divided by the size of the object being inserted/deleted. Our main result is an allocator with worst-case expected overhead $\mathrm{polylog}(ε^{-1})$. This exponentially improves the previous worst-case expected overhead $\tilde O(ε^{-1/2})$ achieved by Farach-Colton, Kuszmaul, Sheffield, and Westover (2024), narrowing the gap towards the $Ω(\logε^{-1})$ lower bound. Our improvement is based on an application of the sunflower lemma previously used by Erdős and Sárközy (1992) in the context of subset sums. Our allocator achieves polylogarithmic overhead only in expectation, and sometimes performs expensive rebuilds. Our second technical result shows that this is necessary: it is impossible to achieve subpolynomial overhead with high probability.

Self-dual Stacked Quantum Low-Density Parity-Check Codes

from arXiv: Data Structures and Algorithms

Authors: Ze-Chuan Liu, Chong-Yuan Xu, Yong Xu

Quantum low-density parity-check (qLDPC) codes are promising candidates for fault-tolerant quantum computation due to their high encoding rates and distances. However, implementing logical operations using qLDPC codes presents significant challenges. Previous research has demonstrated that self-dual qLDPC codes facilitate the implementation of transversal Clifford gates. Here we introduce a method for constructing self-dual qLDPC codes by stacking non-self-dual qLDPC codes. Leveraging this methodology, we develop double-chain bicycle codes, double-layer bivariate bicycle (BB) codes, double-layer twisted BB codes, and double-layer reflection codes, many of which exhibit favorable code parameters. Additionally, we conduct numerical calculations to assess the performance of these codes as quantum memory under the circuit-level noise model, revealing that the logical failure rate can be significantly reduced with high pseudo-thresholds.

Authors: Ze-Chuan Liu, Chong-Yuan Xu, Yong Xu

Quantum low-density parity-check (qLDPC) codes are promising candidates for fault-tolerant quantum computation due to their high encoding rates and distances. However, implementing logical operations using qLDPC codes presents significant challenges. Previous research has demonstrated that self-dual qLDPC codes facilitate the implementation of transversal Clifford gates. Here we introduce a method for constructing self-dual qLDPC codes by stacking non-self-dual qLDPC codes. Leveraging this methodology, we develop double-chain bicycle codes, double-layer bivariate bicycle (BB) codes, double-layer twisted BB codes, and double-layer reflection codes, many of which exhibit favorable code parameters. Additionally, we conduct numerical calculations to assess the performance of these codes as quantum memory under the circuit-level noise model, revealing that the logical failure rate can be significantly reduced with high pseudo-thresholds.

Testing Monotonicity of Real-Valued Functions on DAGs

from arXiv: Data Structures and Algorithms

Authors: Yuichi Yoshida

We study monotonicity testing of real-valued functions on directed acyclic graphs (DAGs) with $n$ vertices. For every constant $δ>0$, we prove a $Ω(n^{1/2-δ}/\sqrt{\varepsilon})$ lower bound against non-adaptive two-sided testers on DAGs, nearly matching the classical $O(\sqrt{n/\varepsilon})$-query upper bound. For constant $\varepsilon$, we also prove an $Ω(\sqrt n)$ lower bound for randomized adaptive one-sided testers on explicit bipartite DAGs, whereas previously only an $Ω(\log n)$ lower bound was known. A key technical ingredient in both lower bounds is positive-matching Ruzsa--Szemerédi families. On the algorithmic side, we give simple non-adaptive one-sided testers with query complexity $O(\sqrt{m\,\ell}/(\varepsilon n))$ and $O(m^{1/3}/\varepsilon^{2/3})$, where $m$ is the number of edges in the transitive reduction and $\ell$ is the number of edges in the transitive closure. For constant $\varepsilon>0$, these improve over the previous $O(\sqrt{n/\varepsilon})$ bound when $m\ell=o(n^3)$ and $m=o(n^{3/2})$, respectively.

Authors: Yuichi Yoshida

We study monotonicity testing of real-valued functions on directed acyclic graphs (DAGs) with $n$ vertices. For every constant $δ>0$, we prove a $Ω(n^{1/2-δ}/\sqrt{\varepsilon})$ lower bound against non-adaptive two-sided testers on DAGs, nearly matching the classical $O(\sqrt{n/\varepsilon})$-query upper bound. For constant $\varepsilon$, we also prove an $Ω(\sqrt n)$ lower bound for randomized adaptive one-sided testers on explicit bipartite DAGs, whereas previously only an $Ω(\log n)$ lower bound was known. A key technical ingredient in both lower bounds is positive-matching Ruzsa--Szemerédi families. On the algorithmic side, we give simple non-adaptive one-sided testers with query complexity $O(\sqrt{m\,\ell}/(\varepsilon n))$ and $O(m^{1/3}/\varepsilon^{2/3})$, where $m$ is the number of edges in the transitive reduction and $\ell$ is the number of edges in the transitive closure. For constant $\varepsilon>0$, these improve over the previous $O(\sqrt{n/\varepsilon})$ bound when $m\ell=o(n^3)$ and $m=o(n^{3/2})$, respectively.

Revisiting the Sparse Matrix Compression Problem

from arXiv: Data Structures and Algorithms

Authors: Vincent Jugé, Dominik Köppl, Vincent Limouzy, Andrea Marino, Jannik Olblich, Giulia Punzi, Takeaki Uno

The sparse matrix compression problem asks for a one-dimensional representation of a binary $n \times \ell$ matrix, formed by an integer array of row indices and a shift function for each row, such that accessing a matrix entry is possible in constant time by consulting this representation. It has been shown that the decision problem for finding an integer array of length $\ell+ρ$ or restricting the shift function up to values of $ρ$ is NP-complete (cf. the textbook of Garey and Johnson). As a practical heuristic, a greedy algorithm has been proposed to shift the $i$-th row until it forms a solution with its predecessor rows. Despite that this greedy algorithm is cherished for its good approximation in practice, we show that it actually exhibits an approximation ratio of $Θ(\sqrt{\ell+ρ})$. We give further hardness results for parameterizations such as the number of distinct rows or the maximum number of non-zero entries per row. Finally, we devise a DP-algorithm that solves the problem for double-logarithmic matrix widths or logarithmic widths for further restrictions. We study all these findings also under a new perspective by introducing a variant of the problem, where we wish to minimize the length of the resulting integer array by trimming the non-zero borders, which has not been studied in the literature before but has practical motivations.

Authors: Vincent Jugé, Dominik Köppl, Vincent Limouzy, Andrea Marino, Jannik Olblich, Giulia Punzi, Takeaki Uno

The sparse matrix compression problem asks for a one-dimensional representation of a binary $n \times \ell$ matrix, formed by an integer array of row indices and a shift function for each row, such that accessing a matrix entry is possible in constant time by consulting this representation. It has been shown that the decision problem for finding an integer array of length $\ell+ρ$ or restricting the shift function up to values of $ρ$ is NP-complete (cf. the textbook of Garey and Johnson). As a practical heuristic, a greedy algorithm has been proposed to shift the $i$-th row until it forms a solution with its predecessor rows. Despite that this greedy algorithm is cherished for its good approximation in practice, we show that it actually exhibits an approximation ratio of $Θ(\sqrt{\ell+ρ})$. We give further hardness results for parameterizations such as the number of distinct rows or the maximum number of non-zero entries per row. Finally, we devise a DP-algorithm that solves the problem for double-logarithmic matrix widths or logarithmic widths for further restrictions. We study all these findings also under a new perspective by introducing a variant of the problem, where we wish to minimize the length of the resulting integer array by trimming the non-zero borders, which has not been studied in the literature before but has practical motivations.

Near-real-time Solutions for Online String Problems

from arXiv: Data Structures and Algorithms

Authors: Dominik Köppl, Gregory Kucherov

Based on the Breslauer-Italiano online suffix tree construction algorithm (2013) with double logarithmic worst-case guarantees on the update time per letter, we develop near-real-time algorithms for several classical problems on strings, including the computation of the longest repeating suffix array, the (reversed) Lempel-Ziv 77 factorization, and the maintenance of minimal unique substrings, all in an online manner. Our solutions improve over the best known running times for these problems in terms of the worst-case time per letter, for which we achieve a poly-log-logarithmic time complexity, within a linear space. Best known results for these problems require a poly-logarithmic time complexity per letter or only provide amortized complexity bounds. As a result of independent interest, we give conversions between the longest previous factor array and the longest repeating suffix array in space and time bounds based on their irreducible representations, which can have sizes sublinear in the length of the input string.

Authors: Dominik Köppl, Gregory Kucherov

Based on the Breslauer-Italiano online suffix tree construction algorithm (2013) with double logarithmic worst-case guarantees on the update time per letter, we develop near-real-time algorithms for several classical problems on strings, including the computation of the longest repeating suffix array, the (reversed) Lempel-Ziv 77 factorization, and the maintenance of minimal unique substrings, all in an online manner. Our solutions improve over the best known running times for these problems in terms of the worst-case time per letter, for which we achieve a poly-log-logarithmic time complexity, within a linear space. Best known results for these problems require a poly-logarithmic time complexity per letter or only provide amortized complexity bounds. As a result of independent interest, we give conversions between the longest previous factor array and the longest repeating suffix array in space and time bounds based on their irreducible representations, which can have sizes sublinear in the length of the input string.

Tensor Decomposition for Non-Clifford Gate Minimization

from arXiv: Data Structures and Algorithms

Authors: Kirill Khoruzhii, Patrick Gelß, Sebastian Pokutta

Fault-tolerant quantum computation requires minimizing non-Clifford gates, whose implementation via magic state distillation dominates the resource costs. While $T$-count minimization is well-studied, dedicated $CCZ$ factories shift the natural target to direct Toffoli minimization. We develop algebraic methods for this problem, building on a connection between Toffoli count and tensor decomposition over $\mathbb{F}_2$. On standard benchmarks, these methods match or improve all reported results for both Toffoli and $T$-count, with most circuits completing in under a minute on a single CPU instead of thousands of TPUs used by prior work.

Authors: Kirill Khoruzhii, Patrick Gelß, Sebastian Pokutta

Fault-tolerant quantum computation requires minimizing non-Clifford gates, whose implementation via magic state distillation dominates the resource costs. While $T$-count minimization is well-studied, dedicated $CCZ$ factories shift the natural target to direct Toffoli minimization. We develop algebraic methods for this problem, building on a connection between Toffoli count and tensor decomposition over $\mathbb{F}_2$. On standard benchmarks, these methods match or improve all reported results for both Toffoli and $T$-count, with most circuits completing in under a minute on a single CPU instead of thousands of TPUs used by prior work.

Tuesday, February 17

The antiferromagnetic Ising model beyond line graphs

from arXiv: Computational Complexity

Authors: Mark Jerrum

Both the antiferromagnetic Ising model and the hard-core model could be said to be tractable on line graphs of bounded degree. For example, Glauber dynamics is rapidly mixing in both cases. In the case of the hard-core model, we know that tractability extends further, to claw-free graphs and somewhat beyond. In contrast, it is shown here that the corresponding extensions are not possible in the case of the antiferromagnetic Ising model.

Authors: Mark Jerrum

Both the antiferromagnetic Ising model and the hard-core model could be said to be tractable on line graphs of bounded degree. For example, Glauber dynamics is rapidly mixing in both cases. In the case of the hard-core model, we know that tractability extends further, to claw-free graphs and somewhat beyond. In contrast, it is shown here that the corresponding extensions are not possible in the case of the antiferromagnetic Ising model.

Geometric Characterization of Context-Free Intersections via the Inner Segment Dichotomy

from arXiv: Computational Complexity

Authors: Jorge Miguel Silva

The intersection of two context-free languages is not generally context-free, but no geometric criterion has characterized when it remains so. The crossing gap (max(i'-i, j'-j) for two crossing push-pop arcs) is the natural candidate. We refute this: we exhibit CFLs whose intersection is CFL despite unbounded-gap crossings. The governing quantity is the inner segment measure: for crossing arcs inducing a decomposition w = P1 P2 P3 P4, it is max(|P2|,|P3|), the length of the longer inner segment between interleaved crossing endpoints. We prove a dichotomy for this measure: bounded inner segments imply context-freeness via a finite buffer construction; growing inner segments with pump-sensitive linkages imply non-context-freeness. The inner segment concept applies to all CFL intersections; the strictness of the resulting characterization depends on the language class. For block-counting CFLs (languages requiring equality among designated pairs of block lengths), the dichotomy is complete: the intersection is CFL if and only if the combined arcs are jointly well-nested. For general CFLs, the CFL direction is unconditional; the non-CFL direction requires pump-sensitive linkages whose necessity is the main open problem, reducing the general CFL intersection problem to a specific property of pump-sensitive decompositions.

Authors: Jorge Miguel Silva

The intersection of two context-free languages is not generally context-free, but no geometric criterion has characterized when it remains so. The crossing gap (max(i'-i, j'-j) for two crossing push-pop arcs) is the natural candidate. We refute this: we exhibit CFLs whose intersection is CFL despite unbounded-gap crossings. The governing quantity is the inner segment measure: for crossing arcs inducing a decomposition w = P1 P2 P3 P4, it is max(|P2|,|P3|), the length of the longer inner segment between interleaved crossing endpoints. We prove a dichotomy for this measure: bounded inner segments imply context-freeness via a finite buffer construction; growing inner segments with pump-sensitive linkages imply non-context-freeness. The inner segment concept applies to all CFL intersections; the strictness of the resulting characterization depends on the language class. For block-counting CFLs (languages requiring equality among designated pairs of block lengths), the dichotomy is complete: the intersection is CFL if and only if the combined arcs are jointly well-nested. For general CFLs, the CFL direction is unconditional; the non-CFL direction requires pump-sensitive linkages whose necessity is the main open problem, reducing the general CFL intersection problem to a specific property of pump-sensitive decompositions.

Graph Homomorphisms and Universal Algebra

from arXiv: Computational Complexity

Authors: Manuel Bodirsky

Constraint satisfaction problems are computational problems that naturally appear in many areas of theoretical computer science. One of the central themes is their computational complexity, and in particular the border between polynomial-time tractability and NP-hardness. In this course we introduce the universal-algebraic approach to study the computational complexity of finite-domain CSPs. The course covers in particular the cyclic terms and bounded width theorems. To keep the presentation accessible, we start the course in the tangible setting of directed graphs and graph homomorphism problems.

Authors: Manuel Bodirsky

Constraint satisfaction problems are computational problems that naturally appear in many areas of theoretical computer science. One of the central themes is their computational complexity, and in particular the border between polynomial-time tractability and NP-hardness. In this course we introduce the universal-algebraic approach to study the computational complexity of finite-domain CSPs. The course covers in particular the cyclic terms and bounded width theorems. To keep the presentation accessible, we start the course in the tangible setting of directed graphs and graph homomorphism problems.

Affine Rank Minimization is ER Complete

from arXiv: Computational Complexity

Authors: Angshul Majumdar

We study the decision problem Affine Rank Minimization, denoted ARM(k). The input consists of rational matrices A_1,...,A_q in Q^{m x n} and rational scalars b_1,...,b_q in Q. The question is whether there exists a real matrix X in R^{m x n} such that trace(A_l^T X) = b_l for all l in {1,...,q} and rank(X) <= k. We first prove membership: for every fixed k >= 1, ARM(k) lies in the existential theory of the reals by giving an explicit existential encoding of the rank constraint using a constant-size factorization witness. We then prove existential-theory-of-reals hardness via a polynomial-time many-one reduction from ETR to ARM(k), where the target instance uses only affine equalities together with a single global constraint rank(X) <= k. The reduction compiles an ETR formula into an arithmetic circuit in gate-equality normal form and assigns each circuit quantity to a designated entry of X. Affine semantics (constants, copies, addition, and negation) are enforced by linear constraints, while multiplicative semantics are enforced by constant-size rank-forcing gadgets. Soundness is certified by a fixed-rank gauge submatrix that removes factorization ambiguity. We prove a composition lemma showing that gadgets can be embedded without unintended interactions, yielding global soundness and completeness while preserving polynomial bounds on dimension and bit-length. Consequently, ARM(k) is complete for the existential theory of the reals; in particular, ARM(3) is complete. This shows that feasibility of purely affine constraints under a fixed constant rank bound captures the full expressive power of real algebraic feasibility.

Authors: Angshul Majumdar

We study the decision problem Affine Rank Minimization, denoted ARM(k). The input consists of rational matrices A_1,...,A_q in Q^{m x n} and rational scalars b_1,...,b_q in Q. The question is whether there exists a real matrix X in R^{m x n} such that trace(A_l^T X) = b_l for all l in {1,...,q} and rank(X) <= k. We first prove membership: for every fixed k >= 1, ARM(k) lies in the existential theory of the reals by giving an explicit existential encoding of the rank constraint using a constant-size factorization witness. We then prove existential-theory-of-reals hardness via a polynomial-time many-one reduction from ETR to ARM(k), where the target instance uses only affine equalities together with a single global constraint rank(X) <= k. The reduction compiles an ETR formula into an arithmetic circuit in gate-equality normal form and assigns each circuit quantity to a designated entry of X. Affine semantics (constants, copies, addition, and negation) are enforced by linear constraints, while multiplicative semantics are enforced by constant-size rank-forcing gadgets. Soundness is certified by a fixed-rank gauge submatrix that removes factorization ambiguity. We prove a composition lemma showing that gadgets can be embedded without unintended interactions, yielding global soundness and completeness while preserving polynomial bounds on dimension and bit-length. Consequently, ARM(k) is complete for the existential theory of the reals; in particular, ARM(3) is complete. This shows that feasibility of purely affine constraints under a fixed constant rank bound captures the full expressive power of real algebraic feasibility.

An Algebraic Rigidity Framework for Order-Oblivious Deterministic Black-Box PIT of ROABPs

from arXiv: Computational Complexity

Authors: Shalender Singh, Vishnupriya Singh

Deterministic black-box polynomial identity testing (PIT) for read-once oblivious algebraic branching programs (ROABPs) is a central open problem in algebraic complexity, particularly in the absence of variable ordering. Prior deterministic algorithms either rely on order information or incur significant overhead through combinatorial isolation techniques. In this paper, we introduce an algebraic rigidity framework for ROABPs based on the internal structure of their associated matrix word algebras. We show that nonzero width-$w$ ROABPs induce word algebras whose effective algebraic degrees of freedom collapse to dimension at most $w^2$, independent of the number of variables. This rigidity enables deterministic witness construction via intrinsic algebraic invariants, bypassing rank concentration, isolation lemmas, and probabilistic tools used in previous work.Thus, we obtain the first order-oblivious deterministic black-box PIT algorithm for ROABPs, running in quasi-polynomial time $n\cdot(wd)^{O(w^2)}$. This establishes that algebraic rigidity alone suffices to derandomize PIT in this model, without assuming ordering information. The framework further isolates a single remaining obstacle to full polynomial-time complexity. We formulate a Modular Stability Conjecture, asserting that width-$w$ ROABPs are stable under hashing into cyclic quotient rings $\mathbb{K}[λ]/< λ^r-1 >$ once the modulus exceeds a polynomial threshold in $w$ and the individual degree. This conjecture arises naturally from the low-dimensional coefficient structure revealed by rigidity and is supported by extensive empirical evidence. Assuming the conjecture, our methods yield a fully polynomial-time deterministic black-box PIT algorithm for ROABPs, matching the complexity of the best-known white-box algorithms and reducing the black-box problem to a concrete algebraic stability question.

Authors: Shalender Singh, Vishnupriya Singh

Deterministic black-box polynomial identity testing (PIT) for read-once oblivious algebraic branching programs (ROABPs) is a central open problem in algebraic complexity, particularly in the absence of variable ordering. Prior deterministic algorithms either rely on order information or incur significant overhead through combinatorial isolation techniques. In this paper, we introduce an algebraic rigidity framework for ROABPs based on the internal structure of their associated matrix word algebras. We show that nonzero width-$w$ ROABPs induce word algebras whose effective algebraic degrees of freedom collapse to dimension at most $w^2$, independent of the number of variables. This rigidity enables deterministic witness construction via intrinsic algebraic invariants, bypassing rank concentration, isolation lemmas, and probabilistic tools used in previous work.Thus, we obtain the first order-oblivious deterministic black-box PIT algorithm for ROABPs, running in quasi-polynomial time $n\cdot(wd)^{O(w^2)}$. This establishes that algebraic rigidity alone suffices to derandomize PIT in this model, without assuming ordering information. The framework further isolates a single remaining obstacle to full polynomial-time complexity. We formulate a Modular Stability Conjecture, asserting that width-$w$ ROABPs are stable under hashing into cyclic quotient rings $\mathbb{K}[λ]/< λ^r-1 >$ once the modulus exceeds a polynomial threshold in $w$ and the individual degree. This conjecture arises naturally from the low-dimensional coefficient structure revealed by rigidity and is supported by extensive empirical evidence. Assuming the conjecture, our methods yield a fully polynomial-time deterministic black-box PIT algorithm for ROABPs, matching the complexity of the best-known white-box algorithms and reducing the black-box problem to a concrete algebraic stability question.

Morphing of and writing with a scissor linkage mechanism

from arXiv: Computational Geometry

Authors: Mohanraj A, S Ganga Prasath

Kinematics of mechanisms is intricately coupled to their geometry and their utility often arises out of the ability to perform reproducible motion with fewer actuating degrees of freedom. In this article, we explore the assembly of scissor-units, each made of two rigid linear members connected by a pin joint. The assembly has a single degree of freedom, where actuating any single unit results in a shape change of the entire assembly. We derive expressions for the effective curvature of the unit and the trajectory of the mechanism's tip as a function of the geometric variables which we then use as the basis to program two tasks in the mechanism: shape morphing and writing. By phrasing these tasks as optimization problems and utilizing the differentiable simulation framework, we arrive at solutions that are then tested in table-top experiments. Our results show that the geometry of scissor assemblies can be leveraged for automated navigation and inspection in complex domains, in light of the optimization framework. However, we highlight that the challenges associated with rapid programming and error-free implementation in experiments without feedback still remain.

Authors: Mohanraj A, S Ganga Prasath

Kinematics of mechanisms is intricately coupled to their geometry and their utility often arises out of the ability to perform reproducible motion with fewer actuating degrees of freedom. In this article, we explore the assembly of scissor-units, each made of two rigid linear members connected by a pin joint. The assembly has a single degree of freedom, where actuating any single unit results in a shape change of the entire assembly. We derive expressions for the effective curvature of the unit and the trajectory of the mechanism's tip as a function of the geometric variables which we then use as the basis to program two tasks in the mechanism: shape morphing and writing. By phrasing these tasks as optimization problems and utilizing the differentiable simulation framework, we arrive at solutions that are then tested in table-top experiments. Our results show that the geometry of scissor assemblies can be leveraged for automated navigation and inspection in complex domains, in light of the optimization framework. However, we highlight that the challenges associated with rapid programming and error-free implementation in experiments without feedback still remain.

Lower Estimates for $L_1$-Distortion of Transportation Cost Spaces

from arXiv: Computational Geometry

Authors: Chris Gartland, Mikhail Ostrovskii

Quantifying the degree of dissimilarity between two probability distributions on a finite metric space is a fundamental task in Computer Science and Computer Vision. A natural dissimilarity measure based on optimal transport is the Earth Mover's Distance (EMD). A key technique for analyzing this metric, pioneered by Charikar (2002) and Indyk and Thaper (2003), involves constructing low-distortion embeddings of EMD(X) into the Lebesgue space $L_1$. It became a key problem to investigate whether the upper bound of $O(\log n)$ can be improved for important classes of metric spaces known to admit low-distortion embeddings into $L_1$. In the context of Computer Vision, grid graphs, especially planar grids, are among the most fundamental. Indyk posed the related problem of estimating the $L_1$-distortion of the space of uniform distributions on $n$-point subsets of $R^2$. The Progress Report, last updated in August 2011, highlighted two key results: first, the work of Khot and Naor (2006) on Hamming cubes, which showed that the $L_1$-distortion for Hamming cubes meets the described above upper estimate, and second, the result of Naor and Schechtman (2007) for planar grids, which established that the $L_1$-distortion of for a planar $n$ by $n$ grid is $Ω(\sqrt{\log n})$. Our first result is the improvement of the lower bound on the $L_1$-distortion for grids to $Ω(\log n)$, matching the universal upper bound up to multiplicative constants. The key ingredient allowing us to obtain these sharp estimates is a new Sobolev-type inequality for scalar-valued functions on the grid graphs. Our method is also applicable to many recursive families of graphs, such as diamond and Laakso graphs. We obtain the sharp distortion estimates of $\log n$ in these cases as well.

Authors: Chris Gartland, Mikhail Ostrovskii

Quantifying the degree of dissimilarity between two probability distributions on a finite metric space is a fundamental task in Computer Science and Computer Vision. A natural dissimilarity measure based on optimal transport is the Earth Mover's Distance (EMD). A key technique for analyzing this metric, pioneered by Charikar (2002) and Indyk and Thaper (2003), involves constructing low-distortion embeddings of EMD(X) into the Lebesgue space $L_1$. It became a key problem to investigate whether the upper bound of $O(\log n)$ can be improved for important classes of metric spaces known to admit low-distortion embeddings into $L_1$. In the context of Computer Vision, grid graphs, especially planar grids, are among the most fundamental. Indyk posed the related problem of estimating the $L_1$-distortion of the space of uniform distributions on $n$-point subsets of $R^2$. The Progress Report, last updated in August 2011, highlighted two key results: first, the work of Khot and Naor (2006) on Hamming cubes, which showed that the $L_1$-distortion for Hamming cubes meets the described above upper estimate, and second, the result of Naor and Schechtman (2007) for planar grids, which established that the $L_1$-distortion of for a planar $n$ by $n$ grid is $Ω(\sqrt{\log n})$. Our first result is the improvement of the lower bound on the $L_1$-distortion for grids to $Ω(\log n)$, matching the universal upper bound up to multiplicative constants. The key ingredient allowing us to obtain these sharp estimates is a new Sobolev-type inequality for scalar-valued functions on the grid graphs. Our method is also applicable to many recursive families of graphs, such as diamond and Laakso graphs. We obtain the sharp distortion estimates of $\log n$ in these cases as well.

Fine-Grained Complexity for Quantum Problems from Size-Preserving Circuit-to-Hamiltonian Constructions

from arXiv: Data Structures and Algorithms

Authors: Nai-Hui Chia, Atsuya Hasegawa, François Le Gall, Yu-Ching Shen

The local Hamiltonian (LH) problem is the canonical $\mathsf{QMA}$-complete problem introduced by Kitaev. In this paper, we show its hardness in a very strong sense: we show that the 3-local Hamiltonian problem on $n$ qubits cannot be solved classically in time $O(2^{(1-\varepsilon)n})$ for any $\varepsilon>0$ under the Strong Exponential-Time Hypothesis (SETH), and cannot be solved quantumly in time $O(2^{(1-\varepsilon)n/2})$ for any $\varepsilon>0$ under the Quantum Strong Exponential-Time Hypothesis (QSETH). These lower bounds give evidence that the currently known classical and quantum algorithms for LH cannot be significantly improved. Furthermore, we are able to demonstrate fine-grained complexity lower bounds for approximating the quantum partition function (QPF) with an arbitrary constant relative error. Approximating QPF with relative error is known to be equivalent to approximately counting the dimension of the solution subspace of $\mathsf{QMA}$ problems. We show the SETH and QSETH hardness to estimate QPF with constant relative error. We then provide a quantum algorithm that runs in $O(\sqrt{2^n})$ time for an arbitrary $1/\mathrm{poly}(n)$ relative error, matching our lower bounds and improving the state-of-the-art algorithm by Bravyi, Chowdhury, Gosset, and Wocjan (Nature Physics 2022) in the low-temperature regime. To prove our fine-grained lower bounds, we introduce the first size-preserving circuit-to-Hamiltonian construction that encodes the computation of a $T$-time quantum circuit acting on $N$ qubits into a $(d+1)$-local Hamiltonian acting on $N+O(T^{1/d})$ qubits. This improves the standard construction based on the unary clock, which uses $N+O(T)$ qubits.

Authors: Nai-Hui Chia, Atsuya Hasegawa, François Le Gall, Yu-Ching Shen

The local Hamiltonian (LH) problem is the canonical $\mathsf{QMA}$-complete problem introduced by Kitaev. In this paper, we show its hardness in a very strong sense: we show that the 3-local Hamiltonian problem on $n$ qubits cannot be solved classically in time $O(2^{(1-\varepsilon)n})$ for any $\varepsilon>0$ under the Strong Exponential-Time Hypothesis (SETH), and cannot be solved quantumly in time $O(2^{(1-\varepsilon)n/2})$ for any $\varepsilon>0$ under the Quantum Strong Exponential-Time Hypothesis (QSETH). These lower bounds give evidence that the currently known classical and quantum algorithms for LH cannot be significantly improved. Furthermore, we are able to demonstrate fine-grained complexity lower bounds for approximating the quantum partition function (QPF) with an arbitrary constant relative error. Approximating QPF with relative error is known to be equivalent to approximately counting the dimension of the solution subspace of $\mathsf{QMA}$ problems. We show the SETH and QSETH hardness to estimate QPF with constant relative error. We then provide a quantum algorithm that runs in $O(\sqrt{2^n})$ time for an arbitrary $1/\mathrm{poly}(n)$ relative error, matching our lower bounds and improving the state-of-the-art algorithm by Bravyi, Chowdhury, Gosset, and Wocjan (Nature Physics 2022) in the low-temperature regime. To prove our fine-grained lower bounds, we introduce the first size-preserving circuit-to-Hamiltonian construction that encodes the computation of a $T$-time quantum circuit acting on $N$ qubits into a $(d+1)$-local Hamiltonian acting on $N+O(T^{1/d})$ qubits. This improves the standard construction based on the unary clock, which uses $N+O(T)$ qubits.

Expander Decomposition with Almost Optimal Overhead

from arXiv: Data Structures and Algorithms

Authors: Nikhil Bansal, Arun Jambulapati, Thatchaphol Saranurak

We present the first polynomial-time algorithm for computing a near-optimal \emph{flow}-expander decomposition. Given a graph $G$ and a parameter $φ$, our algorithm removes at most a $φ\log^{1+o(1)}n$ fraction of edges so that every remaining connected component is a $φ$-\emph{flow}-expander (a stronger guarantee than being a $φ$-\emph{cut}-expander). This achieves overhead $\log^{1+o(1)}n$, nearly matching the $Ω(\log n)$ graph-theoretic lower bound that already holds for cut-expander decompositions, up to a $\log^{o(1)}n$ factor. Prior polynomial-time algorithms required removing $O(φ\log^{1.5}n)$ and $O(φ\log^{2}n)$ fractions of edges to guarantee $φ$-cut-expander and $φ$-flow-expander components, respectively.

Authors: Nikhil Bansal, Arun Jambulapati, Thatchaphol Saranurak

We present the first polynomial-time algorithm for computing a near-optimal \emph{flow}-expander decomposition. Given a graph $G$ and a parameter $φ$, our algorithm removes at most a $φ\log^{1+o(1)}n$ fraction of edges so that every remaining connected component is a $φ$-\emph{flow}-expander (a stronger guarantee than being a $φ$-\emph{cut}-expander). This achieves overhead $\log^{1+o(1)}n$, nearly matching the $Ω(\log n)$ graph-theoretic lower bound that already holds for cut-expander decompositions, up to a $\log^{o(1)}n$ factor. Prior polynomial-time algorithms required removing $O(φ\log^{1.5}n)$ and $O(φ\log^{2}n)$ fractions of edges to guarantee $φ$-cut-expander and $φ$-flow-expander components, respectively.

Robust Value Maximization in Challenge the Champ Tournaments with Probabilistic Outcomes

from arXiv: Data Structures and Algorithms

Authors: Umang Bhaskar, Juhi Chaudhary, Sushmita Gupta, Pallavi Jain, Sanjay Seetharaman

Challenge the Champ is a simple tournament format, where an ordering of the players -- called a seeding -- is decided. The first player in this order is the initial champ, and faces the next player. The outcome of each match decides the current champion, who faces the next player in the order. Each player also has a popularity, and the value of each match is the popularity of the winner. Value maximization in tournaments has been previously studied when each match has a deterministic outcome. However, match outcomes are often probabilistic, rather than deterministic. We study robust value maximization in Challenge the Champ tournaments, when the winner of a match may be probabilistic. That is, we seek to maximize the total value that is obtained, irrespective of the outcome of probabilistic matches. We show that even in simple binary settings, for non-adaptive algorithms, the optimal robust value -- which we term the \textsc{VnaR}, or the value not at risk -- is hard to approximate. However, if we allow adaptive algorithms that determine the order of challengers based on the outcomes of previous matches, or restrict the matches with probabilistic outcomes, we can obtain good approximations to the optimal \textsc{VnaR}.

Authors: Umang Bhaskar, Juhi Chaudhary, Sushmita Gupta, Pallavi Jain, Sanjay Seetharaman

Challenge the Champ is a simple tournament format, where an ordering of the players -- called a seeding -- is decided. The first player in this order is the initial champ, and faces the next player. The outcome of each match decides the current champion, who faces the next player in the order. Each player also has a popularity, and the value of each match is the popularity of the winner. Value maximization in tournaments has been previously studied when each match has a deterministic outcome. However, match outcomes are often probabilistic, rather than deterministic. We study robust value maximization in Challenge the Champ tournaments, when the winner of a match may be probabilistic. That is, we seek to maximize the total value that is obtained, irrespective of the outcome of probabilistic matches. We show that even in simple binary settings, for non-adaptive algorithms, the optimal robust value -- which we term the \textsc{VnaR}, or the value not at risk -- is hard to approximate. However, if we allow adaptive algorithms that determine the order of challengers based on the outcomes of previous matches, or restrict the matches with probabilistic outcomes, we can obtain good approximations to the optimal \textsc{VnaR}.

On the Parameterized Tractability of Packing Vertex-Disjoint A-Paths with Length Constraints

from arXiv: Data Structures and Algorithms

Authors: Susobhan Bandopadhyay, Aritra Banik, Diptapriyo Majumdar, Abhishek Sahu

Given an undirected graph G and a set A \subseteq V(G), an A-path is a path in G that starts and ends at two distinct vertices of A with intermediate vertices in V(G) \setminus A. An A-path is called an (A,\ell)-path if the length of the path is exactly \ell. In the {\sc (A, \ell)-Path Packing} problem (ALPP), we seek to determine whether there exist k vertex-disjoint (A, \ell)-paths in G or not. We pursue this problem with respect to structural parameters. We prove that ALPP is W[1]-hard when it is parameterized by the combined parameter distance to path (dtp) and |A|. In addition, we consider the combined parameters distance to cluster (cvd) + |A| and distance to cluster (cvd) + \ell. For both these combined parameters, we provide FPT algorithms. Finally, we consider the vertex cover number (vc) as the parameter and provide a kernel with O(vc^2) vertices.

Authors: Susobhan Bandopadhyay, Aritra Banik, Diptapriyo Majumdar, Abhishek Sahu

Given an undirected graph G and a set A \subseteq V(G), an A-path is a path in G that starts and ends at two distinct vertices of A with intermediate vertices in V(G) \setminus A. An A-path is called an (A,\ell)-path if the length of the path is exactly \ell. In the {\sc (A, \ell)-Path Packing} problem (ALPP), we seek to determine whether there exist k vertex-disjoint (A, \ell)-paths in G or not. We pursue this problem with respect to structural parameters. We prove that ALPP is W[1]-hard when it is parameterized by the combined parameter distance to path (dtp) and |A|. In addition, we consider the combined parameters distance to cluster (cvd) + |A| and distance to cluster (cvd) + \ell. For both these combined parameters, we provide FPT algorithms. Finally, we consider the vertex cover number (vc) as the parameter and provide a kernel with O(vc^2) vertices.

Constant-Time Dynamic Enumeration of Word Infixes in a Regular Language

from arXiv: Data Structures and Algorithms

Authors: Antoine Amarilli, Sven Dziadek, Luc Segoufin

For a fixed regular language $L$, the enumeration of $L$-infixes is the following task: we are given an input word $w = a_1 \cdots a_n$ and we must enumerate the infixes of $w$ that belong to $L$, i.e., the pairs $i \leq j$ such that $a_i \cdots a_j \in L$. We are interested in dynamic enumeration of $L$-infixes, where we must additionally support letter substitution updates on $w$ (e.g., "replace the $i$-th letter of $w$ by a letter $a$"). Each update changes the set of infixes to enumerate, and resets the enumeration state. We study for which regular languages $L$ we can perform dynamic enumeration of $L$-infixes in constant delay (i.e., the next infix is always produced in constant time) and constant additional memory throughout the enumeration, while supporting each update in constant time. We show that, for languages $L$ with a neutral letter, if the language $L$ belongs to the class ZG and is extensible (i.e., if $u \in L$ and $u$ is a factor of $v$ then $v \in L$), then dynamic enumeration of $L$-infixes can be achieved with a simple algorithm that ensures constant-time updates and constant delay, but not constant additional memory. Our main contribution is then to show an algorithm that additionally uses only constant additional memory, and applies to a more general class of semi-extensible ZG languages for which we give several equivalent characterizations. We further discuss whether our results can be generalized to larger language classes and show some (conditional) lower bounds.

Authors: Antoine Amarilli, Sven Dziadek, Luc Segoufin

For a fixed regular language $L$, the enumeration of $L$-infixes is the following task: we are given an input word $w = a_1 \cdots a_n$ and we must enumerate the infixes of $w$ that belong to $L$, i.e., the pairs $i \leq j$ such that $a_i \cdots a_j \in L$. We are interested in dynamic enumeration of $L$-infixes, where we must additionally support letter substitution updates on $w$ (e.g., "replace the $i$-th letter of $w$ by a letter $a$"). Each update changes the set of infixes to enumerate, and resets the enumeration state. We study for which regular languages $L$ we can perform dynamic enumeration of $L$-infixes in constant delay (i.e., the next infix is always produced in constant time) and constant additional memory throughout the enumeration, while supporting each update in constant time. We show that, for languages $L$ with a neutral letter, if the language $L$ belongs to the class ZG and is extensible (i.e., if $u \in L$ and $u$ is a factor of $v$ then $v \in L$), then dynamic enumeration of $L$-infixes can be achieved with a simple algorithm that ensures constant-time updates and constant delay, but not constant additional memory. Our main contribution is then to show an algorithm that additionally uses only constant additional memory, and applies to a more general class of semi-extensible ZG languages for which we give several equivalent characterizations. We further discuss whether our results can be generalized to larger language classes and show some (conditional) lower bounds.

Evaluation of Dynamic Vector Bin Packing for Virtual Machine Placement

from arXiv: Data Structures and Algorithms

Authors: Zong Yu Lee, Xueyan Tang

Virtual machine placement is a crucial challenge in cloud computing for efficiently utilizing physical machine resources in data centers. Virtual machine placement can be formulated as a MinUsageTime Dynamic Vector Bin Packing (DVBP) problem, aiming to minimize the total usage time of the physical machines. This paper evaluates state-of-the-art MinUsageTime DVBP algorithms in non-clairvoyant, clairvoyant and learning-augmented online settings, where item durations (virtual machine lifetimes) are unknown, known and predicted, respectively. Besides the algorithms taken from the literature, we also develop several new algorithms or enhancements. Empirical experimentation is carried out with real-world datasets of Microsoft Azure. The insights from the experimental results are discussed to explore the structures of algorithms and promising design elements that work well in practice.

Authors: Zong Yu Lee, Xueyan Tang

Virtual machine placement is a crucial challenge in cloud computing for efficiently utilizing physical machine resources in data centers. Virtual machine placement can be formulated as a MinUsageTime Dynamic Vector Bin Packing (DVBP) problem, aiming to minimize the total usage time of the physical machines. This paper evaluates state-of-the-art MinUsageTime DVBP algorithms in non-clairvoyant, clairvoyant and learning-augmented online settings, where item durations (virtual machine lifetimes) are unknown, known and predicted, respectively. Besides the algorithms taken from the literature, we also develop several new algorithms or enhancements. Empirical experimentation is carried out with real-world datasets of Microsoft Azure. The insights from the experimental results are discussed to explore the structures of algorithms and promising design elements that work well in practice.

Near-Linear Time Computation of Welzl Orders on Graphs with Linear Neighborhood Complexity

from arXiv: Data Structures and Algorithms

Authors: Jan Dreier, Clemens Kuske

Orders with low crossing number, introduced by Welzl, are a fundamental tool in range searching and computational geometry. Recently, they have found important applications in structural graph theory: set systems with linear shatter functions correspond to graph classes with linear neighborhood complexity. For such systems, Welzl's theorem guarantees the existence of orders with only $\mathcal{O}(\log^2 n)$ crossings. A series of works has progressively improved the runtime for computing such orders, from Chazelle and Welzl's original $\mathcal{O}(|U|^3 |\mathcal{F}|)$ bound, through Har-Peled's $\mathcal{O}(|U|^2|\mathcal{F}|)$, to the recent sampling-based methods of Csikós and Mustafa. We present a randomized algorithm that computes Welzl orders for set systems with linear primal and dual shatter functions in time $\mathcal{O}(\|S\| \log \|S\|)$, where $\|S\| = |U| + \sum_{X \in \mathcal{F}} |X|$ is the size of the canonical input representation. As an application, we compute compact neighborhood covers in graph classes with (near-)linear neighborhood complexity in time \(\mathcal{O}(n \log n)\) and improve the runtime of first-order model checking on monadically stable graph classes from $\mathcal{O}(n^{5+\varepsilon})$ to $\mathcal{O}(n^{3+\varepsilon})$.

Authors: Jan Dreier, Clemens Kuske

Orders with low crossing number, introduced by Welzl, are a fundamental tool in range searching and computational geometry. Recently, they have found important applications in structural graph theory: set systems with linear shatter functions correspond to graph classes with linear neighborhood complexity. For such systems, Welzl's theorem guarantees the existence of orders with only $\mathcal{O}(\log^2 n)$ crossings. A series of works has progressively improved the runtime for computing such orders, from Chazelle and Welzl's original $\mathcal{O}(|U|^3 |\mathcal{F}|)$ bound, through Har-Peled's $\mathcal{O}(|U|^2|\mathcal{F}|)$, to the recent sampling-based methods of Csikós and Mustafa. We present a randomized algorithm that computes Welzl orders for set systems with linear primal and dual shatter functions in time $\mathcal{O}(\|S\| \log \|S\|)$, where $\|S\| = |U| + \sum_{X \in \mathcal{F}} |X|$ is the size of the canonical input representation. As an application, we compute compact neighborhood covers in graph classes with (near-)linear neighborhood complexity in time \(\mathcal{O}(n \log n)\) and improve the runtime of first-order model checking on monadically stable graph classes from $\mathcal{O}(n^{5+\varepsilon})$ to $\mathcal{O}(n^{3+\varepsilon})$.

FO and MSO Model Checking on Temporal Graphs

from arXiv: Data Structures and Algorithms

Authors: Michelle Döring, Jessica Enright, Laura Larios-Jones, George Skretas

Algorithmic meta-theorems provide an important tool for showing tractability of graph problems on graph classes defined by structural restrictions. While such results are well established for static graphs, corresponding frameworks for temporal graphs are comparatively limited. In this work, we revisit past applications of logical meta-theorems to temporal graphs and develop an extended unifying logical framework. Our first contribution is the introduction of logical encodings for the parameters vertex-interval-membership (VIM) width and tree-interval-membership (TIM) width, parameters which capture the signature of vertex and component activity over time. Building on this, we extend existing monadic second-order (MSO) meta-theorems for bounded lifetime and temporal degree to the parameters VIM and TIM width, and establish novel first-order (FO) meta-theorems for all four parameters. Finally, we signpost a modular lexicon of reusable FO and MSO formulas for a broad range of temporal graph problems, and give an example. This lexicon allows new problems to be expressed compositionally and directly yields fixed-parameter tractability results across the four parameters we consider.

Authors: Michelle Döring, Jessica Enright, Laura Larios-Jones, George Skretas

Algorithmic meta-theorems provide an important tool for showing tractability of graph problems on graph classes defined by structural restrictions. While such results are well established for static graphs, corresponding frameworks for temporal graphs are comparatively limited. In this work, we revisit past applications of logical meta-theorems to temporal graphs and develop an extended unifying logical framework. Our first contribution is the introduction of logical encodings for the parameters vertex-interval-membership (VIM) width and tree-interval-membership (TIM) width, parameters which capture the signature of vertex and component activity over time. Building on this, we extend existing monadic second-order (MSO) meta-theorems for bounded lifetime and temporal degree to the parameters VIM and TIM width, and establish novel first-order (FO) meta-theorems for all four parameters. Finally, we signpost a modular lexicon of reusable FO and MSO formulas for a broad range of temporal graph problems, and give an example. This lexicon allows new problems to be expressed compositionally and directly yields fixed-parameter tractability results across the four parameters we consider.

Faster Pseudo-Deterministic Minimum Cut

from arXiv: Data Structures and Algorithms

Authors: Yotam Kenneth-Mordoch

Pseudo-deterministic algorithms are randomized algorithms that, with high constant probability, output a fixed canonical solution. The study of pseudo-deterministic algorithms for the global minimum cut problem was recently initiated by Agarwala and Varma [ITCS'26], who gave a black-box reduction incurring an $O(\log n \log \log n)$ overhead. We introduce a natural graph-theoretic tie-breaking mechanism that uniquely selects a canonical minimum cut. Using this mechanism, we obtain: (i) A pseudo-deterministic minimum cut algorithm for weighted graphs running in $O(m\log^2 n)$ time, eliminating the $O(\log n \log \log n)$ overhead of prior work and matching existing randomized algorithms. (ii) The first pseudo-deterministic algorithm for maintaining a canonical minimum cut in a fully-dynamic unweighted graph, with $\mathrm{polylog}(n)$ update time and $\tilde{O}(n)$ query time. (iii) Improved pseudo-deterministic algorithms for unweighted graphs in the dynamic streaming and cut-query models of computation, matching the best randomized algorithms.

Authors: Yotam Kenneth-Mordoch

Pseudo-deterministic algorithms are randomized algorithms that, with high constant probability, output a fixed canonical solution. The study of pseudo-deterministic algorithms for the global minimum cut problem was recently initiated by Agarwala and Varma [ITCS'26], who gave a black-box reduction incurring an $O(\log n \log \log n)$ overhead. We introduce a natural graph-theoretic tie-breaking mechanism that uniquely selects a canonical minimum cut. Using this mechanism, we obtain: (i) A pseudo-deterministic minimum cut algorithm for weighted graphs running in $O(m\log^2 n)$ time, eliminating the $O(\log n \log \log n)$ overhead of prior work and matching existing randomized algorithms. (ii) The first pseudo-deterministic algorithm for maintaining a canonical minimum cut in a fully-dynamic unweighted graph, with $\mathrm{polylog}(n)$ update time and $\tilde{O}(n)$ query time. (iii) Improved pseudo-deterministic algorithms for unweighted graphs in the dynamic streaming and cut-query models of computation, matching the best randomized algorithms.

Constrained and Composite Sampling via Proximal Sampler

from arXiv: Data Structures and Algorithms

Authors: Thanh Dang, Jiaming Liang

We study two log-concave sampling problems: constrained sampling and composite sampling. First, we consider sampling from a target distribution with density proportional to $\exp(-f(x))$ supported on a convex set $K \subset \mathbb{R}^d$, where $f$ is convex. The main challenge is enforcing feasibility without degrading mixing. Using an epigraph transformation, we reduce this task to sampling from a nearly uniform distribution over a lifted convex set in $\mathbb{R}^{d+1}$. We then solve the lifted problem using a proximal sampler. Assuming only a separation oracle for $K$ and a subgradient oracle for $f$, we develop an implementation of the proximal sampler based on the cutting-plane method and rejection sampling. Unlike existing constrained samplers that rely on projection, reflection, barrier functions, or mirror maps, our approach enforces feasibility using only minimal oracle access, resulting in a practical and unbiased sampler without knowing the geometry of the constraint set. Second, we study composite sampling, where the target is proportional to $\exp(-f(x)-h(x))$ with closed and convex $f$ and $h$. This composite structure is standard in Bayesian inference with $f$ modeling data fidelity and $h$ encoding prior information. We reduce composite sampling via an epigraph lifting of $h$ to constrained sampling in $\mathbb{R}^{d+1}$, which allows direct application of the constrained sampling algorithm developed in the first part. This reduction results in a double epigraph lifting formulation in $\mathbb{R}^{d+2}$, on which we apply a proximal sampler. By keeping $f$ and $h$ separate, we further demonstrate how different combinations of oracle access (such as subgradient and proximal) can be leveraged to construct separation oracles for the lifted problem. For both sampling problems, we establish mixing time bounds measured in Rényi and $χ^2$ divergences.

Authors: Thanh Dang, Jiaming Liang

We study two log-concave sampling problems: constrained sampling and composite sampling. First, we consider sampling from a target distribution with density proportional to $\exp(-f(x))$ supported on a convex set $K \subset \mathbb{R}^d$, where $f$ is convex. The main challenge is enforcing feasibility without degrading mixing. Using an epigraph transformation, we reduce this task to sampling from a nearly uniform distribution over a lifted convex set in $\mathbb{R}^{d+1}$. We then solve the lifted problem using a proximal sampler. Assuming only a separation oracle for $K$ and a subgradient oracle for $f$, we develop an implementation of the proximal sampler based on the cutting-plane method and rejection sampling. Unlike existing constrained samplers that rely on projection, reflection, barrier functions, or mirror maps, our approach enforces feasibility using only minimal oracle access, resulting in a practical and unbiased sampler without knowing the geometry of the constraint set. Second, we study composite sampling, where the target is proportional to $\exp(-f(x)-h(x))$ with closed and convex $f$ and $h$. This composite structure is standard in Bayesian inference with $f$ modeling data fidelity and $h$ encoding prior information. We reduce composite sampling via an epigraph lifting of $h$ to constrained sampling in $\mathbb{R}^{d+1}$, which allows direct application of the constrained sampling algorithm developed in the first part. This reduction results in a double epigraph lifting formulation in $\mathbb{R}^{d+2}$, on which we apply a proximal sampler. By keeping $f$ and $h$ separate, we further demonstrate how different combinations of oracle access (such as subgradient and proximal) can be leveraged to construct separation oracles for the lifted problem. For both sampling problems, we establish mixing time bounds measured in Rényi and $χ^2$ divergences.

Sensitivity of Repetitiveness Measures to String Reversal

from arXiv: Data Structures and Algorithms

Authors: Hideo Bannai, Yuto Fujie, Peaker Guo, Shunsuke Inenaga, Yuto Nakashima, Simon J. Puglisi, Cristian Urbina

We study the impact that string reversal can have on several repetitiveness measures. First, we exhibit an infinite family of strings where the number, $r$, of runs in the run-length encoding of the Burrows--Wheeler transform (BWT) can increase additively by $Θ(n)$ when reversing the string. This substantially improves the known $Ω(\log n)$ lower-bound for the additive sensitivity of $r$ and it is asymptotically tight. We generalize our result to other variants of the BWT, including the variant with an appended end-of-string symbol and the bijective BWT. We show that an analogous result holds for the size $z$ of the Lempel--Ziv 77 (LZ) parsing of the text, and also for some of its variants, including the non-overlapping LZ parsing, and the LZ-end parsing. Moreover, we describe a family of strings for which the ratio $z(w^R)/z(w)$ approaches $3$ from below as $|w|\rightarrow \infty$. We also show an asymptotically tight lower-bound of $Θ(n)$ for the additive sensitivity of the size $v$ of the smallest lexicographic parsing to string reversal. Finally, we show that the multiplicative sensitivity of $v$ to reversing the string is $Θ(\log n)$, and this lower-bound is also tight. Overall, our results expose the limitations of repetitiveness measures that are widely used in practice, against string reversal -- a simple and natural data transformation.

Authors: Hideo Bannai, Yuto Fujie, Peaker Guo, Shunsuke Inenaga, Yuto Nakashima, Simon J. Puglisi, Cristian Urbina

We study the impact that string reversal can have on several repetitiveness measures. First, we exhibit an infinite family of strings where the number, $r$, of runs in the run-length encoding of the Burrows--Wheeler transform (BWT) can increase additively by $Θ(n)$ when reversing the string. This substantially improves the known $Ω(\log n)$ lower-bound for the additive sensitivity of $r$ and it is asymptotically tight. We generalize our result to other variants of the BWT, including the variant with an appended end-of-string symbol and the bijective BWT. We show that an analogous result holds for the size $z$ of the Lempel--Ziv 77 (LZ) parsing of the text, and also for some of its variants, including the non-overlapping LZ parsing, and the LZ-end parsing. Moreover, we describe a family of strings for which the ratio $z(w^R)/z(w)$ approaches $3$ from below as $|w|\rightarrow \infty$. We also show an asymptotically tight lower-bound of $Θ(n)$ for the additive sensitivity of the size $v$ of the smallest lexicographic parsing to string reversal. Finally, we show that the multiplicative sensitivity of $v$ to reversing the string is $Θ(\log n)$, and this lower-bound is also tight. Overall, our results expose the limitations of repetitiveness measures that are widely used in practice, against string reversal -- a simple and natural data transformation.

High-accuracy log-concave sampling with stochastic queries

from arXiv: Data Structures and Algorithms

Authors: Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin

We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/δ)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $Θ(1/δ)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries.

Authors: Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin

We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/δ)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $Θ(1/δ)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries.

Sublinear-Time Lower Bounds for Approximating Matching Size using Non-Adaptive Queries

from arXiv: Data Structures and Algorithms

Authors: Vihan Shah

We study the problem of estimating the size of the maximum matching in the sublinear-time setting. This problem has been extensively studied, with several known upper and lower bounds. A notable result by Behnezhad (FOCS 2021) established a 2-approximation in ~O(n) time. However, all known upper and lower bounds are in the adaptive query model, where each query can depend on previous answers. In contrast, non-adaptive query models-where the distribution over all queries must be fixed in advance-are widely studied in property testing, often revealing fundamental gaps between adaptive and non-adaptive complexities. This raises the natural question: is adaptivity also necessary for approximating the maximum matching size in sublinear time? This motivates the goal of achieving a constant or even a polylogarithmic approximation using ~O(n) non-adaptive adjacency list queries, similar to what was done by Behnezhad using adaptive queries. We show that this is not possible by proving that any randomized non-adaptive algorithm achieving an n^{1/3 - gamma}-approximation, for any constant gamma > 0, with probability at least 2/3, must make Omega(n^{1 + eps}) adjacency list queries, for some constant eps > 0 depending on gamma. This result highlights the necessity of adaptivity in achieving strong approximations. However, non-trivial upper bounds are still achievable: we present a simple randomized algorithm that achieves an n^{1/2}-approximation in O(n log^2 n) queries. Moreover, our lower bound also extends to the newly defined variant of the non-adaptive model, where queries are issued according to a fixed query tree, introduced by Azarmehr, Behnezhad, Ghafari, and Sudan (FOCS 2025) in the context of Local Computation Algorithms.

Authors: Vihan Shah

We study the problem of estimating the size of the maximum matching in the sublinear-time setting. This problem has been extensively studied, with several known upper and lower bounds. A notable result by Behnezhad (FOCS 2021) established a 2-approximation in ~O(n) time. However, all known upper and lower bounds are in the adaptive query model, where each query can depend on previous answers. In contrast, non-adaptive query models-where the distribution over all queries must be fixed in advance-are widely studied in property testing, often revealing fundamental gaps between adaptive and non-adaptive complexities. This raises the natural question: is adaptivity also necessary for approximating the maximum matching size in sublinear time? This motivates the goal of achieving a constant or even a polylogarithmic approximation using ~O(n) non-adaptive adjacency list queries, similar to what was done by Behnezhad using adaptive queries. We show that this is not possible by proving that any randomized non-adaptive algorithm achieving an n^{1/3 - gamma}-approximation, for any constant gamma > 0, with probability at least 2/3, must make Omega(n^{1 + eps}) adjacency list queries, for some constant eps > 0 depending on gamma. This result highlights the necessity of adaptivity in achieving strong approximations. However, non-trivial upper bounds are still achievable: we present a simple randomized algorithm that achieves an n^{1/2}-approximation in O(n log^2 n) queries. Moreover, our lower bound also extends to the newly defined variant of the non-adaptive model, where queries are issued according to a fixed query tree, introduced by Azarmehr, Behnezhad, Ghafari, and Sudan (FOCS 2025) in the context of Local Computation Algorithms.

Catalytic Tree Evaluation From Matching Vectors

from arXiv: Data Structures and Algorithms

Authors: Alexandra Henzinger, Edward Pyne, Seyoon Ragavan

We give new algorithms for tree evaluation (S. Cook et al. TOCT 2012) in the catalytic-computing model (Buhrman et al. STOC 2014). Two existing approaches aim to solve tree evaluation (TreeEval) in low space: on the one hand, J. Cook and Mertz (STOC 2024) give an algorithm for TreeEval running in super-logarithmic space $O(\log n\log\log n)$ and super-polynomial time $n^{O(\log\log n)}$. On the other hand, a simple reduction from TreeEval to circuit evaluation, combined with the result of Buhrman et al. (STOC 2014), gives a catalytic algorithm for TreeEval running in logarithmic $O(\log n)$ free space and polynomial time, but with polynomial catalytic space. We show that the latter result can be improved. We give a catalytic algorithm for TreeEval with logarithmic $O(\log n)$ free space, polynomial runtime, and subpolynomial $2^{\log^εn}$ catalytic space (for any $ε> 0$). Our result gives the first natural problem known to be solvable with logarithmic free space and even $n^{1-ε}$ catalytic space, that is not known to be in standard logspace even under assumptions. Our result immediately implies an improved simulation of time by catalytic space, by the reduction of Williams (STOC 2025). Our catalytic TreeEval algorithm is inspired by a connection to matching vector families and private information retrieval, and improved constructions of (uniform) matching vector families would imply improvements to our algorithm.

Authors: Alexandra Henzinger, Edward Pyne, Seyoon Ragavan

We give new algorithms for tree evaluation (S. Cook et al. TOCT 2012) in the catalytic-computing model (Buhrman et al. STOC 2014). Two existing approaches aim to solve tree evaluation (TreeEval) in low space: on the one hand, J. Cook and Mertz (STOC 2024) give an algorithm for TreeEval running in super-logarithmic space $O(\log n\log\log n)$ and super-polynomial time $n^{O(\log\log n)}$. On the other hand, a simple reduction from TreeEval to circuit evaluation, combined with the result of Buhrman et al. (STOC 2014), gives a catalytic algorithm for TreeEval running in logarithmic $O(\log n)$ free space and polynomial time, but with polynomial catalytic space. We show that the latter result can be improved. We give a catalytic algorithm for TreeEval with logarithmic $O(\log n)$ free space, polynomial runtime, and subpolynomial $2^{\log^εn}$ catalytic space (for any $ε> 0$). Our result gives the first natural problem known to be solvable with logarithmic free space and even $n^{1-ε}$ catalytic space, that is not known to be in standard logspace even under assumptions. Our result immediately implies an improved simulation of time by catalytic space, by the reduction of Williams (STOC 2025). Our catalytic TreeEval algorithm is inspired by a connection to matching vector families and private information retrieval, and improved constructions of (uniform) matching vector families would imply improvements to our algorithm.

Counting Balanced Triangles on Social Networks With Uncertain Edge Signs

from arXiv: Data Structures and Algorithms

Authors: Alexander Zhou, Haoyang Li, Anxin Tian, Zhiyuan Li, Yue Wang

On signed social networks, balanced and unbalanced triangles are a critical motif due to their role as the foundations of Structural Balance Theory. The uses for these motifs have been extensively explored in networks with known edge signs, however in the real-world graphs with ground-truth signs are near non-existent, particularly on a large-scale. In reality, edge signs are inferred via various techniques with differing levels of confidence, meaning the edge signs on these graphs should be modelled with a probability value. In this work, we adapt balanced and unbalanced triangles to a setting with uncertain edge signs and explore the problems of triangle counting and enumeration. We provide a baseline and improved method (leveraging the inherent information provided by the edge probabilities in order to reduce the search space) for fast exact counting and enumeration. We also explore approximate solutions for counting via different sampling approaches, including leveraging insights from our improved exact solution to significantly reduce the runtime of each sample resulting in upwards of two magnitudes more queries executed per second. We evaluate the efficiency of all our solutions as well as examine the effectiveness of our sampling approaches on real-world topological networks with a variety of probability distributions.

Authors: Alexander Zhou, Haoyang Li, Anxin Tian, Zhiyuan Li, Yue Wang

On signed social networks, balanced and unbalanced triangles are a critical motif due to their role as the foundations of Structural Balance Theory. The uses for these motifs have been extensively explored in networks with known edge signs, however in the real-world graphs with ground-truth signs are near non-existent, particularly on a large-scale. In reality, edge signs are inferred via various techniques with differing levels of confidence, meaning the edge signs on these graphs should be modelled with a probability value. In this work, we adapt balanced and unbalanced triangles to a setting with uncertain edge signs and explore the problems of triangle counting and enumeration. We provide a baseline and improved method (leveraging the inherent information provided by the edge probabilities in order to reduce the search space) for fast exact counting and enumeration. We also explore approximate solutions for counting via different sampling approaches, including leveraging insights from our improved exact solution to significantly reduce the runtime of each sample resulting in upwards of two magnitudes more queries executed per second. We evaluate the efficiency of all our solutions as well as examine the effectiveness of our sampling approaches on real-world topological networks with a variety of probability distributions.

Faster Parameterized Vertex Multicut

from arXiv: Data Structures and Algorithms

Authors: Huairui Chu, Yuxi Liu, Daniel Lokshtanov, Junqiang Peng, Kangyi Tian, Mingyu Xiao

In the {\sc Vertex Multicut} problem the input consists of a graph $G$, integer $k$, and a set $\mathbf{T} = \{(s_1, t_1), \ldots, (s_p, t_p)\}$ of pairs of vertices of $G$. The task is to find a set $X$ of at most $k$ vertices such that, for every $(s_i, t_i) \in \mathbf{T}$, there is no path from $s_i$ to $t_i$ in $G - X$. Marx and Razgon [STOC 2011 and SICOMP 2014] and Bousquet, Daligault, and Thomassé [STOC 2011 and SICOMP 2018] independently and simultaneously gave the first algorithms for {\sc Vertex Multicut} with running time $f(k)n^{O(1)}$. The running time of their algorithms is $2^{O(k^3)}n^{O(1)}$ and $2^{O(k^{O(1)})}n^{O(1)}$, respectively. As part of their result, Marx and Razgon introduce the {\em shadow removal} technique, which was subsequently applied in algorithms for several parameterized cut and separation problems. The shadow removal step is the only step of the algorithm of Marx and Razgon which requires $2^{O(k^3)}n^{O(1)}$ time. Chitnis et al. [TALG 2015] gave an improved version of the shadow removal step, which, among other results, led to a $k^{O(k^2)}n^{O(1)}$ time algorithm for {\sc Vertex Multicut}. We give a faster algorithm for the {\sc Vertex Multicut} problem with running time $k^{O(k)}n^{O(1)}$. Our main technical contribution is a refined shadow removal step for vertex separation problems that only introduces an overhead of $k^{O(k)}\log n$ time. The new shadow removal step implies a $k^{O(k^2)}n^{O(1)}$ time algorithm for {\sc Directed Subset Feedback Vertex Set} and a $k^{O(k)}n^{O(1)}$ time algorithm for {\sc Directed Multiway Cut}, improving over the previously best known algorithms of Chitnis et al. [TALG 2015].

Authors: Huairui Chu, Yuxi Liu, Daniel Lokshtanov, Junqiang Peng, Kangyi Tian, Mingyu Xiao

In the {\sc Vertex Multicut} problem the input consists of a graph $G$, integer $k$, and a set $\mathbf{T} = \{(s_1, t_1), \ldots, (s_p, t_p)\}$ of pairs of vertices of $G$. The task is to find a set $X$ of at most $k$ vertices such that, for every $(s_i, t_i) \in \mathbf{T}$, there is no path from $s_i$ to $t_i$ in $G - X$. Marx and Razgon [STOC 2011 and SICOMP 2014] and Bousquet, Daligault, and Thomassé [STOC 2011 and SICOMP 2018] independently and simultaneously gave the first algorithms for {\sc Vertex Multicut} with running time $f(k)n^{O(1)}$. The running time of their algorithms is $2^{O(k^3)}n^{O(1)}$ and $2^{O(k^{O(1)})}n^{O(1)}$, respectively. As part of their result, Marx and Razgon introduce the {\em shadow removal} technique, which was subsequently applied in algorithms for several parameterized cut and separation problems. The shadow removal step is the only step of the algorithm of Marx and Razgon which requires $2^{O(k^3)}n^{O(1)}$ time. Chitnis et al. [TALG 2015] gave an improved version of the shadow removal step, which, among other results, led to a $k^{O(k^2)}n^{O(1)}$ time algorithm for {\sc Vertex Multicut}. We give a faster algorithm for the {\sc Vertex Multicut} problem with running time $k^{O(k)}n^{O(1)}$. Our main technical contribution is a refined shadow removal step for vertex separation problems that only introduces an overhead of $k^{O(k)}\log n$ time. The new shadow removal step implies a $k^{O(k^2)}n^{O(1)}$ time algorithm for {\sc Directed Subset Feedback Vertex Set} and a $k^{O(k)}n^{O(1)}$ time algorithm for {\sc Directed Multiway Cut}, improving over the previously best known algorithms of Chitnis et al. [TALG 2015].

Min-Max Connected Multiway Cut

from arXiv: Data Structures and Algorithms

Authors: Hans Raj Tiwary, Petr Kolman

We introduce a variant of the multiway cut that we call the min-max connected multiway cut. Given a graph $G=(V,E)$ and a set $Γ\subseteq V$ of $t$ terminals, partition $V$ into $t$ parts such that each part is connected and contains exactly one terminal; the objective is to minimize the maximum weight of the edges leaving any part of the partition. This problem is a natural modification of the standard multiway cut problem and it differs from it in two ways: first, the cost of a partition is defined to be the maximum size of the boundary of any part, as opposed to the sum of all boundaries, and second, the subgraph induced by each part is required to be connected. Although the modified objective function has been considered before in the literature under the name min-max multiway cut, the requirement on each component to be connected has not been studied as far as we know. We show various hardness results for this problem, including a proof of weak NP-hardness of the weighted version of the problem on graphs with tree-width two, and provide a pseudopolynomial time algorithm as well as an FPTAS for the weighted problem on trees. As a consequence of our investigation we also show that the (unconstrained) min-max multiway cut problem is NP-hard even for three terminals, strengthening the known results.

Authors: Hans Raj Tiwary, Petr Kolman

We introduce a variant of the multiway cut that we call the min-max connected multiway cut. Given a graph $G=(V,E)$ and a set $Γ\subseteq V$ of $t$ terminals, partition $V$ into $t$ parts such that each part is connected and contains exactly one terminal; the objective is to minimize the maximum weight of the edges leaving any part of the partition. This problem is a natural modification of the standard multiway cut problem and it differs from it in two ways: first, the cost of a partition is defined to be the maximum size of the boundary of any part, as opposed to the sum of all boundaries, and second, the subgraph induced by each part is required to be connected. Although the modified objective function has been considered before in the literature under the name min-max multiway cut, the requirement on each component to be connected has not been studied as far as we know. We show various hardness results for this problem, including a proof of weak NP-hardness of the weighted version of the problem on graphs with tree-width two, and provide a pseudopolynomial time algorithm as well as an FPTAS for the weighted problem on trees. As a consequence of our investigation we also show that the (unconstrained) min-max multiway cut problem is NP-hard even for three terminals, strengthening the known results.

Spanning tree congestion of proper interval graphs

from arXiv: Data Structures and Algorithms

Authors: Yota Otachi

We show that the spanning tree congestion problem is NP-complete even for proper interval graphs of linear clique-width at most 4.

Authors: Yota Otachi

We show that the spanning tree congestion problem is NP-complete even for proper interval graphs of linear clique-width at most 4.

Compressed Index with Construction in Compressed Space

from arXiv: Data Structures and Algorithms

Authors: Dmitry Kosolobov

Suppose that we are given a string $s$ of length $n$ over an alphabet $\{0,1,\ldots,n^{O(1)}\}$ and $δ$ is a compression measure for $s$ called string complexity. We describe an index on $s$ with $O(δ\log\frac{n}δ)$ space, measured in $O(\log n)$-bit machine words, that can search in $s$ any string of length $m$ in $O(m + (\mathrm{occ} + 1)\log^εn)$ time, where $\mathrm{occ}$ is the number of found occurrences and $ε> 0$ is any fixed constant (the big-O in the space bound hides factor $\frac{1}ε$). Crucially, the index can be built within this space in $O(n\log n)$ expected time by one left-to-right pass on the string $s$ in a streaming fashion. The index does not use the Karp--Rabin fingerprints, and the randomization in the construction time can be eliminated by using deterministic dictionaries instead of hash tables (with a slowdown). The search time matches currently best results and the space is almost optimal (the known optimum is $O(δ\log\frac{n}{δα})$, where $α= \log_σn$ and $σ$ is the alphabet size, and it coincides with $O(δ\log\frac{n}δ)$ when $δ= O(n / α^2)$). This is the first index that can be constructed within such space and with such time guarantees. To avoid uninteresting marginal cases, all above bounds are stated for $δ\ge Ω(\log\log n)$.

Authors: Dmitry Kosolobov

Suppose that we are given a string $s$ of length $n$ over an alphabet $\{0,1,\ldots,n^{O(1)}\}$ and $δ$ is a compression measure for $s$ called string complexity. We describe an index on $s$ with $O(δ\log\frac{n}δ)$ space, measured in $O(\log n)$-bit machine words, that can search in $s$ any string of length $m$ in $O(m + (\mathrm{occ} + 1)\log^εn)$ time, where $\mathrm{occ}$ is the number of found occurrences and $ε> 0$ is any fixed constant (the big-O in the space bound hides factor $\frac{1}ε$). Crucially, the index can be built within this space in $O(n\log n)$ expected time by one left-to-right pass on the string $s$ in a streaming fashion. The index does not use the Karp--Rabin fingerprints, and the randomization in the construction time can be eliminated by using deterministic dictionaries instead of hash tables (with a slowdown). The search time matches currently best results and the space is almost optimal (the known optimum is $O(δ\log\frac{n}{δα})$, where $α= \log_σn$ and $σ$ is the alphabet size, and it coincides with $O(δ\log\frac{n}δ)$ when $δ= O(n / α^2)$). This is the first index that can be constructed within such space and with such time guarantees. To avoid uninteresting marginal cases, all above bounds are stated for $δ\ge Ω(\log\log n)$.

Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition

from arXiv: Data Structures and Algorithms

Authors: Tianshuo Zhou, David H. Mathews, Liang Huang

Motivation: RNA design aims to find RNA sequences that fold into a given target secondary structure, a problem also known as RNA inverse folding. However, not all target structures are designable. Recent advances in RNA designability have focused primarily on minimum free energy (MFE)-based criteria, while ensemble-based notions of designability remain largely underexplored. To address this gap, we introduce a theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way. We further develop a linear-time dynamic programming algorithm that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure. Results: Applying our methods to both native and artificial RNA structures in the ArchiveII and Eterna100 benchmarks, we obtained probability bounds that are much tighter than prior approaches. In addition, our methods further provide anatomical tools for analyzing RNA structures and understanding the sources of design difficulty at the motif level. Availability: Source code and data are available at github.com/shanry/RNA-Undesign. Supplementary information: Supplementary text and data are available in a separate PDF.

Authors: Tianshuo Zhou, David H. Mathews, Liang Huang

Motivation: RNA design aims to find RNA sequences that fold into a given target secondary structure, a problem also known as RNA inverse folding. However, not all target structures are designable. Recent advances in RNA designability have focused primarily on minimum free energy (MFE)-based criteria, while ensemble-based notions of designability remain largely underexplored. To address this gap, we introduce a theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way. We further develop a linear-time dynamic programming algorithm that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure. Results: Applying our methods to both native and artificial RNA structures in the ArchiveII and Eterna100 benchmarks, we obtained probability bounds that are much tighter than prior approaches. In addition, our methods further provide anatomical tools for analyzing RNA structures and understanding the sources of design difficulty at the motif level. Availability: Source code and data are available at https://github.com/shanry/RNA-Undesign. Supplementary information: Supplementary text and data are available in a separate PDF.

Quantum Speedups for Group Relaxations of Integer Linear Programs

from arXiv: Data Structures and Algorithms

Authors: Brandon Augustino, Dylan Herman, Guneykan Ozgul, Jacob Watkins, Atithi Acharya, Enrico Fontana, Junhyung Lyle Kim, Shouvanik Chakrabarti

Integer Linear Programs (ILPs) are a flexible and ubiquitous model for discrete optimization problems. Solving ILPs is \textsf{NP-Hard} yet of great practical importance. Super-quadratic quantum speedups for ILPs have been difficult to obtain because classical algorithms for many-constraint ILPs are global and exhaustive, whereas quantum frameworks that offer super-quadratic speedup exploit local structure of the objective and feasible set. We address this via quantum algorithms for Gomory's group relaxation. The group relaxation of an ILP is obtained by dropping nonnegativity on variables that are positive in the optimal solution of the linear programming (LP) relaxation, while retaining integrality of the decision variables. We present a competitive feasibility-preserving classical local-search algorithm for the group relaxation, and a corresponding quantum algorithm that, under reasonable technical conditions, achieves a super-quadratic speedup. When the group relaxation satisfies a nondegeneracy condition analogous to, but stronger than, LP non-degeneracy, our approach yields the optimal solution to the original ILP. Otherwise, the group relaxation tightens bounds on the optimal objective value of the ILP, and can improve downstream branch-and-cut by reducing the integrality gap; we numerically observe this on several practically relevant ILPs. To achieve these results, we derive efficiently constructible constraint-preserving mixers for the group relaxation with favorable spectral properties, which are of independent interest.

Authors: Brandon Augustino, Dylan Herman, Guneykan Ozgul, Jacob Watkins, Atithi Acharya, Enrico Fontana, Junhyung Lyle Kim, Shouvanik Chakrabarti

Integer Linear Programs (ILPs) are a flexible and ubiquitous model for discrete optimization problems. Solving ILPs is \textsf{NP-Hard} yet of great practical importance. Super-quadratic quantum speedups for ILPs have been difficult to obtain because classical algorithms for many-constraint ILPs are global and exhaustive, whereas quantum frameworks that offer super-quadratic speedup exploit local structure of the objective and feasible set. We address this via quantum algorithms for Gomory's group relaxation. The group relaxation of an ILP is obtained by dropping nonnegativity on variables that are positive in the optimal solution of the linear programming (LP) relaxation, while retaining integrality of the decision variables. We present a competitive feasibility-preserving classical local-search algorithm for the group relaxation, and a corresponding quantum algorithm that, under reasonable technical conditions, achieves a super-quadratic speedup. When the group relaxation satisfies a nondegeneracy condition analogous to, but stronger than, LP non-degeneracy, our approach yields the optimal solution to the original ILP. Otherwise, the group relaxation tightens bounds on the optimal objective value of the ILP, and can improve downstream branch-and-cut by reducing the integrality gap; we numerically observe this on several practically relevant ILPs. To achieve these results, we derive efficiently constructible constraint-preserving mixers for the group relaxation with favorable spectral properties, which are of independent interest.

How to Train Your Filter: Should You Learn, Stack or Adapt?

from arXiv: Data Structures and Algorithms

Authors: Diandre Miguel Sabale, Wolfgang Gatterbauer, Prashant Pandey

Filters are ubiquitous in computer science, enabling space-efficient approximate membership testing. Since Bloom filters were introduced in 1970, decades of work improved their space efficiency and performance. Recently, three new paradigms have emerged offering orders-of-magnitude improvements in false positive rates (FPRs) by using information beyond the input set: (1) learned filters train a model to distinguish (non)members, (2) stacked filters use negative workload samples to build cascading layers, and (3) adaptive filters update internal representation in response to false positive feedback. Yet each paradigm targets specific use cases, introduces complex configuration tuning, and has been evaluated in isolation. This results in unclear trade-offs and a gap in understanding how these approaches compare and when each is most appropriate. This paper presents the first comprehensive evaluation of learned, stacked, and adaptive filters across real-world datasets and query workloads. Our results reveal critical trade-offs: (1) Learned filters achieve up to 10^2 times lower FPRs but exhibit high variance and lack robustness under skewed or dynamic workloads. Critically, model inference overhead leads to up to 10^4 times slower query latencies than stacked or adaptive filters. (2) Stacked filters reliably achieve up to 10^3 times lower FPRs on skewed workloads but require workload knowledge. (3) Adaptive filters are robust across settings, achieving up to 10^3 times lower FPRs under adversarial queries without workload assumptions. Based on our analysis, learned filters suit stable workloads where input features enable effective model training and space constraints are paramount, stacked filters excel when reliable query distributions are known, and adaptive filters are most generalizable, providing robust theoretically bound guarantees even in dynamic or adversarial environments.

Authors: Diandre Miguel Sabale, Wolfgang Gatterbauer, Prashant Pandey

Filters are ubiquitous in computer science, enabling space-efficient approximate membership testing. Since Bloom filters were introduced in 1970, decades of work improved their space efficiency and performance. Recently, three new paradigms have emerged offering orders-of-magnitude improvements in false positive rates (FPRs) by using information beyond the input set: (1) learned filters train a model to distinguish (non)members, (2) stacked filters use negative workload samples to build cascading layers, and (3) adaptive filters update internal representation in response to false positive feedback. Yet each paradigm targets specific use cases, introduces complex configuration tuning, and has been evaluated in isolation. This results in unclear trade-offs and a gap in understanding how these approaches compare and when each is most appropriate. This paper presents the first comprehensive evaluation of learned, stacked, and adaptive filters across real-world datasets and query workloads. Our results reveal critical trade-offs: (1) Learned filters achieve up to 10^2 times lower FPRs but exhibit high variance and lack robustness under skewed or dynamic workloads. Critically, model inference overhead leads to up to 10^4 times slower query latencies than stacked or adaptive filters. (2) Stacked filters reliably achieve up to 10^3 times lower FPRs on skewed workloads but require workload knowledge. (3) Adaptive filters are robust across settings, achieving up to 10^3 times lower FPRs under adversarial queries without workload assumptions. Based on our analysis, learned filters suit stable workloads where input features enable effective model training and space constraints are paramount, stacked filters excel when reliable query distributions are known, and adaptive filters are most generalizable, providing robust theoretically bound guarantees even in dynamic or adversarial environments.

Optimal-Time Mapping in Run-Length Compressed PBWT

from arXiv: Data Structures and Algorithms

Authors: Paola Bonizzoni, Davide Cozzi, Younan Gao

The Positional Burrows--Wheeler Transform (PBWT) is a data structure designed for efficiently representing and querying large collections of sequences, such as haplotype panels in genomics. Forward and backward stepping operations -- analogues to LF- and FL-mapping in the traditional BWT -- are fundamental to the PBWT, underpinning many algorithms based on the PBWT for haplotype matching and related analyses. Although the run-length encoded variant of the PBWT (also known as the $μ$-PBWT) achieves $O(\newR)$-word space usage, where $\newR$ is the total number of runs, no data structure supporting both forward and backward stepping in constant time within this space bound was previously known. In this paper, we consider the multi-allelic PBWT that is extended from its original binary form to a general ordered alphabet $\{0, \dots, σ-1\}$. We first establish bounds on the size $\newR$ and then introduce a new $O(\newR)$-word data structure built over a list of haplotypes $\{S_1, \dots, S_\height\}$, each of length $\width$, that supports constant-time forward and backward stepping. We further revisit two key applications -- haplotype retrieval and prefix search -- leveraging our efficient forward stepping technique. Specifically, we design an $O(\newR)$-word space data structure that supports haplotype retrieval in $O(\log \log_{\word} h + \width)$ time. For prefix search, we present an $O(\height + \newR)$-word data structure that answers queries in $O(m' \log\log_{\word} σ+ \occ)$ time, where $m'$ denotes the length of the longest common prefix returned and $\occ$ denotes the number of haplotypes prefixed the longest prefix.

Authors: Paola Bonizzoni, Davide Cozzi, Younan Gao

The Positional Burrows--Wheeler Transform (PBWT) is a data structure designed for efficiently representing and querying large collections of sequences, such as haplotype panels in genomics. Forward and backward stepping operations -- analogues to LF- and FL-mapping in the traditional BWT -- are fundamental to the PBWT, underpinning many algorithms based on the PBWT for haplotype matching and related analyses. Although the run-length encoded variant of the PBWT (also known as the $μ$-PBWT) achieves $O(\newR)$-word space usage, where $\newR$ is the total number of runs, no data structure supporting both forward and backward stepping in constant time within this space bound was previously known. In this paper, we consider the multi-allelic PBWT that is extended from its original binary form to a general ordered alphabet $\{0, \dots, σ-1\}$. We first establish bounds on the size $\newR$ and then introduce a new $O(\newR)$-word data structure built over a list of haplotypes $\{S_1, \dots, S_\height\}$, each of length $\width$, that supports constant-time forward and backward stepping. We further revisit two key applications -- haplotype retrieval and prefix search -- leveraging our efficient forward stepping technique. Specifically, we design an $O(\newR)$-word space data structure that supports haplotype retrieval in $O(\log \log_{\word} h + \width)$ time. For prefix search, we present an $O(\height + \newR)$-word data structure that answers queries in $O(m' \log\log_{\word} σ+ \occ)$ time, where $m'$ denotes the length of the longest common prefix returned and $\occ$ denotes the number of haplotypes prefixed the longest prefix.

Differentially private graph coloring

from arXiv: Data Structures and Algorithms

Authors: Michael Xie, Jiayi Wu, Dung Nguyen, Aravind Srinivasan

Differential Privacy is the gold standard in privacy-preserving data analysis. This paper addresses the challenge of producing a differentially edge-private vertex coloring. In this paper, we present two novel algorithms to approach this problem. Both algorithms initially randomly colors each vertex from a fixed size palette, then applies the exponential mechanism to locally resample colors for either all or a chosen subset of the vertices. Any non-trivial differentially edge private coloring of graph needs to be defective. A coloring of a graph is k defective if all vertices of the graph share it's assigned color with at most k of its neighbors. This is the metric by which we will measure the utility of our algorithms. Our first algorithm applies to d-inductive graphs. Assume we have a d-inductive graph with n vertices and max degree $Δ$. We show that our algorithm provides a \(3ε\)-differentially private coloring with \(O(\frac{\log n}ε+d)\) max defectiveness, given a palette of size $Θ(\fracΔ{\log n}+\frac{1}ε)$ Furthermore, we show that this algorithm can generalize to $O(\fracΔ{cε}+d)$ defectiveness, where c is the size of the palette and $c=O(\fracΔ{\log n})$. Our second algorithm utilizes noisy thresholding to guarantee \(O(\frac{\log n}ε)\) max defectiveness, given a palette of size $Θ(\fracΔ{\log n}+\frac{1}ε)$, generalizing to all graphs rather than just d-inductive ones.

Authors: Michael Xie, Jiayi Wu, Dung Nguyen, Aravind Srinivasan

Differential Privacy is the gold standard in privacy-preserving data analysis. This paper addresses the challenge of producing a differentially edge-private vertex coloring. In this paper, we present two novel algorithms to approach this problem. Both algorithms initially randomly colors each vertex from a fixed size palette, then applies the exponential mechanism to locally resample colors for either all or a chosen subset of the vertices. Any non-trivial differentially edge private coloring of graph needs to be defective. A coloring of a graph is k defective if all vertices of the graph share it's assigned color with at most k of its neighbors. This is the metric by which we will measure the utility of our algorithms. Our first algorithm applies to d-inductive graphs. Assume we have a d-inductive graph with n vertices and max degree $Δ$. We show that our algorithm provides a \(3ε\)-differentially private coloring with \(O(\frac{\log n}ε+d)\) max defectiveness, given a palette of size $Θ(\fracΔ{\log n}+\frac{1}ε)$ Furthermore, we show that this algorithm can generalize to $O(\fracΔ{cε}+d)$ defectiveness, where c is the size of the palette and $c=O(\fracΔ{\log n})$. Our second algorithm utilizes noisy thresholding to guarantee \(O(\frac{\log n}ε)\) max defectiveness, given a palette of size $Θ(\fracΔ{\log n}+\frac{1}ε)$, generalizing to all graphs rather than just d-inductive ones.

Monday, February 16

9th Workshop on Algebraic Complexity Theory

from CS Theory Events

June 2-5, 2026 University of Copenhagen sites.google.com/view/wact2026/ Registration deadline: February 28, 2026 Algebraic Complexity Theory investigates the computational complexity of algebraic problems, focusing on arithmetic circuits, polynomial computation, and algebraic models of computation. The goal of this workshop is to present recent advances in the field of algebraic complexity and to highlight the deep underlying … Continue reading 9th Workshop on Algebraic Complexity Theory

By shacharlovett

June 2-5, 2026 University of Copenhagen https://sites.google.com/view/wact2026/ Registration deadline: February 28, 2026 Algebraic Complexity Theory investigates the computational complexity of algebraic problems, focusing on arithmetic circuits, polynomial computation, and algebraic models of computation. The goal of this workshop is to present recent advances in the field of algebraic complexity and to highlight the deep underlying … Continue reading 9th Workshop on Algebraic Complexity Theory

By shacharlovett

TR26-022 | Catalytic Tree Evaluation From Matching Vectors | Alexandra Henzinger, Edward Pyne, Seyoon Ragavan

from ECCC Papers

We give new algorithms for tree evaluation (S. Cook et. al. TOCT 2012) in the catalytic-computing model (Buhrman et. al. STOC 2014). Two existing approaches aim to solve tree evaluation in low space: on the one hand, J. Cook and Mertz (STOC 2024) give an algorithm for TreeEval running in super-logarithmic space $O(\log n\log\log n)$ and super-polynomial time $n^{O(\log\log n)}$. On the other hand, a simple reduction from TreeEval to circuit evaluation, combined with the result of Buhrman et al. (STOC 2014), gives a catalytic algorithm for TreeEval running in logarithmic $O(\log n)$ free space and polynomial time, but with polynomial catalytic space. We show that the latter result can be improved. We give a catalytic algorithm for TreeEval with logarithmic $O(\log n)$ free space, polynomial runtime, and subpolynomial $2^{\log^\epsilon n}$ catalytic space (for any $\epsilon > 0$). Our result opens a new line of attack on putting TreeEval in logspace, and immediately implies an improved simulation of time by catalytic space, by the reduction of Williams (STOC 2025). Our catalytic TreeEval algorithm is inspired by a connection to matching-vector families and private information retrieval, and improved constructions of (uniform) matching-vector families would imply improvements to our algorithm.

We give new algorithms for tree evaluation (S. Cook et. al. TOCT 2012) in the catalytic-computing model (Buhrman et. al. STOC 2014). Two existing approaches aim to solve tree evaluation in low space: on the one hand, J. Cook and Mertz (STOC 2024) give an algorithm for TreeEval running in super-logarithmic space $O(\log n\log\log n)$ and super-polynomial time $n^{O(\log\log n)}$. On the other hand, a simple reduction from TreeEval to circuit evaluation, combined with the result of Buhrman et al. (STOC 2014), gives a catalytic algorithm for TreeEval running in logarithmic $O(\log n)$ free space and polynomial time, but with polynomial catalytic space. We show that the latter result can be improved. We give a catalytic algorithm for TreeEval with logarithmic $O(\log n)$ free space, polynomial runtime, and subpolynomial $2^{\log^\epsilon n}$ catalytic space (for any $\epsilon > 0$). Our result opens a new line of attack on putting TreeEval in logspace, and immediately implies an improved simulation of time by catalytic space, by the reduction of Williams (STOC 2025). Our catalytic TreeEval algorithm is inspired by a connection to matching-vector families and private information retrieval, and improved constructions of (uniform) matching-vector families would imply improvements to our algorithm.

TR26-021 | Failure of Symmetry of Information for Randomized Computations | Jinqiao Hu, Yahel Manor, Igor Oliveira

from ECCC Papers

Symmetry of Information (SoI) is a fundamental result in Kolmogorov complexity stating that for all $n$-bit strings $x$ and $y$, $K(x,y) = K(y) + K(x \mid y)$ up to an additive error of $O(\log n)$ [ZL70]. In contrast, understanding whether SoI holds for time-bounded Kolmogorov complexity measures is closely related to longstanding open problems in complexity theory and cryptography, such as the P versus NP question [LW95, Hir22] and the existence of one-way functions [HILNO23, HLO24, HLN24]. In this paper, we prove that SoI fails for $rKt$ complexity, the randomized analogue of Levin's $Kt$ complexity [Lev84]. This is the first unconditional result of this type for a randomized notion of time-bounded Kolmogorov complexity. More generally, we establish a close relationship between the validity of SoI for $rKt$ and the existence of randomized algorithms approximating $rKt(x)$. Motivated by applications in cryptography, we also establish the failure of SoI for a related notion called $pKt$ complexity [HLO24], and provide an extension of the results to the average-case setting. Finally, we prove a near-optimal lower bound on the complexity of estimating conditional $rKt$, a result that might be of independent interest. Our findings complement those of [Ron04], who demonstrated the failure of SoI for $Kt$ complexity. In contrast, the randomized setting poses a significant challenge, which we overcome using insights from [KK25], structural results about $rKt$ implied by SoI, and techniques from meta-complexity [Oli19] and the theory of computational pseudorandomness [TV07].

Symmetry of Information (SoI) is a fundamental result in Kolmogorov complexity stating that for all $n$-bit strings $x$ and $y$, $K(x,y) = K(y) + K(x \mid y)$ up to an additive error of $O(\log n)$ [ZL70]. In contrast, understanding whether SoI holds for time-bounded Kolmogorov complexity measures is closely related to longstanding open problems in complexity theory and cryptography, such as the P versus NP question [LW95, Hir22] and the existence of one-way functions [HILNO23, HLO24, HLN24]. In this paper, we prove that SoI fails for $rKt$ complexity, the randomized analogue of Levin's $Kt$ complexity [Lev84]. This is the first unconditional result of this type for a randomized notion of time-bounded Kolmogorov complexity. More generally, we establish a close relationship between the validity of SoI for $rKt$ and the existence of randomized algorithms approximating $rKt(x)$. Motivated by applications in cryptography, we also establish the failure of SoI for a related notion called $pKt$ complexity [HLO24], and provide an extension of the results to the average-case setting. Finally, we prove a near-optimal lower bound on the complexity of estimating conditional $rKt$, a result that might be of independent interest. Our findings complement those of [Ron04], who demonstrated the failure of SoI for $Kt$ complexity. In contrast, the randomized setting poses a significant challenge, which we overcome using insights from [KK25], structural results about $rKt$ implied by SoI, and techniques from meta-complexity [Oli19] and the theory of computational pseudorandomness [TV07].