Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Friday, March 06

For All The Cows

from Ben Recht

Steampunk Data Science: An afterword and some thoughts on Cyberpunk Data Science.

This is the final part of a blogged essay “Steampunk Data Science.” A table of contents is here.

To be a practicing scientist, you need to maintain a state of vigilant cognitive dissonance. If you get too deep into the history and philosophy of your discipline, it becomes very hard to keep writing papers. A little perspective reveals the absurdity and irrationality of the system. And once you see it, you can’t unsee it. Participating requires a lot more effort.

Of course, I speak from experience. Here’s my super brief history. My research from 2015-2018 convinced me that machine learning was lying to itself about its foundations. I went to other disciplines for answers. I first looked to statistics and disappointingly found its epistemology even more confused than machine learning. David Blei sent me David Freedman’s Statistical Models and Shoe Leather, and I became a Freedman completionist. The last chapter of Freedman’s posthumously collected works, Statistical Models and Causal Inference, is a sequel to this famous Shoe Leather essay. It details several examples of “scientific shoe leather,” and devotes a page to the history of vitamins. I was a bit surprised that he wrote a lot about Eijkman, but didn’t mention Wisconsin. I wondered what else was missing in this summary. And here we are, six years later.

Though I wrote the first draft of this essay four years ago, I have never been able to figure out what to do with it. Initially, I thought it would be part of The Irrational Decision, but that project took me in a very different direction. The Irrational Decision is about science, engineering, and decision-making after the computer. This story was about how we did things before the standardizing forces of the computer and statistics. It didn’t cleanly fit, and I ended up jettisoning this chapter.

As a standalone piece, Henry Farrell warned me that it lived in an uncanny valley. The writing wasn’t academic enough to get published in a journal or pop enough to appear in a magazine. But you know where that sort of stuff can go? Directly to you, my argmin readers. Maybe this blog is the uncanny valley between NeurIPS and The New Yorker. I should make that the masthead.

Whatever the case, this piece not finding an external home is fine. Writing through a project is an exercise in itself. I finished this essay before I started substacking, and the recurring themes on here are built upon my research and writing about vitamins. This unpublished project laid the foundation of a research program. We’ll see where it ends up.

Now, for all of you scientists reading this, I’m not suggesting you follow me down this path. They’re uncommon, but scientific breakthroughs still happen, and looking for them can be thrilling. Keep looking.

The most dramatic plot in this essay is F. Gowland Hopkins’ crossover curves, in which he showed that some factor in milk was needed to sustain the growth of rats.

You might look at that plot and say, “Gee, it would have been nice to be a scientist back in 1900, as all of those low-hanging fruit are gone.” You might think that we never find such clear success stories in our modern, complex world. This couldn’t be further from the truth. I mean, this plot is from 2021:

This graphs the effect of starting and stopping semaglutide (Ozempic). The evidence of a revolutionary breakthrough in the management of weight loss tracks the average growth with a visualization scheme identical to F. Gowland Hopkins’ plots from a century earlier.

Yes, interventions like GLP-1 agonists come along rarely. But they do come along!

And for all of our results that are not robust and large? We should accept that, though they are likely illusory, the many small results help us build a scientific record. They form a scattered pile of puzzle pieces that we can all try to assemble. It’s only from the incremental pieces that aren’t precisely reproducible or clean that we can see the big picture. Don’t worry about the distribution of p-values. Do think hard about how to produce reproducible data and coding artifacts.

I’ll repeat what I said yesterday. The most important thing we can learn from the discovery of vitamins is that discovery starts with a mess. It’s in learning from the mess that we find the undeniable effects that completely transform our understanding.


Oh, one more thing. You might be wondering what happened to those cows in the Single-Grain Experiment. The cows were fed their monotonous diets for years, and the research team closely monitored the cows’ growth and health. They weighed each animal monthly and took a photograph once every six months. The cows all grew at similar rates, but there were noticeable differences in appearance, offspring, and milk production.

The corn-fed animals looked smooth of coat, fuller through the barrel; and as expressed by experienced feeders and judges of domestic animals, they were in a better state of nutrition. On the other extreme stood the wheat-fed group with rough coats, gaunt and thin in appearance, small of girth and barrel, and to the practiced eye, in rather a lower state of nutrition.

While those fed on corn gave birth to healthy calves, the wheat-fed cows’ offspring all died within a day. The milk of the cows had different fat contents depending on the diet. The corn-fed had somewhat less fat than the oat-fed, but the wheat-fed had almost no fat content whatsoever. Wheat alone was a perplexingly poor diet for cows.

Elmer McCollum was no dummy. He and Marguerite Davis began publishing papers about their rat colony well before the Single-Grain experiment had finished. Hart eventually terminated the Single-Grain experiment in 1911, and the team published their results in the Proceedings of the National Academy of Sciences, several years after McCollum and Davis announced their discovery of Vitamin A and confirmed the existence of Vitamin B.1

The Wisconsin researchers never figured out what was wrong with the wheat. Following McCollum and Davis’ discovery of Vitamin A, they tried adding butter to the rations, but this didn’t seem to improve the cows’ health. Their most likely hypothesis was that there was something toxic in their wheat ration. In his autobiography, McCollum later reflected that the harvested wheat itself was just of poor quality. Due to the way they grew, processed, and stored the wheat on the Madison campus, the cows ended up being fed only wheat grain and straw.

Had the cows eaten their full quota of leaf, as the corn- and oat-fed animals did, they would not have been in such poor nutritive condition. Through four years we had been inexcusably uncritical of some important details.

The experiment, though wildly influential, yielded nothing but inconclusive results.

Subscribe now

I’d like to thank Mihaela Curmei, Jessica Dai, Shamik Dasgupta, Henry Farrell, Sara Fridovich-Keil, Paula Gradu, Chris Harshaw, Lauren Kroiz, Kevin Munger, Deb Raji, Philippe Rigollet, Scott Shenker, and Chris Wiggins for many helpful comments and suggestions. Special thanks to the students in the 2024 Spring seminar “The Philosophy and History of Automated Decision Making,” who participated in a lively discussion about an earlier draft of this essay.

1

Even back then, PNAS was the journal for articles “Previously rejected from Nature And Science.”

By Ben Recht

The Fully Depolarizing Noise Conjecture for Physical Cat States is Twenty Years Old!

from Gil Kalai

Amidst the war, I am holding onto hope for a future of peace and tolerance. At the end of the post I include a few pictures from the safe room (closet). Writing this also made me reflect on related posts from … Continue reading →

Amidst the war, I am holding onto hope for a future of peace and tolerance. At the end of the post I include a few pictures from the safe room (closet). Writing this also made me reflect on related posts from 2017 and 2024.

 

The Fully Depolarizing Noise Conjecture (2006)

Conjecture (Fully Depolarizing Noise for Bell States).
For every physical implementation of a quantum computer, whenever a Bell state on a pair of physical qubits is created, there exists a small probability t>0 that both qubits are replaced by the maximally mixed state.

Equivalently: the preparation of a Bell state (i.e., a two-qubit cat state) on two physical qubits necessarily carries a non-negligible probability that both qubits undergo fully depolarizing noise.

Twenty years ago I proposed that this phenomenon cannot be avoided by any method of preparing a Bell state on a pair of physical qubits. In particular, the conjecture applies to any pair of transmons in a Bell state in superconducting architectures. As far as I know, the conjecture remains open.

What is a Bell state?

A Bell state (also called a two-qubit cat state) is a maximally entangled state of two qubits. A canonical example is

\frac{1}{\sqrt{2}}(|00\rangle + |11\rangle)

The other Bell states are obtained from this one by local unitary transformations (for example, by applying Pauli operations to one of the qubits).

Bell states represent the simplest and most fundamental form of quantum entanglement. They are the basic resource behind quantum teleportation, entanglement swapping, and many constructions in quantum error correction.

Remark: In its original formulation, the conjecture was stated more broadly: it applied to every entangled pair of physical qubits, not necessarily maximally entangled ones. Moreover, the strength of the correlated depolarizing component was conjectured to depend linearly on the amount of entanglement between the qubits. The discussion below implicitly relies on this more general formulation.

Direct versus indirect creation of entanglement

For directly gated qubits, this behavior is already built into standard stochastic noise models. The novelty of the conjecture is the claim that this remains true with a similar error level even when the cat state is created indirectly. Quantitatively  I conjecture that the value of t in the conjecture is in the ballpark of 2-gate fully depolarizing error.

For example, one may create a cat state on qubits 1 and 3 by:

  • applying a CNOT on (1,2),
  • then a CNOT on (2,3),
  • and measuring qubit 2 (or uncomputing it).

In standard gate-level noise models, if each two-qubit gate independently depolarizes its participating pair with probability t, then:

  • qubit 1 is replaced with probability t,
  • qubit 3 is replaced with probability t,
  • both are replaced only with probability t^2.

Thus, in such models, the probability that both entangled qubits are hit drops from order t (direct gate) to order t^2 (indirect construction).

My conjecture asserts that nature does not permit such quadratic suppression. Even when entanglement is generated indirectly—through mediating qubits, measurement, feedforward, or clever circuit identities—the resulting Bell state on the two physical qubits still carries an intrinsic order-t probability that both qubits are replaced by the maximally mixed state.

Formulating the conjecture for state-of-the-art superconducting devices makes it more concrete. For superconducting quantum computers, the best rate of 2-gate errors is around 0.005 and we can assume that a large fraction of the noise is depolarizing channel hitting both qubits.  If the conjecture is correct for indirect entangled qubits, we will be able to identify fully depolarizing errors for pairs of entangled qubits of similar rate rather than two orders of magnitude smaller as suggested by the computation above.

Physical intuition

The conjecture expresses, in a mathematically sharp way, a common physical intuition:

Entanglement between two physical systems requires a genuine physical interaction, and such interaction inevitably exposes both systems to correlated noise.

For example, there is no indirect method of performing an entangling gate between two transmons (say) that is much more reliable than just directly interacting them. Standard gate-level noise modeling does not enforce this. Indeed, in simple circuit models (as we computed above) indirect constructions reduce fully depolarization errors from order t to order t^2. Thus the conjecture postulates an additional, more subtle “nasty” property of physical noise—one that goes beyond standard local stochastic models. (I was myself recently confused about whether this stronger behavior follows from standard simulations; it does not.)

Localizable entanglement

Given a quantum state, the localizable entanglement (Verstraete–Popp–Cirac, 2004) of a pair of qubits is defined as the maximum amount of entanglement that can be obtained between them when single-qubit measurements are performed on all remaining qubits and the outcomes are conditioned upon.

The conjecture extends to such localizable entangled pairs: whenever a pair of qubits exhibits non-zero localizable entanglement, there exists a small probability t>0 (depending linearly on the value of the localizable entanglement) that both qubits are replaced by the maximally mixed (maximum-entropy) state.

Implications for larger entangled states

For more complicated entangled states—surface-code states, GHZ states, random-circuit sampling states, and cluster states—the extended conjecture applies to every pair of physical qubits that can exhibit localized entanglement. In several canonical families (such as GHZ states, cluster states, and surface-code states), this includes essentially every pair of qubits.

If true, this would have severe consequences for quantum error correction: correlated depolarizing noise on pairs of qubits is far more damaging than the quasi-independent noise assumed in threshold theorems. The reason is that a noise channel in which the events “qubit i is depolarized” and “qubit j is depolarized” are positively correlated for all pairs (i,j) necessarily leads to large-scale error synchronization.

The state of the conjecture

As far as I know, the conjecture remains open.

Current NISQ-scale devices could in principle test it experimentally—even at noise levels above the fault-tolerance threshold. The challenge is not the scale, but the identification of the specific correlated fully-depolarizing component in the noise.

Recent demonstrations of distance-3, distance-5, and even distance-7 surface codes appear, at least superficially, to be in tension with the conjecture. Whether this tension is genuine or only apparent deserves careful examination. It will also be interesting to examine the situation for experimentally realized “large” GHZ states.

I look forward to the conjecture being carefully tested—and, of course, possibly refuted.

Motivation

The original motivation for the conjecture was to “reverse engineer” natural structural conditions on noise that would cause quantum fault tolerance to fail.

For this reason, I did not view the conjecture itself as a reason for people to revise their a priori beliefs about quantum computers. Rather, I regarded it as a concrete and testable benchmark for quantum devices — one that is meaningful both for small systems with only a handful of qubits and for larger systems. The conjecture is relevant even to systems operating at noise levels above the threshold required for fault tolerance.

Sources

In March 2006 I raised an early version of the conjecture on Dave Bacon’s blog and discussed it with Dave Bacon, John Sidles, and others. The conjecture later appeared as Postulate 1 in my 2006 paper How quantum computers can fail and in several subsequent works. The related “noise synchronization” consequence for highly entangled states appeared as Postulate 2 and it connects back to my 2005 paper.

These postulates became Conjecture 3 and Conjecture 4 in my 2012 debate with Aram Harrow, where they played a central role. Conjectures 3 and 4 from the debate and a further consequence for the error rate (in terms of qubit errors) are Predictions 1,2, and 3 in my 2016 paper “The quantum computer Puzzle“.

Remark: In my papers I struggled to formulate the conjecture precisely and to identify the most appropriate notion of entanglement. Formulating the conjecture for localized entanglement (which I referred to as emergent entanglement) was proposed in my 2009 paper Quantum computers: Noise propagation and adversarial noise models.

I also considered a weaker notion where the maximum amount of entanglement over single-qubit measurements is replaced by the average entanglement obtained from such measurements.

In some early papers I also formulated related conjectures for correlated classical noisy systems,  asserting that correlation for the “signal” implies correlation for the noise. In this generality classical computation provides simple counterexamples. Nevertheless, suitably adjusted versions of the conjecture appear to apply to several natural physical systems.

Some counter arguments (and responses)

1) “This may be true for physical qubits, but it is false for logical qubits.”

Yes — the conjecture refers to physical qubits.

If the conjecture holds, it lowers the prospects for achieving reliable logical qubits. In fact, good-quality logical qubits would violate the Fully Depolarizing Noise Conjecture at the physical level. Thus the conjecture asserts that sufficiently good logical qubits are out of reach because the necessary physical behavior is unavailable.

2) “The conjecture does not describe a legitimate quantum channel (namely a channel supported by quantum mechanics).”

The conjecture does not specify a complete noise model. It imposes constraints on whatever the correct physical noise model is.

Even if universally true, the physical mechanisms leading to the conjectured behavior may differ between implementations. The conjecture is structural, not mechanistic.

3) “The conjecture violates  linearity of QM. It is possible that it will apply to one initial state of your quantum computer but not to another one.”

This is incorrect (while interesting). As I wrote above, the conjecture proposes constraints on the noise model rather than a complete description. If for the same circuit, with one initial condition the conjecture applies and for another initial condition the conjecture does not apply,  this does not violate linearity. Instead, it implies that if preparing these initial conditions makes no difference for the noise, then correlated depolarizing errors will appear in both cases.

4) The noise (or Nature) cannot “know” if the state is entangled or not. Entanglement cannot cause correlations for the noise.

The conjecture does not assume that noise “detects” entanglement or that entanglement directly “causes” correlation. It asserts that the physical processes required to generate entanglement inevitably produce correlated noise.

5) “The conjectured noise resembles nonphysical random-unitary models.”

Even if certain effective noise models implied by the conjecture appear unphysical, that does not refute the conjecture. It may instead indicate that certain large-scale entangled states are themselves physically unrealizable.

If creating a distance-15 surface code necessarily produces nonphysical types of noise, that would suggest that building such a code is itself nonphysical. The figure in my 2009 paper When Noise Accumulates was meant to illustrate precisely this point

6) “The threshold theorem extends to general long-range noise.”

Yes — but these extensions still violate the conjecture. Mathematically speaking, many small long-range interaction terms are not the same as imposing substantial correlated depolarization.

7) “Entanglement is basis dependent.”

The conjecture refers to correlations in a fixed computational basis. There is no ambiguity here.

8) “Empirical independence in random circuit sampling shows that there are no unexpected nasty aspects of quantum noise.

No.

Fidelity measures (including XEB) primarily estimate the probability that no error occurred. They are not sensitive to correlated structure among the error events that do occur.

The conjecture concerns correlations between qubits conditioned on noise events, not the global fidelity of outputs.

9) If the error rate is p, the conjecture  implies \Omega(n^2 p) errors per round. (We expect \Omega(np) errors per round.)”

This observation captures an important feature of the conjecture.

In trace-distance terms, the total noise per round may scale linearly.
But in terms of qubit-error counts, error synchronization can produce quadratic scaling \Omega(n^2 p) in the number of correlated qubit hits.

This quadratic synchronization is precisely the phenomenon the conjecture predicts.

10) The conjecture is too vague; it does not explicitly describe the noise channel. It also does not describe the physical source of the noise and its exact modeling.

This is partially true. The conjecture is structural and not mechanistic (see further explanation below).

Testing the conjecture experimentally would require identifying in experimental data specific correlated fully-depolarizing components. Supporting it theoretically would require fine-grained modeling of concrete physical systems.

11) As long as t > 0 is unspecified, then the conjecture might always stay open.

I conjecture that t is in the ballpark of the rate of 2-gate fully depolarizing error.

In contrast, (to the best of my memory) for fault tolerance quantum computers with n physical qubits, for most pairs of qubits i,j, the correlation between the events “qubit i is hit by a depolarizing  error” and “qubit j is hit by a depolarizing  error” tends to zero as n tends to infinity.

12) The correlation conjecture (and my earlier line of research from 2005) has no direct bearing on topological quantum computing.

Right.

13) The conjecture represents the “physical-noise-is-conspiring position”.  However, Nature is not malicious.

We will see about that 🙂 .

Remarks:

(to items 3,4) The possibility for systematic relation between the noiseless state and the noise was raised and discussed in my 2005 paper and through the years has led to interesting heated discussions.

(to item 5) noise models which resemble the behavior of random unitary operator were offered by early skeptical views. They do not violate the postulates of quantum mechanics but are considered unphysical: they violate how quantum physics is believed to be mapped into quantum mechanics.

(to item 6) Preskill’s 2012 paper (partially triggered by our 2011 Caltech discussion) presented general conditions for the threshold theorem to hold and cites earlier papers in this direction. Robert Alicki opined that the conditions proposed by Preskill are violated for open quantum systems, and I explained in this comment a specific feature of Preskill’s very general noise models that I find physically questionable.

Gated qubits and “purifying” gate errors

The assertion of the conjecture for gated qubits is (a rather vanilla) part of the standard model of noise for noisy quantum circuits and it is also clearly manifested by experimental data.  The negation of the conjecture would mean the ability to create entangled pairs of transmons (or other physical qubits) without any fully depolarizing errors. In effect, this would amount to “purifying” the two-qubit gate used to generate entanglement: starting with two-qubit gates that include fully depolarizing noise at rate t, one could nevertheless produce physical entangled pairs with no fully depolarizing component on the pair itself. The conjecture asserts that such purification is not physically possible.

The conjecture is structural and not mechanistic

Some of the objections listed above (especially, 2,3,4,10) treat the conjecture as if it were proposing a specific mechanistic claim of the form: “Here is the physical source of the noise — e.g., microscopic Hamiltonian couplings, thermal photons, leakage, cross-talk —
and here is the dynamical derivation showing why fully depolarizing correlations appear. Such a claim would specify, the environment, the interaction model, the time evolution, and the exact channel arising from tracing out the bath. My conjecture is a structural claim of the form

Whenever two physical qubits can be prepared (or projected) into a Bell state, there is an intrinsic order-t fully depolarizing component acting jointly on them.

This is a statement about the form of the effective channel, not about the physical process generating it. Of course, mechanistic explanations (that may differ for different implementations) that support the conjecture could be valuable.

A Simple Probabilistic Model: Tail Bounds from Pairwise Marginals

To illustrate how strong pairwise correlations can force even large-scale error synchronization, consider the following simple probabilistic model. There is a probability distribution W on \{0,1\}^n. We generate a random bitstring w according to the distribution W. If w_k=1 we fully depolarize qubit k.

Let X=(x_1,\dots,x_n)\in\{0,1\}^n be a random 0–1 vector of length n, and write S=\sum_{i=1}^n x_i. We assume that the joint distribution of every pair (x_i,x_j) (for i\neq j) is fixed. The goal is to minimize the upper tail probability \Pr(S\ge \lceil sn\rceil).


Theorem (symmetric “one-parameter” pair law): Assume that for every i\neq j,

\displaystyle \Pr(x_i=0,x_j=0)=1-t,\qquad \Pr(x_i=0,x_j=1)=\Pr(x_i=1,x_j=0)=\Pr(x_i=1,x_j=1)=\frac{t}{3},

where t\in[0,1]. Let S=\sum_{i=1}^n x_i and K=\lceil sn\rceil. Then the minimum possible value of \Pr(S\ge K) over all such distributions equals

\displaystyle \min \Pr(S\ge K)= \begin{cases} \frac{4t}{3}\cdot\frac{n}{n+1}, & \text{if } K \le \frac{n+1}{2},\\[6pt] 0, & \text{if } K > \frac{n+1}{2}. \end{cases}

(In particular, for large n this is asymptotically \frac{4t}{3} for s<\tfrac12 and 0 for s>\tfrac12.)

A proof and discussion of this implication will be given separately.

Time-parametrization and smooth Lindblad evolutions

My conjecture asserts that for noisy quantum states and evolutions — when the noise level is low — there are systematic relations between the noise and the corresponding “ideal” state. (Such systematic relations were already explored in my 2005 paper.) For example, the noise in a quantum computation may reflect symmetries and statistical features of the underlying signal.

Over the years, I made several attempts to place these ideas into a broader mathematical framework, extending beyond the specific (and hypothetical) setting of quantum computers. One direction I proposed was the study of certain subclasses of Lindblad evolutions, which I referred to as smoothed Lindblad evolutions. Another direction involved introducing an intrinsic time-parametrization for noisy quantum evolutions. These ideas appear as Predictions 6 and 7 in my 2016 Notices of the AMS paper.

Both directions aim at understanding noise not as arbitrary perturbation, but as a structured phenomenon constrained by the evolving quantum state.

My paper and discussion with Greg Kuperberg.

One offshoot of these conjectures — particularly the notion of smoothed Lindblad evolutions — and long email discussions with Greg Kuperberg led to a joint paper: Contagious error sources would need time travel to prevent quantum computation. Following ideas of Emanuel Knill, we proved that fault tolerance is possible even under very general forms of “disease-like” spreading noise.

Greg viewed this paper as a successful response to my skepticism. I did not quite see it that way. I saw our paper as clarifying an important point: if time-smoothing in my proposed class of Lindblad evolutions applies only forward in time, then fault tolerance can still succeed. To obstruct fault tolerance, the smoothing would need to extend to both past and future — effectively requiring a kind of “time-travel” correlation structure.

Recently, Greg jokingly proposed a joint mathematical collaboration (not necessarily on quantum topics) involving the two of us together with Gal Kronenberg and Guy Kindler. I think this is a wonderful idea.

My argument against QC (2013-2020), and other related directions.

Between 2013 and 2020, I pursued a different skeptical direction regarding quantum computation. Together with Guy Kindler (initially in the context of Boson Sampling), I developed an argument based on computational complexity and the notion of noise sensitivity. This line of work suggests that NISQ devices cannot achieve “quantum supremacy.”

Unlike the Fully Depolarizing Noise Conjecture — which posits a structurally “nasty” type of correlated noise beyond standard modeling — this later argument relies entirely on standard noise assumptions. It explains why the extremely low error rates required for quantum supremacy and scalable fault tolerance may be beyond reach.

Naturally, the quantum-supremacy claims of the past six years directly challenge this position. The central issue is careful evaluation of experimental details. As far as I know, those claims have no direct bearing on the Fully Depolarizing Noise Conjecture.

Both skeptical directions generate near-term experimental predictions. Both are discussed in my 2016 paper “The quantum computer Puzzle“. While experimental validation is the primary testing ground, another possible avenue is to derive broader physical consequences beyond the specific framework of quantum computing.

Let me also note that the question of whether “NISQ computers” could deliver “quantum supremacy” arose in my debate with Aram Harrow — and in other discussions — before the terms “NISQ” and “quantum supremacy” were coined.

Statistical testing

Since 2019, I have also been working (with Yosi Rinott, Tomer Shoham, and Carsten Voelkmann) on developing statistical tools for analyzing samples produced by NISQ experiments. Engaging directly with experimental data and statistical methodology may prove useful for testing the Fully Depolarizing Noise Conjecture as well.

One possible approach is to begin with a standard noise model, augment it with the hypothetical two-qubit depolarizing component (DEPOLARIZE2) predicted by the conjecture at some rate p, and then determine the value of p that best matches empirical data. Here DEPOLARIZE2 refers to a two-qubit depolarizing channel acting jointly on the pair.

I recently had useful discussions with Craig Gidney about this type of simulation. Craig is optimistic that the skeptical position will be decisively refuted in the coming years. Statistical fitting of experimental samples to classes of W-noise models (discussed earlier) may also provide empirical tests of the conjecture.

Conclusion

The Fully Depolarizing Noise Conjecture was proposed twenty years ago as a structural stress test for quantum computing — a condition that scalable quantum devices must overcome. It does not attempt to describe a specific microscopic mechanism. Rather, it imposes a constraint on the effective noise channel: whenever physical qubits can generate entanglement — or localizable entanglement — correlated fully depolarizing noise must appear at linear order.

Whether this structural constraint reflects a fundamental limitation of quantum devices, or whether it will ultimately be refuted by experiment, remains an open question. The answer lies mainly in precise experimental scrutiny together with careful theoretical modeling and analysis.

 

A few pictures from the safe room (closet).

Late remark: There is a lovely interview of Scott Aaronson with Yuval Boger of QuEra. Yuval asked: “I’ve heard you refer many times in your talks to an argument with Gil Kalai about whether large-scale quantum computing is even possible. Do you think he still has a path to being vindicated, or is it over?” Scott gave a detailed and nice answer presenting his view of my position. (See also this SO post.) The discussion nicely illustrates how this debate continues to evolve twenty years after the Fully Depolarizing Noise Conjecture was first proposed.

 

By Gil Kalai

Mysticeti: Revolutionizing Consensus on Sui

from Decentralized Thoughts

TL;DR Mysticeti is a novel Byzantine consensus protocol deployed on the Sui blockchain that revolutionizes transaction processing. Eliminates explicit certification, reducing good-case latency to the theoretical minimum. Crash-fault masking, avoiding head-of-line blocking for pipelined rounds. 40% CPU usage reduction for consensus in production. 80% latency reduction compared to Bullshark on Sui Mainnet (from ~1.9s to ~400ms). Mysticeti is a novel Byzantine consensus protocol recently deployed on the Sui blockchain. It...

By Lefteris Kokoris Kogias, Alberto Sonnino, George Danezis

TL;DR Mysticeti is a novel Byzantine consensus protocol deployed on the Sui blockchain that revolutionizes transaction processing. Eliminates explicit certification, reducing good-case latency to the theoretical minimum. Crash-fault masking, avoiding head-of-line blocking for pipelined rounds. 40% CPU usage reduction for consensus in production. 80% latency reduction compared to Bullshark on Sui Mainnet (from ~1.9s to ~400ms). Mysticeti is a novel Byzantine consensus protocol recently deployed on the Sui blockchain. It...

By Lefteris Kokoris Kogias, Alberto Sonnino, George Danezis

Recurrent Graph Neural Networks and Arithmetic Circuits

from arXiv: Computational Complexity

Authors: Timon Barlag, Vivian Holzapfel, Laura Strieker, Jonni Virtema, Heribert Vollmer

We characterise the computational power of recurrent graph neural networks (GNNs) in terms of arithmetic circuits over the real numbers. Our networks are not restricted to aggregate-combine GNNs or other particular types. Generalizing similar notions from the literature, we introduce the model of recurrent arithmetic circuits, which can be seen as arithmetic analogues of sequential or logical circuits. These circuits utilise so-called memory gates which are used to store data between iterations of the recurrent circuit. While (recurrent) GNNs work on labelled graphs, we construct arithmetic circuits that obtain encoded labelled graphs as real valued tuples and then compute the same function. For the other direction we construct recurrent GNNs which are able to simulate the computations of recurrent circuits. These GNNs are given the circuit-input as initial feature vectors and then, after the GNN-computation, have the circuit-output among the feature vectors of its nodes. In this way we establish an exact correspondence between the expressivity of recurrent GNNs and recurrent arithmetic circuits operating over real numbers.

Authors: Timon Barlag, Vivian Holzapfel, Laura Strieker, Jonni Virtema, Heribert Vollmer

We characterise the computational power of recurrent graph neural networks (GNNs) in terms of arithmetic circuits over the real numbers. Our networks are not restricted to aggregate-combine GNNs or other particular types. Generalizing similar notions from the literature, we introduce the model of recurrent arithmetic circuits, which can be seen as arithmetic analogues of sequential or logical circuits. These circuits utilise so-called memory gates which are used to store data between iterations of the recurrent circuit. While (recurrent) GNNs work on labelled graphs, we construct arithmetic circuits that obtain encoded labelled graphs as real valued tuples and then compute the same function. For the other direction we construct recurrent GNNs which are able to simulate the computations of recurrent circuits. These GNNs are given the circuit-input as initial feature vectors and then, after the GNN-computation, have the circuit-output among the feature vectors of its nodes. In this way we establish an exact correspondence between the expressivity of recurrent GNNs and recurrent arithmetic circuits operating over real numbers.

Attacking the Polynomials in the Maze of Finite Fields problem

from arXiv: Computational Complexity

Authors: Àngela Barbero, Ragnar Freij-Hollanti, Camilla Hollanti, Håvard Raddum, Øyvind Ytrehus, Morten Øygarden

In April 2025 GMV announced a competition for finding the best method to solve a particular polynomial system over a finite field. In this paper we provide a method for solving the given equation system significantly faster than what is possible by brute-force or standard Gröbner basis approaches. The method exploits the structured sparsity of the polynomial system to compute a univariate polynomial in the associated ideal through successive computations of resultants. A solution to the system can then be efficiently recovered from this univariate polynomial. Pseudocode is given for the proposed ResultantSolver algorithm, along with experiments and comparisons to rival methods. We also discuss further potential improvements, such as parallelizing parts of the computations.

Authors: Àngela Barbero, Ragnar Freij-Hollanti, Camilla Hollanti, Håvard Raddum, Øyvind Ytrehus, Morten Øygarden

In April 2025 GMV announced a competition for finding the best method to solve a particular polynomial system over a finite field. In this paper we provide a method for solving the given equation system significantly faster than what is possible by brute-force or standard Gröbner basis approaches. The method exploits the structured sparsity of the polynomial system to compute a univariate polynomial in the associated ideal through successive computations of resultants. A solution to the system can then be efficiently recovered from this univariate polynomial. Pseudocode is given for the proposed ResultantSolver algorithm, along with experiments and comparisons to rival methods. We also discuss further potential improvements, such as parallelizing parts of the computations.

Garment numbers of bi-colored point sets in the plane

from arXiv: Computational Geometry

Authors: Oswin Aichholzer, Helena Bergold, Simon D. Fink, Maarten Löffler, Patrick Schnider, Josef Tkadlec

We consider colored variants of a class of geometric-combinatorial questions on $k$-gons and empty $k$-gons that have been started around 1935 by Erdős and Szekeres. In our setting we have $n$ points in general position in the plane, each one colored either red or blue. A structure on $k$ points is a geometric graph where the edges are spanned by (some of) these points and is called monochromatic if all $k$ points have the same color. Already for $k=4$ there exist interesting open problems. Most prominently, it is still open whether for any sufficiently large bichromatic set there always exists a convex empty, monochromatic quadrilateral. In order to shed more light on the underlying geometry we study the existence of five different monochromatic structures that all use exactly 4 points of a bichromatic point set. We provide several improved lower and upper bounds on the smallest $n$ such that every bichromatic set of at least $n$ points contains (some of) those monochromatic structures.

Authors: Oswin Aichholzer, Helena Bergold, Simon D. Fink, Maarten Löffler, Patrick Schnider, Josef Tkadlec

We consider colored variants of a class of geometric-combinatorial questions on $k$-gons and empty $k$-gons that have been started around 1935 by Erdős and Szekeres. In our setting we have $n$ points in general position in the plane, each one colored either red or blue. A structure on $k$ points is a geometric graph where the edges are spanned by (some of) these points and is called monochromatic if all $k$ points have the same color. Already for $k=4$ there exist interesting open problems. Most prominently, it is still open whether for any sufficiently large bichromatic set there always exists a convex empty, monochromatic quadrilateral. In order to shed more light on the underlying geometry we study the existence of five different monochromatic structures that all use exactly 4 points of a bichromatic point set. We provide several improved lower and upper bounds on the smallest $n$ such that every bichromatic set of at least $n$ points contains (some of) those monochromatic structures.

Drone Air Traffic Control: Tracking a Set of Moving Objects with Minimal Power

from arXiv: Computational Geometry

Authors: Chek-Manh Loi, Michael Perk, Malte Hoffmann, Sándor Fekete

A common sensing problem is to use a set of stationary tracking locations to monitor a collection of moving devices: Given $n$ objects that need to be tracked, each following its own trajectory, and $m$ stationary traffic control stations, each with a sensing region of adjustable range; how should we adjust the individual sensor ranges in order to optimize energy consumption? We provide both negative theoretical and positive practical results for this important and natural challenge. On the theoretical side, we show that even if all objects move at constant speed along straight lines, no polynomial-time algorithm can guarantee optimal coverage for a given starting solution. On the practical side, we present an algorithm based on geometric insights that is able to find optimal solutions for the $\min \max$ variant of the problem, which aims at minimizing peak power consumption. Runtimes for instances with 500 moving objects and 25 stations are in the order of seconds for scenarios that take minutes to play out in the real world, demonstrating real-time capability of our methods.

Authors: Chek-Manh Loi, Michael Perk, Malte Hoffmann, Sándor Fekete

A common sensing problem is to use a set of stationary tracking locations to monitor a collection of moving devices: Given $n$ objects that need to be tracked, each following its own trajectory, and $m$ stationary traffic control stations, each with a sensing region of adjustable range; how should we adjust the individual sensor ranges in order to optimize energy consumption? We provide both negative theoretical and positive practical results for this important and natural challenge. On the theoretical side, we show that even if all objects move at constant speed along straight lines, no polynomial-time algorithm can guarantee optimal coverage for a given starting solution. On the practical side, we present an algorithm based on geometric insights that is able to find optimal solutions for the $\min \max$ variant of the problem, which aims at minimizing peak power consumption. Runtimes for instances with 500 moving objects and 25 stations are in the order of seconds for scenarios that take minutes to play out in the real world, demonstrating real-time capability of our methods.

What induces plane structures in complete graph drawings?

from arXiv: Computational Geometry

Authors: Alexandra Weinberger, Ji Zeng

This paper considers the task of connecting points on a piece of paper by drawing a curve between each pair of them. Under mild assumptions, we prove that many pairwise disjoint curves are unavoidable if either of the following rules is obeyed: any two adjacent curves do not cross, or any two non-adjacent curves cross at most once. Here, two curves are called adjacent if they share an endpoint. On the other hand, we demonstrate how to draw all curves such that any two adjacent curves cross exactly once, any two non-adjacent curves cross at least once and at most twice, and thus no two curves are disjoint. Furthermore, we analyze the emergence of disjoint curves without these mild assumptions, and characterize the plane structures in complete graph drawings guaranteed by each of the rules above.

Authors: Alexandra Weinberger, Ji Zeng

This paper considers the task of connecting points on a piece of paper by drawing a curve between each pair of them. Under mild assumptions, we prove that many pairwise disjoint curves are unavoidable if either of the following rules is obeyed: any two adjacent curves do not cross, or any two non-adjacent curves cross at most once. Here, two curves are called adjacent if they share an endpoint. On the other hand, we demonstrate how to draw all curves such that any two adjacent curves cross exactly once, any two non-adjacent curves cross at least once and at most twice, and thus no two curves are disjoint. Furthermore, we analyze the emergence of disjoint curves without these mild assumptions, and characterize the plane structures in complete graph drawings guaranteed by each of the rules above.

Structural Properties of Shortest Flip Sequences Between Plane Spanning Trees

from arXiv: Computational Geometry

Authors: Oswin Aichholzer, Joseph Dorfer, Peter Kramer, Christian Rieck, Birgit Vogtenhuber

We study the reconfiguration of plane spanning trees on point sets in the plane in convex position, where a reconfiguration step (flip) replaces one edge with another, yielding again a plane spanning tree. The flip distance between two trees is then the minimum number of flips needed to transform one tree into the other. We study structural properties of shortest flip sequences. The folklore happy edge conjecture suggests that any edge shared by both the initial and target tree is never flipped in a shortest flip sequence. The more recent parking edge conjecture, which would have implied the happy edge conjecture, states that there exist shortest flip sequences which use only edges of the start and target tree, and edges in the convex hull of the point set. Finally, another conjecture that is implicit in the literature is the reparking conjecture which states that no edge is flipped more than twice. Essentially all recent flip algorithms respect these three conjectures and the properties they imply. We study cases in which the latter two conjectures hold and disprove them for the general setting. (Shortened abstract due to arXiv restrictions.)

Authors: Oswin Aichholzer, Joseph Dorfer, Peter Kramer, Christian Rieck, Birgit Vogtenhuber

We study the reconfiguration of plane spanning trees on point sets in the plane in convex position, where a reconfiguration step (flip) replaces one edge with another, yielding again a plane spanning tree. The flip distance between two trees is then the minimum number of flips needed to transform one tree into the other. We study structural properties of shortest flip sequences. The folklore happy edge conjecture suggests that any edge shared by both the initial and target tree is never flipped in a shortest flip sequence. The more recent parking edge conjecture, which would have implied the happy edge conjecture, states that there exist shortest flip sequences which use only edges of the start and target tree, and edges in the convex hull of the point set. Finally, another conjecture that is implicit in the literature is the reparking conjecture which states that no edge is flipped more than twice. Essentially all recent flip algorithms respect these three conjectures and the properties they imply. We study cases in which the latter two conjectures hold and disprove them for the general setting. (Shortened abstract due to arXiv restrictions.)

Reconfiguration of Squares Using a Constant Number of Moves Each

from arXiv: Computational Geometry

Authors: Thijs van der Horst, Maarten Löffler, Tim Ophelders, Tom Peters

Multi-robot motion planning is a hard problem. We investigate restricted variants of the problem where square robots are allowed to slide over an arbitrary curve to a new position only a constant number of times each. We show that the problem remains NP-hard in most cases, except when the squares have unit size and when the problem is unlabeled, i.e., the location of each square in the target configuration is left unspecified.

Authors: Thijs van der Horst, Maarten Löffler, Tim Ophelders, Tom Peters

Multi-robot motion planning is a hard problem. We investigate restricted variants of the problem where square robots are allowed to slide over an arbitrary curve to a new position only a constant number of times each. We show that the problem remains NP-hard in most cases, except when the squares have unit size and when the problem is unlabeled, i.e., the location of each square in the target configuration is left unspecified.

An Optimal Algorithm for Computing Many Faces in Line Arrangements

from arXiv: Computational Geometry

Authors: Haitao Wang

Given a set of $m$ points and a set of $n$ lines in the plane, we consider the problem of computing the faces of the arrangement of the lines that contain at least one point. In this paper, we present an $O(m^{2/3}n^{2/3}+(n+m)\log n)$ time algorithm for the problem. We also show that this matches the lower bound under the algebraic decision tree model and thus our algorithm is optimal. In particular, when $m=n$, the runtime is $O(n^{4/3})$, which matches the worst case combinatorial complexity $Ω(n^{4/3})$ of all output faces. This is the first optimal algorithm since the problem was first studied more than three decades ago [Edelsbrunner, Guibas, and Sharir, SoCG 1988].

Authors: Haitao Wang

Given a set of $m$ points and a set of $n$ lines in the plane, we consider the problem of computing the faces of the arrangement of the lines that contain at least one point. In this paper, we present an $O(m^{2/3}n^{2/3}+(n+m)\log n)$ time algorithm for the problem. We also show that this matches the lower bound under the algebraic decision tree model and thus our algorithm is optimal. In particular, when $m=n$, the runtime is $O(n^{4/3})$, which matches the worst case combinatorial complexity $Ω(n^{4/3})$ of all output faces. This is the first optimal algorithm since the problem was first studied more than three decades ago [Edelsbrunner, Guibas, and Sharir, SoCG 1988].

Quadratic polarity and polar Fenchel-Young divergences from the canonical Legendre polarity

from arXiv: Computational Geometry

Authors: Frank Nielsen, Basile Plus-Gourdon, Mahito Sugiyama

Polarity is a fundamental reciprocal duality of $n$-dimensional projective geometry which associates to points polar hyperplanes, and more generally $k$-dimensional convex bodies to polar $(n-1-k)$-dimensional convex bodies. It is well-known that the Legendre-Fenchel transformation of functions can be interpreted from the polarity viewpoint of their graphs using an extra dimension. In this paper, we first show that generic polarities induced by quadratic polarity functionals can be expressed either as deformed Legendre polarity or as the Legendre polarity of deformed convex bodies, and be efficiently manipulated using linear algebra on $(n+2)\times (n+2)$ matrices operating on homogeneous coordinates. Second, we define polar divergences using the Legendre polarity and show that they generalize the Fenchel-Young divergence or equivalent Bregman divergence. This polarity study brings new understanding of the core reference duality in information geometry. Last, we show that the total Bregman divergences can be considered as a total polar Fenchel-Young divergence from which we newly exhibit the reference duality using dual polar conformal factors.

Authors: Frank Nielsen, Basile Plus-Gourdon, Mahito Sugiyama

Polarity is a fundamental reciprocal duality of $n$-dimensional projective geometry which associates to points polar hyperplanes, and more generally $k$-dimensional convex bodies to polar $(n-1-k)$-dimensional convex bodies. It is well-known that the Legendre-Fenchel transformation of functions can be interpreted from the polarity viewpoint of their graphs using an extra dimension. In this paper, we first show that generic polarities induced by quadratic polarity functionals can be expressed either as deformed Legendre polarity or as the Legendre polarity of deformed convex bodies, and be efficiently manipulated using linear algebra on $(n+2)\times (n+2)$ matrices operating on homogeneous coordinates. Second, we define polar divergences using the Legendre polarity and show that they generalize the Fenchel-Young divergence or equivalent Bregman divergence. This polarity study brings new understanding of the core reference duality in information geometry. Last, we show that the total Bregman divergences can be considered as a total polar Fenchel-Young divergence from which we newly exhibit the reference duality using dual polar conformal factors.

Hypercube drawings with no long plane paths

from arXiv: Computational Geometry

Authors: Todor Antić, Niloufar Fuladi, Anna Margarethe Limbach, Pavel Valtr

We study the existence of plane substructures in drawings of the $d$-dimensional hypercube graph $Q_d$. We construct drawings of $Q_d$ which contain no plane subgraph with more than $2d-2$ edges, no plane path with more than $2d-3$ edges, and no plane matching of size more than $2d-4$. On the other hand, we prove that every rectilinear drawing of $Q_d$ with vertices in convex position contains a plane path of length $d$ (if $d$ is odd) or $d-1$ (if $d$ is even). We also prove that if a graph $G$ is a plane subgraph of every drawing of $Q_d$ for a sufficiently large $d$, then $G$ is necessarily a forest of caterpillars. Lastly, we give a short proof of a generalization of a result by Alpert et al. [Cong. Numerantium, 2009] on the maximum rectilinear crossing number of $Q_d$.

Authors: Todor Antić, Niloufar Fuladi, Anna Margarethe Limbach, Pavel Valtr

We study the existence of plane substructures in drawings of the $d$-dimensional hypercube graph $Q_d$. We construct drawings of $Q_d$ which contain no plane subgraph with more than $2d-2$ edges, no plane path with more than $2d-3$ edges, and no plane matching of size more than $2d-4$. On the other hand, we prove that every rectilinear drawing of $Q_d$ with vertices in convex position contains a plane path of length $d$ (if $d$ is odd) or $d-1$ (if $d$ is even). We also prove that if a graph $G$ is a plane subgraph of every drawing of $Q_d$ for a sufficiently large $d$, then $G$ is necessarily a forest of caterpillars. Lastly, we give a short proof of a generalization of a result by Alpert et al. [Cong. Numerantium, 2009] on the maximum rectilinear crossing number of $Q_d$.

Estimation of Persistence Diagrams via the Three Gap Theorem

from arXiv: Computational Geometry

Authors: Luis Suarez Salas, Jose A. Perea

The time delay (or Sliding Window) embedding is a technique from dynamical systems to reconstruct attractors from time series data. Recently, descriptors from Topological Data Analysis (TDA) -- specifically, persistence diagrams -- have been used to measure the shape of said reconstructed attractors in applications including periodicity and quasiperiodicity quantification. Despite their utility, the fast computation of persistence diagrams of sliding window embeddings is still poorly understood. In this work, we present theoretical and computational schemes to approximate the persistence diagrams of sliding window embeddings from quasiperiodic functions. We do so by combining the Three Gap Theorem from number theory with the Persistent Künneth formula from TDA, and derive fast and provably correct persistent homology approximations. The input to our procedure is the spectrum of the signal, and we provide numerical as well as theoretical evidence of its utility to capture the shape of toroidal attractors.

Authors: Luis Suarez Salas, Jose A. Perea

The time delay (or Sliding Window) embedding is a technique from dynamical systems to reconstruct attractors from time series data. Recently, descriptors from Topological Data Analysis (TDA) -- specifically, persistence diagrams -- have been used to measure the shape of said reconstructed attractors in applications including periodicity and quasiperiodicity quantification. Despite their utility, the fast computation of persistence diagrams of sliding window embeddings is still poorly understood. In this work, we present theoretical and computational schemes to approximate the persistence diagrams of sliding window embeddings from quasiperiodic functions. We do so by combining the Three Gap Theorem from number theory with the Persistent Künneth formula from TDA, and derive fast and provably correct persistent homology approximations. The input to our procedure is the spectrum of the signal, and we provide numerical as well as theoretical evidence of its utility to capture the shape of toroidal attractors.

ETH-Tight Complexity of Optimal Morse Matching on Bounded-Treewidth Complexes

from arXiv: Data Structures and Algorithms

Authors: Geevarghese Philip, Erlend Raa Vågset

The Optimal Morse Matching (OMM) problem asks for a discrete gradient vector field on a simplicial complex that minimizes the number of critical simplices. It is NP-hard and has been studied extensively in heuristic, approximation, and parameterized complexity settings. Parameterized by treewidth $k$, OMM has long been known to be solvable on triangulations of $3$-manifolds in $2^{O(k^2)} n^{O(1)}$ time and in FPT time for triangulations of arbitrary manifolds, but the exact dependence on $k$ has remained an open question. We resolve this by giving a new $2^{O(k \log k)} n$-time algorithm for any finite regular CW complex, and show that no $2^{o(k \log k)} n^{O(1)}$-time algorithm exists unless the Exponential Time Hypothesis (ETH) fails.

Authors: Geevarghese Philip, Erlend Raa Vågset

The Optimal Morse Matching (OMM) problem asks for a discrete gradient vector field on a simplicial complex that minimizes the number of critical simplices. It is NP-hard and has been studied extensively in heuristic, approximation, and parameterized complexity settings. Parameterized by treewidth $k$, OMM has long been known to be solvable on triangulations of $3$-manifolds in $2^{O(k^2)} n^{O(1)}$ time and in FPT time for triangulations of arbitrary manifolds, but the exact dependence on $k$ has remained an open question. We resolve this by giving a new $2^{O(k \log k)} n$-time algorithm for any finite regular CW complex, and show that no $2^{o(k \log k)} n^{O(1)}$-time algorithm exists unless the Exponential Time Hypothesis (ETH) fails.

Quantum Algorithms for Network Signal Coordination

from arXiv: Data Structures and Algorithms

Authors: Vinayak Dixit, Richard Pech

There has been increasing interest in developing efficient quantum algorithms for hard classical problems. The Network Signal Coordination (NSC) problem is one such problem known to be NP complete. We implement Grover's search algorithm to solve the NSC problem to provide quadratic speedup. We further extend the algorithm to a Robust NSC formulation and analyse its complexity under both constant and polynomial-precision robustness parameters. The Robust NSC problem determines whether there exists a fraction (alpha) of solutions space that will lead to system delays less than a maximum threshold (K). The key contributions of this work are (1) development of a quantum algorithm for the NSC problem, and (2) a quantum algorithm for the Robust NSC problem whose iteration count is O(1/sqrt(alpha)), independent of the search space size, and (3) an extension to polynomial-precision robustness where alpha = alpha_o/p(N) decays polynomially with network size, retaining a quadratic quantum speedup. We demonstrate its implementation through simulation and on an actual quantum computer.

Authors: Vinayak Dixit, Richard Pech

There has been increasing interest in developing efficient quantum algorithms for hard classical problems. The Network Signal Coordination (NSC) problem is one such problem known to be NP complete. We implement Grover's search algorithm to solve the NSC problem to provide quadratic speedup. We further extend the algorithm to a Robust NSC formulation and analyse its complexity under both constant and polynomial-precision robustness parameters. The Robust NSC problem determines whether there exists a fraction (alpha) of solutions space that will lead to system delays less than a maximum threshold (K). The key contributions of this work are (1) development of a quantum algorithm for the NSC problem, and (2) a quantum algorithm for the Robust NSC problem whose iteration count is O(1/sqrt(alpha)), independent of the search space size, and (3) an extension to polynomial-precision robustness where alpha = alpha_o/p(N) decays polynomially with network size, retaining a quadratic quantum speedup. We demonstrate its implementation through simulation and on an actual quantum computer.

Generalizing Fair Top-$k$ Selection: An Integrative Approach

from arXiv: Data Structures and Algorithms

Authors: Guangya Cai

Fair top-$k$ selection, which ensures appropriate proportional representation of members from minority or historically disadvantaged groups among the top-$k$ selected candidates, has drawn significant attention. We study the problem of finding a fair (linear) scoring function with multiple protected groups while also minimizing the disparity from a reference scoring function. This generalizes the prior setup, which was restricted to the single-group setting without disparity minimization. Previous studies imply that the number of protected groups may have a limited impact on the runtime efficiency. However, driven by the need for experimental exploration, we find that this implication overlooks a critical issue that may affect the fairness of the outcome. Once this issue is properly considered, our hardness analysis shows that the problem may become computationally intractable even for a two-dimensional dataset and small values of $k$. However, our analysis also reveals a gap in the hardness barrier, enabling us to recover the efficiency for the case of small $k$ when the number of protected groups is sufficiently small. Furthermore, beyond measuring disparity as the "distance" between the fair and the reference scoring functions, we introduce an alternative disparity measure$\unicode{x2014}$utility loss$\unicode{x2014}$that may yield a more stable scoring function under small weight perturbations. Through careful engineering trade-offs that balance implementation complexity, robustness, and performance, our augmented two-pronged solution demonstrates strong empirical performance on real-world datasets, with experimental observations also informing algorithm design and implementation decisions.

Authors: Guangya Cai

Fair top-$k$ selection, which ensures appropriate proportional representation of members from minority or historically disadvantaged groups among the top-$k$ selected candidates, has drawn significant attention. We study the problem of finding a fair (linear) scoring function with multiple protected groups while also minimizing the disparity from a reference scoring function. This generalizes the prior setup, which was restricted to the single-group setting without disparity minimization. Previous studies imply that the number of protected groups may have a limited impact on the runtime efficiency. However, driven by the need for experimental exploration, we find that this implication overlooks a critical issue that may affect the fairness of the outcome. Once this issue is properly considered, our hardness analysis shows that the problem may become computationally intractable even for a two-dimensional dataset and small values of $k$. However, our analysis also reveals a gap in the hardness barrier, enabling us to recover the efficiency for the case of small $k$ when the number of protected groups is sufficiently small. Furthermore, beyond measuring disparity as the "distance" between the fair and the reference scoring functions, we introduce an alternative disparity measure$\unicode{x2014}$utility loss$\unicode{x2014}$that may yield a more stable scoring function under small weight perturbations. Through careful engineering trade-offs that balance implementation complexity, robustness, and performance, our augmented two-pronged solution demonstrates strong empirical performance on real-world datasets, with experimental observations also informing algorithm design and implementation decisions.

Revisiting Graph Modification via Disk Scaling: From One Radius to Interval-Based Radii

from arXiv: Data Structures and Algorithms

Authors: Thomas Depian, Frank Sommer

For a fixed graph class $Π$, the goal of $Π$-Modification is to transform an input graph $G$ into a graph $H\inΠ$ using at most $k$ modifications. Vertex and edge deletions are common operations, and their (parameterized) complexity for various $Π$ is well-studied. Classic graph modification operations such as edge deletion do not consider the geometric nature of geometric graphs such as (unit) disk graphs. This led Fomin et al. [ITCS' 25] to initiate the study of disk scaling as a geometric graph modification operation for unit disk graphs: For a given radius $r$, each modified disk will be rescaled to radius $r$. In this paper, we generalize their model by allowing rescaled disks to choose a radius within a given interval $[r_{\min}, r_{\max}]$ and study the (parameterized) complexity (with respect to $k$) of the corresponding problem $Π$-Scaling. We show that $Π$-Scaling is in XP for every graph class $Π$ that can be recognized in polynomial time. Furthermore, we show that $Π$-Scaling: (1) is NP-hard and FPT for cluster graphs, (2) can be solved in polynomial time for complete graphs, and (3) is W[1]-hard for connected graphs. In particular, (1) and (2) answer open questions of Fomin et al. and (3) generalizes the hardness result for their variant where the set of scalable disks is restricted.

Authors: Thomas Depian, Frank Sommer

For a fixed graph class $Π$, the goal of $Π$-Modification is to transform an input graph $G$ into a graph $H\inΠ$ using at most $k$ modifications. Vertex and edge deletions are common operations, and their (parameterized) complexity for various $Π$ is well-studied. Classic graph modification operations such as edge deletion do not consider the geometric nature of geometric graphs such as (unit) disk graphs. This led Fomin et al. [ITCS' 25] to initiate the study of disk scaling as a geometric graph modification operation for unit disk graphs: For a given radius $r$, each modified disk will be rescaled to radius $r$. In this paper, we generalize their model by allowing rescaled disks to choose a radius within a given interval $[r_{\min}, r_{\max}]$ and study the (parameterized) complexity (with respect to $k$) of the corresponding problem $Π$-Scaling. We show that $Π$-Scaling is in XP for every graph class $Π$ that can be recognized in polynomial time. Furthermore, we show that $Π$-Scaling: (1) is NP-hard and FPT for cluster graphs, (2) can be solved in polynomial time for complete graphs, and (3) is W[1]-hard for connected graphs. In particular, (1) and (2) answer open questions of Fomin et al. and (3) generalizes the hardness result for their variant where the set of scalable disks is restricted.

Finding Short Paths on Simple Polytopes

from arXiv: Data Structures and Algorithms

Authors: Alexander E. Black, Raphael Steiner

We prove that computing a shortest monotone path to the optimum of a linear program over a simple polytope is NP-hard, thus resolving a 2022 open question of De Loera, Kafer, and Sanità. As a consequence, finding a shortest sequence of pivots to an optimal basis with the simplex method is NP-hard. In fact, we show this is NP-hard already for fractional knapsack polytopes. By applying an additional polyhedral construction, we show that computing the diameter of a simple polytope is NP-hard, resolving a 2003 open problem by Kaibel and Pfetsch. Finally, on the positive side we show that every polytope has a small, simple extended formulation for which a linear length path may be found between any pair of vertices in polynomial time building upon a result of Kaibel and Kukharenko.

Authors: Alexander E. Black, Raphael Steiner

We prove that computing a shortest monotone path to the optimum of a linear program over a simple polytope is NP-hard, thus resolving a 2022 open question of De Loera, Kafer, and Sanità. As a consequence, finding a shortest sequence of pivots to an optimal basis with the simplex method is NP-hard. In fact, we show this is NP-hard already for fractional knapsack polytopes. By applying an additional polyhedral construction, we show that computing the diameter of a simple polytope is NP-hard, resolving a 2003 open problem by Kaibel and Pfetsch. Finally, on the positive side we show that every polytope has a small, simple extended formulation for which a linear length path may be found between any pair of vertices in polynomial time building upon a result of Kaibel and Kukharenko.

Generalized matching decoders for 2D topological translationally-invariant codes

from arXiv: Data Structures and Algorithms

Authors: Shi Jie Samuel Tan, Ian Gill, Eric Huang, Pengyu Liu, Chen Zhao, Hossein Dehghani, Aleksander Kubica, Hengyun Zhou, Arpit Dua

Two-dimensional topological translationally-invariant (TTI) quantum codes, such as the toric code (TC) and bivariate bicycle (BB) codes, are promising candidates for fault-tolerant quantum computation. For such codes to be practically relevant, their decoders must successfully correct the most likely errors while remaining computationally efficient. For the TC, graph-matching decoders satisfy both requirements and, additionally, admit provable performance guarantees. Given the equivalence between TTI codes and (multiple copies of) the TC, one may then ask whether TTI codes also admit analogous graph-matching decoders. In this work, we develop a graph-matching approach to decoding general TTI codes. Intuitively, our approach coarse-grains the TTI code to obtain an effective description of the syndrome in terms of TC excitations, which can then be removed using graph-matching techniques. We prove that our decoders correct errors of weight up to a constant fraction of the code distance and achieve non-zero code-capacity thresholds. We further numerically study a variant optimized for practically relevant BB codes and observe performance comparable to that of the belief propagation with ordered statistics decoder. Our results indicate that graph-matching decoders are a viable approach to decoding BB codes and other TTI codes.

Authors: Shi Jie Samuel Tan, Ian Gill, Eric Huang, Pengyu Liu, Chen Zhao, Hossein Dehghani, Aleksander Kubica, Hengyun Zhou, Arpit Dua

Two-dimensional topological translationally-invariant (TTI) quantum codes, such as the toric code (TC) and bivariate bicycle (BB) codes, are promising candidates for fault-tolerant quantum computation. For such codes to be practically relevant, their decoders must successfully correct the most likely errors while remaining computationally efficient. For the TC, graph-matching decoders satisfy both requirements and, additionally, admit provable performance guarantees. Given the equivalence between TTI codes and (multiple copies of) the TC, one may then ask whether TTI codes also admit analogous graph-matching decoders. In this work, we develop a graph-matching approach to decoding general TTI codes. Intuitively, our approach coarse-grains the TTI code to obtain an effective description of the syndrome in terms of TC excitations, which can then be removed using graph-matching techniques. We prove that our decoders correct errors of weight up to a constant fraction of the code distance and achieve non-zero code-capacity thresholds. We further numerically study a variant optimized for practically relevant BB codes and observe performance comparable to that of the belief propagation with ordered statistics decoder. Our results indicate that graph-matching decoders are a viable approach to decoding BB codes and other TTI codes.

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

from arXiv: Data Structures and Algorithms

Authors: Maryam Aliakbarpour, Alireza Azizi, Ria Stevens

Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $ε$-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of \textit{augmented distribution testing}. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction's quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of $d$ random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.

Authors: Maryam Aliakbarpour, Alireza Azizi, Ria Stevens

Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $ε$-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of \textit{augmented distribution testing}. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction's quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of $d$ random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.

Thursday, March 05

Moar Updatez

from Scott Aaronson

To start on a somber note: those of us at UT Austin are in mourning this week for Savitha Shan, an undergrad double major here in economics and information systems, who was murdered over the weekend by an Islamist terrorist who started randomly shooting people on Sixth Street, apparently angry about the war in Iran. […]

To start on a somber note: those of us at UT Austin are in mourning this week for Savitha Shan, an undergrad double major here in economics and information systems, who was murdered over the weekend by an Islamist terrorist who started randomly shooting people on Sixth Street, apparently angry about the war in Iran. Two other innocents were also killed.

As it happens, these murders happened just a few hours after the end of my daughter’s bat mitzvah, and in walking distance from the venue. The bat mitzvah itself was an incredibly joyful and successful event that consumed most of my time lately, and which I might or might not say more about—the nastier the online trolls get, the more I need to think about my family’s privacy.


Of all the many quantum computing podcasts/interviews I’ve done recently, I’m probably happiest with this one, with Yuval Boger of QuEra. It covers all the main points about where the hardware currently is, the threat to public-key cryptography, my decades-long battle against quantum applications hype, etc. etc., and there’s even an AI-created transcript that eliminates my verbal infelicities!


A month ago, I blogged about “The Time I Didn’t Meet Jeffrey Epstein” (basically, because my mom warned me not to). Now the story has been written up in Science magazine, under the clickbaity headline “Meet Three Scientists Who Said No to Epstein.” (Besides yours truly, the other two scientists are friend-of-the-blog Sean Carroll, whose not-meeting-Epstein story I’d already heard directly from him, and David Agus, whose story I hadn’t heard.)

To be clear: as I explained in my post, I never actually said “no” to Epstein. Instead, based on my mom’s advice, I simply failed to follow up with his emissary, to the point where no meeting ever happened.

Anyway, ever since Science ran this story and it started making the rounds on social media, my mom has been getting congratulatory messages from friends of hers who saw it!


I’ve been a huge fan of the philosopher-novelist Rebecca Newberger Goldstein ever since I read her celebrated debut work, The Mind-Body Problem, back in 2005. Getting to know Rebecca and her husband, Steven Pinker, was a highlight of my last years at MIT. So I’m thrilled that Rebecca will be visiting UT Austin next week to give a talk on Spinoza, related to her latest book The Mattering Instinct (which I’m reading right now), and hosted by me and my colleague Galen Strawson in UT’s philosophy department. More info is in the poster below. If you’re in Austin, I hope to see you there!


The 88-year-old Donald Knuth has published a 5-page document about how Claude was able to solve a tricky graph theory problem that arose while he was working on the latest volume of The Art of Computer Programming—a series that Knuth is still writing after half a century. As you’d expect from Knuth, the document is almost entirely about the graph theory problem itself and Claude’s solution to it, eschewing broader questions about the nature of machine intelligence and how LLMs are changing life on Earth. To anyone who’s been following AI-for-math lately, the fact that Claude now can help with this sort of problem won’t come as a great shock. The virality is presumably because Knuth is such a legend that to watch him interact productively with an LLM is sort of like watching Leibniz, Babbage, or Turing do the same.


John Baez is a brilliant mathematical physicist and writer, who was blogging about science before the concept of “blogging” even existed, and from whom I’ve learned an enormous amount. But regarding John’s quest for the past 15 years — namely, to use category theory to help solve the climate crisis (!) — I always felt like the Cookie Monster would, with equal intellectual justification, say that the key to arresting climate change was for him to eat more Oreos. Then I read this Quanta article on the details of Baez’s project, and … uh … I confess it failed to change my view. Maybe someday I’ll understand why it’s better to say using category theory what I would’ve said in a 100x simpler way without category theory, but I fear that day is not today.

By Scott

Learning From the Mess

from Ben Recht

What the vitamin story tells us about reproducibility, discovery, and human nature.

This is Part 6 (of 7!) of a blogged essay “Steampunk Data Science.” A table of contents is here.

Having followed the tumultuous thirty-year journey from Eijkman’s chickens to Davis’ rats, let’s return to where we started: the question of reproducible, rigorous research. If there was an era of gold standard reproducible research, the early 20th century wasn’t it. I described multiple examples of important work that not only weren’t reproducible but were flat-out wrong.

McCollum’s first paper on nutrition couldn’t have been more wrong. Led astray by the work of Pavlov, McCollum was convinced that “the psychic influence of palatability is one of the most important factors in nutrition.” McCollum considered the possibility of vitamins, which he described as “certain organic complexes in the food given, which the body was not able to supply through its synthetic power from the materials at hand,” to be completely ruled out by his experiments. He would completely change his mind in the course of only a couple of years.

Was this paper published by McCollum bad for the scientific discovery of vitamins? The reality is the opposite. The fact that McCollum was dead wrong inspired further investigations, and the breakthroughs occurred in figuring out why he was wrong. Mendel’s team at Yale was inspired by McCollum’s synthetic diets and intrigued by his findings on palatability. In their replication attempts, they not only disproved McCollum’s hypothesis but also strengthened the case for the existence of essential amino acids. This work by Mendel’s team subsequently inspired McCollum and Davis’ investigations into the extracts of milk and egg yolks, resulting in their discovery of Vitamin A.

At the opposite end of the spectrum was German physician Wilhelm Stepp, who had conducted experiments removing the ether-soluble contents of bread and feeding them to mice. He claimed that after removing the ether-soluble contents, the mice quickly perished. When the ether-soluble materials were added as a supplement, the mice thrived. This sounds a lot like McCollum and Davis’ experimental setup, but Stepp’s results were deemed “far from conclusive” by Mendel and Osborne. His data was fishy. George Wolf and Kenneth Carpenter reanalyzed Stepp’s experiments from our contemporary understanding and found Stepp’s mice died far too quickly for the cause to be Vitamin A deficiency. What exactly Stepp had done remained unclear, and his work was not reproducible. But he had the right answer! He was clearly on the right track to finding Vitamin A.

We can and do learn from a lack of reproduction. Failure to reproduce tells us something about why our earlier assumptions were wrong, and digging into reproduction failures leads us to new discoveries. Nothing in the evidence points to malicious fraud or scientific misbehavior by those involved in the search for vitamins. It’s not clear how many of these errors would have been corrected by better statistical or scientific methodology. We should not expect science to be perfect and should be open to learning from mess.

And what about rigorous tools and research practices? Might these have accelerated our understanding of nutrition? Here, the evidence again points to no. The discovery of vitamins required a remarkably diverse set of investigative tools. As is always the case, well-controlled experiments designed to deliberately refute hypotheses were only one of many methods used to generate evidence. Natural experiments, such as Eijkman’s work with chickens and Vorderman’s prison observations, provided the initial clues that brown rice contained essential vitamins. The case studies initiated by Mendel, Osborne, and Ferry were experiments on single animals. They applied varied interventions over the course of the rat’s life, probing varied inputs into its diet, scouring for clues as they compared to a baseline. The individual case series of McCollum and Davis provided the definitive evidence that a simple organic compound could start and stop growth. Each of these methods provided a piece of the puzzle, but the researchers were learning how to do nutrition research as they went.

And though it was clear to Christiaan Eijkman that white and brown rice were different from each other, it took thirty years for that difference to be given a name. Sure, we can point back to single experiments that are as clear as day in hindsight, but the vitamins weren’t “discovered” until they were named. It took Funk’s bold survey article to name the problem (deficiency diseases) and the cure (vitamines). Only after Funk did everyone converge on the answer. The clean articulation of the problem and solution, of the cause and the effect, marked the actual discovery.

Perhaps the only pattern I can extract from the scientific processes here is that everyone involved was driven by a definitive purpose. The discovery of vitamins arose out of a deliberate, interventionist mindset. The researchers in Wisconsin wanted to identify the best diet for raising cattle. The researchers in the Dutch colonies sought a cure for beriberi. Nutrition research wasn’t aimed at breaking down the world for understanding, but rather at identifying interventions. They were trying to figure out cause and effect so that they could do something. The entire purpose was intervening, whether to save farmers money or cure terrible diseases. In finding what worked for their problems, they also discovered new chemistry and biology.

Would vitamins have been discovered sooner if the nutrition scientists had a more rigorous set of scientific tools? Could we imagine a counterfactual acceleration had they had access to computers loaded with spreadsheets and statistical software? We could also ask this question differently without assuming the present was wiser than the past. Was this discovery made possible by a rigorous practice that we can learn from?

The vitamin saga suggests the answer to all of these questions is probably no! If anything, the confines of scientific rigor of the day, like Koch’s rigid postulates for determining disease etiology, would have left us stuck with germ theory. I worry that contemporary quests for standardization and formalization of research practice lose sight of the value of creative experimentation and investigation. We can’t let reproducibility checklists stifle creative exploration.

What I also love about this story is how there’s no single hero. I understand the motivational power of the simple great-man science stories, like John Snow and his Broad Street Pump, or Alexander Fleming and his Petri dishes. But historians of science have been scolding us for decades that most of these scientific beatifications are far oversimplified. The muddled mess of vitamin discovery is more the rule than the exception. Though there were multiple Nobel Prizes, it’s really hard to extract a single hero.

On the other hand, it’s easy to find lots of petty fights. Babcock chided Atwater about his protein theories. McCollum was humiliated by Mendel’s work, which proved his palatability hypothesis erroneous. In his autobiography, he admits embarrassment and revenge were among his motivations for studying milk extracts. Harry Steenbock felt like he should have been on McCollum and Davis’ Vitamin A paper and held a grudge for years. He even wrote a letter to Science magazine in 1918, accusing McCollum of academic misconduct when he moved his lab from Wisconsin to Johns Hopkins.

The process of searching for vitamins was a mess. But we learned from the mess. And when we did, we found undeniable effects that completely transformed our understanding of food and our ability to treat disease.

Subscribe now

By Ben Recht

Closed-form dynamics beyond quadratics

from Francis Bach

Quadratic functions are the workhorse of the analysis of iterative algorithms such as gradient-based optimization. They lead, in discrete and continuous time, to closed-form dynamics that treat all eigensubspaces of the Hessian matrix independently. This leads to simple math for understanding convergence behaviors (maximal step-size, condition number, acceleration, scaling laws for gradient descent or its...

Quadratic functions are the workhorse of the analysis of iterative algorithms such as gradient-based optimization. They lead, in discrete and continuous time, to closed-form dynamics that treat all eigensubspaces of the Hessian matrix independently. This leads to simple math for understanding convergence behaviors (maximal step-size, condition number, acceleration, scaling laws for gradient descent or its stochastic version, etc.).

An important success of the optimization field is to have shown that the same behaviors hold for convex functions, where the analysis now relies on Lyapunov functions rather than closed-form computations (see, e.g., this post on automated proofs), with some extensions to non-convex functions, but only regarding convergence to stationary points.

Going (globally) non-convex is of major interest in different fields, including machine learning for neural networks or matrix factorizations. In this post, I try to answer the following question: What are dynamics that lead to closed form or enough of it to get relevant insights?

Over the years, several have emerged, allowing us to give some nice and explicit intuitive results on the global behavior of gradient descent in certain non-convex cases. This blog post is about them. Note that this complements the analysis of diagonal networks from [1, 2] that we described in an earlier post.

One-hidden layer linear network.

We focus on linear neural networks, with a prediction function \(x \mapsto A_2^\top A_1^\top x\), where \(x \in \mathbb{R}^d\) is the input, \(A_1 \in \mathbb{R}^{d \times m}\) is the matrix of input weights, and \(A_2 \in \mathbb{R}^{m \times k}\) is the matrix of output weights, with an overall weight matrix \(\Theta = A_1 A_2 \in \mathbb{R}^{d \times k}\). This is a linear model corresponding to multi-task / multi-category classification / multi-label supervised problems, or an auto-encoder for unsupervised learning. Clearly, using the parameterization \(\Theta = A_1 A_2\) is not the most efficient for optimization (as we could simply use convex optimization directly on \(\Theta\)), but we use it for understanding non-convex dynamics.

Given data matrices \(Y \in \mathbb{R}^{n \times k}\) and \(X \in \mathbb{R}^{n \times d}\), the empirical risk is $$ L(A_1,A_2) = \frac{1}{2n} \big\| Y – X A_1 A_2 \big\|_{\rm F}^2,$$ where \(\| M \|_{\rm F} = \sqrt{ {\rm tr} (M^\top M)}\) is the Frobenius norm of \(M\).

The gradient flow dynamics are then $$ \left\{ \begin{array}{lcrcr} \dot{A}_1 & = & – \nabla_{A_1} L(A_1,A_2) & = & \frac{1}{n} X^\top ( Y – X A_1 A_2) A_2^\top \\ \dot{A}_2 & = &- \nabla_{A_2} L(A_1,A_2) & = & \frac{1}{n} A_1^\top X^\top ( Y – X A_1 A_2). \\ \end{array} \right.$$

We denote by \(B_1 \in \mathbb{R}^{d \times m}, B_2 \in \mathbb{R}^{m \times k}\) the initialization of the dynamics. As first outlined by [3] and used in most subsequence analyses (e.g., [4, 6, 7]), there is an invariant of the dynamics, as $$\frac{d}{dt} ( A_1^\top A_1 ) = 2 {\rm Sym} \big( A_1^\top \frac{1}{n} X^\top ( Y – X A_1 A_2) A_2^\top \big) = \frac{d}{dt} ( A_2 A_2^\top ), $$ where \({\rm Sym}(M) = \frac{1}{2} ( M + M^\top)\) is the symmetrization of the square matrix \(M\). This implies that \(A_1^\top A_1 – A_2 A_2^\top \in \mathbb{R}^{m \times m}\) is constant along the dynamics. This will allow considerable simplifications (the presence of such invariants is not limited to linear networks, see [16] for a nice detailed account on these conservation laws).

There are several lines of work to go further. I will use a non-chronological presentation and start with the simplest case where singular vectors are aligned, before moving to the real magic first detailed by Kenji Fukumizu in [3].

Aligned initialization and commuting moments

The first set of assumptions that lead to a closed form is, from [4] with \(X^\top X \) proportional to identity, and [5] for this present version:

  1. Commuting moments: \(X^\top X\) and \(X^\top YY^\top X\) commute, which is equivalent to aligned singular value decompositions \(C = \frac{1}{n} X^\top Y = U {\rm Diag}(c) V^\top\) and \(\Sigma = \frac{1}{n} X^\top X = U {\rm Diag}(s) U^\top\) for matrices \(U \in \mathbb{R}^{d \times r},V \in \mathbb{R}^{k \times r}\) with orthonormal columns (that is, \(U^\top U = V^\top V = I\)), and \(c,s \in \mathbb{R}_+^r\) non-negative vectors of singular values (this is thus the “economy-size” version of the SVD where \(U\) and \(V\) have the same number of columns). We only need to take \(r\) as big as \(\max \{ {\rm rank}(C), {\rm rank}(\Sigma) \}\). This occurs, for example, when \(X^\top X \propto I\) (whitened data), or for auto-encoders, where \(Y = X\).
  2. Aligned initialization: \(B_1 = U {\rm Diag}(b_1) W^\top\) and \(B_2 = W {\rm Diag}(b_2) V^\top\) are the singular value decompositions, with \(W \in \mathbb{R}^{m \times r}\) with orthonormal columns (which implies that we need to assume \(m \geqslant r\)), with singular vectors of \(B_1 B_2 = U {\rm Diag}(b_1 b_2) V^\top\) aligned with the one of \(X^\top Y\).

Note that throughout the blog post, as done just above, multiplication and divisions of the (singular) vectors of the same size will be considered component-wise.

With these assumptions, throughout the dynamics, it turns out that we keep aligned singular vectors, and thus, if \(a_1, a_2\) are the vectors of singular values of \(A_1, A_2\), we get the coupled ODEs: $$ \left\{ \begin{array}{lcr} \dot{a}_1 & = & ( c \, – s a_1 a_2 ) a_2 \\ \dot{a}_2 & = & ( c \, – s a_1 a_2 ) a_1 . \\ \end{array} \right.$$ The invariant mentioned above leads to $$ a_1 a_1 – a_2 a_2 = {\rm cst}.$$ Overall we get the dynamics for \(\theta = a_1 a_2\), $$ \dot{\theta}= ( c \, – s a_1 a_2 ) ( a_1 a_1 + a_2 a_2).$$

If \(b_1 = b_2 = \alpha \) at initialization (for all singular values) with \(\alpha\) a positive scalar, then this remains true throughout the dynamics, and we get $$ \dot{\theta} = 2 ( c \, – s \theta ) \theta, $$ with a closed form dynamics by integration $$\theta = \frac{c}{s} \Big( \frac{e^{2ct} {\alpha^2 }}{\frac{c}{s} – \alpha^2+e^{2ct} {\alpha^2}}\Big) = \frac{1}{ \frac{1}{\alpha^2} e^{-2ct} + \frac{s}{c} ( 1 – e^{-2ct} )}.$$

Insights. We see the sequential learning of each singular values when the initialization scale \(\alpha\) tends to zero, as then we have $$ \theta \sim \frac{1}{ \frac{1}{\alpha^2} e^{-2ct} + \frac{s}{c}},$$ with a sharp jump from \(0\) to \(\frac{c}{s}\), around the time \(\frac{1}{c} \log \frac{1}{\alpha}\).

To prove it, we rescale time as \(t= u \log \frac{1}{\alpha}\) and we get $$\theta = \frac{1}{ \alpha^{2(cu-1)} + \frac{s}{c}},$$ which, with \(u\) fixed, has its \(i\)-th component that converges to \(0\) for \(u < \frac{1}{c_i}\) and to \(c_i / s_i\) for \(u > \frac{1}{c_i}\).

Linear networks with aligned initialization. Plotting the singular values \(\theta_t\) as a function of (rescaled) time \(t\), for various initialization scales \(\alpha\).

Extension to deeper models. Following [4], we can consider more layers and minimize $$ \frac{1}{2n} \| Y – X A_1 \cdots A_L\|_{\rm F}^2.$$ If at initialization, all singular vectors of all \(A_i\)’s align, then we get dynamics directly on the singular values, and starting from equal values, we get after a few lines $$\dot{\theta} = L (c-s\theta) \theta^{2 – 2/L},$$ which can also be integrated and analyzed. See [4] for details, and [18] for deep linear networks beyond aligned initialization.

Other dynamics. In this simple case, we can easily compare the dynamics of learning for three common algorithms, which all leads to dynamics on singular values \(\theta\) in this aligned case:

  • Gradient descent on \((A_1,A_2)\): As seen above \(\displaystyle \theta = \frac{1}{ \frac{1}{\alpha^2} e^{-2ct} + \frac{s}{c} ( 1 – e^{-2ct} )}\). The singular values \(c\) of \(\frac{1}{n} X^\top Y\) characterize the dynamics, with each singular value appearing one by one when \(\alpha\) is small.
  • Gradient descent on \(\Theta = A_1 A_2\) starting from zero (e.g., \(\alpha=0\)): \(\theta = \frac{c}{s} ( 1 – e^{-st})\). The singular values \(s\) of \(\frac{1}{n} X^\top X\) characterize the dynamics.
  • Muon: this is the new cool algorithm (“stylé” my kids would say). It corresponds to orthogonalizing the matrix updates, and thus, to do sign descent with vanishing step-size on the singular values, that is, \(\dot{a}_1 = \dot{a}_2 = 1\) until reaching the solution. This leads to \(\theta = \min \{ t^2, c/s\}\), with a conditioning that does not depend on \(s\) or \(c\). No singular values is impacting the dynamics, hence the “automatic” reconditioning, which extends what can be seen for sign descent (see previous post). Note that if we implement Muon directly on \(\Theta\), we would obtain instead \(\theta = \min \{ t, c/s\}\).
Comparison of three dynamics. Top: singular values \(\theta_t\), bottom: loss functions \(R(\theta_t)\) as a function of time \(t\).

Matrix Riccati equation

The second set of assumptions is:

  1. White moment matrix: \(\Sigma = \frac{1}{n} X^\top X = I\). Thus, the only data matrix left is \(C = \frac{1}{n} X^\top Y\).
  2. Balanced initialization: \(B_1^\top B_1 = B_2 B_2^\top\), which implies that throughout the dynamics, we have \(A_1^\top A_1 = A_2 A_2^\top\) (because of the invariant).

Note that there are no assumptions regarding the alignments of singular vectors at initialization. We also assume that \(m \leqslant r = {\rm rank}(C) \leqslant \min\{d,k\}\). See [17] for closed-form expressions with larger width \(m\).

Then something magical happens, first outlined by Kenji Fukumizu [3], and then further used in several works (e.g., [7, 17, 18]). Defining \(R = { A_1 \choose A_2^\top} \in \mathbb{R}^{(d+k) \times m}\), we have $$R^\top R = A_1^\top A_1 + A_2 A_2^\top = 2A_1^\top A_1 = 2 A_2 A_2^\top,$$ which leads to the rewriting of the dynamics as: $$\dot{R} = \Delta R \, – \frac{1}{2} RR^\top R \mbox{ with } \Delta = \left( \begin{array}{cc} 0 & \!\frac{1}{n}X^\top Y \!\\ \! \frac{1}{n} Y^\top X \! & 0 \end{array} \right)= \left( \begin{array}{cc} 0 & \! C \!\\ \! C^\top \! & 0 \end{array} \right),$$ initialized with \(R_0 = { B_1 \choose B_2^\top} \in \mathbb{R}^{(d+k) \times m}\).

This is now a matrix Riccati equation [8, 9], with a closed form and nice links with several branches of mathematics, such as control (linear-quadratic regulator), probabilistic modelling (Kalman filtering), and machine learning (Oja flow for principal component analysis [10]). In this present post, we will only scratch the surface of this fascinating topic, but a future post will most likely do it justice.

Denoting \(P = RR^\top\), we get $$\dot{P} = \Delta P + P \Delta\, – P^2,$$ and we can then write $$P = NM^{-1}$$ with \(M\) initialized at \(I\) and \(N\) initialized at \(R_0 R_0^\top = { B_1 \choose B_2^\top}{ B_1 \choose B_2^\top}^\top\), and the linear dynamics $$\frac{d}{dt} { M \choose N} = \left( \begin{array}{cc} \! – \Delta \! & \! I \! \\ \! 0 \! & \! \Delta \! \end{array} \right){ M \choose N}.$$ The magical proof (see [10]) is really simple: $$\dot{P} = \dot{N} M^{-1} – N M^{-1} \dot{M} M^{-1} = \Delta N M^{-1} – N M^{-1} ( -\Delta M + N) M^{-1} = \Delta P + P \Delta\, – P^2.$$ This dynamics can be rewritten as $$ \left\{ \begin{array}{rcl} \frac{d}{dt} N & = & \Delta N \\ \frac{d}{dt} ( N\, – 2 \Delta M) & = & – \Delta ( N \, – 2 \Delta M), \end{array} \right.$$ and thus leads to the closed form $$ P = e^{\Delta t} R_0 R_0^\top \Big( e^{-\Delta t} + \frac{e^{\Delta t} – e^{-\Delta t}}{2\Delta } R_0 R_0^\top \Big)^{-1} = e^{\Delta t} R_0 \Big( I + R_0^\top \frac{e^{2\Delta t}-I}{2\Delta } R_0 \Big)^{-1} R_0^\top e^{\Delta t}.$$

We can now analyze this closed-form expression precisely with the following eigenvalue decomposition of \(\Delta\) $$\Delta = \frac{1}{2} { U \choose V} {\rm Diag}(c) { U \choose V}^\top +\frac{1}{2} { U \choose -V} {\rm Diag}(-c) { U \choose -V}^\top,$$ classically obtained for the Jordan-Wielandt matrix \(\Delta\) associated with \(C\) (see an earlier post) with singular value decomposition \(C = U {\rm Diag}(c) V^\top\).

Recovering aligned initialization. We start with aligned initialization (as before, and with the same notations), that is, \(b_1= b_2 =a\). By taking \(B_1 = U {\rm Diag}(a) W^\top\) and \(B_2 = W {\rm Diag}(a) V^\top\), we get \(R_0 = { U \choose V} {\rm Diag}(a) W^\top\), and for any function \(f: \mathbb{R} \to \mathbb{R}\), we have \(f(\Delta) R_0 = { U \choose V} {\rm Diag}(a f(c)) W^\top\), where \(c\) is the vector of singular values of \(C =\frac{1}{n} X^\top Y\), leading to $$P = { U \choose V} {\rm Diag} \Big( \frac{a^2 e^{2 c t}}{ 1 + a^2 \frac{e^{2c t}-1}{c}} \Big){ U \choose V}^\top,$$ which leads to $$ A_1 A_2 = U {\rm Diag} \Big( \frac{a^2 e^{2 c t}}{ 1 + a^2 \frac{e^{2c t}-1}{c}} \Big) V^\top,$$ which is exactly the same as the “aligned” result (OUF!). Note that the aligned initialization only requires that \(C\) and \(\Sigma\) commute, which is weaker than the assumption \(\Sigma =I\) that is made here (but we can go much further below).

Beyond aligned initialization. When \(R_0\) is initialized randomly (and thus with full rank on all subspaces) with balanced initialization and norm proportional to \(\alpha^2\), then, with \(t = u \log \frac{1}{\alpha}\), we also can prove sequential learning of all singular subspaces. More precisely, if \(c_1 > \cdots > c_r\), then, when \(\alpha\) tends to zero (see proof below the reference section):

  • If \(u<\frac{1}{c_1}\), \(P \sim 0\). This is the initial phase with no apparent movements, but an alignment of singular vectors, nicely cast as silent alignment by [7].
  • When \(u \in ( \frac{1}{c_i}, \frac{1}{c_{i+1}})\) for \(i \in \{1,\dots,m-1\}\), we get \(P \sim C_i\), the optimal rank \(i\) approximation of \(C\).
  • When \(u > \frac{1}{c_m}\), \(P \sim C_m\), which is the global minimizer of the cost function.

Note that when \(m = r\), we get \(C_m = C\). Overall, this is exactly sequential learning, with both the singular values and the singular vectors learned from scratch. In the plot below, we see how singular vectors align early in the dynamics when \(\alpha\) is small, while singular values are learned step-by-step.

Dynamics of linear network learning as a function of rescaled time: components of \(\Theta\) (left, one curve per component) with their limit as crosses, singular values of \(\Theta\) (middle), absolute projections of singular vectors of \(\Theta\) onto the ones of \(C\) (right). \(\alpha\) is going from \(10^{-1}\) to \(10^{-9}\).

Conclusion

Overall, we obtain a precise behavior for data with an identity covariance matrix, showing that features (singular vector directions) are learned early on, while their weights (singular values) are learned sequentially. This was done using a balanced initialization (see [11] for an extension beyond balanced initialization and large layer width).

However, this study only applies to linear neural networks (without any non-linear activation function). Extensions to the simplest architectures remain an active area of research (see, e.g., [12] and references therein), with in particular similar early alignment behaviors [13, 14]. In addition, matrix product parameterizations also arise in attention architectures through the product of the key and query matrices. As a result, sequential learning phenomena can also be observed in this setting; see [15] for an analysis of sequential learning in linear attention.

Acknowledgements. I would like to thank Raphaël Berthier, Pierre Marion, Loucas Pillaud-Vivien, and Aditya Varre for proofreading this blog post and making good, clarifying suggestions.

References

[1] Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro. Kernel and rich regimes in overparametrized models. Conference on Learning Theory, 2020.
[2] Scott Pesme, Loucas Pillaud-Vivien, and Nicolas Flammarion. Implicit bias of SGD for diagonal linear networks: a provable benefit of stochasticity. Advances in Neural Information Processing Systems, 2021.
[3] Kenji Fukumizu. Effect of batch learning in multilayer neural networks. International Conference on Neural Information Processing, 1998.
[4] Andrew M. Saxe, James L. McClelland, Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. International Conference on Learning Representations, 2014.
[5] Gauthier Gidel, Francis Bach, and Simon Lacoste-Julien. Implicit Regularization of Discrete Gradient Dynamics in Linear Neural NetworksAdvances in Neural Information Processing Systems, 2019.
[6] Simon S. Du, Wei Hu, and Jason D. Lee. Algorithmic regularization in learning deep homogeneous models: Layers are automatically balancedAdvances in Neural Information Processing Systems, 2018.
[7] Alexander Atanasov, Blake Bordelon, and Cengiz Pehlevan. Neural Networks as Kernel Learners: The Silent Alignment EffectInternational Conference on Learning Representations, 2022.
[8] Vladimír Kučera. A review of the matrix Riccati equationKybernetika, 9.1:42-61, 1973.
[9] Richard S. Bucy. Global theory of the Riccati equationJournal of Computer and System Sciences, 1.4:349-361, 1967.
[10] Wei-Yong Yan, Uwe Helmke, and John B. Moore. Global analysis of Oja’s flow for neural networksIEEE Transactions on Neural Networks, 5.5:674-683, 1994.
[11] Aditya Vardhan Varre, Maria-Luiza Vladarean, Loucas Pillaud-Vivien, Nicolas Flammarion. On the spectral bias of two-layer linear networks. Advances in Neural Information Processing Systems, 2023.
[12] Alberto Bietti, Joan Bruna, and Loucas Pillaud-Vivien. On learning Gaussian multi-index models with gradient flow. Technical report arXiv:2310.19793, 2023.
[13] Hartmut Maennel, Olivier Bousquet, and Sylvain Gelly. Gradient descent quantizes ReLU network features. Technical report arXiv:1803.08367, 2018.
[14] Etienne Boursier and Nicolas Flammarion. Early alignment in two-layer networks training is a two-edged swordJournal of Machine Learning Research 26:1-75, 2025.
[15] Yedi Zhang, Aaditya K. Singh, Peter E. Latham, Andrew Saxe. Training Dynamics of In-Context Learning in Linear Attention. International Conference on Machine Learning, 2025.
[16] Sibylle Marcotte, Rémi Gribonval, and Gabriel Peyré. Abide by the law and follow the flow: Conservation laws for gradient flowsAdvances in Neural Information Processing Systems, 2023.
[17] Lukas Braun, Clémentine C. J. Dominé, James E. Fitzgerald, Andrew M. Saxe. Exact learning dynamics of deep linear networks with prior knowledge. Advances in Neural Information Processing Systems, 2022.
[18] Sanjeev Arora, Nadav Cohen, Wei Hu, Yuping Luo. Implicit regularization in deep matrix factorizationAdvances in Neural Information Processing Systems, 2019.


Proofs

We can rewrite the dynamics as (note that \(\Delta\) is not positive semidefinite) $$ P = \Big( \frac{ 2 \Delta}{1-e^{-2\Delta t}}\Big)^{1/2} H ( H + I)^{-1} \Big( \frac{ 2 \Delta}{1-e^{-2\Delta t}}\Big)^{1/2} ,$$ with $$H = \Big( \frac{e^{2\Delta t}-I}{2\Delta } \Big)^{1/2} R_0 R_0^\top \Big( \frac{e^{2\Delta t}-I}{2\Delta } \Big)^{1/2}.$$ We note that the eigenvalues of \(\Delta\) are singular values of \(C\) as well as their opposite, together with zeros.

The term \(\big( \frac{ 2 \Delta}{1-e^{-2\Delta t}}\big)^{1/2}\) is a spectral function of \(\Delta\) which will set to zero all negative eigenvalues and set to \(( 2 c)^{1/2}\) all positive ones \(c\). We can therefore only consider the positive ones, and consider these terms as identity. Zeros eigenvalues of \(\Delta\) also lead to decay, this time in \(1/t\), and can be discarded as well in the asymptotic analysis.

The term \(\big( \frac{e^{2\Delta t}-I}{2\Delta } \big)^{1/2}\), with strictly positive eigenvalues, is equivalent to \(e^{\Delta t} ( 2 \Delta)^{-1/2}\), and we are faced with studying the dynamics of \(H(I+H)^{-1}\) for \(H = e^{\Delta t} \alpha^2 M e^{\Delta t}\) for \(M = \Delta^{-1} R_0 R_0^\top \Delta^{-1}\) a rank-\(m\) \(r \times r\) matrix, when \(\alpha\) tends to zero. We rewrite \(t = u \log \frac{1}{\alpha}\), and we then have, once in the basis of singular vectors $$H = {\rm Diag}(\alpha^{1-cu}) M {\rm Diag}(\alpha^{1-cu}),$$ with \(N \in \mathbb{R}^{r\times r}\) a rank-\(m\) matrix. We assume \(c_1 > \cdots > c_r\) for simplicity and consider three cases.

Case 1. If \(1- c_i u > 0\) for all \(i\), that is, \(u < 1/c_1\), then the matrix \({\rm Diag}(\alpha^{1-cu}\) goes to zero, and thus \(P\) goes to zero.

Case 2. For \(j \in \{1,\dots,m\}\), and \(u \in [\frac{1}{c_j}, \frac{1}{c_{j+1}}]\), then we have $$H = \left( \begin{array}{cc} D_\infty A D_\infty & D_\infty B D_0 \\ D_0 B^\top D_\infty & D_0 C D_0 \end{array} \right),$$ for a diagonal matrix \(D_\infty\) (of size \(j\)) that goes to infinity and a diagonal matrix \(D_0\) goes to zero. Using \(H (H+I)^{-1} = I – (H+I)^{-1}\) and using Schur complements, we have $$ (H+I)^{-1} = \left( \begin{array}{cc} \bar{A} & \bar{B} \\ \bar{B}^\top & \bar{C} \end{array} \right),$$ with $$\bar{A}^{-1} = I + D_\infty A D_\infty – D_\infty B D_0 ( D_0 C D_0 +I )^{-1}D_0 B^\top D_\infty \sim I + D_\infty A D_\infty,$$ so that \(\bar{A}\) tends to zero (since \(A\) is invertible). We also have $$\bar{C}^{-1} = I + D_0 C D_0 – D_0 B^\top D_\infty ( D_\infty C D_\infty +I )^{-1}D_\infty B D_\infty \sim I,$$ with a similar limit for the off-diagonal block. Therefore \((H+I)^{-1}\) tends to \(\left( \begin{array}{cc} 0 & 0 \\ 0 &I \end{array} \right)\), which leads to the desired result.

Case 3. For \(u > 1/c_m\), using the same formula for \((H+I)^{-1}\), with now \(D_0\) not tending to zero, we can use the fact that \(C = B^\top A^{-1} B\) to obtain the desired limit.

By Francis Bach

Strong Chain Quality

from Decentralized Thoughts

Chain Quality (CQ) is a core blockchain property. Roughly speaking, it says: Owning $3\%$ of the stake gives you control over $3\%$ of the blockspace over time. For Nakamoto style chains, this is called Ideal CQ (see here). Chain quality was sufficient for early generations of blockchains that had low throughput, but modern blockchains have much higher bandwidth and can commit many transactions within a single block. This motivates a...

By Ittai Abraham, Pranav Garimidi, Joachim Neu

Chain Quality (CQ) is a core blockchain property. Roughly speaking, it says: Owning $3\%$ of the stake gives you control over $3\%$ of the blockspace over time. For Nakamoto style chains, this is called Ideal CQ (see here). Chain quality was sufficient for early generations of blockchains that had low throughput, but modern blockchains have much higher bandwidth and can commit many transactions within a single block. This motivates a...

By Ittai Abraham, Pranav Garimidi, Joachim Neu

Learning Read-Once Determinants and the Principal Minor Assignment Problem

from arXiv: Computational Complexity

Authors: Abhiram Aravind, Abhranil Chatterjee, Sumanta Ghosh, Rohit Gurjar, Roshan Raj, Chandan Saha

A symbolic determinant under rank-one restriction computes a polynomial of the form $\det(A_0+A_1y_1+\ldots+A_ny_n)$, where $A_0,A_1,\ldots,A_n$ are square matrices over a field $\mathbb{F}$ and $rank(A_i)=1$ for each $i\in[n]$. This class of polynomials has been studied extensively, since the work of Edmonds (1967), in the context of linear matroids, matching, matrix completion and polynomial identity testing. We study the following learning problem for this class: Given black-box access to an $n$-variate polynomial $f=\det(A_0+A_1y_1+ \ldots+A_ny_n)$, where $A_0,A_1,\ldots,A_n$ are unknown square matrices over $\mathbb{F}$ and rank$(A_i)=1$ for each $i\in[n]$, find a square matrix $B_0$ and rank-one square matrices $B_1,\ldots,B_n$ over $\mathbb{F}$ such that $f=\det(B_0+B_1y_1+\ldots+B_ny_n)$. In this work, we give a randomized poly(n) time algorithm to solve this problem. As the above-mentioned class is known to be equivalent to the class of read-once determinants (RODs), we will refer to the problem as learning RODs. The algorithm for learning RODs is obtained by connecting with a well-known open problem in linear algebra, namely the Principal Minor Assignment Problem (PMAP), which asks to find (if possible) a matrix having prescribed principal minors. PMAP has also been studied in machine learning to learn the kernel matrix of a determinantal point process. Here, we study a natural black-box version of PMAP: Given black-box access to an $n$-variate polynomial $f = \det(A + Y)$, where $A \in \mathbb{F}^{n \times n}$ is unknown and $Y = diag(y_1,\ldots,y_n)$, find a $B\in\mathbb{F}^{n\times n}$ such that $f=det(B+Y)$. We show that black-box PMAP can be solved in randomized poly(n) time, and further, it is randomized polynomial-time equivalent to learning RODs. We resolve black-box PMAP by investigating a property of dense matrices that we call the rank-one extension property.

Authors: Abhiram Aravind, Abhranil Chatterjee, Sumanta Ghosh, Rohit Gurjar, Roshan Raj, Chandan Saha

A symbolic determinant under rank-one restriction computes a polynomial of the form $\det(A_0+A_1y_1+\ldots+A_ny_n)$, where $A_0,A_1,\ldots,A_n$ are square matrices over a field $\mathbb{F}$ and $rank(A_i)=1$ for each $i\in[n]$. This class of polynomials has been studied extensively, since the work of Edmonds (1967), in the context of linear matroids, matching, matrix completion and polynomial identity testing. We study the following learning problem for this class: Given black-box access to an $n$-variate polynomial $f=\det(A_0+A_1y_1+ \ldots+A_ny_n)$, where $A_0,A_1,\ldots,A_n$ are unknown square matrices over $\mathbb{F}$ and rank$(A_i)=1$ for each $i\in[n]$, find a square matrix $B_0$ and rank-one square matrices $B_1,\ldots,B_n$ over $\mathbb{F}$ such that $f=\det(B_0+B_1y_1+\ldots+B_ny_n)$. In this work, we give a randomized poly(n) time algorithm to solve this problem. As the above-mentioned class is known to be equivalent to the class of read-once determinants (RODs), we will refer to the problem as learning RODs. The algorithm for learning RODs is obtained by connecting with a well-known open problem in linear algebra, namely the Principal Minor Assignment Problem (PMAP), which asks to find (if possible) a matrix having prescribed principal minors. PMAP has also been studied in machine learning to learn the kernel matrix of a determinantal point process. Here, we study a natural black-box version of PMAP: Given black-box access to an $n$-variate polynomial $f = \det(A + Y)$, where $A \in \mathbb{F}^{n \times n}$ is unknown and $Y = diag(y_1,\ldots,y_n)$, find a $B\in\mathbb{F}^{n\times n}$ such that $f=det(B+Y)$. We show that black-box PMAP can be solved in randomized poly(n) time, and further, it is randomized polynomial-time equivalent to learning RODs. We resolve black-box PMAP by investigating a property of dense matrices that we call the rank-one extension property.

Truth Predicate of Inductive Definitions and Logical Complexity of Infinite-Descent Proofs

from arXiv: Computational Complexity

Authors: Sohei Ito, Makoto Tatsuta

Formal reasoning about inductively defined relations and structures is widely recognized not only for its mathematical interest but also for its importance in computer science, and has applications in verifying properties of programs and algorithms. Recently, several proof systems of inductively defined predicates based on sequent calculus including the cyclic proof system CLKID-omega and the infinite-descent proof system LKID-omega have attracted much attention. Although the relation among their provabilities has been clarified so far, the logical complexity of these systems has not been much studied. The infinite-descent proof system LKID-omega is an infinite proof system for inductive definitions and allows infinite paths in proof figures. It serves as a basis for the cyclic proof system. This paper shows that the logical complexity of the provability in LKID-omega is (Pi-1-1)-complete. To show this, first it is shown that the validity for inductive definitions in standard models is equivalent to the validity for inductive definitions in standard term models. Next, using this equivalence, this paper extends the truth predicate of omega-languages, as given in Girard's textbook, to inductive definitions by employing arithmetical coding of inductive definitions. This shows that the validity of inductive definitions in standard models is a (Pi-1-1) relation. Then, using the completeness of LKID-omega for standard models, it is shown that the logical complexity of the provability in LKID-omega is (Pi-1-1)-complete.

Authors: Sohei Ito, Makoto Tatsuta

Formal reasoning about inductively defined relations and structures is widely recognized not only for its mathematical interest but also for its importance in computer science, and has applications in verifying properties of programs and algorithms. Recently, several proof systems of inductively defined predicates based on sequent calculus including the cyclic proof system CLKID-omega and the infinite-descent proof system LKID-omega have attracted much attention. Although the relation among their provabilities has been clarified so far, the logical complexity of these systems has not been much studied. The infinite-descent proof system LKID-omega is an infinite proof system for inductive definitions and allows infinite paths in proof figures. It serves as a basis for the cyclic proof system. This paper shows that the logical complexity of the provability in LKID-omega is (Pi-1-1)-complete. To show this, first it is shown that the validity for inductive definitions in standard models is equivalent to the validity for inductive definitions in standard term models. Next, using this equivalence, this paper extends the truth predicate of omega-languages, as given in Girard's textbook, to inductive definitions by employing arithmetical coding of inductive definitions. This shows that the validity of inductive definitions in standard models is a (Pi-1-1) relation. Then, using the completeness of LKID-omega for standard models, it is shown that the logical complexity of the provability in LKID-omega is (Pi-1-1)-complete.

Advances in List Decoding of Polynomial Codes

from arXiv: Computational Complexity

Authors: Mrinal Kumar, Noga Ron-Zewi

Error-correcting codes are a method for representing data, so that one can recover the original information even if some parts of it were corrupted. The basic idea, which dates back to the revolutionary work of Shannon and Hamming about a century ago, is to encode the data into a redundant form, so that the original information can be decoded from the redundant encoding even in the presence of some noise or corruption. One prominent family of error-correcting codes are Reed-Solomon Codes which encode the data using evaluations of low-degree polynomials. Nearly six decades after they were introduced, Reed-Solomon Codes, as well as some related families of polynomial-based codes, continue to be widely studied, both from a theoretical perspective and from the point of view of applications. Besides their obvious use in communication, error-correcting codes such as Reed-Solomon Codes are also useful for various applications in theoretical computer science. These applications often require the ability to cope with many errors, much more than what is possible information-theoretically. List-decodable codes are a special class of error-correcting codes that enable correction from more errors than is traditionally possible by allowing a small list of candidate decodings. These codes have turned out to be extremely useful in various applications across theoretical computer science and coding theory. In recent years, there have been significant advances in list decoding of Reed-Solomon Codes and related families of polynomial-based codes. This includes efficient list decoding of such codes up to the information-theoretic capacity, with optimal list-size, and using fast nearly-linear time, and even sublinear-time, algorithms. In this book, we survey these developments.

Authors: Mrinal Kumar, Noga Ron-Zewi

Error-correcting codes are a method for representing data, so that one can recover the original information even if some parts of it were corrupted. The basic idea, which dates back to the revolutionary work of Shannon and Hamming about a century ago, is to encode the data into a redundant form, so that the original information can be decoded from the redundant encoding even in the presence of some noise or corruption. One prominent family of error-correcting codes are Reed-Solomon Codes which encode the data using evaluations of low-degree polynomials. Nearly six decades after they were introduced, Reed-Solomon Codes, as well as some related families of polynomial-based codes, continue to be widely studied, both from a theoretical perspective and from the point of view of applications. Besides their obvious use in communication, error-correcting codes such as Reed-Solomon Codes are also useful for various applications in theoretical computer science. These applications often require the ability to cope with many errors, much more than what is possible information-theoretically. List-decodable codes are a special class of error-correcting codes that enable correction from more errors than is traditionally possible by allowing a small list of candidate decodings. These codes have turned out to be extremely useful in various applications across theoretical computer science and coding theory. In recent years, there have been significant advances in list decoding of Reed-Solomon Codes and related families of polynomial-based codes. This includes efficient list decoding of such codes up to the information-theoretic capacity, with optimal list-size, and using fast nearly-linear time, and even sublinear-time, algorithms. In this book, we survey these developments.

When Relaxation Does Not Help: RLDCs with Small Soundness Yield LDCs

from arXiv: Computational Complexity

Authors: Kuan Cheng, Xin Li, Songtao Mao

Locally decodable codes (LDCs) are error correction codes that allow recovery of any single message symbol by probing only a small number of positions from the (possibly corrupted) codeword. Relaxed locally decodable codes (RLDCs) further allow the decoder to output a special failure symbol $\bot$ on a corrupted codeword. While known constructions of RLDCs achieve much better parameters than standard LDCs, it is intriguing to understand the relationship between LDCs and RLDCs. Separation results (i.e., the existence of $q$-query RLDCs that are not $q$-query LDCs) are known for $q=3$ (Gur, Minzer, Weissenberg, and Zheng, arXiv:2512.12960, 2025) and $q \geq 15$ (Grigorescu, Kumar, Manohar, and Mon, arXiv:2511.02633, 2025), while any $2$-query RLDC also gives a $2$-query LDC (Block, Blocki, Cheng, Grigorescu, Li, Zheng, and Zhu, CCC 2023). In this work, we generalize and strengthen the main result in Grigorescu, Kumar, Manohar, and Mon (arXiv:2511.02633, 2025), by removing the requirement of linear codes. Specifically, we show that any $q$-query RLDC with soundness error below some threshold $s(q)$ also yields a $q$-query LDC with comparable parameters. This holds even if the RLDC has imperfect completeness but with a non-adaptive decoder. Our results also extend to the setting of locally correctable codes (LCCs) and relaxed locally correctable codes (RLCCs). Using our results, we further derive improved lower bounds for arbitrary RLDCs and RLCCs, as well as probabilistically checkable proofs of proximity (PCPPs).

Authors: Kuan Cheng, Xin Li, Songtao Mao

Locally decodable codes (LDCs) are error correction codes that allow recovery of any single message symbol by probing only a small number of positions from the (possibly corrupted) codeword. Relaxed locally decodable codes (RLDCs) further allow the decoder to output a special failure symbol $\bot$ on a corrupted codeword. While known constructions of RLDCs achieve much better parameters than standard LDCs, it is intriguing to understand the relationship between LDCs and RLDCs. Separation results (i.e., the existence of $q$-query RLDCs that are not $q$-query LDCs) are known for $q=3$ (Gur, Minzer, Weissenberg, and Zheng, arXiv:2512.12960, 2025) and $q \geq 15$ (Grigorescu, Kumar, Manohar, and Mon, arXiv:2511.02633, 2025), while any $2$-query RLDC also gives a $2$-query LDC (Block, Blocki, Cheng, Grigorescu, Li, Zheng, and Zhu, CCC 2023). In this work, we generalize and strengthen the main result in Grigorescu, Kumar, Manohar, and Mon (arXiv:2511.02633, 2025), by removing the requirement of linear codes. Specifically, we show that any $q$-query RLDC with soundness error below some threshold $s(q)$ also yields a $q$-query LDC with comparable parameters. This holds even if the RLDC has imperfect completeness but with a non-adaptive decoder. Our results also extend to the setting of locally correctable codes (LCCs) and relaxed locally correctable codes (RLCCs). Using our results, we further derive improved lower bounds for arbitrary RLDCs and RLCCs, as well as probabilistically checkable proofs of proximity (PCPPs).

Planar Graph Orientation Frameworks, Applied to KPlumber and Polyomino Tiling

from arXiv: Computational Complexity

Authors: MIT Hardness Group, Zachary Abel, Erik D. Demaine, Jenny Diomidova, Jeffery Li, Zixiang Zhou

Given a graph, when can we orient the edges to satisfy local constraints at the vertices, where each vertex specifies which local orientations of its incident edges are allowed? This family of graph orientation problems is a special kind of SAT problem, where each variable (edge orientation) appears in exactly two clauses (vertex constraints) -- once positively and once negatively. We analyze the complexity of many natural vertex types (patterns of allowed vertex neighborhoods), most notably all sets of symmetric vertex types which depend on only the number of incoming edges. In many scenarios, including Planar and Non-Planar Symmetric Graph Orientation with constants, we give a full dichotomy characterizing P vs. NP-complete problem classes. We apply our results to obtain new polynomial-time algorithms, resolving a 20-year-old open problem about KPlumber; to simplify existing NP-hardness proofs for tiling with trominoes; and to prove new NP-completeness results for tiling with tetrominoes.

Authors: MIT Hardness Group, Zachary Abel, Erik D. Demaine, Jenny Diomidova, Jeffery Li, Zixiang Zhou

Given a graph, when can we orient the edges to satisfy local constraints at the vertices, where each vertex specifies which local orientations of its incident edges are allowed? This family of graph orientation problems is a special kind of SAT problem, where each variable (edge orientation) appears in exactly two clauses (vertex constraints) -- once positively and once negatively. We analyze the complexity of many natural vertex types (patterns of allowed vertex neighborhoods), most notably all sets of symmetric vertex types which depend on only the number of incoming edges. In many scenarios, including Planar and Non-Planar Symmetric Graph Orientation with constants, we give a full dichotomy characterizing P vs. NP-complete problem classes. We apply our results to obtain new polynomial-time algorithms, resolving a 20-year-old open problem about KPlumber; to simplify existing NP-hardness proofs for tiling with trominoes; and to prove new NP-completeness results for tiling with tetrominoes.

Ultrabubble enumeration via a lowest common ancestor approach

from arXiv: Data Structures and Algorithms

Authors: Athanasios E. Zisis, Pål Sætrom

Pangenomics uses graph-based models to represent and study the genetic variation between individuals of the same species or between different species. In such variation graphs, a path through the graph represents one individual genome. Subgraphs that encode locally distinct paths are therefore genomic regions with distinct genetic variation and detecting such subgraphs is integral for studying genetic variation. Biedged graphs is a type of variation graph that use two types of edges, black and grey, to represent genomic sequences and adjacencies between sequences, respectively. Ultrabubbles in biedged graphs are minimal subgraphs that represent a finite set of sequence variants that all start and end with two distinct sequences; that is, ultrabubbles are acyclic and all paths in an ultrabubble enter and exit through two distinct black edges. Ultrabubbles are therefore a special case of snarls, which are minimal subgraphs that are connected with two black edges to the rest of the graph. Here, we show that any bidirected graph can be transformed to a bipartite biedged graph in which lowest common ancestor queries can determine whether a snarl is an ultrabubble. This leads to an O(Kn) algorithm for finding all ultrabubbles in a set of K snarls, improving on the prior naive approach of O(K(n + m)) in a biedged graph with n nodes and m edges. Accordingly, our benchmark experiments on real and synthetic variation graphs show improved run times on graphs with few cycles and dead end paths, and dense graphs with many edges.

Authors: Athanasios E. Zisis, Pål Sætrom

Pangenomics uses graph-based models to represent and study the genetic variation between individuals of the same species or between different species. In such variation graphs, a path through the graph represents one individual genome. Subgraphs that encode locally distinct paths are therefore genomic regions with distinct genetic variation and detecting such subgraphs is integral for studying genetic variation. Biedged graphs is a type of variation graph that use two types of edges, black and grey, to represent genomic sequences and adjacencies between sequences, respectively. Ultrabubbles in biedged graphs are minimal subgraphs that represent a finite set of sequence variants that all start and end with two distinct sequences; that is, ultrabubbles are acyclic and all paths in an ultrabubble enter and exit through two distinct black edges. Ultrabubbles are therefore a special case of snarls, which are minimal subgraphs that are connected with two black edges to the rest of the graph. Here, we show that any bidirected graph can be transformed to a bipartite biedged graph in which lowest common ancestor queries can determine whether a snarl is an ultrabubble. This leads to an O(Kn) algorithm for finding all ultrabubbles in a set of K snarls, improving on the prior naive approach of O(K(n + m)) in a biedged graph with n nodes and m edges. Accordingly, our benchmark experiments on real and synthetic variation graphs show improved run times on graphs with few cycles and dead end paths, and dense graphs with many edges.

Stringology-Based Motif Discovery from EEG Signals: an ADHD Case Study

from arXiv: Data Structures and Algorithms

Authors: Anat Dahan, Samah Ghazawi

We propose a novel computational framework for analyzing electroencephalography (EEG) time series using methods from stringology, the study of efficient algorithms for string processing, to systematically identify and characterize recurrent temporal patterns in neural signals. The primary aim is to introduce quantitative measures to understand neural signal dynamics, with the present findings serving as a proof-of-concept. The framework adapts order-preserving matching (OPM) and Cartesian tree matching (CTM) to detect temporal motifs that preserve relative ordering and hierarchical structure while remaining invariant to amplitude scaling. This approach provides a temporally precise representation of EEG dynamics that complements traditional spectral and global complexity analyses. To evaluate its utility, we applied the framework to multichannel EEG recordings from individuals with attention-deficit/hyperactivity disorder (ADHD) and matched controls using a publicly available dataset. Highly recurrent, group-specific motifs were extracted and quantified using both OPM and CTM. The ADHD group exhibited significantly higher motif frequencies, suggesting increased repetitiveness in neural activity. OPM analysis revealed shorter motif lengths and greater gradient instability in ADHD, reflected in larger mean and maximal inter-sample amplitude changes. CTM analysis further demonstrated reduced hierarchical complexity in ADHD, characterized by shallower tree structures and fewer hierarchical levels despite comparable motif lengths. These findings suggest that ADHD-related EEG alterations involve systematic differences in the structure, stability, and hierarchical organization of recurrent temporal patterns. The proposed stringology-based motif framework provides a complementary computational tool with potential applications for objective biomarker development in neurodevelopmental disorders.

Authors: Anat Dahan, Samah Ghazawi

We propose a novel computational framework for analyzing electroencephalography (EEG) time series using methods from stringology, the study of efficient algorithms for string processing, to systematically identify and characterize recurrent temporal patterns in neural signals. The primary aim is to introduce quantitative measures to understand neural signal dynamics, with the present findings serving as a proof-of-concept. The framework adapts order-preserving matching (OPM) and Cartesian tree matching (CTM) to detect temporal motifs that preserve relative ordering and hierarchical structure while remaining invariant to amplitude scaling. This approach provides a temporally precise representation of EEG dynamics that complements traditional spectral and global complexity analyses. To evaluate its utility, we applied the framework to multichannel EEG recordings from individuals with attention-deficit/hyperactivity disorder (ADHD) and matched controls using a publicly available dataset. Highly recurrent, group-specific motifs were extracted and quantified using both OPM and CTM. The ADHD group exhibited significantly higher motif frequencies, suggesting increased repetitiveness in neural activity. OPM analysis revealed shorter motif lengths and greater gradient instability in ADHD, reflected in larger mean and maximal inter-sample amplitude changes. CTM analysis further demonstrated reduced hierarchical complexity in ADHD, characterized by shallower tree structures and fewer hierarchical levels despite comparable motif lengths. These findings suggest that ADHD-related EEG alterations involve systematic differences in the structure, stability, and hierarchical organization of recurrent temporal patterns. The proposed stringology-based motif framework provides a complementary computational tool with potential applications for objective biomarker development in neurodevelopmental disorders.

Wednesday, March 04

Feedback requested for the complexity book “Mathematics of the impossible”

from Emanuele Viola

I have just completed a draft of the book. It is the March 4 version on my homepage. CLICK HERE TO DOWNLOAD. (If you downloaded already, check you don’t have the earlier version cached.) I would very much appreciate any feedback on the book, including suggestions on things to include, especially if they fit well […]

I have just completed a draft of the book. It is the March 4 version on my homepage. CLICK HERE TO DOWNLOAD. (If you downloaded already, check you don’t have the earlier version cached.) I would very much appreciate any feedback on the book, including suggestions on things to include, especially if they fit well and are not overly technical. Please feel completely free to suggest your own results.

After collecting feedback, I plan to do one more pass and then it’s off to the press, so speak now ;-)

Here is the table of contents in case you want to know what the book is about before downloading:

0 Introduction
0.1 Teasers
0.1.1 Computing with three bits of memory
0.1.2 Randomness and derandomization
0.1.3 Proofs and delegating computation
0.1.4 Changing base losing no time and no space
0.2 Notation
0.3 Contents and comparisons
0.4 How to use this book
0.5 Acknowledgments

1 Time
1.1 Word programs
1.2 Complexity classes
1.3 You don’t need much to have it all
1.4 Composing programs
1.5 Universal programs
1.6 The fastest algorithm for Factoring
1.7 On the word size
1.7.1 Factoring with large words
1.8 The grand challenge
1.8.1 Undecidability and diagonalization
1.8.2 The hierarchy of Time
1.9 Problems
1.10 Notes

2 Circuits
2.1 The grand challenge for circuits
2.2 Circuits vs. Time
2.3 Cosmological, non-asymptotic impossibility
2.4 Problems
2.5 Notes

3 Randomness
3.1 Error reduction for one-sided algorithm
3.2 Error reduction for BPTime
3.3 The power of randomness
3.3.1 Verifying matrix multiplication
3.3.2 Checking if a circuit represents zero
3.4 Does randomness really buy time?
3.5 The hierarchy of BPTime
3.6 Problems
3.7 Notes

4 Reductions
4.1 Types of reductions
4.2 Multiplication
4.3 3Sum
4.4 Satisfiability
4.4.1 3Sat to Clique
4.4.2 3Sat to Subset-Sum
4.4.3 3Sat to 3Color
4.4.4 More
4.5 Power hardness from SETH
4.6 Search problems
4.7 Gap-Sat: The PCP theorem
4.8 Problems
4.9 Notes

5 Nondeterminism
5.1 Nondeterministic computation
5.2 Completeness
5.3 From programs to 3Sat in quasi-linear time
5.3.1 Efficient sorting circuits
5.4 Power from completeness
5.4.1 Max-3Sat
5.4.2 NP is as easy as detecting unique solutions
5.5 Alternation
5.5.1 Does the hierarchy collapse?
5.6 Problems
5.7 Notes

6 Space
6.1 Branching programs
6.2 The power of L
6.2.1 Arithmetic
6.2.2 Graphs
6.2.3 Linear algebra
6.3 Checkpoints
6.4 The grand challenge for space
6.5 Randomness
6.6 Reductions
6.6.1 P vs. PSpace
6.6.2 L vs. P
6.7 Nondeterministic space
6.8 An impossibility result for 3Sat
6.9 TiSp
6.10 Computing with a full memory: Catalytic space
6.11 Problems
6.12 Notes

7 Depth
7.1 Depth vs space
7.2 The power of NC2: Linear algebra
7.3 Formulae
7.3.1 The grand challenge for formulae
7.4 The power of NC1: Arithmetic
7.5 Computing with 3 bits of memory
7.6 Group programs
7.7 The power NC0: Cryptography
7.8 Word circuits
7.8.1 Simulating circuits with square-root space
7.9 Uniformity
7.10 Problems
7.11 Notes

8 Majority
8.1 The power of TC0: Arithmetic
8.2 Neural networks
8.3 Amplifying lower bounds by means of self-reducibility
8.4 The power of Majority: Boosting correlation
8.5 Uniformity
8.6 Problems
8.7 Notes

9 Alternation
9.1 The polynomial method over F2
9.1.1 AC0 correlates with low-degree polynomials modulo 2
9.1.2 Using the correlation to show that Majority is hard
9.2 The polynomial method over R
9.2.1 AC0 correlates with low-degree real polynomials
9.2.2 Sign-Approximating Maj-AC0
9.2.3 Using the correlation to show impossibility for Maj-AC0
9.2.4 AC0 has small correlation with parity
9.3 Switching lemmas
9.3.1 Switching I
9.3.2 Switching II
9.3.3 Proof of switching II
9.4 AC0 vs L, NC1, and TC0
9.4.1 L
9.4.2 Linear-size log-depth
9.4.3 TC0
9.5 The power of AC0: Gap majority
9.5.1 Back to the PH
9.6 Mod 6
9.6.1 The power of ACC0
9.7 Impossibility results for ACC0
9.8 The power of AC0: sampling
9.9 Problems
9.10 Notes

10 Proofs
10.1 Static proofs
10.2 Zero-knowledge proofs
10.3 Interactive proofs
10.4 Delegating computation: Interactive proofs for muggles
10.4.1 Warm-up: Counting triangles
10.4.2 Delegating NC
10.5 Problems
10.6 Notes

11 Pseudorandomness
11.1 Basic PRGs
11.1.1 Local tests
11.1.2 Low-degree polynomials
11.1.3 Local tests, II
11.1.4 Local small bias
11.2 PH is a random low-degree polynomial
11.3 Pseudorandom generators from hard functions
11.3.1 Stretching correlation bounds: The bounded-intersection generator
11.3.2 Turning hardness into correlation bounds
11.3.3 Hard-core sets
11.3.4 Derandomizing the XOR lemma
11.3.5 Encoding the whole truth-table
11.3.6 Monotone amplification within NP
11.4 Finding the hard core
11.5 Cryptographic pseudorandom generators
11.5.1 AC0
11.5.2 Circuits
11.6 Problems
11.7 Notes

12 Expansion
12.1 Edge expansion
12.2 Spectral expansion
12.3 Undirected reachability in L
12.4 What do expanders fool?
12.5 On the proof of the PCP theorem
12.6 Problems
12.7 Notes

13 Communication
13.1 Two parties
13.1.1 The rectangle method and the Equality function
13.1.2 Rounds: Pointer chasing
13.1.3 Randomness
13.1.4 Arbitrary partition
13.2 Number-on-forehead
13.2.1 Generalized inner product
13.2.2 The power of logarithmic parties
13.2.3 Pointer chasing
13.3 Problems
13.4 Notes

14 Arithmetic
14.1 Linear transformations
14.1.1 The power of depth-2 xor circuits
14.2 Integers
14.3 Univariate polynomials
14.4 Multivariate polynomials
14.5 Depth reduction and completeness
14.6 Alternation
14.6.1 The power of depth 3
14.6.2 Impossibility results
14.7 Problems
14.8 Notes

15 Structures
15.1 Static
15.1.1 Succinct
15.1.2 Succincter
15.1.3 Impossibility results by ruling out samplers
15.2 Dynamic
15.3 Problems
15.4 Notes

16 Tapes
16.1 On the alphabet
16.1.1 The universal TM
16.2 Multi-tape machines
16.2.1 Time vs. TM-Time
16.3 TMs vs circuits
16.4 The grand challenge for TMs
16.4.1 Communication bottleneck: crossing sequences
16.5 TM-Time hierarchy
16.6 Randomness
16.7 Sub-logarithmic space
16.7.1 The power of sub-logarithmic space
16.8 Problems
16.9 Notes

17 Barriers
17.1 Black-box
17.2 Natural proofs
17.2.1 Examples of proofs that are natural
17.2.2 Ruling out natural proofs under popular conjectures
17.3 Problems
17.4 Notes

18 Speculation
18.1 Critique of common arguments for P ≠ NP

A Miscellanea
A.1 Logic
A.2 Integers
A.3 Sums
A.4 Inequalities
A.5 Probability theory
A.5.1 Deviation bounds for the sum of random variables
A.6 Squaring tricks
A.7 Duality
A.8 Groups
A.9 Fields
A.10 Linear algebra
A.11 Polynomials
A.12 Notes

A Landscape

By Manu

The Vital Amines

from Ben Recht

Assembling the consensus on the necessity of vitamins.

This is Part 5 of 7 of a blogged essay “Steampunk Data Science.” A table of contents is here.

At the same time that Stephen Babcock was moving from New York to Wisconsin, Christian Eijkman departed the Netherlands for a new position in the Dutch Colony of Batavia. Eijkman would be credited with discovering Vitamin B, but his path to discovery was markedly different than the one taken in Wisconsin.

Eijkman was sent to Batavia, which we now know as Jakarta, to investigate the disease Beriberi. Plaguing much of Asia, beriberi caused nervous system disorders, including loss of the ability to walk and loss of reflexes. As was the style at the time, Eijkman was convinced beriberi was an infectious disease, and he set out upon a series of experiments to isolate the responsible germ.

Eijkman’s first attempts involved transfusing human blood into animals in the hope of inducing Beriberi. He first tried to infect monkeys, to no avail. He next experimented with rabbits but also failed to induce beriberi. His third animal model was chickens. Even in the 1800s, a non-mammalian test subject was outlandish, but chickens could succumb to a beriberi-like illness called polyneuritis, which caused a similar pattern of neural dysfunction. Moreover, Eijkman had grown frustrated with failures.

To his surprise and delight, Eijkman found that chickens often developed polyneuritis after receiving his beriberi transfusions. However, there was one perplexing glitch with his experiment: He observed an equal incidence of polyneuritis in the treatment and control groups. How was this possible? Eijkman tried and failed to cultivate the infecting substance from the afflicted chickens. None of the germ theory he had learned studying with Robert Koch could explain the outbreak of polyneuritis in his experimental fowl. Miraculously, in one of those mythologized moments of scientific luck, Eijkman happened upon an impossible answer by accident.

To save precious research funds, the lab assistant assigned to care for the chickens had been feeding them the leftover rice from the military mess hall adjacent to Eijkman’s facility. But the food supply was cut off after a new barracks chef, in Eijkman’s words, “refused to allow military rice to be taken for civilian chickens.” The lab keeper had to resort to purchasing bargain-basement brown rice for the chickens. Once the rice was switched, the chickens were cured of polyneuritis.

What was different about the barracks rice and the new civilian rice? The military rice was white. White rice is polished brown rice. Eijkman was convinced that something about the polishing turned the rice into a beriberi vector.

Unfortunately, he had no idea what that something was. He had a difficult time shaking the germ theory instilled in him by Koch. Perhaps polishing the rice allowed some bug to flourish in the white rice. Perhaps the brown rice contained something that killed an unknown pathogen. Determining what precisely was special about the brown rice would take decades.

Multiple investigations confirmed Eijkman’s observations over the subsequent years. For example, in 1897, Adolphe Vorderman observed that prisons that fed prisoners white rice had a considerably higher incidence of beriberi than those that fed brown rice. You might say that Eijkman’s observation in chickens had been “replicated” in humans in this observational study. But researchers at the time were as wary of confounding in non-interventional explanations as they are today.

It wasn’t until 1901 that an experiment seemed to definitively confirm that there was something valuable in the rice bran itself. Eijkman’s successor in Batavia, Gerrit Grijns, demonstrated that chickens would not develop polyneuritis on a diet of meat and rice bran. Grijns confidently (and correctly) declared there was something essential to the diet in the rice bran. It took another decade for scientific consensus to accept Grijns was right.


The case for vitamins would finally be closed by Casimir Funk, who wrote an audacious meta-analysis in 1912. In his report, he boldly claimed that beriberi, scurvy, and pellagra were all caused by the lack of some food substance in a person’s diet. Part of Funk’s evidence was epidemiological. He noted these afflictions only occurred in countries where people ate unvarying, monotonous diets, like those based on polished rice. However, not all monotonous diets were necessarily perilous. Russians who lived on a diet of cabbage, potatoes, and bacon seemed to avoid this cluster of illnesses. Some diets were missing yet-to-be-identified nutrients, and people couldn’t live without them. He called the missing components “vital amines” or “vitamines” for short.

Funk, a biochemist at the Lister Institute of Preventive Medicine in London, had been investigating the properties of rice polishing, his interest piqued by the findings from Indonesia. He devised multiple chemical mechanisms to extract the curative component from the discarded rice shavings. He found that it would take tremendous quantities of rice to yield tiny amounts of this substance, but only a small amount was required to cure pigeons of polyneuritis .

These experiments and the evidence from Asia convinced Funk. Germ theory, toxic theory, and hormonal theory were each insufficient to explain the evidence. Instead, Funk invented a new classification: deficiency diseases. He proposes that small changes in diet could reliably cure these diseases.

Wisconsin investigators McCollum and Davis soon confirmed Funk’s vitamin theory. Their fat-soluble compound could not sustain rats fed only polished rice. However, adding the water-soluble compound found in the rice polishings allowed the rats to thrive.

There are necessary for normal nutrition during growth two classes of unknown accessory substances, one soluble in fats and accompanying these in the process of isolation of fats from certain foodstuffs, and the other soluble in water, but apparently not in fats.

Thus, McCollum and Davis showed that multiple compounds were necessary to sustain the life of their rats, and these compounds were not proteins, fats, or carbohydrates. Having grown more adept with rat experiments, Davis now included a much more extensive collection of rodent growth curves, demonstrating the existence of at least two vitamins.

The fat-soluble vitamin was A. The water-soluble vitamin was B.1

Funk also called out Scurvy as a deficiency disease. Though the British Navy knew that scurvy could be treated with lemons, they did not understand its etiology at all. Axel Holst and Theodor Frolich had induced scurvy in guinea pigs by a deficient diet and then cured the condition using lemon juice or cabbage. Funk concluded that the vitamin associated with scurvy is distinct from that of beriberi, as it was much more sensitive to boiling. Though most were convinced that this evidence showed scurvy was a deficiency disease, Vitamin C would not be chemically isolated until 1928.

Funk went even further out on a limb. He asserted that Pellagra, a disease endemic to northern Italy, was also a deficiency disease. Pellagra was associated with starch-heavy diets based on corn. It wouldn’t be until two decades later, in 1937, that the associated vitamin would be discovered. Pellagra is caused by a deficiency of Niacin, also known as Vitamin B3.

In passing in the penultimate paragraph of his manuscript, Funk hypothesizes that Rickets, characterized by weakened bones and deformed legs, was also likely a deficiency disease. This hypothesis would also prove correct. Rickets is cured by vitamin D, a vitamin discovered in 1922 by McCollum. A year later, Harry Steenbock, another collaborator on the Single-Grain experiment at Wisconsin, found that Vitamin D could be synthesized by irradiating food with UV light and that this irradiated food cured rickets in mice.2

Though many now point to the serendipitous diet change in chicken as the “discovery” of vitamins, Funk’s assembly of 30 years of evidence and naming of a cause had more immediate impact. Funk’s theorizing flew in the face of germ theory and asserted something new. He completely revolutionized how we think about illness, as much as germ theory itself. The evidence had been piling up. People knew that treatments for these diseases involved dietary changes. The rodent studies assuredly pointed to particular necessary factors in food, but Funk was the first to put it all together and state it outright.

Once Funk formulated the concepts of vitamins and deficiency diseases, the rest of the scientific community jumped on board. They had a whole new way to think about what causes and cures illness. With sufficient dietary diversity, humankind was empowered to cure deficiency diseases plaguing large parts of the world.

Continue on to Part 6.

Subscribe now


This history of the discovery of Vitamin B in the Dutch colonies synthesizes Eijkman’s Nobel Prize address and the accounts of Carpenter, Combs and McClung, and Vandenbroucke. Combs and McClung’s text on vitamins also discusses Funk’s legacy in vitamins and his connection to McCollum and Davis. Griminger provides additional details about Funk’s biography.

1

Specifically, the vitamin whose absence causes beriberi is B1. Though they didn’t realize it at the time, nutritionists soon realized there were many water-soluble vitamins. We now have 12 named B vitamins, and all of them are water-soluble.

2

Steenbock would subsequently patent the method of irradiating milk to fortify it with Vitamin D. This patent would generate untold sums of money for the University of Wisconsin. It’s one of the earliest and most successful examples of Universities funding themselves through licensing their intellectual property.

By Ben Recht

The Purpose of Proofs

from Computational Complexity

In discussions of AI and Mathematics, the discussion often goes to mathematical proofs, such as the the First Proof challenge. So let's look at the role of proofs in mathematics.

Without a proof, you don't even know whether a theorem is true or false. It's not even a theorem until you have a proof, just a conjecture or hypothesis. You might have some intuition but you don't know the hardness of a proof until you find it. Even then that only gives you an upper bound on hardness as someone might find a simpler alternative proof in the future,

Proofs give an understanding of why a theorem is true. A proof can look like a piece of art, especially if the proof works in a new or clever way, maybe even a proof from the book like Kelly's Proof of the Sylvester–Gallai theorem. A mathematician often has their most satisfying experiences finding a proof of a theorem they or others have had challenge proving earlier.

Conjectures drive the need for proofs, but often it goes the other direction. A failed proof leads to a new conjecture and maybe a different theorem altogether. My time-space tradeoffs came out of a failed attempt to show NP different from L. Too often I see papers with theorems that are really just what the best proof they can find achieves.

Proofs are supposedly objective, either it is a valid proof or it isn't, the biggest reason AI tackles proofs as we can grade them as right or wrong. Most academic papers have high-level proofs and a referee however needs to make a subjective decision whether the proof has enough details to count as a legitimate proofs. If a proof has minor fixable errors, then it isn't a proof but we usually count it as one.

Now we have proof systems like Lean where we can make proofs truly objective. There's a large mathlib project to formalize much of mathematics in Lean, and talk of a cslib as well. 

Lean excites me less, I'm not sure what we gain by formalizing proofs besides certainty. Will Fermat's last theorem be any more true once we formalize it? One could argue Lean also helps AI reason about mathematics in a grounded way, though could AI just get lost in the logical weeds?

I worry about the heavy task of converting proofs to Lean, even with AI help, takes away from time spent finding and proving new theorems and we lose the beauty of the proofs. If there was a fully lean verified proof of P ≠ NP that you couldn't understand, would you find that satisfying?

By Lance Fortnow

In discussions of AI and Mathematics, the discussion often goes to mathematical proofs, such as the the First Proof challenge. So let's look at the role of proofs in mathematics.

Without a proof, you don't even know whether a theorem is true or false. It's not even a theorem until you have a proof, just a conjecture or hypothesis. You might have some intuition but you don't know the hardness of a proof until you find it. Even then that only gives you an upper bound on hardness as someone might find a simpler alternative proof in the future,

Proofs give an understanding of why a theorem is true. A proof can look like a piece of art, especially if the proof works in a new or clever way, maybe even a proof from the book like Kelly's Proof of the Sylvester–Gallai theorem. A mathematician often has their most satisfying experiences finding a proof of a theorem they or others have had challenge proving earlier.

Conjectures drive the need for proofs, but often it goes the other direction. A failed proof leads to a new conjecture and maybe a different theorem altogether. My time-space tradeoffs came out of a failed attempt to show NP different from L. Too often I see papers with theorems that are really just what the best proof they can find achieves.

Proofs are supposedly objective, either it is a valid proof or it isn't, the biggest reason AI tackles proofs as we can grade them as right or wrong. Most academic papers have high-level proofs and a referee however needs to make a subjective decision whether the proof has enough details to count as a legitimate proofs. If a proof has minor fixable errors, then it isn't a proof but we usually count it as one.

Now we have proof systems like Lean where we can make proofs truly objective. There's a large mathlib project to formalize much of mathematics in Lean, and talk of a cslib as well. 

Lean excites me less, I'm not sure what we gain by formalizing proofs besides certainty. Will Fermat's last theorem be any more true once we formalize it? One could argue Lean also helps AI reason about mathematics in a grounded way, though could AI just get lost in the logical weeds?

I worry about the heavy task of converting proofs to Lean, even with AI help, takes away from time spent finding and proving new theorems and we lose the beauty of the proofs. If there was a fully lean verified proof of P ≠ NP that you couldn't understand, would you find that satisfying?

By Lance Fortnow

Mass surveillance, red lines, and a crazy weekend

from Windows on Theory

[These are my own opinions, and not representing OpenAI. Crossposted on Lesswrong] AI has so many applications, and AI companies have limited resources and attention span. Hence if it was up to me, I’d prefer we focus on applications that are purely beneficial— science, healthcare, education — or even commercial, before working on anything related … Continue reading Mass surveillance, red lines, and a crazy weekend

[These are my own opinions, and not representing OpenAI. Crossposted on Lesswrong]

AI has so many applications, and AI companies have limited resources and attention span. Hence if it was up to me, I’d prefer we focus on applications that are purely beneficial— science, healthcare, education — or even commercial, before working on anything related to weapons or spying. If someone has to do it, I’d prefer it not to be my own company. Alas, we can’t always get what we want.

This is a long-ish post, but the TL;DR is:

  1. I believe that harm to democracy is one of the most important risks of AI, and one that is not sufficiently highlighted.
  2. While I wish it would have proceeded under different circumstances, the conditions in the deal signed between OpenAI and the DoW, as well as the publicity that accompanied it, provide a chance to move forward on this issue, and make using AI to collect, analyze, or de-anonymize people’s data at mass scale a risk that we track in a similar manner to other risks such as cybersecurity and bioweapons.
  3. It is also too soon to “declare victory.” The true test of this deal is not the contract language. It will be in the safeguards we build, and the way we are able to monitor and ensure a model that is safe for our troops and does not enable mass surveillance of our people.

[Also; The possibility of Anthropic’s designation as a supply-chain risk is terrible. I hope it will be resolved asap.]

Country of IRS agents in a datacenter

How can AI destroy democracy? Throughout history, authoritarian regimes required a large obedient bureaucracy to spy and control their citizens. In East Germany, in addition to the full-time Stasi staff, one percent of the population served as informants. The KGB famously had multiple “purges” to ensure loyalty.

AI can potentially lead to a government bureaucracy loyal to whomever has control of the models training or prompting, ensuring an army of agents that will not leak, whistleblow, or disobey an illegal order. Moreover, since the government has the monopoly on violence, we don’t need advances in the “world of atoms” to implement that, nor do we need a “Nobel laureate” level of intelligence.

As an example, imagine that the IRS was replaced with an AI workforce. Arguably current models are already at or near the capability of automating much of those functions. In such a case the leaders of the agency could commence tax investigations at a large scale of their political enemies. Furthermore, even if each AI agent was individually aligned, it might not be possible for it to know that the person it received an order to audit was selected for political reasons. A human being goes home, reads the news, and can understand the broader context. A language model is born at the beginning of a task and dies at its end.

Historically, mass surveillance of a country’s own citizens was key for authoritarian governments. This is why so much of U.S. history is about preventing this, including the fourth amendment. AI opens new possibilities for analysis and de-anonymization of people’s data at a larger scale than ever before. For example, just recently, Lermen et al showed that LLMs can be used to perform large scale autonomic de-anonymization on unstructured data.

While all surveillance is problematic, given the unique power that governments have over their own citizens and residence, restricting domestic surveillance by governments is of particular importance. This is why personally I view it as even more crucial to prevent than privacy violations by foreign governments or corporations. But the latter is important too, especially since governments sometimes “launder” surveillance by purchasing commercially available information.

It is not a lost cause – we can implement and regulate approaches for preventing this. AI can scale oversight and monitoring just as it can scale surveillance. We can also build in privacy and cryptographic protections to AI to empower individuals. But we urgently need to do this work.

Just like with the encryption debates, there will always be people that propose trading our freedoms for protection against our adversaries. But I hope we have learned our lesson from the PATRIOT act and the Snowden revelations.  While I don’t agree with its most expansive interpretations, I think the second amendment is also a good illustration that we Americans have always been willing to trade some safety to protect our freedom. Even in the world of advanced AI, we still have two oceans, thousands of nukes, and a military with a budget larger than China’s and Russia’s combined. We don’t need to give up our freedoms and privacy to protect ourselves.

OpenAI’s deal with the Department of War

While the potential for AI abuse in government is always present, it is amplified in the classified settings, since by their nature, this could make abuse much harder to detect. (E.g., we might have never heard of the NSA overreach if it wasn’t for Snowden.) For this reason, I am glad for the heightened scrutiny our deal with the DoW received (even if that scrutiny has not been so easy for me personally).

I feel that too much of the focus has been on the “legalese”, with people parsing every word of the contract excerpts we posted. I do not dispute the importance of the contract, but as Thomas Jefferson said “The execution of the laws is more important than the making of them.” The importance of a contract is a shared understanding between OpenAI and the DoW on what the models will and will not be used to do. I am happy that we are explicit in our understanding that our models will not be used for domestic mass surveillance, including via analysis of commercially available information of U.S. people. I am even happier that for the time being we will not be working with the intelligence agencies of the DoW, such as NSA, DIA, etc. Our leadership committed to announcing publicly if this changes, and of course this contract has nothing to do with domestic agencies such as DHS, ICE, or FBI. The intelligence agencies have the most sensitive workloads, and so I completely agree it is best to start in the easier cases. This also somewhat mitigates my worry about not ruling out mass surveillance of international citizens. (In addition to the fact that spying on one’s own people is inherently more problematic.)

I am also happy the contract prohibits using our models to direct lethal autonomous weapons, though realistically I do not think powering a killer drone via a cloud-based large model was ever a real possibility. A general purpose frontier model is an extremely poor fit for autonomously directing a weapon; also, the main selling point of autonomous drones is to evade jamming, which requires an on-device model. Given our current state of safety and alignment, lethal autonomous weapons are a very bad idea. But regardless, it would not have happened through this deal.

That said, there is a possibility that eventually our models will be used to help humans in target selection, as is reportedly happening in Iran right now. This is a very heavy burden, and it is up to us to ensure that we do not scale to this use case without very extensive testing of safety and reliability.

The contract enables the necessary conditions for success but it is too soon to know if they are sufficient. It allows us to build in our safety stack to ensure the safe operation of the model and our red lines, as well as have our own forward deployed engineers (FDEs) in place. No safety stack can be perfect, but given the “mass” nature of mass surveillance, it does not need to be perfect to prevent it. That said, this is going to be a challenging enterprise: building safety for applications we are less familiar with, with the added complexities of clearance. Sam has said that we will deploy gradually, starting in the least risky and most familiar domains first.  I think this is essential.

Can we make lemonade out of this lemon?

The previous defense contract of the DoW and Anthropic attracted relatively little attention. I hope that the increased salience of this issue can be used to elevate our standards as an industry. Just like we do with other risks such as bioweapons and cybersecurity, we need to build best practices for avoiding the risk of AI-enabled takeover of democracy, including mass domestic surveillance and high-stakes automated decisions (for example, selective prosecution or “social credit”). These risks are no less catastrophic than bioweapons, and should be tracked and reported as such. While, due to the classified nature of the domain, not everything can be reported, we can and should at least be public about the process.

If there is one thing that AI researchers are good at, it is measuring and optimizing quantities. If we can build the evaluations and turn tracking these risks into a science, we have a much better chance at combatting them. I am confident that it can be done given sufficient time. I am less confident that time will be sufficient.

By Boaz Barak

Hardness of the Binary Covering Radius Problem in Large $\ell_p$ Norms

from arXiv: Computational Complexity

Authors: Huck Bennett, Peter Ly

We study the hardness of the $γ$-approximate decisional Covering Radius Problem on lattices in the $\ell_p$ norm ($γ$-$\text{GapCRP}_p$). Specifically, we prove that there is an explicit function $γ(p)$, with $γ(p) > 1$ for $p > p_0 \approx 35.31$ and $\lim_{p \to \infty} γ(p) = 9/8$, such that for any constant $\varepsilon > 0$, $(γ(p) - \varepsilon)$-$\text{GapCRP}_p$ is $\mathsf{NP}$-hard. This shows the first hardness of $\text{GapCRP}_p$ for explicit $p < \infty$. Work of Haviv and Regev (CCC, 2006 and CJTCS, 2012) previously showed $Π_2$-hardness of approximation for $\text{GapCRP}_p$ for all sufficiently large (but non-explicit) finite $p$ and for $p = \infty$. In fact, our hardness results hold for a variant of $\text{GapCRP}$ called the Binary Covering Radius Problem ($\text{BinGapCRP}$), which trivially reduces to both $\text{GapCRP}$ and the decisional Linear Discrepancy Problem ($\text{LinDisc}$) in any norm in an approximation-preserving way. We also show $Π_2$-hardness of $(9/8 - \varepsilon)$-$\text{BinGapCRP}$ in the $\ell_{\infty}$ norm for any constant $\varepsilon > 0$. Our work extends and heavily uses the work of Manurangsi (IPL, 2021), which showed $Π_2$-hardness of $(9/8 - \varepsilon)$-$\text{LinDisc}$ in the $\ell_{\infty}$ norm.

Authors: Huck Bennett, Peter Ly

We study the hardness of the $γ$-approximate decisional Covering Radius Problem on lattices in the $\ell_p$ norm ($γ$-$\text{GapCRP}_p$). Specifically, we prove that there is an explicit function $γ(p)$, with $γ(p) > 1$ for $p > p_0 \approx 35.31$ and $\lim_{p \to \infty} γ(p) = 9/8$, such that for any constant $\varepsilon > 0$, $(γ(p) - \varepsilon)$-$\text{GapCRP}_p$ is $\mathsf{NP}$-hard. This shows the first hardness of $\text{GapCRP}_p$ for explicit $p < \infty$. Work of Haviv and Regev (CCC, 2006 and CJTCS, 2012) previously showed $Π_2$-hardness of approximation for $\text{GapCRP}_p$ for all sufficiently large (but non-explicit) finite $p$ and for $p = \infty$. In fact, our hardness results hold for a variant of $\text{GapCRP}$ called the Binary Covering Radius Problem ($\text{BinGapCRP}$), which trivially reduces to both $\text{GapCRP}$ and the decisional Linear Discrepancy Problem ($\text{LinDisc}$) in any norm in an approximation-preserving way. We also show $Π_2$-hardness of $(9/8 - \varepsilon)$-$\text{BinGapCRP}$ in the $\ell_{\infty}$ norm for any constant $\varepsilon > 0$. Our work extends and heavily uses the work of Manurangsi (IPL, 2021), which showed $Π_2$-hardness of $(9/8 - \varepsilon)$-$\text{LinDisc}$ in the $\ell_{\infty}$ norm.

Required-edge Cycle Cover Problem: an ASP-Completeness Framework for Graph Problems and Puzzles

from arXiv: Computational Complexity

Authors: Kosuke Susukita, Junichi Teruyama

Proving the NP-completeness of pencil-and-paper puzzles typically relies on reductions from combinatorial problems such as the satisfiability problem (SAT). Although the properties of these problems are well studied, their purely combinatorial nature often does not align well with the geometric constraints of puzzles. In this paper, we introduce the Required-edge Cycle Cover Problem (RCCP) -- a variant of the Cycle Cover Problem (CCP) on mixed graphs. CCP on mixed graphs was studied by Seta (2002) to establish the ASP-completeness (i.e., NP-completeness under parsimonious reductions) of the puzzle Kakuro (a.k.a.~Cross Sum), and is known to be ASP-complete under certain conditions. We prove the ASP-completeness of RCCP under certain conditions, and strengthen known ASP-completeness results of CCP on mixed graphs as a corollary. Using these results, we resolve the ASP-completeness of Constraint Graph Satisfiability (CGS) in a certain case, addressing an open problem posed by the MIT Hardness Group (2024). We also show that Kakuro remains ASP-complete even when the available digit set is $\{1, 2, 3\}$, consequently completing its complexity classification regarding the maximum available digit and the maximum lengths of contiguous blank cells. It strengthens previously known results of Seta (2002) and Ruepp and Holzer (2010). Furthermore, we introduce a flow model equivalent to the constrained RCCP; this model allows gadgets to be tiled densely on a rectangular grid, which enables us to reduce RCCP to various pencil-and-paper puzzles in a parsimonious manner. By applying this framework, we prove the ASP-completeness of several puzzles, including Chocona, Four Cells, Hinge, and Shimaguni, and strengthen existing NP-completeness results for Choco Banana and Five Cells to ASP-completeness.

Authors: Kosuke Susukita, Junichi Teruyama

Proving the NP-completeness of pencil-and-paper puzzles typically relies on reductions from combinatorial problems such as the satisfiability problem (SAT). Although the properties of these problems are well studied, their purely combinatorial nature often does not align well with the geometric constraints of puzzles. In this paper, we introduce the Required-edge Cycle Cover Problem (RCCP) -- a variant of the Cycle Cover Problem (CCP) on mixed graphs. CCP on mixed graphs was studied by Seta (2002) to establish the ASP-completeness (i.e., NP-completeness under parsimonious reductions) of the puzzle Kakuro (a.k.a.~Cross Sum), and is known to be ASP-complete under certain conditions. We prove the ASP-completeness of RCCP under certain conditions, and strengthen known ASP-completeness results of CCP on mixed graphs as a corollary. Using these results, we resolve the ASP-completeness of Constraint Graph Satisfiability (CGS) in a certain case, addressing an open problem posed by the MIT Hardness Group (2024). We also show that Kakuro remains ASP-complete even when the available digit set is $\{1, 2, 3\}$, consequently completing its complexity classification regarding the maximum available digit and the maximum lengths of contiguous blank cells. It strengthens previously known results of Seta (2002) and Ruepp and Holzer (2010). Furthermore, we introduce a flow model equivalent to the constrained RCCP; this model allows gadgets to be tiled densely on a rectangular grid, which enables us to reduce RCCP to various pencil-and-paper puzzles in a parsimonious manner. By applying this framework, we prove the ASP-completeness of several puzzles, including Chocona, Four Cells, Hinge, and Shimaguni, and strengthen existing NP-completeness results for Choco Banana and Five Cells to ASP-completeness.

Optimal play in Yu-Gi-Oh! TCG is hard

from arXiv: Computational Complexity

Authors: Orazio Nicolosi, Federico Pisciotta, Lorenzo Bresolin

Motivated by the results for Magic: The Gathering presented in [CBH20] and [Bid20], we study a computability problem related to optimal play in Yu-Gi-Oh! Trading Card Game, a popular card game developed and published by Konami. We show that the problem of establishing whether, from a given game state, a computable strategy is winning is undecidable. In particular, not only do we prove that the Halting Problem can be reduced to this problem, but also that the same holds for the Kleene's $\mathcal{O}$, thereby demonstrating that this problem is actually $Π^1_1$-complete. We extend this last result to all strategies with a reduction on the set of countable well orders, a classic $\boldsymbolΠ^1_1$-complete set. For these reductions we present two legal decks (according to the current Forbidden & Limited List of Yu-Gi-Oh! Trading Card Game) that can be used by the player who goes first to perform them.

Authors: Orazio Nicolosi, Federico Pisciotta, Lorenzo Bresolin

Motivated by the results for Magic: The Gathering presented in [CBH20] and [Bid20], we study a computability problem related to optimal play in Yu-Gi-Oh! Trading Card Game, a popular card game developed and published by Konami. We show that the problem of establishing whether, from a given game state, a computable strategy is winning is undecidable. In particular, not only do we prove that the Halting Problem can be reduced to this problem, but also that the same holds for the Kleene's $\mathcal{O}$, thereby demonstrating that this problem is actually $Π^1_1$-complete. We extend this last result to all strategies with a reduction on the set of countable well orders, a classic $\boldsymbolΠ^1_1$-complete set. For these reductions we present two legal decks (according to the current Forbidden & Limited List of Yu-Gi-Oh! Trading Card Game) that can be used by the player who goes first to perform them.

Quantum Algorithms for Approximate Graph Isomorphism Testing

from arXiv: Computational Complexity

Authors: Prateek P. Kulkarni

The graph isomorphism problem asks whether two graphs are identical up to vertex relabeling. While the exact problem admits quasi-polynomial-time classical algorithms, many applications in molecular comparison, noisy network analysis, and pattern recognition require a flexible notion of structural similarity. We study the quantum query complexity of approximate graph isomorphism testing, where two graphs on $n$ vertices drawn from the Erdős--Rényi distribution $\mathcal{G} (n,1/2)$ are considered approximately isomorphic if they can be made isomorphic by at most $k$ edge edits. We present a quantum algorithm based on MNRS quantum walk search over the product graph $Γ(G,H)$ of the two input graphs. When the graphs are approximately isomorphic, the quantum walk search detects vertex pairs belonging to a dense near isomorphic matching set; candidate pairings are then reconstructed via local consistency propagation and verified via a Grover-accelerated consistency check. We prove that this approach achieves query complexity $\mathcal{O}(n^{3/2} \log n/\varepsilon)$, where $\varepsilon$ parameterizes the approximation threshold. We complement this with an $Ω(n^2)$ classical lower bound for constant approximation, establishing a genuine polynomial quantum speedup in the query model. We extend the framework to spectral similarity measures based on graph Laplacian eigenvalues, as well as weighted and attributed graphs. Small-scale simulation results on quantum simulators for graphs with up to twenty vertices demonstrate compatibility with near-term quantum devices.

Authors: Prateek P. Kulkarni

The graph isomorphism problem asks whether two graphs are identical up to vertex relabeling. While the exact problem admits quasi-polynomial-time classical algorithms, many applications in molecular comparison, noisy network analysis, and pattern recognition require a flexible notion of structural similarity. We study the quantum query complexity of approximate graph isomorphism testing, where two graphs on $n$ vertices drawn from the Erdős--Rényi distribution $\mathcal{G} (n,1/2)$ are considered approximately isomorphic if they can be made isomorphic by at most $k$ edge edits. We present a quantum algorithm based on MNRS quantum walk search over the product graph $Γ(G,H)$ of the two input graphs. When the graphs are approximately isomorphic, the quantum walk search detects vertex pairs belonging to a dense near isomorphic matching set; candidate pairings are then reconstructed via local consistency propagation and verified via a Grover-accelerated consistency check. We prove that this approach achieves query complexity $\mathcal{O}(n^{3/2} \log n/\varepsilon)$, where $\varepsilon$ parameterizes the approximation threshold. We complement this with an $Ω(n^2)$ classical lower bound for constant approximation, establishing a genuine polynomial quantum speedup in the query model. We extend the framework to spectral similarity measures based on graph Laplacian eigenvalues, as well as weighted and attributed graphs. Small-scale simulation results on quantum simulators for graphs with up to twenty vertices demonstrate compatibility with near-term quantum devices.

Matrices with displacement structure: a deterministic approach for linear systems and nullspace bases

from arXiv: Computational Complexity

Authors: Sara Khichane, Vincent Neiger

The fastest known algorithms for dealing with structured matrices, in the sense of the displacement rank measure, are randomized. For handling classical displacement structures, they achieve the complexity bounds $\tilde{O}(α^{ω-1} n)$ for solving linear systems and $\tilde{O}(α^2 n)$ for computing the nullspace. Here $n \times n$ is the size of the square matrix, $α$ is its displacement rank, $ω> 2$ is a feasible exponent for matrix multiplication, and the notation $\tilde{O}(\cdot)$ counts arithmetic operations in the base field while hiding logarithmic factors. These algorithms rely on an adaptation of Strassen's divide and conquer Gaussian elimination to the context of structured matrices. This approach requires the input matrix to have generic rank profile; this constraint is lifted via pre- and post-multiplications by special matrices generated from random coefficients chosen in a sufficiently large subset of the base field. This work introduces a fast and deterministic approach, which solves both problems within $\tilde{O}(α^{ω-1} (m+n))$ operations in the base field for an arbitrary rectangular $m \times n$ input matrix. We provide explicit algorithms that instantiate this approach for Toeplitz-like, Vandermonde-like, and Cauchy-like structures. The starting point of the approach is to reformulate a structured linear system as a modular equation on univariate polynomials. Then, a description of all solutions to this equation is found in three steps, all using fast and deterministic operations on polynomial matrices. Specifically, one first computes a basis of solutions to a vector M-Padé approximation problem; then one performs linear system solving over the polynomials to isolate away unwanted unknowns and restrict to those that are actually sought; and finally the latter are found by simultaneous M-Padé approximation.

Authors: Sara Khichane, Vincent Neiger

The fastest known algorithms for dealing with structured matrices, in the sense of the displacement rank measure, are randomized. For handling classical displacement structures, they achieve the complexity bounds $\tilde{O}(α^{ω-1} n)$ for solving linear systems and $\tilde{O}(α^2 n)$ for computing the nullspace. Here $n \times n$ is the size of the square matrix, $α$ is its displacement rank, $ω> 2$ is a feasible exponent for matrix multiplication, and the notation $\tilde{O}(\cdot)$ counts arithmetic operations in the base field while hiding logarithmic factors. These algorithms rely on an adaptation of Strassen's divide and conquer Gaussian elimination to the context of structured matrices. This approach requires the input matrix to have generic rank profile; this constraint is lifted via pre- and post-multiplications by special matrices generated from random coefficients chosen in a sufficiently large subset of the base field. This work introduces a fast and deterministic approach, which solves both problems within $\tilde{O}(α^{ω-1} (m+n))$ operations in the base field for an arbitrary rectangular $m \times n$ input matrix. We provide explicit algorithms that instantiate this approach for Toeplitz-like, Vandermonde-like, and Cauchy-like structures. The starting point of the approach is to reformulate a structured linear system as a modular equation on univariate polynomials. Then, a description of all solutions to this equation is found in three steps, all using fast and deterministic operations on polynomial matrices. Specifically, one first computes a basis of solutions to a vector M-Padé approximation problem; then one performs linear system solving over the polynomials to isolate away unwanted unknowns and restrict to those that are actually sought; and finally the latter are found by simultaneous M-Padé approximation.

Grounded String Representations of Series-Parallel Graphs without Transitive Edges

from arXiv: Computational Geometry

Authors: Sabine Cornelsen, Jan Kratochvíl, Miriam Münch, Giacomo Ortali, Alexandra Weinberger, Alexander Wolff

In a {\em grounded string representation} of a graph there is a horizontal line $\ell$ and each vertex is represented as a simple curve below $\ell$ with one end point on $\ell$ such that two curves intersect if and only if the respective vertices are adjacent. A grounded string representation is a {\em grounded L-reverseL-representation} if each vertex is represented by a 1-bend orthogonal polyline. It is a {\em grounded L-representation} if in addition all curves are L-shaped. We show that every biconnected series-parallel graph without edges between the two vertices of a separation pair (i.e., {\em transitive edges}) admits a grounded L-reverseL-representation if and only if it admits a grounded string representation. Moreover, we can test in linear time whether such a representation exists. We also construct a biconnected series-parallel graph without transitive edges that admits a grounded L-reverseL-representation, but no grounded L-representation.

Authors: Sabine Cornelsen, Jan Kratochvíl, Miriam Münch, Giacomo Ortali, Alexandra Weinberger, Alexander Wolff

In a {\em grounded string representation} of a graph there is a horizontal line $\ell$ and each vertex is represented as a simple curve below $\ell$ with one end point on $\ell$ such that two curves intersect if and only if the respective vertices are adjacent. A grounded string representation is a {\em grounded L-reverseL-representation} if each vertex is represented by a 1-bend orthogonal polyline. It is a {\em grounded L-representation} if in addition all curves are L-shaped. We show that every biconnected series-parallel graph without edges between the two vertices of a separation pair (i.e., {\em transitive edges}) admits a grounded L-reverseL-representation if and only if it admits a grounded string representation. Moreover, we can test in linear time whether such a representation exists. We also construct a biconnected series-parallel graph without transitive edges that admits a grounded L-reverseL-representation, but no grounded L-representation.

Tilt Automata: Gathering Particles With Uniform External Control

from arXiv: Computational Geometry

Authors: Sándor P. Fekete, Jonas Friemel, Peter Kramer, Jan-Marc Reinhardt, Christian Rieck, Christian Scheffer

Motivated by targeted drug delivery, we investigate the gathering of particles in the full tilt model of externally controlled motion planning: A set of particles is located at the tiles of a polyomino with all particles reacting uniformly to an external force by moving as far as possible in one of the four axis-parallel directions until they hit the boundary. The goal is to choose a sequence of directions that moves all particles to a common position. Our results include a polynomial-time algorithm for gathering in a completely filled polyomino as well as hardness reductions for approximating shortest gathering sequences and for determining whether the particles in a partially filled polyomino can be gathered. We pay special attention to the impact of restricted geometry, particularly polyominoes without holes. As corollaries, we make progress on an open question from [Balanza-Martinez et al., SODA 2020] by showing that deciding whether a given position can be occupied remains NP-hard in polyominoes without holes and provide initial results on the parameterized complexity of tilt problems. Our results build on a connection we establish between tilt models and the theory of synchronizing automata.

Authors: Sándor P. Fekete, Jonas Friemel, Peter Kramer, Jan-Marc Reinhardt, Christian Rieck, Christian Scheffer

Motivated by targeted drug delivery, we investigate the gathering of particles in the full tilt model of externally controlled motion planning: A set of particles is located at the tiles of a polyomino with all particles reacting uniformly to an external force by moving as far as possible in one of the four axis-parallel directions until they hit the boundary. The goal is to choose a sequence of directions that moves all particles to a common position. Our results include a polynomial-time algorithm for gathering in a completely filled polyomino as well as hardness reductions for approximating shortest gathering sequences and for determining whether the particles in a partially filled polyomino can be gathered. We pay special attention to the impact of restricted geometry, particularly polyominoes without holes. As corollaries, we make progress on an open question from [Balanza-Martinez et al., SODA 2020] by showing that deciding whether a given position can be occupied remains NP-hard in polyominoes without holes and provide initial results on the parameterized complexity of tilt problems. Our results build on a connection we establish between tilt models and the theory of synchronizing automata.

Geometric structures and deviations on James' symmetric positive-definite matrix bicone domain

from arXiv: Computational Geometry

Authors: Jacek Karwowski, Frank Nielsen

Symmetric positive-definite (SPD) matrix datasets play a central role across numerous scientific disciplines, including signal processing, statistics, finance, computer vision, information theory, and machine learning among others. The set of SPD matrices forms a cone which can be viewed as a global coordinate chart of the underlying SPD manifold. Rich differential-geometric structures may be defined on the SPD cone manifold. Among the most widely used geometric frameworks on this manifold are the affine-invariant Riemannian structure and the dual information-geometric log-determinant barrier structure, each associated with dissimilarity measures (distance and divergence, respectively). In this work, we introduce two new structures, a Finslerian structure and a dual information-geometric structure, both derived from James' bicone reparameterization of the SPD domain. Those structures ensure that geodesics correspond to straight lines in appropriate coordinate systems. The closed bicone domain includes the spectraplex (the set of positive semi-definite diagonal matrices with unit trace) as an affine subspace, and the Hilbert VPM distance is proven to generalize the Hilbert simplex distance which found many applications in machine learning. Finally, we discuss several applications of these Finsler/dual Hessian structures and provide various inequalities between the new and traditional dissimilarities.

Authors: Jacek Karwowski, Frank Nielsen

Symmetric positive-definite (SPD) matrix datasets play a central role across numerous scientific disciplines, including signal processing, statistics, finance, computer vision, information theory, and machine learning among others. The set of SPD matrices forms a cone which can be viewed as a global coordinate chart of the underlying SPD manifold. Rich differential-geometric structures may be defined on the SPD cone manifold. Among the most widely used geometric frameworks on this manifold are the affine-invariant Riemannian structure and the dual information-geometric log-determinant barrier structure, each associated with dissimilarity measures (distance and divergence, respectively). In this work, we introduce two new structures, a Finslerian structure and a dual information-geometric structure, both derived from James' bicone reparameterization of the SPD domain. Those structures ensure that geodesics correspond to straight lines in appropriate coordinate systems. The closed bicone domain includes the spectraplex (the set of positive semi-definite diagonal matrices with unit trace) as an affine subspace, and the Hilbert VPM distance is proven to generalize the Hilbert simplex distance which found many applications in machine learning. Finally, we discuss several applications of these Finsler/dual Hessian structures and provide various inequalities between the new and traditional dissimilarities.

Lozenge Tiling by Computing Distances

from arXiv: Computational Geometry

Authors: Jean-Marie Favreau, Yan Gerard, Pascal Lafourcade, Léo Robert

The Calisson puzzle is a tiling puzzle in which one must tile a triangular grid inside a hexagon with lozenges, under the constraint that certain prescribed edges remain tile boundaries and that adjacent lozenges along these edges have different orientations. We present the first polynomial-time algorithm for this problem, with cubic running time. This algorithm, called the advancing surface algorithm, can be executed in a simple and intuitive way, even by hand with a pencil and an eraser. Its apparent simplicity conceals a deeper algorithmic reinterpretation of the classical ideas of John Conway and William Thurston, revisited here from a theoretical computer science perspective. We introduce a graph-theoretic overlay based on directed cuts and systems of difference constraints that complements Thurston's theory of lozenge tilings and makes its algorithmic structure explicit. In Thurston's approach, lozenge tilings are lifted to monotone stepped surfaces in the three-dimensional cubic lattice and projected back to the plane using height functions, reducing tilability to the computation of heights. We show that selecting a monotone surface corresponds to selecting a directed cut in a periodic directed graph, while height functions arise as solutions of a system of difference constraints. In this formulation, a region is tilable if and only if the associated weighted directed graph contains no cycle of strictly negative weight. This additional graph layer shows that the Bellman-Ford algorithm suffices to decide feasibility and compute solutions. In particular, our framework allows one to decide whether the infinite triangular grid can be tiled while respecting a finite set of prescribed local constraints, even in the absence of boundary conditions.

Authors: Jean-Marie Favreau, Yan Gerard, Pascal Lafourcade, Léo Robert

The Calisson puzzle is a tiling puzzle in which one must tile a triangular grid inside a hexagon with lozenges, under the constraint that certain prescribed edges remain tile boundaries and that adjacent lozenges along these edges have different orientations. We present the first polynomial-time algorithm for this problem, with cubic running time. This algorithm, called the advancing surface algorithm, can be executed in a simple and intuitive way, even by hand with a pencil and an eraser. Its apparent simplicity conceals a deeper algorithmic reinterpretation of the classical ideas of John Conway and William Thurston, revisited here from a theoretical computer science perspective. We introduce a graph-theoretic overlay based on directed cuts and systems of difference constraints that complements Thurston's theory of lozenge tilings and makes its algorithmic structure explicit. In Thurston's approach, lozenge tilings are lifted to monotone stepped surfaces in the three-dimensional cubic lattice and projected back to the plane using height functions, reducing tilability to the computation of heights. We show that selecting a monotone surface corresponds to selecting a directed cut in a periodic directed graph, while height functions arise as solutions of a system of difference constraints. In this formulation, a region is tilable if and only if the associated weighted directed graph contains no cycle of strictly negative weight. This additional graph layer shows that the Bellman-Ford algorithm suffices to decide feasibility and compute solutions. In particular, our framework allows one to decide whether the infinite triangular grid can be tiled while respecting a finite set of prescribed local constraints, even in the absence of boundary conditions.

Low-Degree Method Fails to Predict Robust Subspace Recovery

from arXiv: Data Structures and Algorithms

Authors: He Jia, Aravindan Vijayaraghavan

The low-degree polynomial framework has been highly successful in predicting computational versus statistical gaps for high-dimensional problems in average-case analysis and machine learning. This success has led to the low-degree conjecture, which posits that this method captures the power and limitations of efficient algorithms for a wide class of high-dimensional statistical problems. We identify a natural and basic hypothesis testing problem in $\mathbb{R}^n$ which is polynomial time solvable, but for which the low-degree polynomial method fails to predict its computational tractability even up to degree $k=n^{Ω(1)}$. Moreover, the low-degree moments match exactly up to degree $k=O(\sqrt{\log n/\log\log n})$. Our problem is a special case of the well-studied robust subspace recovery problem. The lower bounds suggest that there is no polynomial time algorithm for this problem. In contrast, we give a simple and robust polynomial time algorithm that solves the problem (and noisy variants of it), leveraging anti-concentration properties of the distribution. Our results suggest that the low-degree method and low-degree moments fail to capture algorithms based on anti-concentration, challenging their universality as a predictor of computational barriers.

Authors: He Jia, Aravindan Vijayaraghavan

The low-degree polynomial framework has been highly successful in predicting computational versus statistical gaps for high-dimensional problems in average-case analysis and machine learning. This success has led to the low-degree conjecture, which posits that this method captures the power and limitations of efficient algorithms for a wide class of high-dimensional statistical problems. We identify a natural and basic hypothesis testing problem in $\mathbb{R}^n$ which is polynomial time solvable, but for which the low-degree polynomial method fails to predict its computational tractability even up to degree $k=n^{Ω(1)}$. Moreover, the low-degree moments match exactly up to degree $k=O(\sqrt{\log n/\log\log n})$. Our problem is a special case of the well-studied robust subspace recovery problem. The lower bounds suggest that there is no polynomial time algorithm for this problem. In contrast, we give a simple and robust polynomial time algorithm that solves the problem (and noisy variants of it), leveraging anti-concentration properties of the distribution. Our results suggest that the low-degree method and low-degree moments fail to capture algorithms based on anti-concentration, challenging their universality as a predictor of computational barriers.

An Improved Combinatorial Algorithm for Edge-Colored Clustering in Hypergraphs

from arXiv: Data Structures and Algorithms

Authors: Seongjune Han, Nate Veldt

Many complex systems and datasets are characterized by multiway interactions of different categories, and can be modeled as edge-colored hypergraphs. We focus on clustering such datasets using the NP-hard edge-colored clustering problem, where the goal is to assign colors to nodes in such a way that node colors tend to match edge colors. A key focus in prior work has been to develop approximation algorithms for the problem that are combinatorial and easier to scale. In this paper, we present the first combinatorial approximation algorithm with an approximation factor better than 2.

Authors: Seongjune Han, Nate Veldt

Many complex systems and datasets are characterized by multiway interactions of different categories, and can be modeled as edge-colored hypergraphs. We focus on clustering such datasets using the NP-hard edge-colored clustering problem, where the goal is to assign colors to nodes in such a way that node colors tend to match edge colors. A key focus in prior work has been to develop approximation algorithms for the problem that are combinatorial and easier to scale. In this paper, we present the first combinatorial approximation algorithm with an approximation factor better than 2.

Efficient Dynamic Algorithms to Predict Short Races

from arXiv: Data Structures and Algorithms

Authors: Minjian Zhang, Mahesh Viswanathan

We introduce and study the problem of detecting short races in an observed trace. Specifically, for a race type $R$, given a trace $σ$ and window size $w$, the task is to determine whether there exists an $R$-race $(e_1, e_2)$ in $σ$ such that the subtrace starting with $e_1$ and ending with $e_2$ contains at most $w$ events. We present a monitoring framework for short-race prediction and instantiate the framework for happens-before and sync-preserving races, yielding efficient detection algorithms. Our happens-before algorithm runs in the same time as FastTrack but uses space that scales with $\log w$ as opposed to $\log |σ|$. For sync-preserving races, our algorithm runs faster and consumes significantly less space than SyncP. Our experiments validate the effectiveness of these short-race detection algorithms: they run more efficiently, use less memory, and detect significantly more races under the same budget, offering a reasonable balance between resource usage and predictive power.

Authors: Minjian Zhang, Mahesh Viswanathan

We introduce and study the problem of detecting short races in an observed trace. Specifically, for a race type $R$, given a trace $σ$ and window size $w$, the task is to determine whether there exists an $R$-race $(e_1, e_2)$ in $σ$ such that the subtrace starting with $e_1$ and ending with $e_2$ contains at most $w$ events. We present a monitoring framework for short-race prediction and instantiate the framework for happens-before and sync-preserving races, yielding efficient detection algorithms. Our happens-before algorithm runs in the same time as FastTrack but uses space that scales with $\log w$ as opposed to $\log |σ|$. For sync-preserving races, our algorithm runs faster and consumes significantly less space than SyncP. Our experiments validate the effectiveness of these short-race detection algorithms: they run more efficiently, use less memory, and detect significantly more races under the same budget, offering a reasonable balance between resource usage and predictive power.

A simple Path-based LP Relaxation for Directed Steiner Tree

from arXiv: Data Structures and Algorithms

Authors: Kanstantsin Pashkovich, Marta Pozzi, Laura Sanità

We study the Directed Steiner Tree (DST) problem in layered graphs through a simple path-based linear programming relaxation. This relaxation achieves an integrality gap of O(l log k), where k is the number of terminals and l is the number of layers, which matches the best known bounds for DST previously obtained via lift-and-project hierarchies. Our formulation bypasses hierarchy machinery, offering a more transparent route to the state-of-the-art bound, and it can be exploited to provide an alternative simpler proof that O(l) rounds of the Sherali-Adams hierarchy suffice for reducing the integrality gap on layered instances of DST.

Authors: Kanstantsin Pashkovich, Marta Pozzi, Laura Sanità

We study the Directed Steiner Tree (DST) problem in layered graphs through a simple path-based linear programming relaxation. This relaxation achieves an integrality gap of O(l log k), where k is the number of terminals and l is the number of layers, which matches the best known bounds for DST previously obtained via lift-and-project hierarchies. Our formulation bypasses hierarchy machinery, offering a more transparent route to the state-of-the-art bound, and it can be exploited to provide an alternative simpler proof that O(l) rounds of the Sherali-Adams hierarchy suffice for reducing the integrality gap on layered instances of DST.

Deterministic Edge Coloring with few Colors in CONGEST

from arXiv: Data Structures and Algorithms

Authors: Joakim Blikstad, Yannic Maus, Tijn de Vos

As the main contribution of this work we present deterministic edge coloring algorithms in the CONGEST model. In particular, we present an algorithm that edge colors any $n$-node graph with maximum degree $Δ$ with with $(1+\varepsilon)Δ+O(\sqrt{\log n})$ colors in $\tilde{O}(\log^{2.5} n+\log^2 Δ\log n)$ rounds. This brings the upper bound polynomially close to the lower bound of $Ω(\log n/\log\log n)$ rounds that also holds in the more powerful LOCAL model [Chang, He, Li, Pettie, Uitto; SODA'18]. As long as $Δ\geq c\sqrt{\log n}$ our algorithm uses fewer than $2Δ-1$ colors and to the best of our knowledge is the first polylogarithmic-round CONGEST algorithm achieving this for any range of $Δ$. As a corollary we also improve the complexity of edge coloring with $2Δ-1$ colors for all ranges of $Δ$ to $\tilde{O}(\log^{2.5} n+\log^2 Δ\log n)$. This improves upon the previous $O(\log^8 n)$-round algorithm from [Fischer, Ghaffari, Kuhn; FOCS'17]. Our approach builds on a refined analysis and extension of the online edge-coloring algorithm of Blikstad, Svensson, Vintan, and Wajc [FOCS'25], and more broadly on new connections between online and distributed graph algorithms. We show that their algorithm exhibits very low locality and, if it can additionally have limited local access to future edges (as distributed algorithms can), it can be derandomized for smaller degrees. Under this additional power, we are able to bypass classical online lower bounds and translate the results to efficient distributed algorithms. This leads to our CONGEST algorithm for $(1+\varepsilon)Δ+O(\sqrt{\log n})$-edge coloring. Since the modified online algorithm can be implemented more efficiently in the LOCAL model, we also obtain (marginally) improved complexity bounds in that model.

Authors: Joakim Blikstad, Yannic Maus, Tijn de Vos

As the main contribution of this work we present deterministic edge coloring algorithms in the CONGEST model. In particular, we present an algorithm that edge colors any $n$-node graph with maximum degree $Δ$ with with $(1+\varepsilon)Δ+O(\sqrt{\log n})$ colors in $\tilde{O}(\log^{2.5} n+\log^2 Δ\log n)$ rounds. This brings the upper bound polynomially close to the lower bound of $Ω(\log n/\log\log n)$ rounds that also holds in the more powerful LOCAL model [Chang, He, Li, Pettie, Uitto; SODA'18]. As long as $Δ\geq c\sqrt{\log n}$ our algorithm uses fewer than $2Δ-1$ colors and to the best of our knowledge is the first polylogarithmic-round CONGEST algorithm achieving this for any range of $Δ$. As a corollary we also improve the complexity of edge coloring with $2Δ-1$ colors for all ranges of $Δ$ to $\tilde{O}(\log^{2.5} n+\log^2 Δ\log n)$. This improves upon the previous $O(\log^8 n)$-round algorithm from [Fischer, Ghaffari, Kuhn; FOCS'17]. Our approach builds on a refined analysis and extension of the online edge-coloring algorithm of Blikstad, Svensson, Vintan, and Wajc [FOCS'25], and more broadly on new connections between online and distributed graph algorithms. We show that their algorithm exhibits very low locality and, if it can additionally have limited local access to future edges (as distributed algorithms can), it can be derandomized for smaller degrees. Under this additional power, we are able to bypass classical online lower bounds and translate the results to efficient distributed algorithms. This leads to our CONGEST algorithm for $(1+\varepsilon)Δ+O(\sqrt{\log n})$-edge coloring. Since the modified online algorithm can be implemented more efficiently in the LOCAL model, we also obtain (marginally) improved complexity bounds in that model.