Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Tuesday, April 01

ICE-TCS seminar by Benjamin Moore on "Smoothed analysis for graph isomorphism"

from Luca Aceto

Today, the ICE-TCS seminar series at Reykjavik University hosted a talk by Benjamin Moore (Institute of Science and Technology Austria) who is visiting our postdoctoral researcher Nicolaos Matsakis. 

Benjamin presented the main results in his paper "Smoothed analysis for graph isomorphism", coauthored with his ISTA colleagues Michael Anastos and Matthew Kwan. (In passing, I just saw that Matthew Kwan received the main prize of the Austrian Mathematical Society last year. Congratulations!) 

To my mind, Benjamin did an excellent job in presenting the context for their exciting (but very technical) contribution and the main ideas that underlie it. Kudos! The work by Benjamin and his collaborators provides another explanation of the effectiveness of the colour refinement algorithm (also known as the one-dimensional Weisfeiler-Leman algorithm) in checking whether two graphs are isomorphic. I encourage you to read at least the introduction of their paper, which will be presented at STOC 2025, and the ISTA news article here, which does a much better job at putting their work in context than an interested, but ignorant, observer like me ever could. FWIW, I find results like theirs, which offer some explanation as to why theoretically hard problems are seemingly easy in practice, fascinating and I feel like that paper might be a strong candidate for a best paper award. 

It was also fitting to see recent work on smoothed analysis being presented at our seminar series since Daniel Spielman and Shang-Hua Teng received the 2008 Gödel Prize at ICALP 2008, which was held at Reykjavik University. Time flies, but great work is timeless. 


By Luca Aceto

Today, the ICE-TCS seminar series at Reykjavik University hosted a talk by Benjamin Moore (Institute of Science and Technology Austria) who is visiting our postdoctoral researcher Nicolaos Matsakis

Benjamin presented the main results in his paper "Smoothed analysis for graph isomorphism", coauthored with his ISTA colleagues Michael Anastos and Matthew Kwan. (In passing, I just saw that Matthew Kwan received the main prize of the Austrian Mathematical Society last year. Congratulations!) 

To my mind, Benjamin did an excellent job in presenting the context for their exciting (but very technical) contribution and the main ideas that underlie it. Kudos! The work by Benjamin and his collaborators provides another explanation of the effectiveness of the colour refinement algorithm (also known as the one-dimensional Weisfeiler-Leman algorithm) in checking whether two graphs are isomorphic. I encourage you to read at least the introduction of their paper, which will be presented at STOC 2025, and the ISTA news article here, which does a much better job at putting their work in context than an interested, but ignorant, observer like me ever could. FWIW, I find results like theirs, which offer some explanation as to why theoretically hard problems are seemingly easy in practice, fascinating and I feel like that paper might be a strong candidate for a best paper award. 

It was also fitting to see recent work on smoothed analysis being presented at our seminar series since Daniel Spielman and Shang-Hua Teng received the 2008 Gödel Prize at ICALP 2008, which was held at Reykjavik University. Time flies, but great work is timeless. 


By Luca Aceto

PDQ Shor (?-2025)

from Computational Complexity

♦ PDQ Shor

PDQ Shor, Peter Shor's smarter brother, passed away last week. PDQ was a Physicist/Computer Scientist/Mathematician/Astrologer/Psychic at the University of Northern South Dakota in Wakpala.

Dr. Phineas Dominic Quincy Shor III, PhD, MBA, BLT, received his education at Europa U. during one of his many alien abductions. He ended up in South Dakota after having fled every other state.

He was most famous for the concept of unnatural proofs, collected in his anthology Proofs from the Other Book, which includes his classic "interpretive dance proof" of the Pythagorean theorem. Critics complain the proof only handles the case where the right angle is on the left.

His follow up book, Proofs from the Crypt, contains his masterpiece, a 1237 page proof that conclusively shows that the empty set contains no irrational numbers.

Like his brother he's moved to the quantum space, reverse engineering Peter's work by giving a doubly exponential time quantum algorithm for multiplying numbers. He created the innovative dipping bird quantum error collection machine that constantly monitors a quantum machine collapsing all entanglement. Apple bought the device for $327 million which immediately destroyed their plans for a QiPhone.
PDQ used the proceeds to create the perpetual Turing machine, guaranteed to never halt. Until it did.

Sadly PDQ passed away from paranormal causes last week. Or was it last year? No one is quite sure. He donated his body to pseudoscience, currently lying in state in an undisclosed state. We hardly knew you.

With apologies to Peter Schickele. This April Fools post was inspired by the complexity class PDQMA.

By Lance Fortnow

PDQ Shor

PDQ Shor, Peter Shor's smarter brother, passed away last week. PDQ was a Physicist/Computer Scientist/Mathematician/Astrologer/Psychic at the University of Northern South Dakota in Wakpala.

Dr. Phineas Dominic Quincy Shor III, PhD, MBA, BLT, received his education at Europa U. during one of his many alien abductions. He ended up in South Dakota after having fled every other state.

He was most famous for the concept of unnatural proofs, collected in his anthology Proofs from the Other Book, which includes his classic "interpretive dance proof" of the Pythagorean theorem. Critics complain the proof only handles the case where the right angle is on the left.

His follow up book, Proofs from the Crypt, contains his masterpiece, a 1237 page proof that conclusively shows that the empty set contains no irrational numbers.

Like his brother he's moved to the quantum space, reverse engineering Peter's work by giving a doubly exponential time quantum algorithm for multiplying numbers. He created the innovative dipping bird quantum error collection machine that constantly monitors a quantum machine collapsing all entanglement. Apple bought the device for $327 million which immediately destroyed their plans for a QiPhone.

PDQ used the proceeds to create the perpetual Turing machine, guaranteed to never halt. Until it did.

Sadly PDQ passed away from paranormal causes last week. Or was it last year? No one is quite sure. He donated his body to pseudoscience, currently lying in state in an undisclosed state. We hardly knew you.

With apologies to Peter Schickele. This April Fools post was inspired by the complexity class PDQMA.

By Lance Fortnow

Lifting for Arbitrary Gadgets

from arXiv: Computational Complexity

Authors: Siddharth Iyer

We prove a sensitivity-to-communication lifting theorem for arbitrary gadgets. Given functions $f: \{0,1\}^n\to \{0,1\}$ and $g : \mathcal X\times \mathcal Y\to \{0,1\}$, denote $f\circ g(x,y) := f(g(x_1,y_1),\ldots,g(x_n,y_n))$. We show that for any $f$ with sensitivity $s$ and any $g$, \[D(f\circ g) \geq s\cdot \bigg(\frac{\Omega(D(g))}{\log\mathsf{rk}(g)} - \log\mathsf{rk}(g)\bigg),\] where $D(\cdot)$ denotes the deterministic communication complexity and $\mathsf{rk}(g)$ is the rank of the matrix associated with $g$. As a corollary, we get that if $D(g)$ is a sufficiently large constant, $D(f\circ g) = \Omega(\min\{s,d\}\cdot \sqrt{D(g)})$, where $s$ and $d$ denote the sensitivity and degree of $f$. In particular, computing the OR of $n$ copies of $g$ requires $\Omega(n\cdot\sqrt{D(g)})$ bits.

Authors: Siddharth Iyer

We prove a sensitivity-to-communication lifting theorem for arbitrary gadgets. Given functions $f: \{0,1\}^n\to \{0,1\}$ and $g : \mathcal X\times \mathcal Y\to \{0,1\}$, denote $f\circ g(x,y) := f(g(x_1,y_1),\ldots,g(x_n,y_n))$. We show that for any $f$ with sensitivity $s$ and any $g$, \[D(f\circ g) \geq s\cdot \bigg(\frac{\Omega(D(g))}{\log\mathsf{rk}(g)} - \log\mathsf{rk}(g)\bigg),\] where $D(\cdot)$ denotes the deterministic communication complexity and $\mathsf{rk}(g)$ is the rank of the matrix associated with $g$. As a corollary, we get that if $D(g)$ is a sufficiently large constant, $D(f\circ g) = \Omega(\min\{s,d\}\cdot \sqrt{D(g)})$, where $s$ and $d$ denote the sensitivity and degree of $f$. In particular, computing the OR of $n$ copies of $g$ requires $\Omega(n\cdot\sqrt{D(g)})$ bits.

$\mathsf{P}$-completeness of Graph Local Complementation

from arXiv: Computational Complexity

Authors: Pablo Concha-Vega

Local complementation of a graph $G$ on vertex $v$ is an operation that results in a new graph $G*v$, where the neighborhood of $v$ is complemented. This operation has been widely studied in graph theory and quantum computing. This article introduces the Local Complementation Problem, a decision problem that captures the complexity of applying a sequence of local complementations. Given a graph $G$, a sequence of vertices $s$, and a pair of vertices $u,v$, the problem asks whether the edge $(u,v)$ is present in the graph obtained after applying local complementations according to $s$. The main contribution of this work is proving that this problem is $\mathsf{P}$-complete, implying that computing a sequence of local complementation is unlikely to be efficiently parallelizable. The proof is based on a reduction from the Circuit Value Problem, a well-known $\mathsf{P}$-complete problem, by simulating circuits through local complementations. Aditionally, the complexity of this problem is analyzed under different restrictions. In particular, it is shown that for complete and star graphs, the problem belongs to $\mathsf{LOGSPACE}$. Finally, it is conjectured that the problem remains $\mathsf{P}$-complete for the class of circle graphs.

Authors: Pablo Concha-Vega

Local complementation of a graph $G$ on vertex $v$ is an operation that results in a new graph $G*v$, where the neighborhood of $v$ is complemented. This operation has been widely studied in graph theory and quantum computing. This article introduces the Local Complementation Problem, a decision problem that captures the complexity of applying a sequence of local complementations. Given a graph $G$, a sequence of vertices $s$, and a pair of vertices $u,v$, the problem asks whether the edge $(u,v)$ is present in the graph obtained after applying local complementations according to $s$. The main contribution of this work is proving that this problem is $\mathsf{P}$-complete, implying that computing a sequence of local complementation is unlikely to be efficiently parallelizable. The proof is based on a reduction from the Circuit Value Problem, a well-known $\mathsf{P}$-complete problem, by simulating circuits through local complementations. Aditionally, the complexity of this problem is analyzed under different restrictions. In particular, it is shown that for complete and star graphs, the problem belongs to $\mathsf{LOGSPACE}$. Finally, it is conjectured that the problem remains $\mathsf{P}$-complete for the class of circle graphs.

Simple general magnification of circuit lower bounds

from arXiv: Computational Complexity

Authors: Albert Atserias, Moritz Müller

We construct so-called distinguishers, sparse matrices that retain some properties of error correcting codes. They provide a technically and conceptually simple approach to magnification. We generalize and strengthen known general (not problem specific) magnification results and in particular achieve magnification thresholds below known lower bounds. For example, we show that fixed polynomial formula size lower bounds for NP are implied by slightly superlinear formula size lower bounds for approximating any sufficiently sparse problem in NP. We also show that the thresholds achieved are sharp. Additionally, our approach yields a uniform magnification result for the minimum circuit size problem. This seems to sidestep the localization barrier.

Authors: Albert Atserias, Moritz Müller

We construct so-called distinguishers, sparse matrices that retain some properties of error correcting codes. They provide a technically and conceptually simple approach to magnification. We generalize and strengthen known general (not problem specific) magnification results and in particular achieve magnification thresholds below known lower bounds. For example, we show that fixed polynomial formula size lower bounds for NP are implied by slightly superlinear formula size lower bounds for approximating any sufficiently sparse problem in NP. We also show that the thresholds achieved are sharp. Additionally, our approach yields a uniform magnification result for the minimum circuit size problem. This seems to sidestep the localization barrier.

On the Quantum Chromatic Gap

from arXiv: Computational Complexity

Authors: Lorenzo Ciardo

The largest known gap between quantum and classical chromatic number of graphs, obtained via quantum protocols for colouring Hadamard graphs based on the Deutsch--Jozsa algorithm and the quantum Fourier transform, is exponential. We put forth a quantum pseudo-telepathy version of Khot's $d$-to-$1$ Games Conjecture and prove that, conditional to its validity, the gap is unbounded: There exist graphs whose quantum chromatic number is $3$ and whose classical chromatic number is arbitrarily large. Furthermore, we show that the existence of a certain form of pseudo-telepathic XOR games would imply the conjecture and, thus, the unboundedness of the quantum chromatic gap. As two technical steps of our proof that might be of independent interest, we establish a quantum adjunction theorem for Pultr functors between categories of relational structures, and we prove that the Dinur--Khot--Kindler--Minzer--Safra reduction, recently used for proving the $2$-to-$2$ Games Theorem, is quantum complete.

Authors: Lorenzo Ciardo

The largest known gap between quantum and classical chromatic number of graphs, obtained via quantum protocols for colouring Hadamard graphs based on the Deutsch--Jozsa algorithm and the quantum Fourier transform, is exponential. We put forth a quantum pseudo-telepathy version of Khot's $d$-to-$1$ Games Conjecture and prove that, conditional to its validity, the gap is unbounded: There exist graphs whose quantum chromatic number is $3$ and whose classical chromatic number is arbitrarily large. Furthermore, we show that the existence of a certain form of pseudo-telepathic XOR games would imply the conjecture and, thus, the unboundedness of the quantum chromatic gap. As two technical steps of our proof that might be of independent interest, we establish a quantum adjunction theorem for Pultr functors between categories of relational structures, and we prove that the Dinur--Khot--Kindler--Minzer--Safra reduction, recently used for proving the $2$-to-$2$ Games Theorem, is quantum complete.

Classical Simulation of Quantum CSP Strategies

from arXiv: Computational Complexity

Authors: Demian Banakh, Lorenzo Ciardo, Marcin Kozik, Jan Tułowiecki

We prove that any perfect quantum strategy for the two-prover game encoding a constraint satisfaction problem (CSP) can be simulated via a perfect classical strategy with an extra classical communication channel, whose size depends only on $(i)$ the size of the shared quantum system used in the quantum strategy, and $(ii)$ structural parameters of the CSP template. The result is obtained via a combinatorial characterisation of perfect classical strategies with extra communication channels and a geometric rounding procedure for the projection-valued measurements involved in quantum strategies. A key intermediate step of our proof is to establish that the gap between the classical chromatic number of graphs and its quantum variant is bounded when the quantum strategy involves shared quantum information of bounded size.

Authors: Demian Banakh, Lorenzo Ciardo, Marcin Kozik, Jan Tułowiecki

We prove that any perfect quantum strategy for the two-prover game encoding a constraint satisfaction problem (CSP) can be simulated via a perfect classical strategy with an extra classical communication channel, whose size depends only on $(i)$ the size of the shared quantum system used in the quantum strategy, and $(ii)$ structural parameters of the CSP template. The result is obtained via a combinatorial characterisation of perfect classical strategies with extra communication channels and a geometric rounding procedure for the projection-valued measurements involved in quantum strategies. A key intermediate step of our proof is to establish that the gap between the classical chromatic number of graphs and its quantum variant is bounded when the quantum strategy involves shared quantum information of bounded size.

On the difficulty of order constrained pattern matching with applications to feature matching based malware detection

from arXiv: Computational Complexity

Authors: Adiesha Liyanage, Braeden Sopp, Binhai Zhu

We formulate low-level malware detection using algorithms based on feature matching as Order-based Malware Detection with Critical Instructions (General-OMDCI): given a pattern in the form of a sequence \(M\) of colored blocks, where each block contains a critical character (representing a unique sequence of critical instructions potentially associated with malware but without certainty), and a program \(A\), represented as a sequence of \(n\) colored blocks with critical characters, the goal is to find two subsequences, \(M'\) of \(M\) and \(A'\) of \(A\), with blocks matching in color and whose critical characters form a permutation of each other. When $M$ is a permutation in both colors and critical characters the problem is called OMDCI. If we additionally require $M'=M$, then the problem is called OMDCI+; if in this case $d=|M|$ is used as a parameter, then the OMDCI+ problem is easily shown to be FPT. Our main (negative) results are on the cases when $|M|$ is arbitrary and are summarized as follows: OMDCI+ is NP-complete, which implies OMDCI is also NP-complete. For the special case of OMDCI, deciding if the optimal solution has length $0$ (i.e., deciding if no part of \(M\) appears in \(A\)) is co-NP-hard. As a result, the OMDCI problem does not admit an FPT algorithm unless P=co-NP. In summary, our results imply that using algorithms based on feature matching to identify malware or determine the absence of malware in a given low-level program are both hard.

Authors: Adiesha Liyanage, Braeden Sopp, Binhai Zhu

We formulate low-level malware detection using algorithms based on feature matching as Order-based Malware Detection with Critical Instructions (General-OMDCI): given a pattern in the form of a sequence \(M\) of colored blocks, where each block contains a critical character (representing a unique sequence of critical instructions potentially associated with malware but without certainty), and a program \(A\), represented as a sequence of \(n\) colored blocks with critical characters, the goal is to find two subsequences, \(M'\) of \(M\) and \(A'\) of \(A\), with blocks matching in color and whose critical characters form a permutation of each other. When $M$ is a permutation in both colors and critical characters the problem is called OMDCI. If we additionally require $M'=M$, then the problem is called OMDCI+; if in this case $d=|M|$ is used as a parameter, then the OMDCI+ problem is easily shown to be FPT. Our main (negative) results are on the cases when $|M|$ is arbitrary and are summarized as follows: OMDCI+ is NP-complete, which implies OMDCI is also NP-complete. For the special case of OMDCI, deciding if the optimal solution has length $0$ (i.e., deciding if no part of \(M\) appears in \(A\)) is co-NP-hard. As a result, the OMDCI problem does not admit an FPT algorithm unless P=co-NP. In summary, our results imply that using algorithms based on feature matching to identify malware or determine the absence of malware in a given low-level program are both hard.

Disjunctive Complexity

from arXiv: Computational Complexity

Authors: Nikita Ivanov, Alexander Rubtsov, Michael Vyalyi

A recently introduced measure of Boolean functions complexity--disjunc\-tive complexity (DC)--is compared with other complexity measures: the space complexity of streaming algorithms and the complexity of nondeterministic branching programs (NBP). We show that DC is incomparable with NBP. Specifically, we present a function that has low NBP but has subexponential DC. Conversely, we provide arguments based on computational complexity conjectures to show that DC can superpolynomially exceed NBP in certain cases. Additionally, we prove that the monotone version of NBP complexity is strictly weaker than DC. We prove that the space complexity of one-pass streaming algorithms is strictly weaker than DC. Furthermore, we introduce a generalization of streaming algorithms that captures the full power of DC. This generalization can be expressed in terms of nondeterministic algorithms that irreversibly write 1s to entries of a Boolean vector (i.e., changes from 1 to 0 are not allowed). Finally, we discuss an unusual phenomenon in disjunctive complexity: the existence of uniformly hard functions. These functions exhibit the property that their disjunctive complexity is maximized, and this property extends to all functions dominated by them.

Authors: Nikita Ivanov, Alexander Rubtsov, Michael Vyalyi

A recently introduced measure of Boolean functions complexity--disjunc\-tive complexity (DC)--is compared with other complexity measures: the space complexity of streaming algorithms and the complexity of nondeterministic branching programs (NBP). We show that DC is incomparable with NBP. Specifically, we present a function that has low NBP but has subexponential DC. Conversely, we provide arguments based on computational complexity conjectures to show that DC can superpolynomially exceed NBP in certain cases. Additionally, we prove that the monotone version of NBP complexity is strictly weaker than DC. We prove that the space complexity of one-pass streaming algorithms is strictly weaker than DC. Furthermore, we introduce a generalization of streaming algorithms that captures the full power of DC. This generalization can be expressed in terms of nondeterministic algorithms that irreversibly write 1s to entries of a Boolean vector (i.e., changes from 1 to 0 are not allowed). Finally, we discuss an unusual phenomenon in disjunctive complexity: the existence of uniformly hard functions. These functions exhibit the property that their disjunctive complexity is maximized, and this property extends to all functions dominated by them.

Integer multiplication is at least as hard as matrix transposition

from arXiv: Computational Complexity

Authors: David Harvey, Joris van der Hoeven

Working in the multitape Turing model, we show how to reduce the problem of matrix transposition to the problem of integer multiplication. If transposing an $n \times n$ binary matrix requires $\Omega(n^2 \log n)$ steps on a Turing machine, then our reduction implies that multiplying $n$-bit integers requires $\Omega(n \log n)$ steps. In other words, if matrix transposition is as hard as expected, then integer multiplication is also as hard as expected.

Authors: David Harvey, Joris van der Hoeven

Working in the multitape Turing model, we show how to reduce the problem of matrix transposition to the problem of integer multiplication. If transposing an $n \times n$ binary matrix requires $\Omega(n^2 \log n)$ steps on a Turing machine, then our reduction implies that multiplying $n$-bit integers requires $\Omega(n \log n)$ steps. In other words, if matrix transposition is as hard as expected, then integer multiplication is also as hard as expected.

The Stamp Folding Problem From a Mountain-Valley Perspective: Enumerations and Bounds

from arXiv: Computational Geometry

Authors: Thomas C. Hull, Adham Ibrahim, Jacob Paltrowitz, Natalya Ter-Saakov, Grace Wang

A strip of square stamps can be folded in many ways such that all of the stamps are stacked in a single pile in the folded state. The stamp folding problem asks for the number of such foldings and has previously been studied extensively. We consider this problem with the additional restriction of fixing the mountain-valley assignment of each crease in the stamp pattern. We provide a closed form for counting the number of legal foldings on specific patterns of mountain-valley assignments, including a surprising appearance of the Catalan numbers. We construct upper and lower bounds for the number of ways to fold a given mountain-valley assignment on the strip of stamps. Lastly, we provide experimental evidence towards more general results.

Authors: Thomas C. Hull, Adham Ibrahim, Jacob Paltrowitz, Natalya Ter-Saakov, Grace Wang

A strip of square stamps can be folded in many ways such that all of the stamps are stacked in a single pile in the folded state. The stamp folding problem asks for the number of such foldings and has previously been studied extensively. We consider this problem with the additional restriction of fixing the mountain-valley assignment of each crease in the stamp pattern. We provide a closed form for counting the number of legal foldings on specific patterns of mountain-valley assignments, including a surprising appearance of the Catalan numbers. We construct upper and lower bounds for the number of ways to fold a given mountain-valley assignment on the strip of stamps. Lastly, we provide experimental evidence towards more general results.

Simplification of Trajectory Streams

from arXiv: Computational Geometry

Authors: Siu-Wing Cheng, Haoqiang Huang, Le Jiang

While there are software systems that simplify trajectory streams on the fly, few curve simplification algorithms with quality guarantees fit the streaming requirements. We present streaming algorithms for two such problems under the Fr\'{e}chet distance $d_F$ in $\mathbb{R}^d$ for some constant $d \geq 2$. Consider a polygonal curve $\tau$ in $\mathbb{R}^d$ in a stream. We present a streaming algorithm that, for any $\varepsilon\in (0,1)$ and $\delta > 0$, produces a curve $\sigma$ such that $d_F(\sigma,\tau[v_1,v_i])\le (1+\varepsilon)\delta$ and $|\sigma|\le 2\,\mathrm{opt}-2$, where $\tau[v_1,v_i]$ is the prefix in the stream so far, and $\mathrm{opt} = \min\{|\sigma'|: d_F(\sigma',\tau[v_1,v_i])\le \delta\}$. Let $\alpha = 2(d-1){\lfloor d/2 \rfloor}^2 + d$. The working storage is $O(\varepsilon^{-\alpha})$. Each vertex is processed in $O(\varepsilon^{-\alpha}\log\frac{1}{\varepsilon})$ time for $d \in \{2,3\}$ and $O(\varepsilon^{-\alpha})$ time for $d \geq 4$ . Thus, the whole $\tau$ can be simplified in $O(\varepsilon^{-\alpha}|\tau|\log\frac{1}{\varepsilon})$ time. Ignoring polynomial factors in $1/\varepsilon$, this running time is a factor $|\tau|$ faster than the best static algorithm that offers the same guarantees. We present another streaming algorithm that, for any integer $k \geq 2$ and any $\varepsilon \in (0,\frac{1}{17})$, maintains a curve $\sigma$ such that $|\sigma| \leq 2k-2$ and $d_F(\sigma,\tau[v_1,v_i])\le (1+\varepsilon) \cdot \min\{d_F(\sigma',\tau[v_1,v_i]): |\sigma'| \leq k\}$, where $\tau[v_1,v_i]$ is the prefix in the stream so far. The working storage is $O((k\varepsilon^{-1}+\varepsilon^{-(\alpha+1)})\log \frac{1}{\varepsilon})$. Each vertex is processed in $O(k\varepsilon^{-(\alpha+1)}\log^2\frac{1}{\varepsilon})$ time for $d \in \{2,3\}$ and $O(k\varepsilon^{-(\alpha+1)}\log\frac{1}{\varepsilon})$ time for $d \geq 4$.

Authors: Siu-Wing Cheng, Haoqiang Huang, Le Jiang

While there are software systems that simplify trajectory streams on the fly, few curve simplification algorithms with quality guarantees fit the streaming requirements. We present streaming algorithms for two such problems under the Fr\'{e}chet distance $d_F$ in $\mathbb{R}^d$ for some constant $d \geq 2$. Consider a polygonal curve $\tau$ in $\mathbb{R}^d$ in a stream. We present a streaming algorithm that, for any $\varepsilon\in (0,1)$ and $\delta > 0$, produces a curve $\sigma$ such that $d_F(\sigma,\tau[v_1,v_i])\le (1+\varepsilon)\delta$ and $|\sigma|\le 2\,\mathrm{opt}-2$, where $\tau[v_1,v_i]$ is the prefix in the stream so far, and $\mathrm{opt} = \min\{|\sigma'|: d_F(\sigma',\tau[v_1,v_i])\le \delta\}$. Let $\alpha = 2(d-1){\lfloor d/2 \rfloor}^2 + d$. The working storage is $O(\varepsilon^{-\alpha})$. Each vertex is processed in $O(\varepsilon^{-\alpha}\log\frac{1}{\varepsilon})$ time for $d \in \{2,3\}$ and $O(\varepsilon^{-\alpha})$ time for $d \geq 4$ . Thus, the whole $\tau$ can be simplified in $O(\varepsilon^{-\alpha}|\tau|\log\frac{1}{\varepsilon})$ time. Ignoring polynomial factors in $1/\varepsilon$, this running time is a factor $|\tau|$ faster than the best static algorithm that offers the same guarantees. We present another streaming algorithm that, for any integer $k \geq 2$ and any $\varepsilon \in (0,\frac{1}{17})$, maintains a curve $\sigma$ such that $|\sigma| \leq 2k-2$ and $d_F(\sigma,\tau[v_1,v_i])\le (1+\varepsilon) \cdot \min\{d_F(\sigma',\tau[v_1,v_i]): |\sigma'| \leq k\}$, where $\tau[v_1,v_i]$ is the prefix in the stream so far. The working storage is $O((k\varepsilon^{-1}+\varepsilon^{-(\alpha+1)})\log \frac{1}{\varepsilon})$. Each vertex is processed in $O(k\varepsilon^{-(\alpha+1)}\log^2\frac{1}{\varepsilon})$ time for $d \in \{2,3\}$ and $O(k\varepsilon^{-(\alpha+1)}\log\frac{1}{\varepsilon})$ time for $d \geq 4$.

Multi-Pass Streaming Lower Bounds for Approximating Max-Cut

from arXiv: Data Structures and Algorithms

Authors: Yumou Fei, Dor Minzer, Shuo Wang

In the Max-Cut problem in the streaming model, an algorithm is given the edges of an unknown graph $G = (V,E)$ in some fixed order, and its goal is to approximate the size of the largest cut in $G$. Improving upon an earlier result of Kapralov, Khanna and Sudan, it was shown by Kapralov and Krachun that for all $\varepsilon>0$, no $o(n)$ memory streaming algorithm can achieve a $(1/2+\varepsilon)$-approximation for Max-Cut. Their result holds for single-pass streams, i.e.~the setting in which the algorithm only views the stream once, and it was open whether multi-pass access may help. The state-of-the-art result along these lines, due to Assadi and N, rules out arbitrarily good approximation algorithms with constantly many passes and $n^{1-\delta}$ space for any $\delta>0$. We improve upon this state-of-the-art result, showing that any non-trivial approximation algorithm for Max-Cut requires either polynomially many passes or polynomially large space. More specifically, we show that for all $\varepsilon>0$, a $k$-pass streaming $(1/2+\varepsilon)$-approximation algorithm for Max-Cut requires $\Omega_{\varepsilon}\left(n^{1/3}/k\right)$ space. This result leads to a similar lower bound for the Maximum Directed Cut problem, showing the near optimality of the algorithm of [Saxena, Singer, Sudan, Velusamy, SODA 2025]. Our lower bounds proceed by showing a communication complexity lower bound for the Distributional Implicit Hidden Partition (DIHP) Problem, introduced by Kapralov and Krachun. While a naive application of the discrepancy method fails, we identify a property of protocols called ``globalness'', and show that (1) any protocol for DIHP can be turned into a global protocol, (2) the discrepancy of a global protocol must be small. The second step is the more technically involved step in the argument, and therein we use global hypercontractive inequalities.

Authors: Yumou Fei, Dor Minzer, Shuo Wang

In the Max-Cut problem in the streaming model, an algorithm is given the edges of an unknown graph $G = (V,E)$ in some fixed order, and its goal is to approximate the size of the largest cut in $G$. Improving upon an earlier result of Kapralov, Khanna and Sudan, it was shown by Kapralov and Krachun that for all $\varepsilon>0$, no $o(n)$ memory streaming algorithm can achieve a $(1/2+\varepsilon)$-approximation for Max-Cut. Their result holds for single-pass streams, i.e.~the setting in which the algorithm only views the stream once, and it was open whether multi-pass access may help. The state-of-the-art result along these lines, due to Assadi and N, rules out arbitrarily good approximation algorithms with constantly many passes and $n^{1-\delta}$ space for any $\delta>0$. We improve upon this state-of-the-art result, showing that any non-trivial approximation algorithm for Max-Cut requires either polynomially many passes or polynomially large space. More specifically, we show that for all $\varepsilon>0$, a $k$-pass streaming $(1/2+\varepsilon)$-approximation algorithm for Max-Cut requires $\Omega_{\varepsilon}\left(n^{1/3}/k\right)$ space. This result leads to a similar lower bound for the Maximum Directed Cut problem, showing the near optimality of the algorithm of [Saxena, Singer, Sudan, Velusamy, SODA 2025]. Our lower bounds proceed by showing a communication complexity lower bound for the Distributional Implicit Hidden Partition (DIHP) Problem, introduced by Kapralov and Krachun. While a naive application of the discrepancy method fails, we identify a property of protocols called ``globalness'', and show that (1) any protocol for DIHP can be turned into a global protocol, (2) the discrepancy of a global protocol must be small. The second step is the more technically involved step in the argument, and therein we use global hypercontractive inequalities.

Accelerated Approximate Optimization of Multi-Commodity Flows on Directed Graphs

from arXiv: Data Structures and Algorithms

Authors: Li Chen, Andrei Graur, Aaron Sidford

We provide $m^{1+o(1)}k\epsilon^{-1}$-time algorithms for computing multiplicative $(1 - \epsilon)$-approximate solutions to multi-commodity flow problems with $k$-commodities on $m$-edge directed graphs, including concurrent multi-commodity flow and maximum multi-commodity flow. To obtain our results, we provide new optimization tools of potential independent interest. First, we provide an improved optimization method for solving $\ell_{q, p}$-regression problems to high accuracy. This method makes $\tilde{O}_{q, p}(k)$ queries to a high accuracy convex minimization oracle for an individual block, where $\tilde{O}_{q, p}(\cdot)$ hides factors depending only on $q$, $p$, or $\mathrm{poly}(\log m)$, improving upon the $\tilde{O}_{q, p}(k^2)$ bound of [Chen-Ye, ICALP 2024]. As a result, we obtain the first almost-linear time algorithm that solves $\ell_{q, p}$ flows on directed graphs to high accuracy. Second, we present optimization tools to reduce approximately solving composite $\ell_{1, \infty}$-regression problems to solving $m^{o(1)}\epsilon^{-1}$ instances of composite $\ell_{q, p}$-regression problem. The method builds upon recent advances in solving box-simplex games [Jambulapati-Tian, NeurIPS 2023] and the area convex regularizer introduced in [Sherman, STOC 2017] to obtain faster rates for constrained versions of the problem. Carefully combining these techniques yields our directed multi-commodity flow algorithm.

Authors: Li Chen, Andrei Graur, Aaron Sidford

We provide $m^{1+o(1)}k\epsilon^{-1}$-time algorithms for computing multiplicative $(1 - \epsilon)$-approximate solutions to multi-commodity flow problems with $k$-commodities on $m$-edge directed graphs, including concurrent multi-commodity flow and maximum multi-commodity flow. To obtain our results, we provide new optimization tools of potential independent interest. First, we provide an improved optimization method for solving $\ell_{q, p}$-regression problems to high accuracy. This method makes $\tilde{O}_{q, p}(k)$ queries to a high accuracy convex minimization oracle for an individual block, where $\tilde{O}_{q, p}(\cdot)$ hides factors depending only on $q$, $p$, or $\mathrm{poly}(\log m)$, improving upon the $\tilde{O}_{q, p}(k^2)$ bound of [Chen-Ye, ICALP 2024]. As a result, we obtain the first almost-linear time algorithm that solves $\ell_{q, p}$ flows on directed graphs to high accuracy. Second, we present optimization tools to reduce approximately solving composite $\ell_{1, \infty}$-regression problems to solving $m^{o(1)}\epsilon^{-1}$ instances of composite $\ell_{q, p}$-regression problem. The method builds upon recent advances in solving box-simplex games [Jambulapati-Tian, NeurIPS 2023] and the area convex regularizer introduced in [Sherman, STOC 2017] to obtain faster rates for constrained versions of the problem. Carefully combining these techniques yields our directed multi-commodity flow algorithm.

On Speedups for Convex Optimization via Quantum Dynamics

from arXiv: Data Structures and Algorithms

Authors: Shouvanik Chakrabarti, Dylan Herman, Jacob Watkins, Enrico Fontana, Brandon Augustino, Junhyung Lyle Kim, Marco Pistoia

We explore the potential for quantum speedups in convex optimization using discrete simulations of the Quantum Hamiltonian Descent (QHD) framework, as proposed by Leng et al., and establish the first rigorous query complexity bounds. We develop enhanced analyses for quantum simulation of Schr\"odinger operators with black-box potential via the pseudo-spectral method, providing explicit resource estimates independent of wavefunction assumptions. These bounds are applied to assess the complexity of optimization through QHD. Our findings pertain to unconstrained convex optimization in $d$ dimensions. In continuous time, we demonstrate that QHD, with suitable parameters, can achieve arbitrarily fast convergence rates. The optimization speed limit arises solely from the discretization of the dynamics, mirroring a property of the classical dynamics underlying QHD. Considering this cost, we show that a $G$-Lipschitz convex function can be optimized to an error of $\epsilon$ with $\widetilde{\mathcal{O}}(d^{1.5}G^2 R^2/\epsilon^2)$ queries. Moreover, under reasonable assumptions on the complexity of Hamiltonian simulation, $\widetilde{\Omega}(d/\epsilon^2)$ queries are necessary. Thus, QHD does not offer a speedup over classical zeroth order methods with exact oracles. However, we demonstrate that the QHD algorithm tolerates $\widetilde{\mathcal{O}}(\epsilon^3/d^{1.5}G^2 R^2)$ noise in function evaluation. We show that QHD offers a super-quadratic query advantage over all known classical algorithms tolerating this level of evaluation noise in the high-dimension regime. Additionally, we design a quantum algorithm for stochastic convex optimization that provides a super-quadratic speedup over all known classical algorithms in the high-dimension regime. To our knowledge, these results represent the first rigorous quantum speedups for convex optimization achieved through a dynamical algorithm.

Authors: Shouvanik Chakrabarti, Dylan Herman, Jacob Watkins, Enrico Fontana, Brandon Augustino, Junhyung Lyle Kim, Marco Pistoia

We explore the potential for quantum speedups in convex optimization using discrete simulations of the Quantum Hamiltonian Descent (QHD) framework, as proposed by Leng et al., and establish the first rigorous query complexity bounds. We develop enhanced analyses for quantum simulation of Schr\"odinger operators with black-box potential via the pseudo-spectral method, providing explicit resource estimates independent of wavefunction assumptions. These bounds are applied to assess the complexity of optimization through QHD. Our findings pertain to unconstrained convex optimization in $d$ dimensions. In continuous time, we demonstrate that QHD, with suitable parameters, can achieve arbitrarily fast convergence rates. The optimization speed limit arises solely from the discretization of the dynamics, mirroring a property of the classical dynamics underlying QHD. Considering this cost, we show that a $G$-Lipschitz convex function can be optimized to an error of $\epsilon$ with $\widetilde{\mathcal{O}}(d^{1.5}G^2 R^2/\epsilon^2)$ queries. Moreover, under reasonable assumptions on the complexity of Hamiltonian simulation, $\widetilde{\Omega}(d/\epsilon^2)$ queries are necessary. Thus, QHD does not offer a speedup over classical zeroth order methods with exact oracles. However, we demonstrate that the QHD algorithm tolerates $\widetilde{\mathcal{O}}(\epsilon^3/d^{1.5}G^2 R^2)$ noise in function evaluation. We show that QHD offers a super-quadratic query advantage over all known classical algorithms tolerating this level of evaluation noise in the high-dimension regime. Additionally, we design a quantum algorithm for stochastic convex optimization that provides a super-quadratic speedup over all known classical algorithms in the high-dimension regime. To our knowledge, these results represent the first rigorous quantum speedups for convex optimization achieved through a dynamical algorithm.

Sample-Optimal Private Regression in Polynomial Time

from arXiv: Data Structures and Algorithms

Authors: Prashanti Anderson, Ainesh Bakshi, Mahbod Majid, Stefan Tiegel

We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.

Authors: Prashanti Anderson, Ainesh Bakshi, Mahbod Majid, Stefan Tiegel

We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.

Word Break on SLP-Compressed Texts

from arXiv: Data Structures and Algorithms

Authors: Rajat De, Dominik Kempa

Word Break is a prototypical factorization problem in string processing: Given a word $w$ of length $N$ and a dictionary $\mathcal{D} = \{d_1, d_2, \ldots, d_{K}\}$ of $K$ strings, determine whether we can partition $w$ into words from $\mathcal{D}$. We propose the first algorithm that solves the Word Break problem over the SLP-compressed input text $w$. Specifically, we show that, given the string $w$ represented using an SLP of size $g$, we can solve the Word Break problem in $\mathcal{O}(g \cdot m^{\omega} + M)$ time, where $m = \max_{i=1}^{K} |d_i|$, $M = \sum_{i=1}^{K} |d_i|$, and $\omega \geq 2$ is the matrix multiplication exponent. We obtain our algorithm as a simple corollary of a more general result: We show that in $\mathcal{O}(g \cdot m^{\omega} + M)$ time, we can index the input text $w$ so that solving the Word Break problem for any of its substrings takes $\mathcal{O}(m^2 \log N)$ time (independent of the substring length). Our second contribution is a lower bound: We prove that, unless the Combinatorial $k$-Clique Conjecture fails, there is no combinatorial algorithm for Word Break on SLP-compressed strings running in $\mathcal{O}(g \cdot m^{2-\epsilon} + M)$ time for any $\epsilon > 0$.

Authors: Rajat De, Dominik Kempa

Word Break is a prototypical factorization problem in string processing: Given a word $w$ of length $N$ and a dictionary $\mathcal{D} = \{d_1, d_2, \ldots, d_{K}\}$ of $K$ strings, determine whether we can partition $w$ into words from $\mathcal{D}$. We propose the first algorithm that solves the Word Break problem over the SLP-compressed input text $w$. Specifically, we show that, given the string $w$ represented using an SLP of size $g$, we can solve the Word Break problem in $\mathcal{O}(g \cdot m^{\omega} + M)$ time, where $m = \max_{i=1}^{K} |d_i|$, $M = \sum_{i=1}^{K} |d_i|$, and $\omega \geq 2$ is the matrix multiplication exponent. We obtain our algorithm as a simple corollary of a more general result: We show that in $\mathcal{O}(g \cdot m^{\omega} + M)$ time, we can index the input text $w$ so that solving the Word Break problem for any of its substrings takes $\mathcal{O}(m^2 \log N)$ time (independent of the substring length). Our second contribution is a lower bound: We prove that, unless the Combinatorial $k$-Clique Conjecture fails, there is no combinatorial algorithm for Word Break on SLP-compressed strings running in $\mathcal{O}(g \cdot m^{2-\epsilon} + M)$ time for any $\epsilon > 0$.

Space of Data through the Lens of Multilevel Graph

from arXiv: Data Structures and Algorithms

Authors: Marco Caputo, Michele Russo, Emanuela Merelli

This work seeks to tackle the inherent complexity of dataspaces by introducing a novel data structure that can represent datasets across multiple levels of abstraction, ranging from local to global. We propose the concept of a multilevel graph, which is equipped with two fundamental operations: contraction and expansion of its topology. This multilevel graph is specifically designed to fulfil the requirements for incremental abstraction and flexibility, as outlined in existing definitions of dataspaces. Furthermore, we provide a comprehensive suite of methods for manipulating this graph structure, establishing a robust framework for data analysis. While its effectiveness has been empirically validated for unstructured data, its application to structured data is also inherently viable. Preliminary results are presented through a real-world scenario based on a collection of dream reports.

Authors: Marco Caputo, Michele Russo, Emanuela Merelli

This work seeks to tackle the inherent complexity of dataspaces by introducing a novel data structure that can represent datasets across multiple levels of abstraction, ranging from local to global. We propose the concept of a multilevel graph, which is equipped with two fundamental operations: contraction and expansion of its topology. This multilevel graph is specifically designed to fulfil the requirements for incremental abstraction and flexibility, as outlined in existing definitions of dataspaces. Furthermore, we provide a comprehensive suite of methods for manipulating this graph structure, establishing a robust framework for data analysis. While its effectiveness has been empirically validated for unstructured data, its application to structured data is also inherently viable. Preliminary results are presented through a real-world scenario based on a collection of dream reports.

Network Unreliability in Almost-Linear Time

from arXiv: Data Structures and Algorithms

Authors: Ruoxu Cen, Jason Li, Debmalya Panigrahi

The network unreliability problem asks for the probability that a given undirected graph gets disconnected when every edge independently fails with a given probability $p$. Valiant (1979) showed that this problem is \#P-hard; therefore, the best we can hope for are approximation algorithms. In a classic result, Karger (1995) obtained the first FPTAS for this problem by leveraging the fact that when a graph disconnects, it almost always does so at a near-minimum cut, and there are only a small (polynomial) number of near-minimum cuts. Since then, a series of results have obtained progressively faster algorithms to the current bound of $m^{1+o(1)} + \tilde{O}(n^{3/2})$ (Cen, He, Li, and Panigrahi, 2024). In this paper, we obtain an $m^{1+o(1)}$-time algorithm for the network unreliability problem. This is essentially optimal, since we need $O(m)$ time to read the input graph. Our main new ingredient is relating network unreliability to an {\em ideal} tree packing of spanning trees (Thorup, 2001).

Authors: Ruoxu Cen, Jason Li, Debmalya Panigrahi

The network unreliability problem asks for the probability that a given undirected graph gets disconnected when every edge independently fails with a given probability $p$. Valiant (1979) showed that this problem is \#P-hard; therefore, the best we can hope for are approximation algorithms. In a classic result, Karger (1995) obtained the first FPTAS for this problem by leveraging the fact that when a graph disconnects, it almost always does so at a near-minimum cut, and there are only a small (polynomial) number of near-minimum cuts. Since then, a series of results have obtained progressively faster algorithms to the current bound of $m^{1+o(1)} + \tilde{O}(n^{3/2})$ (Cen, He, Li, and Panigrahi, 2024). In this paper, we obtain an $m^{1+o(1)}$-time algorithm for the network unreliability problem. This is essentially optimal, since we need $O(m)$ time to read the input graph. Our main new ingredient is relating network unreliability to an {\em ideal} tree packing of spanning trees (Thorup, 2001).

Generalized Capacity Planning for the Hospital-Residents Problem

from arXiv: Data Structures and Algorithms

Authors: Haricharan Balasundaram, Girija Limaye, Meghana Nasre, Abhinav Raja

The Hospital Residents setting models important problems like school choice, assignment of undergraduate students to degree programs, among many others. In this setting, fixed quotas are associated with the programs that limit the number of agents that can be assigned to them. Motivated by scenarios where all agents must be matched, we propose and study a generalized capacity planning problem, which allows cost-controlled flexibility with respect to quotas. Our setting is an extension of the Hospital Resident setting where programs have the usual quota as well as an associated cost, indicating the cost of matching an agent beyond the initial quotas. We seek to compute a matching that matches all agents and is optimal with respect to preferences, and minimizes either a local or a global objective on cost. We show that there is a sharp contrast -- minimizing the local objective is polynomial-time solvable, whereas minimizing the global objective is NP-hard. On the positive side, we present approximation algorithms for the global objective in the general case and a particular hard case. We achieve the approximation guarantee for the special hard case via a linear programming based algorithm. We strengthen the NP-hardness by showing a matching lower bound to our algorithmic result.

Authors: Haricharan Balasundaram, Girija Limaye, Meghana Nasre, Abhinav Raja

The Hospital Residents setting models important problems like school choice, assignment of undergraduate students to degree programs, among many others. In this setting, fixed quotas are associated with the programs that limit the number of agents that can be assigned to them. Motivated by scenarios where all agents must be matched, we propose and study a generalized capacity planning problem, which allows cost-controlled flexibility with respect to quotas. Our setting is an extension of the Hospital Resident setting where programs have the usual quota as well as an associated cost, indicating the cost of matching an agent beyond the initial quotas. We seek to compute a matching that matches all agents and is optimal with respect to preferences, and minimizes either a local or a global objective on cost. We show that there is a sharp contrast -- minimizing the local objective is polynomial-time solvable, whereas minimizing the global objective is NP-hard. On the positive side, we present approximation algorithms for the global objective in the general case and a particular hard case. We achieve the approximation guarantee for the special hard case via a linear programming based algorithm. We strengthen the NP-hardness by showing a matching lower bound to our algorithmic result.

Improved algorithms for single machine serial-batch scheduling to minimize makespan and maximum cost

from arXiv: Data Structures and Algorithms

Authors: Shuguang Li, Zhenxin Wen, Jing Wei

This paper studies the bicriteria problem of scheduling $n$ jobs on a serial-batch machine to minimize makespan and maximum cost simultaneously. A serial-batch machine can process up to $b$ jobs as a batch, where $b$ is known as the batch capacity. When a new batch starts, a constant setup time is required for the machine. Within each batch, the jobs are processed sequentially, and thus the processing time of a batch equals the sum of the processing times of its jobs. All the jobs in a batch have the same completion time, namely, the completion time of the batch. The main result is an $O(n^3)$-time algorithm which can generate all Pareto optimal points for the bounded model ($b

Authors: Shuguang Li, Zhenxin Wen, Jing Wei

This paper studies the bicriteria problem of scheduling $n$ jobs on a serial-batch machine to minimize makespan and maximum cost simultaneously. A serial-batch machine can process up to $b$ jobs as a batch, where $b$ is known as the batch capacity. When a new batch starts, a constant setup time is required for the machine. Within each batch, the jobs are processed sequentially, and thus the processing time of a batch equals the sum of the processing times of its jobs. All the jobs in a batch have the same completion time, namely, the completion time of the batch. The main result is an $O(n^3)$-time algorithm which can generate all Pareto optimal points for the bounded model ($b

Wagner's Algorithm Provably Runs in Subexponential Time for SIS$^\infty$

from arXiv: Data Structures and Algorithms

Authors: Léo Ducas, Lynn Engelberts, Johanna Loyer

At CRYPTO 2015, Kirchner and Fouque claimed that a carefully tuned variant of the Blum-Kalai-Wasserman (BKW) algorithm (JACM 2003) should solve the Learning with Errors problem (LWE) in slightly subexponential time for modulus $q=\mathrm{poly}(n)$ and narrow error distribution, when given enough LWE samples. Taking a modular view, one may regard BKW as a combination of Wagner's algorithm (CRYPTO 2002), run over the corresponding dual problem, and the Aharonov-Regev distinguisher (JACM 2005). Hence the subexponential Wagner step alone should be of interest for solving this dual problem - namely, the Short Integer Solution problem (SIS) - but this appears to be undocumented so far. We re-interpret this Wagner step as walking backward through a chain of projected lattices, zigzagging through some auxiliary superlattices. We further randomize the bucketing step using Gaussian randomized rounding to exploit the powerful discrete Gaussian machinery. This approach avoids sample amplification and turns Wagner's algorithm into an approximate discrete Gaussian sampler for $q$-ary lattices. For an SIS lattice with $n$ equations modulo $q$, this algorithm runs in subexponential time $\exp(O(n/\log \log n))$ to reach a Gaussian width parameter $s = q/\mathrm{polylog}(n)$ only requiring $m = n + \omega(n/\log \log n)$ many SIS variables. This directly provides a provable algorithm for solving the Short Integer Solution problem in the infinity norm ($\mathrm{SIS}^\infty$) for norm bounds $\beta = q/\mathrm{polylog}(n)$. This variant of SIS underlies the security of the NIST post-quantum cryptography standard Dilithium. Despite its subexponential complexity, Wagner's algorithm does not appear to threaten Dilithium's concrete security.

Authors: Léo Ducas, Lynn Engelberts, Johanna Loyer

At CRYPTO 2015, Kirchner and Fouque claimed that a carefully tuned variant of the Blum-Kalai-Wasserman (BKW) algorithm (JACM 2003) should solve the Learning with Errors problem (LWE) in slightly subexponential time for modulus $q=\mathrm{poly}(n)$ and narrow error distribution, when given enough LWE samples. Taking a modular view, one may regard BKW as a combination of Wagner's algorithm (CRYPTO 2002), run over the corresponding dual problem, and the Aharonov-Regev distinguisher (JACM 2005). Hence the subexponential Wagner step alone should be of interest for solving this dual problem - namely, the Short Integer Solution problem (SIS) - but this appears to be undocumented so far. We re-interpret this Wagner step as walking backward through a chain of projected lattices, zigzagging through some auxiliary superlattices. We further randomize the bucketing step using Gaussian randomized rounding to exploit the powerful discrete Gaussian machinery. This approach avoids sample amplification and turns Wagner's algorithm into an approximate discrete Gaussian sampler for $q$-ary lattices. For an SIS lattice with $n$ equations modulo $q$, this algorithm runs in subexponential time $\exp(O(n/\log \log n))$ to reach a Gaussian width parameter $s = q/\mathrm{polylog}(n)$ only requiring $m = n + \omega(n/\log \log n)$ many SIS variables. This directly provides a provable algorithm for solving the Short Integer Solution problem in the infinity norm ($\mathrm{SIS}^\infty$) for norm bounds $\beta = q/\mathrm{polylog}(n)$. This variant of SIS underlies the security of the NIST post-quantum cryptography standard Dilithium. Despite its subexponential complexity, Wagner's algorithm does not appear to threaten Dilithium's concrete security.

Length-Constrained Directed Expander Decomposition and Length-Constrained Vertex-Capacitated Flow Shortcuts

from arXiv: Data Structures and Algorithms

Authors: Bernhard Haeupler, Yaowei Long, Thatchaphol Saranurak, Shengzhe Wang

We show the existence of length-constrained expander decomposition in directed graphs and undirected vertex-capacitated graphs. Previously, its existence was shown only in undirected edge-capacitated graphs [Haeupler-R\"acke-Ghaffari, STOC 2022; Haeupler-Hershkowitz-Tan, FOCS 2024]. Along the way, we prove the multi-commodity maxflow-mincut theorems for length-constrained expansion in both directed and undirected vertex-capacitated graphs. Based on our decomposition, we build a length-constrained flow shortcut for undirected vertex-capacitated graphs, which roughly speaking is a set of edges and vertices added to the graph so that every multi-commodity flow demand can be routed with approximately the same vertex-congestion and length, but all flow paths only contain few edges. This generalizes the shortcut for undirected edge-capacitated graphs from [Haeupler-Hershkowitz-Li-Roeyskoe-Saranurak, STOC 2024]. Length-constrained expander decomposition and flow shortcuts have been crucial in the recent algorithms in undirected edge-capacitated graphs [Haeupler-Hershkowitz-Li-Roeyskoe-Saranurak, STOC 2024; Haeupler-Long-Saranurak, FOCS 2024]. Our work thus serves as a foundation to generalize these concepts to directed and vertex-capacitated graphs.

Authors: Bernhard Haeupler, Yaowei Long, Thatchaphol Saranurak, Shengzhe Wang

We show the existence of length-constrained expander decomposition in directed graphs and undirected vertex-capacitated graphs. Previously, its existence was shown only in undirected edge-capacitated graphs [Haeupler-R\"acke-Ghaffari, STOC 2022; Haeupler-Hershkowitz-Tan, FOCS 2024]. Along the way, we prove the multi-commodity maxflow-mincut theorems for length-constrained expansion in both directed and undirected vertex-capacitated graphs. Based on our decomposition, we build a length-constrained flow shortcut for undirected vertex-capacitated graphs, which roughly speaking is a set of edges and vertices added to the graph so that every multi-commodity flow demand can be routed with approximately the same vertex-congestion and length, but all flow paths only contain few edges. This generalizes the shortcut for undirected edge-capacitated graphs from [Haeupler-Hershkowitz-Li-Roeyskoe-Saranurak, STOC 2024]. Length-constrained expander decomposition and flow shortcuts have been crucial in the recent algorithms in undirected edge-capacitated graphs [Haeupler-Hershkowitz-Li-Roeyskoe-Saranurak, STOC 2024; Haeupler-Long-Saranurak, FOCS 2024]. Our work thus serves as a foundation to generalize these concepts to directed and vertex-capacitated graphs.

Monday, March 31

Linkage

from David Eppstein

Matthias Merzenich finds a true-period-14 glider gun in Conway’s Game of Life (\(\mathbb{M}\)).

By David Eppstein

TR25-037 | Uniform Bounds on Product Sylvester-Gallai Configurations | Abhibhav Garg, Rafael Mendes de Oliveira, Akash Sengupta

from ECCC Papers

In this work, we explore a non-linear extension of the classical Sylvester-Gallai configuration. Let $\mathbb{K}$ be an algebraically closed field of characteristic zero, and let $\mathcal{F} = \{F_1, \ldots, F_m\} \subset \mathbb{K}[x_1, \ldots, x_N]$ denote a collection of irreducible homogeneous polynomials of degree at most $d$, where each $F_i$ is not a scalar multiple of any other $F_j$ for $i \neq j$. We define $\mathcal{F}$ to be a product Sylvester-Gallai configuration if, for any two distinct polynomials $F_i, F_j \in \mathcal{F}$, the following condition is satisfied: $$\prod_{k \neq i, j} F_k \in \textrm{rad} \left( F_i, F_j \right).$$ We prove that product Sylvester-Gallai configurations are inherently low dimensional. Specifically, we show that there exists a function $\lambda : \mathbb{N} \rightarrow \mathbb{N}$, independent of $\mathbb{K}$, $N$, and $m$, such that any product Sylvester-Gallai configuration must satisfy: $$ \dim (\text{span}_{\mathbb{K}}(\mathcal{F})) \leq \lambda(d). $$ This result generalizes the main theorems from [S19,PS20a, OS23], and gets us one step closer to a full derandomization of the polynomial identity testing problem for the class of depth 4 circuits with bounded top and bottom fan-in.

In this work, we explore a non-linear extension of the classical Sylvester-Gallai configuration. Let $\mathbb{K}$ be an algebraically closed field of characteristic zero, and let $\mathcal{F} = \{F_1, \ldots, F_m\} \subset \mathbb{K}[x_1, \ldots, x_N]$ denote a collection of irreducible homogeneous polynomials of degree at most $d$, where each $F_i$ is not a scalar multiple of any other $F_j$ for $i \neq j$. We define $\mathcal{F}$ to be a product Sylvester-Gallai configuration if, for any two distinct polynomials $F_i, F_j \in \mathcal{F}$, the following condition is satisfied: $$\prod_{k \neq i, j} F_k \in \textrm{rad} \left( F_i, F_j \right).$$ We prove that product Sylvester-Gallai configurations are inherently low dimensional. Specifically, we show that there exists a function $\lambda : \mathbb{N} \rightarrow \mathbb{N}$, independent of $\mathbb{K}$, $N$, and $m$, such that any product Sylvester-Gallai configuration must satisfy: $$ \dim (\text{span}_{\mathbb{K}}(\mathcal{F})) \leq \lambda(d). $$ This result generalizes the main theorems from [S19,PS20a, OS23], and gets us one step closer to a full derandomization of the polynomial identity testing problem for the class of depth 4 circuits with bounded top and bottom fan-in.

Tragedy in one shitty act

from Scott Aaronson

Far-Left Students and Faculty: We’d sooner burn universities to the ground than allow them to remain safe for the hated Zionist Jews, the baby-killing demons of the earth. We’ll disrupt their classes, bar them from student activities, smash their Hillel centers, take over campus buildings and quads, and chant for Hezbollah and the Al-Aqsa Martyrs […]

Far-Left Students and Faculty: We’d sooner burn universities to the ground than allow them to remain safe for the hated Zionist Jews, the baby-killing demons of the earth. We’ll disrupt their classes, bar them from student activities, smash their Hillel centers, take over campus buildings and quads, and chant for Hezbollah and the Al-Aqsa Martyrs Brigades to eradicate them like vermin. We’ll do all this because we’ve so thoroughly learned the lessons of the Holocaust.

Trump Administration [cackling]: Burn universities to the ground, you say? What a coincidence! We’d love nothing more than to do exactly that. Happy to oblige you.

Far-Left Students and Faculty: You fascist scum. We didn’t mean “call our bluff”! Was it the campus Zionists who ratted us out to you? It was, wasn’t it? You can’t do this without due process; we have rights!

Trump Administration: We don’t answer to you and we don’t care about “due process” or your supposed “rights.” We’re cutting all your funding, effective immediately. Actually, since you leftists don’t have much funding to speak of, let’s just cut any university funding whatsoever that we can reach. Cancer studies. Overhead on NIH grants. Student aid. Fellowships. Whatever universities use to keep the lights on. The more essential it is, the longer it took to build, the more we’ll enjoy the elitist professors’ screams of anguish as we destroy it all in a matter of weeks.

Far-Left Students and Faculty: This is the end, then. But if our whole little world must go up in flames, at least we’ll die having never compromised our most fundamental moral principle: the eradication of the State of Israel and the death of its inhabitants.

Sane Majorities at Universities, Including Almost Everyone in STEM: [don’t get a speaking part in this play. They’ve already bled out on the street, killed in the crossfire]

By Scott

Explicit non-free tensors

from arXiv: Computational Complexity

Authors: Maxim van den Berg, Matthias Christandl, Vladimir Lysikov, Harold Nieuwboer, Michael Walter, Jeroen Zuiddam

Free tensors are tensors which, after a change of bases, have free support: any two distinct elements of its support differ in at least two coordinates. They play a distinguished role in the theory of bilinear complexity, in particular in Strassen's duality theory for asymptotic rank. Within the context of quantum information theory, where tensors are interpreted as multiparticle quantum states, freeness corresponds to a type of multiparticle Schmidt decomposition. In particular, if a state is free in a given basis, the reduced density matrices are diagonal. Although generic tensors in $\mathbb{C}^n \otimes \mathbb{C}^n \otimes \mathbb{C}^n$ are non-free for $n \geq 4$ by parameter counting, no explicit non-free tensors were known until now. We solve this hay in a haystack problem by constructing explicit tensors that are non-free for every $n \geq 3$. In particular, this establishes that non-free tensors exist in $\mathbb{C}^n \otimes \mathbb{C}^n \otimes \mathbb{C}^n$, where they are not generic. To establish non-freeness, we use results from geometric invariant theory and the theory of moment polytopes. In particular, we show that if a tensor $T$ is free, then there is a tensor $S$ in the GL-orbit closure of $T$, whose support is free and whose moment map image is the minimum-norm point of the moment polytope of $T$. This implies a reduction for checking non-freeness from arbitrary basis changes of $T$ to unitary basis changes of $S$. The unitary equivariance of the moment map can then be combined with the fact that tensors with free support have diagonal moment map image, in order to further restrict the set of relevant basis changes.

Authors: Maxim van den Berg, Matthias Christandl, Vladimir Lysikov, Harold Nieuwboer, Michael Walter, Jeroen Zuiddam

Free tensors are tensors which, after a change of bases, have free support: any two distinct elements of its support differ in at least two coordinates. They play a distinguished role in the theory of bilinear complexity, in particular in Strassen's duality theory for asymptotic rank. Within the context of quantum information theory, where tensors are interpreted as multiparticle quantum states, freeness corresponds to a type of multiparticle Schmidt decomposition. In particular, if a state is free in a given basis, the reduced density matrices are diagonal. Although generic tensors in $\mathbb{C}^n \otimes \mathbb{C}^n \otimes \mathbb{C}^n$ are non-free for $n \geq 4$ by parameter counting, no explicit non-free tensors were known until now. We solve this hay in a haystack problem by constructing explicit tensors that are non-free for every $n \geq 3$. In particular, this establishes that non-free tensors exist in $\mathbb{C}^n \otimes \mathbb{C}^n \otimes \mathbb{C}^n$, where they are not generic. To establish non-freeness, we use results from geometric invariant theory and the theory of moment polytopes. In particular, we show that if a tensor $T$ is free, then there is a tensor $S$ in the GL-orbit closure of $T$, whose support is free and whose moment map image is the minimum-norm point of the moment polytope of $T$. This implies a reduction for checking non-freeness from arbitrary basis changes of $T$ to unitary basis changes of $S$. The unitary equivariance of the moment map can then be combined with the fact that tensors with free support have diagonal moment map image, in order to further restrict the set of relevant basis changes.

The moment polytope of matrix multiplication is not maximal

from arXiv: Computational Complexity

Authors: Maxim van den Berg, Matthias Christandl, Vladimir Lysikov, Harold Nieuwboer, Michael Walter, Jeroen Zuiddam

Moment polytopes of tensors, the study of which is deeply rooted in invariant theory, representation theory and symplectic geometry, have found relevance in numerous places, from quantum information (entanglement polytopes) and algebraic complexity theory (GCT program and the complexity of matrix multiplication) to optimization (scaling algorithms). Towards an open problem in algebraic complexity theory, we prove separations between the moment polytopes of matrix multiplication tensors and unit tensors. As a consequence, we find that matrix multiplication moment polytopes are not maximal, i.e. are strictly contained in the corresponding Kronecker polytope. As another consequence, we obtain a no-go result for a natural operational characterization of moment polytope inclusion in terms of asymptotic restriction. We generalize the separation and non-maximality to moment polytopes of iterated matrix multiplication tensors. Our result implies that tensor networks where multipartite entanglement structures beyond two-party entanglement are allowed can go beyond projected entangled-pair states (PEPS) in terms of expressivity. Our proof characterizes membership of uniform points in moment polytopes of tensors, and establishes a connection to polynomial multiplication tensors via the minrank of matrix subspaces. As a result of independent interest, we extend these techniques to obtain a new proof of the optimal border subrank bound for matrix multiplication.

Authors: Maxim van den Berg, Matthias Christandl, Vladimir Lysikov, Harold Nieuwboer, Michael Walter, Jeroen Zuiddam

Moment polytopes of tensors, the study of which is deeply rooted in invariant theory, representation theory and symplectic geometry, have found relevance in numerous places, from quantum information (entanglement polytopes) and algebraic complexity theory (GCT program and the complexity of matrix multiplication) to optimization (scaling algorithms). Towards an open problem in algebraic complexity theory, we prove separations between the moment polytopes of matrix multiplication tensors and unit tensors. As a consequence, we find that matrix multiplication moment polytopes are not maximal, i.e. are strictly contained in the corresponding Kronecker polytope. As another consequence, we obtain a no-go result for a natural operational characterization of moment polytope inclusion in terms of asymptotic restriction. We generalize the separation and non-maximality to moment polytopes of iterated matrix multiplication tensors. Our result implies that tensor networks where multipartite entanglement structures beyond two-party entanglement are allowed can go beyond projected entangled-pair states (PEPS) in terms of expressivity. Our proof characterizes membership of uniform points in moment polytopes of tensors, and establishes a connection to polynomial multiplication tensors via the minrank of matrix subspaces. As a result of independent interest, we extend these techniques to obtain a new proof of the optimal border subrank bound for matrix multiplication.

Average-Case Hardness of Parity Problems: Orthogonal Vectors, k-SUM and More

from arXiv: Computational Complexity

Authors: Mina Dalirrooyfard, Andrea Lincoln, Barna Saha, Virginia Vassilevska Williams

This work establishes conditional lower bounds for average-case {\em parity}-counting versions of the problems $k$-XOR, $k$-SUM, and $k$-OV. The main contribution is a set of self-reductions for the problems, providing the first specific distributions, for which: $\mathsf{parity}\text{-}k\text{-}OV$ is $n^{\Omega(\sqrt{k})}$ average-case hard, under the $k$-OV hypothesis (and hence under SETH), $\mathsf{parity}\text{-}k\text{-}SUM$ is $n^{\Omega(\sqrt{k})}$ average-case hard, under the $k$-SUM hypothesis, and $\mathsf{parity}\text{-}k\text{-}XOR$ is $n^{\Omega(\sqrt{k})}$ average-case hard, under the $k$-XOR hypothesis. Under the very believable hypothesis that at least one of the $k$-OV, $k$-SUM, $k$-XOR or $k$-Clique hypotheses is true, we show that parity-$k$-XOR, parity-$k$-SUM, and parity-$k$-OV all require at least $n^{\Omega(k^{1/3})}$ (and sometimes even more) time on average (for specific distributions). To achieve these results, we present a novel and improved framework for worst-case to average-case fine-grained reductions, building on the work of Dalirooyfard, Lincoln, and Vassilevska Williams, FOCS 2020.

Authors: Mina Dalirrooyfard, Andrea Lincoln, Barna Saha, Virginia Vassilevska Williams

This work establishes conditional lower bounds for average-case {\em parity}-counting versions of the problems $k$-XOR, $k$-SUM, and $k$-OV. The main contribution is a set of self-reductions for the problems, providing the first specific distributions, for which: $\mathsf{parity}\text{-}k\text{-}OV$ is $n^{\Omega(\sqrt{k})}$ average-case hard, under the $k$-OV hypothesis (and hence under SETH), $\mathsf{parity}\text{-}k\text{-}SUM$ is $n^{\Omega(\sqrt{k})}$ average-case hard, under the $k$-SUM hypothesis, and $\mathsf{parity}\text{-}k\text{-}XOR$ is $n^{\Omega(\sqrt{k})}$ average-case hard, under the $k$-XOR hypothesis. Under the very believable hypothesis that at least one of the $k$-OV, $k$-SUM, $k$-XOR or $k$-Clique hypotheses is true, we show that parity-$k$-XOR, parity-$k$-SUM, and parity-$k$-OV all require at least $n^{\Omega(k^{1/3})}$ (and sometimes even more) time on average (for specific distributions). To achieve these results, we present a novel and improved framework for worst-case to average-case fine-grained reductions, building on the work of Dalirooyfard, Lincoln, and Vassilevska Williams, FOCS 2020.

Polychromatic Coloring of Tuples in Hypergraphs

from arXiv: Computational Geometry

Authors: Ahmad Biniaz, Jean-Lou De Carufel, Anil Maheshwari, Michiel Smid, Shakhar Smorodinsky, Miloš Stojaković

A hypergraph $H$ consists of a set $V$ of vertices and a set $E$ of hyperedges that are subsets of $V$. A $t$-tuple of $H$ is a subset of $t$ vertices of $V$. A $t$-tuple $k$-coloring of $H$ is a mapping of its $t$-tuples into $k$ colors. A coloring is called $(t,k,f)$-polychromatic if each hyperedge of $E$ that has at least $f$ vertices contains tuples of all the $k$ colors. Let $f_H(t,k)$ be the minimum $f$ such that $H$ has a $(t,k,f)$-polychromatic coloring. For a family of hypergraphs $\cal{H}$ let $f_{\cal{H}}(t,k)$ be the maximum $f_H(t,k)$ over all hypergraphs $H$ in $\cal{H}$. We present several bounds on $f_{\cal{H}}(t,k)$ for $t\ge 2$. - Let $\cal{H}$ be the family of hypergraphs $H$ that is obtained by taking any set $P$ of points in $\Re^2$, setting $V:=P$ and $E:=\{d\cap P\colon d\text{ is a disk in }\Re^2\}$. We prove that $f_\cal{H}(2,k)\le 3.7^k$, that is, the pairs of points (2-tuples) can be $k$-colored such that any disk containing at least $3.7^k$ points has pairs of all colors. - For the family $\mathcal{H}$ of shrinkable hypergraphs of VC-dimension at most $d$ we prove that $ f_\cal{H}(d{+}1,k) \leq c^k$ for some constant $c=c(d)$. We also prove that every hypergraph with $n$ vertices and with VC-dimension at most $d$ has a $(d{+}1)$-tuple $T$ of depth at least $\frac{n}{c}$, i.e., any hyperedge that contains $T$ also contains $\frac{n}{c}$ other vertices. - For the relationship between $t$-tuple coloring and vertex coloring in any hypergraph $H$ we establish the inequality $\frac{1}{e}\cdot tk^{\frac{1}{t}}\le f_H(t,k)\le f_H(1,tk^{\frac{1}{t}})$. For the special case of $k=2$, we prove that $t+1\le f_H(t,2)\le\max\{f_H(1,2), t+1\}$; this improves upon the previous best known upper bound. - We generalize some of our results to higher dimensions, other shapes, pseudo-disks, and also study the relationship between tuple coloring and epsilon nets.

Authors: Ahmad Biniaz, Jean-Lou De Carufel, Anil Maheshwari, Michiel Smid, Shakhar Smorodinsky, Miloš Stojaković

A hypergraph $H$ consists of a set $V$ of vertices and a set $E$ of hyperedges that are subsets of $V$. A $t$-tuple of $H$ is a subset of $t$ vertices of $V$. A $t$-tuple $k$-coloring of $H$ is a mapping of its $t$-tuples into $k$ colors. A coloring is called $(t,k,f)$-polychromatic if each hyperedge of $E$ that has at least $f$ vertices contains tuples of all the $k$ colors. Let $f_H(t,k)$ be the minimum $f$ such that $H$ has a $(t,k,f)$-polychromatic coloring. For a family of hypergraphs $\cal{H}$ let $f_{\cal{H}}(t,k)$ be the maximum $f_H(t,k)$ over all hypergraphs $H$ in $\cal{H}$. We present several bounds on $f_{\cal{H}}(t,k)$ for $t\ge 2$. - Let $\cal{H}$ be the family of hypergraphs $H$ that is obtained by taking any set $P$ of points in $\Re^2$, setting $V:=P$ and $E:=\{d\cap P\colon d\text{ is a disk in }\Re^2\}$. We prove that $f_\cal{H}(2,k)\le 3.7^k$, that is, the pairs of points (2-tuples) can be $k$-colored such that any disk containing at least $3.7^k$ points has pairs of all colors. - For the family $\mathcal{H}$ of shrinkable hypergraphs of VC-dimension at most $d$ we prove that $ f_\cal{H}(d{+}1,k) \leq c^k$ for some constant $c=c(d)$. We also prove that every hypergraph with $n$ vertices and with VC-dimension at most $d$ has a $(d{+}1)$-tuple $T$ of depth at least $\frac{n}{c}$, i.e., any hyperedge that contains $T$ also contains $\frac{n}{c}$ other vertices. - For the relationship between $t$-tuple coloring and vertex coloring in any hypergraph $H$ we establish the inequality $\frac{1}{e}\cdot tk^{\frac{1}{t}}\le f_H(t,k)\le f_H(1,tk^{\frac{1}{t}})$. For the special case of $k=2$, we prove that $t+1\le f_H(t,2)\le\max\{f_H(1,2), t+1\}$; this improves upon the previous best known upper bound. - We generalize some of our results to higher dimensions, other shapes, pseudo-disks, and also study the relationship between tuple coloring and epsilon nets.

Randomized $\tilde{O}(m\sqrt{n})$ Bellman-Ford from Fineman and the Boilermakers

from arXiv: Data Structures and Algorithms

Authors: Satish Rao

A classical algorithm by Bellman and Ford from the 1950's computes shortest paths in weighted graphs on $n$ vertices and $m$ edges with possibly negative weights in $O(mn)$ time. Indeed, this algorithm is taught regularly in undergraduate Algorithms courses. In 2023, after nearly 70 years, Fineman \cite{fineman2024single} developed an $\tilde{O}(m n^{8/9})$ expected time algorithm for this problem. Huang, Jin and Quanrud improved on Fineman's startling breakthrough by providing an $\tilde{O}(m n^{4/5} )$ time algorithm. This paper builds on ideas from those results to produce an $\tilde{O}(m\sqrt{n})$ expected time algorithm. The simple observation that distances can be updated with respect to the reduced costs for a price function in linear time is key to the improvement. This almost immediately improves the previous work. To produce the final bound, this paper provides recursive versions of Fineman's structures.

Authors: Satish Rao

A classical algorithm by Bellman and Ford from the 1950's computes shortest paths in weighted graphs on $n$ vertices and $m$ edges with possibly negative weights in $O(mn)$ time. Indeed, this algorithm is taught regularly in undergraduate Algorithms courses. In 2023, after nearly 70 years, Fineman \cite{fineman2024single} developed an $\tilde{O}(m n^{8/9})$ expected time algorithm for this problem. Huang, Jin and Quanrud improved on Fineman's startling breakthrough by providing an $\tilde{O}(m n^{4/5} )$ time algorithm. This paper builds on ideas from those results to produce an $\tilde{O}(m\sqrt{n})$ expected time algorithm. The simple observation that distances can be updated with respect to the reduced costs for a price function in linear time is key to the improvement. This almost immediately improves the previous work. To produce the final bound, this paper provides recursive versions of Fineman's structures.

Light Tree Covers, Routing, and Path-Reporting Oracles via Spanning Tree Covers in Doubling Graphs

from arXiv: Data Structures and Algorithms

Authors: Hsien-Chih Chang, Jonathan Conroy, Hung Le, Shay Solomon, Cuong Than

A $(1+\varepsilon)$-stretch tree cover of an edge-weighted $n$-vertex graph $G$ is a collection of trees, where every pair of vertices has a $(1+\varepsilon)$-stretch path in one of the trees. The celebrated Dumbbell Theorem by Arya et. al. [STOC'95] states that any set of $n$ points in $d$-dimensional Euclidean space admits a $(1+\varepsilon)$-stretch tree cover with a constant number of trees, where the constant depends on $\varepsilon$ and the dimension $d$. This result was generalized for arbitrary doubling metrics by Bartal et. al. [ICALP'19]. While the total number of edges in the tree covers of Arya et. al. and Bartal et. al. is $O(n)$, all known tree cover constructions incur a total lightness of $\Omega(\log n)$; whether one can get a tree cover of constant lightness has remained a longstanding open question, even for 2-dimensional point sets. In this work we resolve this fundamental question in the affirmative, as a direct corollary of a new construction of $(1+\varepsilon)$-stretch spanning tree cover for doubling graphs; in a spanning tree cover, every tree may only use edges of the input graph rather than the corresponding metric. To the best of our knowledge, this is the first constant-stretch spanning tree cover construction (let alone for $(1+\varepsilon)$-stretch) with a constant number of trees, for any nontrivial family of graphs. Concrete applications of our spanning tree cover include a $(1+\varepsilon)$-stretch light tree cover, a compact $(1+\varepsilon)$-stretch routing scheme in the labeled model, and a $(1+\varepsilon)$-stretch path-reporting distance oracle, for doubling graphs. [...]

Authors: Hsien-Chih Chang, Jonathan Conroy, Hung Le, Shay Solomon, Cuong Than

A $(1+\varepsilon)$-stretch tree cover of an edge-weighted $n$-vertex graph $G$ is a collection of trees, where every pair of vertices has a $(1+\varepsilon)$-stretch path in one of the trees. The celebrated Dumbbell Theorem by Arya et. al. [STOC'95] states that any set of $n$ points in $d$-dimensional Euclidean space admits a $(1+\varepsilon)$-stretch tree cover with a constant number of trees, where the constant depends on $\varepsilon$ and the dimension $d$. This result was generalized for arbitrary doubling metrics by Bartal et. al. [ICALP'19]. While the total number of edges in the tree covers of Arya et. al. and Bartal et. al. is $O(n)$, all known tree cover constructions incur a total lightness of $\Omega(\log n)$; whether one can get a tree cover of constant lightness has remained a longstanding open question, even for 2-dimensional point sets. In this work we resolve this fundamental question in the affirmative, as a direct corollary of a new construction of $(1+\varepsilon)$-stretch spanning tree cover for doubling graphs; in a spanning tree cover, every tree may only use edges of the input graph rather than the corresponding metric. To the best of our knowledge, this is the first constant-stretch spanning tree cover construction (let alone for $(1+\varepsilon)$-stretch) with a constant number of trees, for any nontrivial family of graphs. Concrete applications of our spanning tree cover include a $(1+\varepsilon)$-stretch light tree cover, a compact $(1+\varepsilon)$-stretch routing scheme in the labeled model, and a $(1+\varepsilon)$-stretch path-reporting distance oracle, for doubling graphs. [...]

Distributed Freeze Tag: a Sustainable Solution to Discover and Wake-up a Robot Swarm

from arXiv: Data Structures and Algorithms

Authors: Cyril Gavoille, Nicolas Hanusse, Gabriel Le Bouder, Taïssir Marcé

The Freeze Tag Problem consists in waking up a swarm of robots starting with one initially awake robot. Whereas there is a wide literature of the centralized setting, where the location of the robots is known in advance, we focus in the distributed version where the location of the robots $\P$ are unknown, and where awake robots only detect other robots up to distance~$1$. Assuming that moving at distance $\delta$ takes a time $\delta$, we show that waking up of the whole swarm takes $O(\rho+\ell^2\log( \rho/\ell))$, where $\rho$ stands for the largest distance from the initial robot to any point of $\P$, and the $\ell$ is the connectivity threshold of $\P$. Moreover, the result is complemented by a matching lower bound in both parameters $\rho$ and $\ell$. We also provide other distributed algorithms, complemented with lower bounds, whenever each robot has a bounded amount of energy.

Authors: Cyril Gavoille, Nicolas Hanusse, Gabriel Le Bouder, Taïssir Marcé

The Freeze Tag Problem consists in waking up a swarm of robots starting with one initially awake robot. Whereas there is a wide literature of the centralized setting, where the location of the robots is known in advance, we focus in the distributed version where the location of the robots $\P$ are unknown, and where awake robots only detect other robots up to distance~$1$. Assuming that moving at distance $\delta$ takes a time $\delta$, we show that waking up of the whole swarm takes $O(\rho+\ell^2\log( \rho/\ell))$, where $\rho$ stands for the largest distance from the initial robot to any point of $\P$, and the $\ell$ is the connectivity threshold of $\P$. Moreover, the result is complemented by a matching lower bound in both parameters $\rho$ and $\ell$. We also provide other distributed algorithms, complemented with lower bounds, whenever each robot has a bounded amount of energy.

On Finding All Connected Maximum-Sized Common Subgraphs in Multiple Labeled Graphs

from arXiv: Data Structures and Algorithms

Authors: Johannes B. S. Petersen, Akbar Davoodi, Thomas Gärtner, Marc Hellmuth, Daniel Merkle

We present an exact algorithm for computing the connected Maximum Common Subgraph (MCS) across multiple graphs, where edges or vertices may additionally be labeled to account for possible atom types or bond types, a classical labeling used in molecular graphs. Our approach leverages modular product graphs and a modified Bron-Kerbosch algorithm to enumerate maximal cliques, ensuring all intermediate solutions are retained. A pruning heuristic efficiently reduces the modular product size, improving computational feasibility. Additionally, we introduce a graph ordering strategy based on graph-kernel similarity measures to optimize the search process. Our method is particularly relevant for bioinformatics and cheminformatics, where identifying conserved structural motifs in molecular graphs is crucial. Empirical results on molecular datasets demonstrate that our approach is exact, scalable and fast.

Authors: Johannes B. S. Petersen, Akbar Davoodi, Thomas Gärtner, Marc Hellmuth, Daniel Merkle

We present an exact algorithm for computing the connected Maximum Common Subgraph (MCS) across multiple graphs, where edges or vertices may additionally be labeled to account for possible atom types or bond types, a classical labeling used in molecular graphs. Our approach leverages modular product graphs and a modified Bron-Kerbosch algorithm to enumerate maximal cliques, ensuring all intermediate solutions are retained. A pruning heuristic efficiently reduces the modular product size, improving computational feasibility. Additionally, we introduce a graph ordering strategy based on graph-kernel similarity measures to optimize the search process. Our method is particularly relevant for bioinformatics and cheminformatics, where identifying conserved structural motifs in molecular graphs is crucial. Empirical results on molecular datasets demonstrate that our approach is exact, scalable and fast.

Saturday, March 29

Research Scholar at MATS Program (apply by April 18, 2025)

from CCI: jobs

MATS is an independent research and educational seminar program that connects talented scholars with top mentors in the fields of AI alignment, security, and governance. For 10 weeks, MATS scholars conduct research while also attending talks, workshops, and networking events with other members of the SF Bay Area AI alignment research community. Apply by Apr […]

MATS is an independent research and educational seminar program that connects talented scholars with top mentors in the fields of AI alignment, security, and governance. For 10 weeks, MATS scholars conduct research while also attending talks, workshops, and networking events with other members of the SF Bay Area AI alignment research community.

Apply by Apr 18, 2025!

Website: matsprogram.org/apply
Email: communications@matsprogram.org

By shacharlovett

Survey's are done stupidly/A stupid question from a survey

from Computational Complexity

 I have often began taking a survey and quit in the middle. Why?

1) It goes on to long. When I told the surveyors that he may get more people quitting for that reason so he should make it shorter he said, rather rudely, that he is an expert on surveys and they need to ask this many questions to calibrate things properly. I tried to engage him in an intelligent conversation about the tradeoff: the longer it is the better the info, but less people fill it out, so what is the optimal point? He told me I was an idiot. Well... that's not going to encourage me to fill out his survey.

2) It asks questions that are too personal. 

3) It asks questions that seem irrelevant to me for their purpose (to be fair, perhaps I do not know the real purpose)

4) They ask a really stupid question. Here is the stupidest I've seen:


Challenge: Have you ever seen a stupider question? 

As always, I ask non rhetorically. 

By gasarch

 I have often began taking a survey and quit in the middle. Why?

1) It goes on to long. When I told the surveyors that he may get more people quitting for that reason so he should make it shorter he said, rather rudely, that he is an expert on surveys and they need to ask this many questions to calibrate things properly. I tried to engage him in an intelligent conversation about the tradeoff: the longer it is the better the info, but less people fill it out, so what is the optimal point? He told me I was an idiot. Well... that's not going to encourage me to fill out his survey.

2) It asks questions that are too personal. 

3) It asks questions that seem irrelevant to me for their purpose (to be fair, perhaps I do not know the real purpose)

4) They ask a really stupid question. Here is the stupidest I've seen:


Challenge
: Have you ever seen a stupider question? 

As always, I ask non rhetorically. 

By gasarch

TR25-036 | Lifting for Arbitrary Gadgets | Siddharth Iyer

from ECCC Papers

We prove a sensitivity-to-communication lifting theorem for arbitrary gadgets. Given functions $f: \{0,1\}^n\to \{0,1\}$ and $g : \mathcal{X} \times \mathcal{Y}\to \{0,1\}$, denote $f\circ g(x,y) := f(g(x_1,y_1),\ldots,g(x_n,y_n))$. We show that for any $f$ with sensitivity $s$ and any $g$, \[D(f\circ g) \geq s\cdot \bigg(\frac{\Omega(D(g))}{\log rk(g)} - \log rk(g)\bigg),\] where $D(\cdot)$ denotes the deterministic communication complexity and $rk(g)$ is the rank of the matrix associated with $g$. As a corollary, we get that if $D(g)$ is a sufficiently large constant, $D(f\circ g) = \Omega(\min\{s,d\}\cdot \sqrt{D(g)})$, where $s$ and $d$ denote the sensitivity and degree of $f$. In particular, computing the OR of $n$ copies of $g$ requires $\Omega(n\cdot\sqrt{D(g)})$ bits.

We prove a sensitivity-to-communication lifting theorem for arbitrary gadgets. Given functions $f: \{0,1\}^n\to \{0,1\}$ and $g : \mathcal{X} \times \mathcal{Y}\to \{0,1\}$, denote $f\circ g(x,y) := f(g(x_1,y_1),\ldots,g(x_n,y_n))$. We show that for any $f$ with sensitivity $s$ and any $g$, \[D(f\circ g) \geq s\cdot \bigg(\frac{\Omega(D(g))}{\log rk(g)} - \log rk(g)\bigg),\] where $D(\cdot)$ denotes the deterministic communication complexity and $rk(g)$ is the rank of the matrix associated with $g$. As a corollary, we get that if $D(g)$ is a sufficiently large constant, $D(f\circ g) = \Omega(\min\{s,d\}\cdot \sqrt{D(g)})$, where $s$ and $d$ denote the sensitivity and degree of $f$. In particular, computing the OR of $n$ copies of $g$ requires $\Omega(n\cdot\sqrt{D(g)})$ bits.

Friday, March 28

baby, it's cold inside

from Ben Recht

celebrating the reissue of an ambient deep cut

In 2006, I got my PhD, got married, and moved to California to start a post-doc. I started exercising a lot more. I started eating better. I didn’t miss the eastern seaboard autumn or Massachusetts weather. I had a network of friends who had settled out there before me. I loved the LA lifestyle.

But then, why was I so depressed? It gets dark at 4 in Los Angeles December, and the temperature plummets after sunset. The uninsulated ranch houses in the sprawl don’t protect you from the chill.

A few days before heading out for Christmas with in-laws in Virginia, I recorded a new song in my makeshift home studio. I called it baby, it’s cold inside, and emailed my bandmate Isaac, “I've got to work on making things creepy and not just melancholy.” That wouldn’t happen. But that session’s name would inspire us to record one of our more successful albums. Melancholy would remain our sound, and to our surprise, it resonated with a lot of people, who were probably also brooding in their late twenties.

And this album still resonates. We’re beyond thrilled to announce that Berlin’s Keplar label is re-releasing baby, it’s cold inside remastered on vinyl. It comes out today. Go get your copy!

Originally released in 2008, this would be our second album with the curatorial wizards at Barge Recordings. Our first album with Barge, life-sized psychoses, came out in April 2007, and the Barge guys convinced us to do a little tour to promote the record, booking shows in Boston, Providence, and Brooklyn. I flew out to Cambridge for a few weeks to prepare.

It was the first album we tried to record in Isaac’s apartment, a run down second story walkup in Cambridgeport. Isaac and his partner Ari had separate bedrooms, but Isaac’s room was more of a multipurpose oversized closet fondly dubbed “the bedroom tomb.” It had a loft bed and was packed to the gills with records, music gear, a screen printing setup, film equipment, and stuff gathered off the street. When I moved to California, I left one of my baritone guitars, a direct box, and an overdrive pedal with Isaac. They were buried in the bedroom tomb too.

The bedroom tomb was decidedly more lo-fi than my home studios, but this was part of its charm. Bad cabling or unpredictable electronics or just awkward sitting brought a lot of weird serendipity. We hadn’t made music together in months, but we sat down and things started flowing. Over three sessions in one week, we captured our next record.

On the first Friday, we recorded fucking milwaukee’s been hesher forever, a dreamy accident of one of the built-in delays of Ableton Live. On Sunday, autoshow day of the dead emerged from an exercise of tension around a piano loop. And then the rest of the record came together the following Friday the 13th. We spent an evening leaning into the noisier end of the spectrum, experimenting with more distortion and more grit. The result was a little more warm shoegazer distortion than cold post-rock arpeggios.

We’d go on to play some of these songs during the tour, trying to recreate the vibes. This was the first time we tried to capture the initial improvisations in a live experience. Over the next few months, we threw the many different iterations into a single session, moving pieces around and sending each other clips via YouSendIt and email. Eventually, we patched together baby, it’s cold inside.

Sixteen years later, Keplar emailed us out of the blue. Since it seemed like most music had moved to short-form social media, Isaac and I had been pretty delinquent at checking that email account. Though they reached out in June, it wasn’t until August, when I logged in to get a recovery code for a music distribution website, that we saw the email. We thought we had missed out on a great opportunity. Fortunately, the folks at Keplar were enthusiastic to push this forward, and you can grab a physical copy today.

We hope you like it! The fun years still has an infinite back catalog that we’d like to release eventually. Our motto has always been “from quantity comes quality.” We’ve never convinced ourselves to embrace the social media mindset of constantly posting, but maybe it’s time we do. In that spirit, we’re releasing the original, wistful solo piece on our bandcamp. I’m not sure if anyone other than me and Isaac have even heard this before. We hope you dig it. And who knows, 2025 could be the year we dust off our archives to see what’s still cold inside.

Subscribe now

By Ben Recht

PhD at Inria Lille (apply by April 23, 2025)

from CCI: jobs

The LINKS team at Inria Lille is seeking PhD candidates to work on the topic of “Efficient enumeration via edits”. The PhD will be carried out in the LINKS team in Lille, France: it offers a dynamic environment and a friendly atmosphere. The PhD will be co-supervised by Antoine Amarilli (a3nm.net/) and Mikaël Monet (mikael-monet.net/), […]

The LINKS team at Inria Lille is seeking PhD candidates to work on the topic of “Efficient enumeration via edits”. The PhD will be carried out in the LINKS team in Lille, France: it offers a dynamic environment and a friendly atmosphere. The PhD will be co-supervised by Antoine Amarilli (https://a3nm.net/) and Mikaël Monet (https://mikael-monet.net/), and starts in fall 2025.

Website: https://a3nm.net/work/research/offers/phd_enumeration_changes.pdf
Email: a3nm@a3nm.net

By shacharlovett

TCS+ talk: Wednesday, April 9 — Or Zamir, Tel Aviv University

from TCS+ Seminar Series

The third TCS+ talk of the season will take place on Wednesday, April 9th at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 17:00 UTC: check your time zone here). Or Zamir from Tel Aviv University will speak about “Optimality of Frequency Moment Estimation” (abstract below). (Note that the talk is […]

The third TCS+ talk of the season will take place on Wednesday, April 9th at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 17:00 UTC: check your time zone here). Or Zamir from Tel Aviv University will speak about “Optimality of Frequency Moment Estimation” (abstract below).

(Note that the talk is on April 9th, not April 2nd, to accommodate the FOCS deadline.)

You can reserve a spot as an individual or a group to join us live by signing up on the online form. Registration is not required to attend the interactive talk, and the link will be posted on the website the day prior to the talk; however, by registering in the form, you will receive a reminder, along with the link. (The recorded talk will also be posted on our website afterwards) As usual, for more information about the TCS+ online seminar series and the upcoming talks, or to suggest a possible topic or speaker, please see the website.

Abstract: Estimating the second frequency moment of a stream up to (1±ε) multiplicative error requires at most O(log n / ε²) bits of space, due to a seminal result of Alon, Matias, and Szegedy. It is also known that at least Ω(log n + 1/ε²) space is needed. We prove a tight lower bound of Ω(log(nε²) / ε²) for all ε = Ω(1/√n). Notably, when ε > n^(-1/2 + c), where c > 0, our lower bound matches the classic upper bound of AMS. For smaller values of ε, we also introduce a revised algorithm that improves the classic AMS bound and matches our lower bound. Our lower bound also applies to the more general problem of p-th frequency moment estimation for the range of p in (1, 2], providing a tight bound in the only remaining range to settle the optimal space complexity of estimating frequency moments.

Based on a joint work with Mark Braverman.

By plustcs

Time hierarchies for sublogarithmic-space quantum computation

from arXiv: Computational Complexity

Authors: A. C. Cem Say

We present new results on the landscape of problems that can be solved by quantum Turing machines (QTM's) employing severely limited amounts of memory. In this context, we demonstrate two infinite time hierarchies of complexity classes within the ``small space'' regime: For all $i\geq 0$, there is a language that can be recognized by a constant-space machine in $2^{O(n^{1/2^i})}$ time, but not by any sublogarithmic-space QTM in $2^{O(n^{1/2^{i+1}})}$ time. For quantum machines operating within $o(\log \log n)$ space, there exists another hierarchy, each level of which corresponds to an expected runtime of $2^{O((\log n)^i)}$ for a different positive integer $i$. We also improve a quantum advantage result, demonstrating a language that can be recognized by a polynomial-time constant-space QTM, but not by any classical machine using $o(\log \log n)$ space, regardless of the time budget. The implications of our findings for quantum space-time tradeoffs are discussed.

Authors: A. C. Cem Say

We present new results on the landscape of problems that can be solved by quantum Turing machines (QTM's) employing severely limited amounts of memory. In this context, we demonstrate two infinite time hierarchies of complexity classes within the ``small space'' regime: For all $i\geq 0$, there is a language that can be recognized by a constant-space machine in $2^{O(n^{1/2^i})}$ time, but not by any sublogarithmic-space QTM in $2^{O(n^{1/2^{i+1}})}$ time. For quantum machines operating within $o(\log \log n)$ space, there exists another hierarchy, each level of which corresponds to an expected runtime of $2^{O((\log n)^i)}$ for a different positive integer $i$. We also improve a quantum advantage result, demonstrating a language that can be recognized by a polynomial-time constant-space QTM, but not by any classical machine using $o(\log \log n)$ space, regardless of the time budget. The implications of our findings for quantum space-time tradeoffs are discussed.

Matchgate signatures under variable permutations

from arXiv: Computational Complexity

Authors: Boning Meng, Yicheng Pan

In this article, we give a sufficient and necessary condition for determining whether a matchgate signature retains its property under a certain variable permutation, which can be checked in polynomial time. We also define the concept of permutable matchgate signatures, and use it to erase the gap between Pl-\#CSP and \#CSP on planar graphs in the previous study. We provide a detailed characterization of permutable matchgate signatures as well, by presenting their relation to symmetric matchgate signatures. In addition, we prove a dichotomy for Pl-$\#R_D$-CSP where $D\ge 3$ is an integer.

Authors: Boning Meng, Yicheng Pan

In this article, we give a sufficient and necessary condition for determining whether a matchgate signature retains its property under a certain variable permutation, which can be checked in polynomial time. We also define the concept of permutable matchgate signatures, and use it to erase the gap between Pl-\#CSP and \#CSP on planar graphs in the previous study. We provide a detailed characterization of permutable matchgate signatures as well, by presenting their relation to symmetric matchgate signatures. In addition, we prove a dichotomy for Pl-$\#R_D$-CSP where $D\ge 3$ is an integer.

Efficient Computation of the Directional Extremal Boundary of a Union of Equal-Radius Circles

from arXiv: Computational Geometry

Authors: Alexander Gribov

This paper focuses on computing the directional extremal boundary of a union of equal-radius circles. We introduce an efficient algorithm that accurately determines this boundary by analyzing the intersections and dominant relationships among the circles. The algorithm has time complexity of O(n log n).

Authors: Alexander Gribov

This paper focuses on computing the directional extremal boundary of a union of equal-radius circles. We introduce an efficient algorithm that accurately determines this boundary by analyzing the intersections and dominant relationships among the circles. The algorithm has time complexity of O(n log n).

Surface guided analysis of breast changes during post-operative radiotherapy by using a functional map framework

from arXiv: Computational Geometry

Authors: Pierre Galmiche, Hyewon Seo, Yvan Pin, Philippe Meyer, Georges Noël, Michel de Mathelin

The treatment of breast cancer using radiotherapy involves uncertainties regarding breast positioning. As the studies progress, more is known about the expected breast positioning errors, which are taken into account in the Planning Target Volume (PTV) in the form of the margin around the clinical target volume. However, little is known about the non-rigid deformations of the breast in the course of radiotherapy, which is a non-negligible factor to the treatment. Purpose: Taking into account such inter-fractional breast deformations would help develop a promising future direction, such as patient-specific adjustable irradiation plannings. Methods: In this study, we develop a geometric approach to analyze inter-fractional breast deformation throughout the radiotherapy treatment. Our data consists of 3D surface scans of patients acquired during radiotherapy sessions using a handheld scanner. We adapt functional map framework to compute inter-and intra-patient non-rigid correspondences, which are then used to analyze intra-patient changes and inter-patient variability. Results: The qualitative shape collection analysis highlight deformations in the contralateral breast and armpit areas, along with positioning shifts on the head or abdominal regions. We also perform extrinsic analysis, where we align surface acquisitions of the treated breast with the CT-derived skin surface to assess displacements and volume changes in the treated area. On average, displacements within the treated breast exhibit amplitudes of 1-2 mm across sessions, with higher values observed at the time of the 25 th irradiation session. Volume changes, inferred from surface variations, reached up to 10%, with values ranging between 2% and 5% over the course of treatment. Conclusions: We propose a comprehensive workflow for analyzing and modeling breast deformations during radiotherapy using surface acquisitions, incorporating a novel inter-collection shape matching approach to model shape variability within a i shared space across multiple patient shape collections. We validate our method using 3D surface data acquired from patients during External Beam Radiotherapy (EBRT) sessions, demonstrating its effectiveness. The clinical trial data used in this paper is registered under the ClinicalTrials.gov ID NCT03801850.

Authors: Pierre Galmiche, Hyewon Seo, Yvan Pin, Philippe Meyer, Georges Noël, Michel de Mathelin

The treatment of breast cancer using radiotherapy involves uncertainties regarding breast positioning. As the studies progress, more is known about the expected breast positioning errors, which are taken into account in the Planning Target Volume (PTV) in the form of the margin around the clinical target volume. However, little is known about the non-rigid deformations of the breast in the course of radiotherapy, which is a non-negligible factor to the treatment. Purpose: Taking into account such inter-fractional breast deformations would help develop a promising future direction, such as patient-specific adjustable irradiation plannings. Methods: In this study, we develop a geometric approach to analyze inter-fractional breast deformation throughout the radiotherapy treatment. Our data consists of 3D surface scans of patients acquired during radiotherapy sessions using a handheld scanner. We adapt functional map framework to compute inter-and intra-patient non-rigid correspondences, which are then used to analyze intra-patient changes and inter-patient variability. Results: The qualitative shape collection analysis highlight deformations in the contralateral breast and armpit areas, along with positioning shifts on the head or abdominal regions. We also perform extrinsic analysis, where we align surface acquisitions of the treated breast with the CT-derived skin surface to assess displacements and volume changes in the treated area. On average, displacements within the treated breast exhibit amplitudes of 1-2 mm across sessions, with higher values observed at the time of the 25 th irradiation session. Volume changes, inferred from surface variations, reached up to 10%, with values ranging between 2% and 5% over the course of treatment. Conclusions: We propose a comprehensive workflow for analyzing and modeling breast deformations during radiotherapy using surface acquisitions, incorporating a novel inter-collection shape matching approach to model shape variability within a i shared space across multiple patient shape collections. We validate our method using 3D surface data acquired from patients during External Beam Radiotherapy (EBRT) sessions, demonstrating its effectiveness. The clinical trial data used in this paper is registered under the ClinicalTrials.gov ID NCT03801850.

Fully dynamic biconnectivity in $\tilde{\mathcal{O}}(\log^2 n)$ time

from arXiv: Data Structures and Algorithms

Authors: Jacob Holm, Wojciech Nadara, Eva Rotenberg, Marek Sokołowski

We present a deterministic fully-dynamic data structure for maintaining information about the cut-vertices in a graph; i.e. the vertices whose removal would disconnect the graph. Our data structure supports insertion and deletion of edges, as well as queries to whether a pair of connected vertices are either biconnected, or can be separated by a cutvertex, and in the latter case we support access to separating cutvertices. All update operations are supported in amortized $O(\log^2 n \log^2 \log n)$ time, and queries take worst-case $O(\log n \log^2 \log n)$ time. Note that these time bounds match the current best for deterministic dynamic connectivity up to $\log \log n$ factors. We obtain our improved running time by a series of reductions from the original problem into well-defined data structure problems. While we do apply the well-known techniques for improving running time of two-edge connectivity [STOC'00, SODA'18], these techniques alone do not lead to an update time of $\tilde{O}(\log^3 n)$, let alone the $\tilde{O}(\log^2 n)$ we give as a final result. Our contributions include a formally defined transient expose operation, which can be thought of as a cheaper read-only expose operation on a top tree. For each vertex in the graph, we maintain a data structure over its neighbors, and in this data structure we apply biasing (twice) to save two $\tilde{O}(\log n)$ factors. One of these biasing techniques is a new biased disjoint sets data structure, which may be of independent interest. Moreover, in this neighborhood data structure, we facilitate that the vertex can select two VIP neighbors that get special treatment, corresponding to its potentially two neighbors on an exposed path, improving a $\log n$-time operation down to constant time. It is this combination of VIP neighbors with the transient expose that saves an $\tilde{O}(\log n)$-factor from another bottleneck.

Authors: Jacob Holm, Wojciech Nadara, Eva Rotenberg, Marek Sokołowski

We present a deterministic fully-dynamic data structure for maintaining information about the cut-vertices in a graph; i.e. the vertices whose removal would disconnect the graph. Our data structure supports insertion and deletion of edges, as well as queries to whether a pair of connected vertices are either biconnected, or can be separated by a cutvertex, and in the latter case we support access to separating cutvertices. All update operations are supported in amortized $O(\log^2 n \log^2 \log n)$ time, and queries take worst-case $O(\log n \log^2 \log n)$ time. Note that these time bounds match the current best for deterministic dynamic connectivity up to $\log \log n$ factors. We obtain our improved running time by a series of reductions from the original problem into well-defined data structure problems. While we do apply the well-known techniques for improving running time of two-edge connectivity [STOC'00, SODA'18], these techniques alone do not lead to an update time of $\tilde{O}(\log^3 n)$, let alone the $\tilde{O}(\log^2 n)$ we give as a final result. Our contributions include a formally defined transient expose operation, which can be thought of as a cheaper read-only expose operation on a top tree. For each vertex in the graph, we maintain a data structure over its neighbors, and in this data structure we apply biasing (twice) to save two $\tilde{O}(\log n)$ factors. One of these biasing techniques is a new biased disjoint sets data structure, which may be of independent interest. Moreover, in this neighborhood data structure, we facilitate that the vertex can select two VIP neighbors that get special treatment, corresponding to its potentially two neighbors on an exposed path, improving a $\log n$-time operation down to constant time. It is this combination of VIP neighbors with the transient expose that saves an $\tilde{O}(\log n)$-factor from another bottleneck.

Output-sensitive approximate counting via a measure-bounded hyperedge oracle, or: How asymmetry helps estimate $k$-clique counts faster

from arXiv: Data Structures and Algorithms

Authors: Keren Censor-Hillel, Tomer Even, Virginia Vassilevska Williams

Dell, Lapinskas and Meeks [DLM SICOMP 2022] presented a general reduction from approximate counting to decision for a class of fine-grained problems that can be viewed as hyperedge counting or detection problems in an implicit hypergraph, thus obtaining tight equivalences between approximate counting and decision for many key problems such as $k$-clique, $k$-sum and more. Their result is a reduction from approximately counting the number of hyperedges in an implicit $k$-partite hypergraph to a polylogarithmic number of calls to a hyperedge oracle that returns whether a given subhypergraph contains an edge. The main result of this paper is a generalization of the DLM result for {\em output-sensitive} approximate counting, where the running time of the desired counting algorithm is inversely proportional to the number of witnesses. Our theorem is a reduction from approximately counting the (unknown) number of hyperedges in an implicit $k$-partite hypergraph to a polylogarithmic number of calls to a hyperedge oracle called only on subhypergraphs with a small ``measure''. If a subhypergraph has $u_i$ nodes in the $i$th node partition of the $k$-partite hypergraph, then its measure is $\prod_i u_i$. Using the new general reduction and by efficiently implementing measure-bounded colorful independence oracles, we obtain new improved output-sensitive approximate counting algorithms for $k$-clique, $k$-dominating set and $k$-sum. In graphs with $n^t$ $k$-cliques, for instance, our algorithm $(1\pm \epsilon)$-approximates the $k$-clique count in time $$\tilde{O}_\epsilon(n^{\omega(\frac{k-t-1}{3},\frac{k-t}{3},\frac{k-t+2}{3}) }+n^2),$$ where $\omega(a,b,c)$ is the exponent of $n^a\times n^b$ by $n^b\times n^c$ matrix multiplication. For large $k$ and $t>2$, this is a substantial improvement over prior work, even if $\omega=2$.

Authors: Keren Censor-Hillel, Tomer Even, Virginia Vassilevska Williams

Dell, Lapinskas and Meeks [DLM SICOMP 2022] presented a general reduction from approximate counting to decision for a class of fine-grained problems that can be viewed as hyperedge counting or detection problems in an implicit hypergraph, thus obtaining tight equivalences between approximate counting and decision for many key problems such as $k$-clique, $k$-sum and more. Their result is a reduction from approximately counting the number of hyperedges in an implicit $k$-partite hypergraph to a polylogarithmic number of calls to a hyperedge oracle that returns whether a given subhypergraph contains an edge. The main result of this paper is a generalization of the DLM result for {\em output-sensitive} approximate counting, where the running time of the desired counting algorithm is inversely proportional to the number of witnesses. Our theorem is a reduction from approximately counting the (unknown) number of hyperedges in an implicit $k$-partite hypergraph to a polylogarithmic number of calls to a hyperedge oracle called only on subhypergraphs with a small ``measure''. If a subhypergraph has $u_i$ nodes in the $i$th node partition of the $k$-partite hypergraph, then its measure is $\prod_i u_i$. Using the new general reduction and by efficiently implementing measure-bounded colorful independence oracles, we obtain new improved output-sensitive approximate counting algorithms for $k$-clique, $k$-dominating set and $k$-sum. In graphs with $n^t$ $k$-cliques, for instance, our algorithm $(1\pm \epsilon)$-approximates the $k$-clique count in time $$\tilde{O}_\epsilon(n^{\omega(\frac{k-t-1}{3},\frac{k-t}{3},\frac{k-t+2}{3}) }+n^2),$$ where $\omega(a,b,c)$ is the exponent of $n^a\times n^b$ by $n^b\times n^c$ matrix multiplication. For large $k$ and $t>2$, this is a substantial improvement over prior work, even if $\omega=2$.

A Tolerant Independent Set Tester

from arXiv: Data Structures and Algorithms

Authors: Cameron Seth

We give nearly optimal bounds on the sample complexity of $(\widetilde{\Omega}(\epsilon),\epsilon)$-tolerant testing the $\rho$-independent set property in the dense graph setting. In particular, we give an algorithm that inspects a random subgraph on $\widetilde{O}(\rho^3/\epsilon^2)$ vertices and, for some constant $c,$ distinguishes between graphs that have an induced subgraph of size $\rho n$ with fewer than $\frac{\epsilon}{c \log^4(1/\epsilon)} n^2$ edges from graphs for which every induced subgraph of size $\rho n$ has at least $\epsilon n^2$ edges. Our sample complexity bound matches, up to logarithmic factors, the recent upper bound by Blais and Seth (2023) for the non-tolerant testing problem, which is known to be optimal for the non-tolerant testing problem based on a lower bound by Feige, Langberg and Schechtman (2004). Our main technique is a new graph container lemma for sparse subgraphs instead of independent sets. We also show that our new lemma can be used to generalize one of the classic applications of the container method, that of counting independent sets in regular graphs, to counting sparse subgraphs in regular graphs.

Authors: Cameron Seth

We give nearly optimal bounds on the sample complexity of $(\widetilde{\Omega}(\epsilon),\epsilon)$-tolerant testing the $\rho$-independent set property in the dense graph setting. In particular, we give an algorithm that inspects a random subgraph on $\widetilde{O}(\rho^3/\epsilon^2)$ vertices and, for some constant $c,$ distinguishes between graphs that have an induced subgraph of size $\rho n$ with fewer than $\frac{\epsilon}{c \log^4(1/\epsilon)} n^2$ edges from graphs for which every induced subgraph of size $\rho n$ has at least $\epsilon n^2$ edges. Our sample complexity bound matches, up to logarithmic factors, the recent upper bound by Blais and Seth (2023) for the non-tolerant testing problem, which is known to be optimal for the non-tolerant testing problem based on a lower bound by Feige, Langberg and Schechtman (2004). Our main technique is a new graph container lemma for sparse subgraphs instead of independent sets. We also show that our new lemma can be used to generalize one of the classic applications of the container method, that of counting independent sets in regular graphs, to counting sparse subgraphs in regular graphs.

A Theoretical Framework for Distribution-Aware Dataset Search

from arXiv: Data Structures and Algorithms

Authors: Aryan Esmailpour, Sainyam Galhotra, Rahul Raychaudhury, Stavros Sintos

Effective data discovery is a cornerstone of modern data-driven decision-making. Yet, identifying datasets with specific distributional characteristics, such as percentiles or preferences, remains challenging. While recent proposals have enabled users to search based on percentile predicates, much of the research in data discovery relies on heuristics. This paper presents the first theoretically backed framework that unifies data discovery under centralized and decentralized settings. Let $\mathcal{P}=\{P_1,...,P_N\}$ be a repository of $N$ datasets, where $P_i\subset \mathbb{R}^d$, for $d=O(1)$ . We study the percentile indexing (Ptile) problem and the preference indexing (Pref) problem under the centralized and the federated setting. In the centralized setting we assume direct access to the datasets. In the federated setting we assume access to a synopsis of each dataset. The goal of Ptile is to construct a data structure such that given a predicate (rectangle $R$ and interval $\theta$) report all indexes $J$ such that $j\in J$ iff $|P_j\cap R|/|P_j|\in\theta$. The goal of Pref is to construct a data structure such that given a predicate (vector $v$ and interval $\theta$) report all indexes $J$ such that $j\in J$ iff $\omega(P_j,v)\in \theta$, where $\omega(P_j,v)$ is the inner-product of the $k$-th largest projection of $P_j$ on $v$. We first show that we cannot hope for near-linear data structures with polylogarithmic query time in the centralized setting. Next we show $\tilde{O}(N)$ space data structures that answer Ptile and Pref queries in $\tilde{O}(1+OUT)$ time, where $OUT$ is the output size. Each data structure returns a set of indexes $J$ such that i) for every $P_i$ that satisfies the predicate, $i\in J$ and ii) if $j\in J$ then $P_j$ satisfies the predicate up to an additive error $\varepsilon+2\delta$, where $\varepsilon\in(0,1)$ and $\delta$ is the error of synopses.

Authors: Aryan Esmailpour, Sainyam Galhotra, Rahul Raychaudhury, Stavros Sintos

Effective data discovery is a cornerstone of modern data-driven decision-making. Yet, identifying datasets with specific distributional characteristics, such as percentiles or preferences, remains challenging. While recent proposals have enabled users to search based on percentile predicates, much of the research in data discovery relies on heuristics. This paper presents the first theoretically backed framework that unifies data discovery under centralized and decentralized settings. Let $\mathcal{P}=\{P_1,...,P_N\}$ be a repository of $N$ datasets, where $P_i\subset \mathbb{R}^d$, for $d=O(1)$ . We study the percentile indexing (Ptile) problem and the preference indexing (Pref) problem under the centralized and the federated setting. In the centralized setting we assume direct access to the datasets. In the federated setting we assume access to a synopsis of each dataset. The goal of Ptile is to construct a data structure such that given a predicate (rectangle $R$ and interval $\theta$) report all indexes $J$ such that $j\in J$ iff $|P_j\cap R|/|P_j|\in\theta$. The goal of Pref is to construct a data structure such that given a predicate (vector $v$ and interval $\theta$) report all indexes $J$ such that $j\in J$ iff $\omega(P_j,v)\in \theta$, where $\omega(P_j,v)$ is the inner-product of the $k$-th largest projection of $P_j$ on $v$. We first show that we cannot hope for near-linear data structures with polylogarithmic query time in the centralized setting. Next we show $\tilde{O}(N)$ space data structures that answer Ptile and Pref queries in $\tilde{O}(1+OUT)$ time, where $OUT$ is the output size. Each data structure returns a set of indexes $J$ such that i) for every $P_i$ that satisfies the predicate, $i\in J$ and ii) if $j\in J$ then $P_j$ satisfies the predicate up to an additive error $\varepsilon+2\delta$, where $\varepsilon\in(0,1)$ and $\delta$ is the error of synopses.

A Quantum Constraint Generation Framework for Binary Linear Programs

from arXiv: Data Structures and Algorithms

Authors: András Czégel, Boglárka G. -Tóth

We propose a new approach to utilize quantum computers for binary linear programming (BLP), which can be extended to general integer linear programs (ILP). Quantum optimization algorithms, hybrid or quantum-only, are currently general purpose, standalone solvers for ILP. However, to consider them practically useful, we expect them to overperform the current state of the art classical solvers. That expectation is unfair to quantum algorithms: in classical ILP solvers, after many decades of evolution, many different algorithms work together as a robust machine to get the best result. This is the approach we would like to follow now with our quantum 'solver' solutions. In this study we wrap any suitable quantum optimization algorithm into a quantum informed classical constraint generation framework. First we relax our problem by dropping all constraints and encode it into an Ising Hamiltonian for the quantum optimization subroutine. Then, by sampling from the solution state of the subroutine, we obtain information about constraint violations in the initial problem, from which we decide which coupling terms we need to introduce to the Hamiltonian. The coupling terms correspond to the constraints of the initial binary linear program. Then we optimize over the new Hamiltonian again, until we reach a feasible solution, or other stopping conditions hold. Since one can decide how many constraints they add to the Hamiltonian in a single step, our algorithm is at least as efficient as the (hybrid) quantum optimization algorithm it wraps. We support our claim with results on small scale minimum cost exact cover problem instances.

Authors: András Czégel, Boglárka G. -Tóth

We propose a new approach to utilize quantum computers for binary linear programming (BLP), which can be extended to general integer linear programs (ILP). Quantum optimization algorithms, hybrid or quantum-only, are currently general purpose, standalone solvers for ILP. However, to consider them practically useful, we expect them to overperform the current state of the art classical solvers. That expectation is unfair to quantum algorithms: in classical ILP solvers, after many decades of evolution, many different algorithms work together as a robust machine to get the best result. This is the approach we would like to follow now with our quantum 'solver' solutions. In this study we wrap any suitable quantum optimization algorithm into a quantum informed classical constraint generation framework. First we relax our problem by dropping all constraints and encode it into an Ising Hamiltonian for the quantum optimization subroutine. Then, by sampling from the solution state of the subroutine, we obtain information about constraint violations in the initial problem, from which we decide which coupling terms we need to introduce to the Hamiltonian. The coupling terms correspond to the constraints of the initial binary linear program. Then we optimize over the new Hamiltonian again, until we reach a feasible solution, or other stopping conditions hold. Since one can decide how many constraints they add to the Hamiltonian in a single step, our algorithm is at least as efficient as the (hybrid) quantum optimization algorithm it wraps. We support our claim with results on small scale minimum cost exact cover problem instances.

On the Hardness Hierarchy for the $O(n \sqrt{\log n})$ Complexity in the Word RAM

from arXiv: Data Structures and Algorithms

Authors: Dominik Kempa, Tomasz Kociumaka

In this work, we study the relative hardness of fundamental problems with state-of-the-art word RAM algorithms that take $O(n\sqrt{\log n})$ time for instances described in $\Theta(n)$ machine words ($\Theta(n\log n)$ bits). This complexity class, one of six hardness levels identified by Chan and P\u{a}tra\c{s}cu [SODA 2010], includes diverse problems from several domains: Counting Inversions, string processing problems (BWT Construction, LZ77 Factorization, Longest Common Substring, Batched Longest Previous Factor Queries, Batched Inverse Suffix Array Queries), and computational geometry tasks (Orthogonal Range Counting, Orthogonal Segment Intersection). We offer two main contributions: We establish new links between the above string problems and Dictionary Matching, a classic task solvable using the Aho-Corasick automaton. We restrict Dictionary Matching to instances with $O(n)$ binary patterns of length $m = O(\log n)$ each, and we prove that, unless these instances can be solved in $o(n\sqrt{\log n})$ time, the aforementioned string problems cannot be solved faster either. Via further reductions, we extend this hardness to Counting Inversions (a fundamental component in geometric algorithms) and thus to Orthogonal Range Counting and Orthogonal Segment Intersection. This hinges on String Nesting, a new problem which is equivalent to Dictionary Matching and can be reduced to Counting Inversions in three steps. Together, our results unveil a single problem, with two equivalent formulations, that underlies the hardness of nearly all major problems currently occupying the $O(n\sqrt{\log n})$ level of hardness. These results drastically funnel further efforts to improve the complexity of near-linear problems. As an auxiliary outcome of our framework, we also prove that the alphabet in several central string problems can be efficiently reduced to binary.

Authors: Dominik Kempa, Tomasz Kociumaka

In this work, we study the relative hardness of fundamental problems with state-of-the-art word RAM algorithms that take $O(n\sqrt{\log n})$ time for instances described in $\Theta(n)$ machine words ($\Theta(n\log n)$ bits). This complexity class, one of six hardness levels identified by Chan and P\u{a}tra\c{s}cu [SODA 2010], includes diverse problems from several domains: Counting Inversions, string processing problems (BWT Construction, LZ77 Factorization, Longest Common Substring, Batched Longest Previous Factor Queries, Batched Inverse Suffix Array Queries), and computational geometry tasks (Orthogonal Range Counting, Orthogonal Segment Intersection). We offer two main contributions: We establish new links between the above string problems and Dictionary Matching, a classic task solvable using the Aho-Corasick automaton. We restrict Dictionary Matching to instances with $O(n)$ binary patterns of length $m = O(\log n)$ each, and we prove that, unless these instances can be solved in $o(n\sqrt{\log n})$ time, the aforementioned string problems cannot be solved faster either. Via further reductions, we extend this hardness to Counting Inversions (a fundamental component in geometric algorithms) and thus to Orthogonal Range Counting and Orthogonal Segment Intersection. This hinges on String Nesting, a new problem which is equivalent to Dictionary Matching and can be reduced to Counting Inversions in three steps. Together, our results unveil a single problem, with two equivalent formulations, that underlies the hardness of nearly all major problems currently occupying the $O(n\sqrt{\log n})$ level of hardness. These results drastically funnel further efforts to improve the complexity of near-linear problems. As an auxiliary outcome of our framework, we also prove that the alphabet in several central string problems can be efficiently reduced to binary.