Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Wednesday, December 11

Postdoc at University of Toronto (apply by December 15, 2024)

from CCI: jobs

The theory group is also accepting applications for another postdoctoral position beginning in September 2025 to work with Prof. Sushant Sachdeva on topics including (but not limited to): design of fast algorithms, optimization, and mathematical machine learning. If you would like to apply specifically to this position, please cc sachdeva@cs.toronto.edu when submitting your application. Website: […]

The theory group is also accepting applications for another postdoctoral position beginning in September 2025 to work with Prof. Sushant Sachdeva on topics including (but not limited to): design of fast algorithms, optimization, and mathematical machine learning. If you would like to apply specifically to this position, please cc sachdeva@cs.toronto.edu when submitting your application.

Website: https://www.cs.toronto.edu/theory/positions.html
Email: sachdeva@cs.toronto.edu

By shacharlovett

It's Time to Stop Using Grades

from Computational Complexity

We use grades to evaluate students and motivate them to learn. That works as long as grades remain a reasonably good measure of how well the student understands the material in a class. But Goodhart's law, "When a measure becomes a target, it ceases to be a good measure," cannot escape even this most basic of academic measurements. Grades become irrelevant or even worse, counterproductive, as chasing grades may undermine a student's ability to master the material. So perhaps it is time to retire the measure.

Grading became a weaker measure due to grade inflation and academic dishonesty. Let's do a short dive into both of these areas.

The average grade has increased about a full grade level since I went to college in the '80s, and now more than half of all grades given are A's. As college tuition increased, students started thinking of college more transactionally, expecting more from their college experience while putting less effort into classes. Administrators put more weight on student course surveys for faculty evaluation, and the easiest way to improve scores is to give higher grades. And repeat.

If everyone gets an A, no one gets an A. It just becomes harder to distinguish the strong students from the merely good.

Academic dishonesty goes back to the beginning of academics but has advanced dramatically with technology. In my fraternity, we had filing cabinets full of old homework and exams ostensibly to use as study guides. However, if a professor reused questions from year to year, one could gain an unfair advantage.

With the growth of the Internet, Chegg, and more recently large-language models, those looking for an edge never had it so good. ChatGPT-4o1 can answer nearly any undergraduate exam question in any field—it even got an easy A when I tested it with one of my undergraduate theory of computing finals.

AI becomes like steroids: those who don't use it find themselves at a disadvantage. If a pretty good student sees their peers using LLMs, they'll start using them as well, initially just as a learning aid. But there's a very fine line between using AI as a study guide and using AI to give you the answers. Many fall down a slippery slope, and this starts to undermine the mastery that comes with tackling problems on your own.

We can try and counter all this by returning to harsher grading and more heavily weighting in-person, no-tech exams, but these approaches cause other problems. Already we see companies and graduate schools devalue grades and focus on projects and research instead.

So let's acknowledge this endgame and just eliminate grades, maybe keeping only Pass and Fail for those who don't even show up. The ones who want to master the material can focus on doing so. Others can concentrate on working on projects. Still others can earn their way to a degree with little effort but also with little reward.

By Lance Fortnow

We use grades to evaluate students and motivate them to learn. That works as long as grades remain a reasonably good measure of how well the student understands the material in a class. But Goodhart's law, "When a measure becomes a target, it ceases to be a good measure," cannot escape even this most basic of academic measurements. Grades become irrelevant or even worse, counterproductive, as chasing grades may undermine a student's ability to master the material. So perhaps it is time to retire the measure.

Grading became a weaker measure due to grade inflation and academic dishonesty. Let's do a short dive into both of these areas.

The average grade has increased about a full grade level since I went to college in the '80s, and now more than half of all grades given are A's. As college tuition increased, students started thinking of college more transactionally, expecting more from their college experience while putting less effort into classes. Administrators put more weight on student course surveys for faculty evaluation, and the easiest way to improve scores is to give higher grades. And repeat.

If everyone gets an A, no one gets an A. It just becomes harder to distinguish the strong students from the merely good.

Academic dishonesty goes back to the beginning of academics but has advanced dramatically with technology. In my fraternity, we had filing cabinets full of old homework and exams ostensibly to use as study guides. However, if a professor reused questions from year to year, one could gain an unfair advantage.

With the growth of the Internet, Chegg, and more recently large-language models, those looking for an edge never had it so good. ChatGPT-4o1 can answer nearly any undergraduate exam question in any field—it even got an easy A when I tested it with one of my undergraduate theory of computing finals.

AI becomes like steroids: those who don't use it find themselves at a disadvantage. If a pretty good student sees their peers using LLMs, they'll start using them as well, initially just as a learning aid. But there's a very fine line between using AI as a study guide and using AI to give you the answers. Many fall down a slippery slope, and this starts to undermine the mastery that comes with tackling problems on your own.

We can try and counter all this by returning to harsher grading and more heavily weighting in-person, no-tech exams, but these approaches cause other problems. Already we see companies and graduate schools devalue grades and focus on projects and research instead.

So let's acknowledge this endgame and just eliminate grades, maybe keeping only Pass and Fail for those who don't even show up. The ones who want to master the material can focus on doing so. Others can concentrate on working on projects. Still others can earn their way to a degree with little effort but also with little reward.

By Lance Fortnow

News for November 2024

from Property Testing Review

Alas, another late post! But we compensate for that. With a rich hoard of ten papers this month, covering quantum property testing, distribution testing, matchings, Steiner trees, linearity testing, and dealing with corruptions. And some papers on multiple topics simultaneously. Uniformity testing when you have the source code by Clément L. Canonne, Robin Kothari, and […]

Alas, another late post! But we compensate for that. With a rich hoard of ten papers this month, covering quantum property testing, distribution testing, matchings, Steiner trees, linearity testing, and dealing with corruptions. And some papers on multiple topics simultaneously.

Uniformity testing when you have the source code by Clément L. Canonne, Robin Kothari, and Ryan O’Donnell (arXiv). Consider classic problem of uniformity testing, where the domain is \([d]\). We all know by now that the complexity of uniformity testing is \(\Theta(\sqrt{d}/\varepsilon^2)\), where \(\varepsilon\) is the proximity parameter. Suppose the distribution was the output of an algorithm (like, a hash function) and we have access to the source code. (The source code is thought of as a circuit that outputs a single domain element.) Can we beat this bound? Surprisingly, the answer is yes, when the tester is allowed to be quantum. A single “sample” is basically a run of the code. The main result is an upper bound of \(O(d^{1/3}/\varepsilon^{4/3})\) (with some additional terms for the low \(\varepsilon\) setting). Tight lower bounds for this setting are still unknown, so this research direction of “distribution testing with the source code” may lead to many more results.

Coherence in Property Testing: Quantum-Classical Collapses and Separations by Fernando Jeronimo, Nir Magrafta, Joseph Slote, and Pei Wu (ECCC). The distribution testing problem again, where the domain is \(\{0,1\}^n\) and thought of as exponentially large. To distinguish (say) a uniform distribution with support size \(2^{n/8}\) vs support size \(2^{n/4}\) would require exponentially many (\(2^{n/16}\)) samples. From a quantum standpoint, we can restrict the input distribution to be coherent. Intuitively, this is analogous to being restricted to a subcube. Even then, the paper shows that the testing problem requires exponentially many samples. To show a separation between classical and quantum algorithms, the paper gives a model of interactive protocols for distribution testing (which has been studied before in the classical setting, like Chiesa-Gur). In this setting, the paper gives a quantum algorithm that runs in \(\textrm{poly}(n)\) time with polynomially many proof strings, and can approximate the support size of coherent distributions. Classical algorithms, on the other hand, still require exponentially many queries, even with access to proof strings.

Testing classical properties from quantum data by Matthias C. Caro, Preksha Naik, and Joseph Slote (arXiv). Property testing gains its power because of query access, since the tester can “zero in” on the relevant portion of the data. Sample based testers often have far worse complexities. Such testers only get access to random samples of the data/function. Consider access to a function \(f:\{0,1\}^n \rightarrow \{0,1\}\). The classic property of monotonicity can be tested in \(\sqrt{n}\) queries (ignoring the proximity parameter dependencies), but requires \(2^{\Theta(\sqrt{n})}\) samples. The paper studies sample based property testing, except with quantum “samples”. In the quantum world, the function \(f\) is stored/represented as a quantum superposition of states. Quantum samples can obtain richer information, like sampling the Fourier coefficients of \(f\). (Quantum queries give even stronger access.) This allows for many classical query-based property testers to become quantum sample-based testers. This paper gives results for many fundamental properties like monotonicity, symmetry, and triangle freeness.

Tolerant Testing of Stabilizer States with Mixed State Inputs by Vishnu Iyer and Daniel Liang (arXiv). Another quantum testing paper, but here, the property itself is quantum. The input is a quantum state, and the aim is test the property of being a stabilizer state. In all previous work on property testing of being a stabilizer, it was assumed that the input is a pure state. (One should think of a pure state as a sort of “basis” state, while a mixed state is a linear combination of these states.) Given noise, one would typically expect the input to be mixed. This paper gives the first tolerant tester for stabilizer states, where the input is allowed to be a mixed state.

Stochastic Matching via In-n-Out Local Computation Algorithms by Amir Azarmehr, Soheil Behnezhad, Alma Ghafari, and Ronitt Rubinfeld (arXiv). This is not a sublinear result by itself, but has connections that should interest our readers. The main problem is stochastic matching. We are given a graph \(G\). An input \(G_p\) is generated by keeping each edge independently with probability \(p\). The aim is to construct a sparse graph \(H\) such that the expected matching size of \(H \cap G_p\) is close to the expected matching size of \(G_p\). After a long line of work, this paper shows that there exists a subgraph \(H\) of \(\textrm{poly}(1/p)\) degree whose expected matching size is a \((1-\varepsilon)\)-approximation of the overall expectation. The main analysis technique is to study a local computation algorithm (LCA) for computing the maximum matching. An LCA essentially gives local access to a large output (say, a matching or coloring) such that each matching edge (say) of the output can be computed with sublinear lookups to the input. The standard complexity measure is the number of lookups of the input to compute a matching edge. But this paper looks at the “in complexity”, which is the number of queries that lookup a given edge. When both complexities are small, one can show a “decorrelation” of the overall output, which is used in the analysis.

Nearly Tight Bounds on Testing of Metric Properties by Yiqiao Bao, Sampath Kannan, and Erik Waingarten (arXiv). Consider an \(n \times n\) matrix of “distances” between \(n\) points. The aim is to test the fundamental property of the distances forming a metric. Despite a long line of work studying property testing of distances, points, etc., there has surprisingly been no result on this problem. The earliest work in this line is by Parnas-Ron, studying the property of being a tree metric or ultrametric (which satisfy a stronger version of triangle inequality). This paper gives a \(O(n^{2/3}/\varepsilon^{4/3})\) query algorithm for property testing metrics. There is a also a nearly matching lower bound, if one assumes that the dependence on \(\varepsilon\) is polynomial. For the problems of testing tree metrics (and ultrametrics), the paper gives a \(O(1/\varepsilon^2)\) query algorithm, an improvement from the previous bound of \(O(1/\varepsilon^6)\).

Sublinear Metric Steiner Tree via Improved Bounds for Set Cover by Sepideh Mahabadi, Mohammad Roghani, Jakub Tarnawski, Ali Vakilian (arXiv). This paper is on the classic metric Steiner tree problem. The input is an \(n \times n\) matrix of distances in a metric and a subset \(S\) of points. The aim is to estimate the minimum weight Steiner tree (a tree that connects all the terminals \(S\)). Getting a \(2\)-approximation is quite easy, since one can just consider a spanning tree on \(S\). A result of Chen-Khanna-Tan give a \((2-\eta)\)-approximation algorithm that runs in \(O(n^{13/7})\) queries (for some positive constant \(\eta\)). This paper gets the same approximation with \(O(n^{5/3})\) queries. The main technical work goes in designing a sublinear algorithm for a version of set cover problem, where the input set system is given as a matrix.

On Optimal Testing of Linearity by Vipul Arora, Esty Kelman, Uri Meir (arXiv). In property testing, it’s hard to get more fundamental than linearity testing. What is there new to say about this well-studied problem? This paper studies linearity testing under the online manipulations model of Kalemaj-Raskhodnikova-Varma. In this model, after every query of the tester, an adversary corrupts up to \(t\) entries on the input. Previous results show that linearity testing can be done in \(O(\log t + 1/\varepsilon)\) queries. But, the paper notes, all previous results require \(t \leq 2^{n/c}\) for some \(c \geq 2\). How far can be push \(t\)? Remarkably, the main result shows that \(t\) can be as large as \(2^n/\varepsilon^2\) and linearity testing is still feasible under online manipulations. The next result of this paper studies the property of low degree polynomials over the reals. The paper gives an optimal \(O(1/\varepsilon)\)-query tester for the property of being a \(d\)-degree polynomial.

Online versus Offline Adversaries in Property Testing by Esty Kelman, Ephraim Linder, and Sofya Raskhodnikova (arXiv). This paper is related to the online manipulations model discussed above. There is also an offline “erasure” model, wherein some coordinates/bits of the input are erased by an adversary. The original paper of Kalemaj-Raskhodnikova-Varma showed that sortedness of arrays can be tested with offline erasures, but cannot be tested with online erasures even when \(t=1\). This paper proves a complementary theorem. There are properties that can be tested with a \(O(1)\) queries for \(t = O(n/\log n)\) in the online manipulations model, but requires \(\Omega(\log n)\) queries in the offline setting with \(\Theta(n/\log\log n)\) erasures. So the online and offline models are incomparable in power. There is a similar result for a setting where \(t\) is constant. The lower bounds are shown using a property regarding repeated characters in strings.

By Seshadhri

Random regular graph states are complex at almost any depth

from arXiv: Computational Complexity

Authors: Soumik Ghosh, Dominik Hangleiter, Jonas Helsen

Graph states are fundamental objects in the theory of quantum information due to their simple classical description and rich entanglement structure. They are also intimately related to IQP circuits, which have applications in quantum pseudorandomness and quantum advantage. For us, they are a toy model to understand the relation between circuit connectivity, entanglement structure and computational complexity. In the worst case, a strict dichotomy in the computational universality of such graph states appears as a function of the degree $d$ of a regular graph state [GDH+23]. In this paper, we initiate the study of the average-case complexity of simulating random graph states of varying degree when measured in random product bases and give distinct evidence that a similar complexity-theoretic dichotomy exists in the average case. Specifically, we consider random $d$-regular graph states and prove three distinct results: First, we exhibit two families of IQP circuits of depth $d$ and show that they anticoncentrate for any $2 < d = o(n)$ when measured in a random $X$-$Y$-plane product basis. This implies anticoncentration for random constant-regular graph states. Second, in the regime $d = \Theta(n^c)$ with $c \in (0,1)$, we prove that random $d$-regular graph states contain polynomially large grid graphs as induced subgraphs with high probability. This implies that they are universal resource states for measurement-based computation. Third, in the regime of high degree ($d\sim n/2$), we show that random graph states are not sufficiently entangled to be trivially classically simulable, unlike Haar random states. Proving the three results requires different techniques--the analysis of a classical statistical-mechanics model using Krawtchouck polynomials, graph theoretic analysis using the switching method, and analysis of the ranks of submatrices of random adjacency matrices, respectively.

Authors: Soumik Ghosh, Dominik Hangleiter, Jonas Helsen

Graph states are fundamental objects in the theory of quantum information due to their simple classical description and rich entanglement structure. They are also intimately related to IQP circuits, which have applications in quantum pseudorandomness and quantum advantage. For us, they are a toy model to understand the relation between circuit connectivity, entanglement structure and computational complexity. In the worst case, a strict dichotomy in the computational universality of such graph states appears as a function of the degree $d$ of a regular graph state [GDH+23]. In this paper, we initiate the study of the average-case complexity of simulating random graph states of varying degree when measured in random product bases and give distinct evidence that a similar complexity-theoretic dichotomy exists in the average case. Specifically, we consider random $d$-regular graph states and prove three distinct results: First, we exhibit two families of IQP circuits of depth $d$ and show that they anticoncentrate for any $2 < d = o(n)$ when measured in a random $X$-$Y$-plane product basis. This implies anticoncentration for random constant-regular graph states. Second, in the regime $d = \Theta(n^c)$ with $c \in (0,1)$, we prove that random $d$-regular graph states contain polynomially large grid graphs as induced subgraphs with high probability. This implies that they are universal resource states for measurement-based computation. Third, in the regime of high degree ($d\sim n/2$), we show that random graph states are not sufficiently entangled to be trivially classically simulable, unlike Haar random states. Proving the three results requires different techniques--the analysis of a classical statistical-mechanics model using Krawtchouck polynomials, graph theoretic analysis using the switching method, and analysis of the ranks of submatrices of random adjacency matrices, respectively.

NP-hardness and a PTAS for the Euclidean Steiner Line Problem

from arXiv: Data Structures and Algorithms

Authors: Simon Bartlmae, Paul J. Jünger, Elmar Langetepe

The Euclidean Steiner Tree Problem (EST) seeks a minimum-cost tree interconnecting a given set of terminal points in the Euclidean plane, allowing the use of additional intersection points. In this paper, we consider two variants that include an additional straight line $\gamma$ with zero cost, which must be incorporated into the tree. In the Euclidean Steiner fixed Line Problem (ESfL), this line is given as input and can be treated as a terminal. In contrast, the Euclidean Steiner Line Problem (ESL) requires determining the optimal location of $\gamma$. Despite recent advances, including heuristics and a 1.214-approximation algorithm for both problems, a formal proof of NP-hardness has remained open. In this work, we close this gap by proving that both the ESL and ESfL are NP-hard. Additionally, we prove that both problems admit a polynomial-time approximation scheme (PTAS), by demonstrating that approximation algorithms for the EST can be adapted to the ESL and ESfL with appropriate modifications. Specifically, we show ESfL$\leq_{\text{PTAS}}$EST and ESL$\leq_{\text{PTAS}}$EST, i.e., provide a PTAS reduction to the EST.

Authors: Simon Bartlmae, Paul J. Jünger, Elmar Langetepe

The Euclidean Steiner Tree Problem (EST) seeks a minimum-cost tree interconnecting a given set of terminal points in the Euclidean plane, allowing the use of additional intersection points. In this paper, we consider two variants that include an additional straight line $\gamma$ with zero cost, which must be incorporated into the tree. In the Euclidean Steiner fixed Line Problem (ESfL), this line is given as input and can be treated as a terminal. In contrast, the Euclidean Steiner Line Problem (ESL) requires determining the optimal location of $\gamma$. Despite recent advances, including heuristics and a 1.214-approximation algorithm for both problems, a formal proof of NP-hardness has remained open. In this work, we close this gap by proving that both the ESL and ESfL are NP-hard. Additionally, we prove that both problems admit a polynomial-time approximation scheme (PTAS), by demonstrating that approximation algorithms for the EST can be adapted to the ESL and ESfL with appropriate modifications. Specifically, we show ESfL$\leq_{\text{PTAS}}$EST and ESL$\leq_{\text{PTAS}}$EST, i.e., provide a PTAS reduction to the EST.

Automated Discovery of Branching Rules with Optimal Complexity for the Maximum Independent Set Problem

from arXiv: Data Structures and Algorithms

Authors: Xuan-Zhao Gao, Yi-Jia Wang, Pan Zhang, Jin-Guo Liu

The branching algorithm is a fundamental technique for designing fast exponential-time algorithms to solve combinatorial optimization problems exactly. It divides the entire solution space into independent search branches using predetermined branching rules, and ignores the search on suboptimal branches to reduce the time complexity. The complexity of a branching algorithm is primarily determined by the branching rules it employs, which are often designed by human experts. In this paper, we show how to automate this process with a focus on the maximum independent set problem. The main contribution is an algorithm that efficiently generate optimal branching rules for a given sub-graph with tens of vertices. Its efficiency enables us to generate the branching rules on-the-fly, which is provably optimal and significantly reduces the number of branches compared to existing methods that rely on expert-designed branching rules. Numerical experiment on 3-regular graphs shows an average complexity of O(1.0441^n) can be achieved, better than any previous methods.

Authors: Xuan-Zhao Gao, Yi-Jia Wang, Pan Zhang, Jin-Guo Liu

The branching algorithm is a fundamental technique for designing fast exponential-time algorithms to solve combinatorial optimization problems exactly. It divides the entire solution space into independent search branches using predetermined branching rules, and ignores the search on suboptimal branches to reduce the time complexity. The complexity of a branching algorithm is primarily determined by the branching rules it employs, which are often designed by human experts. In this paper, we show how to automate this process with a focus on the maximum independent set problem. The main contribution is an algorithm that efficiently generate optimal branching rules for a given sub-graph with tens of vertices. Its efficiency enables us to generate the branching rules on-the-fly, which is provably optimal and significantly reduces the number of branches compared to existing methods that rely on expert-designed branching rules. Numerical experiment on 3-regular graphs shows an average complexity of O(1.0441^n) can be achieved, better than any previous methods.

Parallel simulation for sampling under isoperimetry and score-based diffusion models

from arXiv: Data Structures and Algorithms

Authors: Huanjian Zhou, Masashi Sugiyama

In recent years, there has been a surge of interest in proving discretization bounds for sampling under isoperimetry and for diffusion models. As data size grows, reducing the iteration cost becomes an important goal. Inspired by the great success of the parallel simulation of the initial value problem in scientific computation, we propose parallel Picard methods for sampling tasks. Rigorous theoretical analysis reveals that our algorithm achieves better dependence on dimension $d$ than prior works in iteration complexity (i.e., reduced from $\widetilde{O}(\log^2 d)$ to $\widetilde{O}(\log d)$), which is even optimal for sampling under isoperimetry with specific iteration complexity. Our work highlights the potential advantages of simulation methods in scientific computation for dynamics-based sampling and diffusion models.

Authors: Huanjian Zhou, Masashi Sugiyama

In recent years, there has been a surge of interest in proving discretization bounds for sampling under isoperimetry and for diffusion models. As data size grows, reducing the iteration cost becomes an important goal. Inspired by the great success of the parallel simulation of the initial value problem in scientific computation, we propose parallel Picard methods for sampling tasks. Rigorous theoretical analysis reveals that our algorithm achieves better dependence on dimension $d$ than prior works in iteration complexity (i.e., reduced from $\widetilde{O}(\log^2 d)$ to $\widetilde{O}(\log d)$), which is even optimal for sampling under isoperimetry with specific iteration complexity. Our work highlights the potential advantages of simulation methods in scientific computation for dynamics-based sampling and diffusion models.

Covered Forest: Fine-grained generalization analysis of graph neural networks

from arXiv: Data Structures and Algorithms

Authors: Antonis Vasileiou, Ben Finkelshtein, Floris Geerts, Ron Levie, Christopher Morris

The expressive power of message-passing graph neural networks (MPNNs) is reasonably well understood, primarily through combinatorial techniques from graph isomorphism testing. However, MPNNs' generalization abilities -- making meaningful predictions beyond the training set -- remain less explored. Current generalization analyses often overlook graph structure, limit the focus to specific aggregation functions, and assume the impractical, hard-to-optimize $0$-$1$ loss function. Here, we extend recent advances in graph similarity theory to assess the influence of graph structure, aggregation, and loss functions on MPNNs' generalization abilities. Our empirical study supports our theoretical insights, improving our understanding of MPNNs' generalization properties.

Authors: Antonis Vasileiou, Ben Finkelshtein, Floris Geerts, Ron Levie, Christopher Morris

The expressive power of message-passing graph neural networks (MPNNs) is reasonably well understood, primarily through combinatorial techniques from graph isomorphism testing. However, MPNNs' generalization abilities -- making meaningful predictions beyond the training set -- remain less explored. Current generalization analyses often overlook graph structure, limit the focus to specific aggregation functions, and assume the impractical, hard-to-optimize $0$-$1$ loss function. Here, we extend recent advances in graph similarity theory to assess the influence of graph structure, aggregation, and loss functions on MPNNs' generalization abilities. Our empirical study supports our theoretical insights, improving our understanding of MPNNs' generalization properties.

Streaming Private Continual Counting via Binning

from arXiv: Data Structures and Algorithms

Authors: Joel Daniel Andersson, Rasmus Pagh

In differential privacy, $\textit{continual observation}$ refers to problems in which we wish to continuously release a function of a dataset that is revealed one element at a time. The challenge is to maintain a good approximation while keeping the combined output over all time steps differentially private. In the special case of $\textit{continual counting}$ we seek to approximate a sum of binary input elements. This problem has received considerable attention lately, in part due to its relevance in implementations of differentially private stochastic gradient descent. $\textit{Factorization mechanisms}$ are the leading approach to continual counting, but the best such mechanisms do not work well in $\textit{streaming}$ settings since they require space proportional to the size of the input. In this paper, we present a simple approach to approximating factorization mechanisms in low space via $\textit{binning}$, where adjacent matrix entries with similar values are changed to be identical in such a way that a matrix-vector product can be maintained in sublinear space. Our approach has provable sublinear space guarantees for a class of lower triangular matrices whose entries are monotonically decreasing away from the diagonal. We show empirically that even with very low space usage we are able to closely match, and sometimes surpass, the performance of asymptotically optimal factorization mechanisms. Recently, and independently of our work, Dvijotham et al. have also suggested an approach to implementing factorization mechanisms in a streaming setting. Their work differs from ours in several respects: It only addresses factorization into $\textit{Toeplitz}$ matrices, only considers $\textit{maximum}$ error, and uses a different technique based on rational function approximation that seems less versatile than our binning approach.

Authors: Joel Daniel Andersson, Rasmus Pagh

In differential privacy, $\textit{continual observation}$ refers to problems in which we wish to continuously release a function of a dataset that is revealed one element at a time. The challenge is to maintain a good approximation while keeping the combined output over all time steps differentially private. In the special case of $\textit{continual counting}$ we seek to approximate a sum of binary input elements. This problem has received considerable attention lately, in part due to its relevance in implementations of differentially private stochastic gradient descent. $\textit{Factorization mechanisms}$ are the leading approach to continual counting, but the best such mechanisms do not work well in $\textit{streaming}$ settings since they require space proportional to the size of the input. In this paper, we present a simple approach to approximating factorization mechanisms in low space via $\textit{binning}$, where adjacent matrix entries with similar values are changed to be identical in such a way that a matrix-vector product can be maintained in sublinear space. Our approach has provable sublinear space guarantees for a class of lower triangular matrices whose entries are monotonically decreasing away from the diagonal. We show empirically that even with very low space usage we are able to closely match, and sometimes surpass, the performance of asymptotically optimal factorization mechanisms. Recently, and independently of our work, Dvijotham et al. have also suggested an approach to implementing factorization mechanisms in a streaming setting. Their work differs from ours in several respects: It only addresses factorization into $\textit{Toeplitz}$ matrices, only considers $\textit{maximum}$ error, and uses a different technique based on rational function approximation that seems less versatile than our binning approach.

Massively Parallel Algorithms for Approximate Shortest Paths

from arXiv: Data Structures and Algorithms

Authors: Michal Dory, Shaked Matar

We present fast algorithms for approximate shortest paths in the massively parallel computation (MPC) model. We provide randomized algorithms that take $poly(\log{\log{n}})$ rounds in the near-linear memory MPC model. Our results are for unweighted undirected graphs with $n$ vertices and $m$ edges. Our first contribution is a $(1+\epsilon)$-approximation algorithm for Single-Source Shortest Paths (SSSP) that takes $poly(\log{\log{n}})$ rounds in the near-linear MPC model, where the memory per machine is $\tilde{O}(n)$ and the total memory is $\tilde{O}(mn^{\rho})$, where $\rho$ is a small constant. Our second contribution is a distance oracle that allows to approximate the distance between any pair of vertices. The distance oracle is constructed in $poly(\log{\log{n}})$ rounds and allows to query a $(1+\epsilon)(2k-1)$-approximate distance between any pair of vertices $u$ and $v$ in $O(1)$ additional rounds. The algorithm is for the near-linear memory MPC model with total memory of size $\tilde{O}((m+n^{1+\rho})n^{1/k})$, where $\rho$ is a small constant. While our algorithms are for the near-linear MPC model, in fact they only use one machine with $\tilde{O}(n)$ memory, where the rest of machines can have sublinear memory of size $O(n^{\gamma})$ for a small constant $\gamma < 1$. All previous algorithms for approximate shortest paths in the near-linear MPC model either required $\Omega(\log{n})$ rounds or had an $\Omega(\log{n})$ approximation. Our approach is based on fast construction of near-additive emulators, limited-scale hopsets and limited-scale distance sketches that are tailored for the MPC model. While our end-results are for the near-linear MPC model, many of the tools we construct such as hopsets and emulators are constructed in the more restricted sublinear MPC model.

Authors: Michal Dory, Shaked Matar

We present fast algorithms for approximate shortest paths in the massively parallel computation (MPC) model. We provide randomized algorithms that take $poly(\log{\log{n}})$ rounds in the near-linear memory MPC model. Our results are for unweighted undirected graphs with $n$ vertices and $m$ edges. Our first contribution is a $(1+\epsilon)$-approximation algorithm for Single-Source Shortest Paths (SSSP) that takes $poly(\log{\log{n}})$ rounds in the near-linear MPC model, where the memory per machine is $\tilde{O}(n)$ and the total memory is $\tilde{O}(mn^{\rho})$, where $\rho$ is a small constant. Our second contribution is a distance oracle that allows to approximate the distance between any pair of vertices. The distance oracle is constructed in $poly(\log{\log{n}})$ rounds and allows to query a $(1+\epsilon)(2k-1)$-approximate distance between any pair of vertices $u$ and $v$ in $O(1)$ additional rounds. The algorithm is for the near-linear memory MPC model with total memory of size $\tilde{O}((m+n^{1+\rho})n^{1/k})$, where $\rho$ is a small constant. While our algorithms are for the near-linear MPC model, in fact they only use one machine with $\tilde{O}(n)$ memory, where the rest of machines can have sublinear memory of size $O(n^{\gamma})$ for a small constant $\gamma < 1$. All previous algorithms for approximate shortest paths in the near-linear MPC model either required $\Omega(\log{n})$ rounds or had an $\Omega(\log{n})$ approximation. Our approach is based on fast construction of near-additive emulators, limited-scale hopsets and limited-scale distance sketches that are tailored for the MPC model. While our end-results are for the near-linear MPC model, many of the tools we construct such as hopsets and emulators are constructed in the more restricted sublinear MPC model.

Tuesday, December 10

The Google Willow thing

from Scott Aaronson

Yesterday I arrived in Santa Clara for the Q2B (Quantum 2 Business) conference, which starts this morning, and where I’ll be speaking Thursday on “Quantum Algorithms in 2024: How Should We Feel?” and also closing the conference via an Ask-Us-Anything session with John Preskill. (If you’re at Q2B, reader, come and say hi!) And to […]

Yesterday I arrived in Santa Clara for the Q2B (Quantum 2 Business) conference, which starts this morning, and where I’ll be speaking Thursday on “Quantum Algorithms in 2024: How Should We Feel?” and also closing the conference via an Ask-Us-Anything session with John Preskill. (If you’re at Q2B, reader, come and say hi!)

And to coincide with Q2B, yesterday Google’s Quantum group officially announced “Willow,” its new 105-qubit superconducting chip with which it’s demonstrated an error-corrected surface code qubit as well as a new, bigger quantum supremacy experiment based on Random Circuit Sampling. I was lucky to be able to attend Google’s announcement ceremony yesterday afternoon at the Computer History Museum in Mountain View, where friend-of-the-blog-for-decades Dave Bacon and other Google quantum people explained exactly what was done and took questions (the technical level was surprisingly high for this sort of event). I was also lucky to get a personal briefing last week from Google’s Sergio Boixo on what happened.

Meanwhile, yesterday Sundar Pichai tweeted about Willow, and Elon Musk replied “Wow.” It cannot be denied that those are both things that happened.

Anyway, all yesterday, I then read comments on Twitter, Hacker News, etc. complaining that, since there wasn’t yet a post on Shtetl-Optimized, how could anyone possibly know what to think of this?? For 20 years I’ve been trying to teach the world how to fish in Hilbert space, but (sigh) I suppose I’ll just hand out some more fish. So, here are my comments:

  1. Yes, this is great. Yes, it’s a real milestone for the field. To be clear: for anyone who’s been following experimental quantum computing these past five years (say, since Google’s original quantum supremacy milestone in 2019), there’s no particular shock here. Since 2019, Google has roughly doubled the number of qubits on its chip and, more importantly, increased the qubits’ coherence time by a factor of 5. Meanwhile, their 2-qubit gate fidelity is now roughly 99.7% (for controlled-Z gates) or 99.85% (for “iswap” gates), compared to ~99.5% in 2019. They then did the more impressive demonstrations that predictably become possible with more and better qubits. And yet, even if the progress is broadly in line with what most of us expected, it’s still of course immensely gratifying to see everything actually work! Huge congratulations to everyone on the Google team for a well-deserved success.
  2. I already blogged about this!!! Specifically, I blogged about Google’s fault-tolerance milestone when its preprint appeared on the arXiv back in August. To clarify, what we’re all talking about now is the same basic technical advance that Google already reported in August, except now with the PR blitz from Sundar Pichai on down, a Nature paper, an official name for the chip (“Willow”), and a bunch of additional details about it.
  3. Scientifically, the headline result is that, as they increase the size of their surface code, from 3×3 to 5×5 to 7×7, Google finds that their encoded logical qubit stays alive for longer rather than shorter. So, this is a very important threshold that’s now been crossed. As Dave Bacon put it to me, “eddies are now forming”—or, to switch metaphors, after 30 years we’re now finally tickling the tail of the dragon of quantum fault-tolerance, the dragon that (once fully awoken) will let logical qubits be preserved and acted on for basically arbitrary amounts of time, allowing scalable quantum computation.
  4. Having said that, Sergio Boixo tells me that Google will only consider itself to have created a “true” fault-tolerant qubit, once it can do fault-tolerant two-qubit gates with an error of ~10-6 (and thus, on the order of a million fault-tolerant operations before suffering a single error). We’re still some ways from that milestone: after all, in this experiment Google created only a single encoded qubit, and didn’t even try to do encoded operations on it, let alone on multiple encoded qubits. But all in good time. Please don’t ask me to predict how long, though empirically, the time from one major experimental QC milestone to the next now seems to be measured in years, which are longer than weeks but shorter than decades.
  5. Google has also announced a new quantum supremacy experiment on its 105-qubit chip, based on Random Circuit Sampling with 40 layers of gates. Notably, they say that, if you use the best currently-known simulation algorithms (based on Johnnie Gray’s optimized tensor network contraction), as well as an exascale supercomputer, their new experiment would take ~300 million years to simulate classically if memory is not an issue, or ~1025 years if memory is an issue (note that a mere ~1010 years have elapsed since the Big Bang). Probably some people have come here expecting me to debunk those numbers, but as far as I know they’re entirely correct, with the caveats stated. Naturally it’s possible that better classical simulation methods will be discovered, but meanwhile the experiments themselves will also rapidly improve.
  6. Having said that, the biggest caveat to the “1025 years” result is one to which I fear Google drew insufficient attention. Namely, for the exact same reason why (as far as anyone knows) this quantum computation would take ~1025 years for a classical computer to simulate, it would also take ~1025 years for a classical computer to directly verify the quantum computer’s results!! (For example, by computing the “Linear Cross-Entropy” score of the outputs.) For this reason, all validation of Google’s new supremacy experiment is indirect, based on extrapolations from smaller circuits, ones for which a classical computer can feasibly check the results. To be clear, I personally see no reason to doubt those extrapolations. But for anyone who wonders why I’ve been obsessing for years about the need to design efficiently verifiable near-term quantum supremacy experiments: well, this is why! We’re now deeply into the unverifiable regime that I warned about.
  7. In his remarks yesterday, Google Quantum AI leader Hartmut Neven talked about David Deutsch’s argument, way back in the 1990s, that quantum computers should force us to accept the reality of the Everettian multiverse, since “where else could the computation have happened, if it wasn’t being farmed out to parallel universes?” And naturally there was lots of debate about that on Hacker News and so forth. Let me confine myself here to saying that, in my view, the new experiment doesn’t add anything new to this old debate. It’s yet another confirmation of the predictions of quantum mechanics. What those predictions mean for our understanding of reality can continue to be argued as it’s been since the 1920s.
  8. Cade Metz did a piece about Google’s announcement for the New York Times. Alas, when Cade reached out to me for comment, I decided that it would be too awkward, after what Cade did to my friend Scott Alexander almost four years ago. I talked to several other journalists, such as Adrian Cho for Science.
  9. No doubt people will ask me what this means for superconducting qubits versus trapped-ion or neutral-atom or photonic qubits, or for Google versus its many competitors in experimental QC. And, I mean, it’s not bad for Google or for superconducting QC! These past couple years I’d sometimes commented that, since Google’s 2019 announcement of quantum supremacy via superconducting qubits, the trapped-ion and neutral-atom approaches had seemed to be pulling ahead, with spectacular results from Quantinuum (trapped-ion) and QuEra (neutral atoms) among others. One could think of Willow as Google’s reply, putting the ball in competitors’ courts likewise to demonstrate better logical qubit lifetime with increasing code size (or, better yet, full operations on logical qubits exceeding that threshold, without resorting to postselection). The great advantage of trapped-ion qubits continues to be that you can move the qubits around (and also, the two-qubit gate fidelities seem somewhat ahead of superconducting). But to compensate, superconducting qubits have the advantage that the gates are a thousand times faster, which makes feasible to do experiments that require collecting millions of samples.
  10. Of course the big question, the one on everyone’s lips, was always how quantum computing skeptic Gil Kalai was going to respond. But we need not wonder! On his blog, Gil writes: “We did not study yet these particular claims by Google Quantum AI but my general conclusion apply to them ‘Google Quantum AI’s claims (including published ones) should be approached with caution, particularly those of an extraordinary nature. These claims may stem from significant methodological errors and, as such, may reflect the researchers’ expectations more than objective scientific reality.’ ”  Most of Gil’s post is devoted to re-analyzing data from Google’s 2019 quantum supremacy experiment, which Gil continues to believe can’t possibly have done what was claimed. Gil’s problem is that the 2019 experiment was long ago superseded anyway: besides the new and more inarguable Google result, IBM, Quantinuum, QuEra, and USTC have now all also reported Random Circuit Sampling experiments with good results. I predict that Gil, and others who take it as axiomatic that scalable quantum computing is impossible, will continue to have their work cut out for them in this new world.

Update: Here’s Sabine Hossenfelder’s take. I don’t think she and I disagree about any of the actual facts; she just decided to frame things much more negatively. Ironically, I guess 20 years of covering hyped, dishonestly-presented non-milestones in quantum computing has inclined me to be pretty positive when a group puts in this much work, demonstrates a real milestone, and talks about it without obvious falsehoods!

By Scott

Qualitative quantitative models

from Ben Recht

A systems view of calcium homeostasis and its implications

Of the many examples of homeostatic systems in Cannon’s Wisdom of the Body, one of the cleanest and most beautiful examples is the regulation of Calcium. In 1932, it was evident that calcium facilitated bone growth, muscle control, and blood coagulation. Moreover, the concentration of calcium in the blood needed to be constant. People with low calcium blood levels would suffer from muscle stiffness and spasms. High calcium concentration would result in increased blood viscosity. By what mechanism did the body maintain the sweet spot of ideal calcium concentration?

Though there wasn’t a complete picture, the systems involved in calcium homeostasis were clear to Cannon. The parathyroid gland played a central role, as removing or damaging it caused dysregulation. Rodent experiments showed that calcium levels could be restored by surgically adding parathyroid tissue. Human studies demonstrated parathyroid extract was an effective treatment of hypocalcemia.1 The bones were also involved, as they held larger stores of calcium when animals were fed high calcium diets. Vitamin D was known to help facilitate absorption of calcium from the intestines, and vitamin D supplementation had effects similar to those of parathyroid extract.

How do these systems work in tandem? The first clean systems-level description I’ve found didn’t appear until 2002. In the Journal of Theoretical Biology, Hana El-Samad, Jesse Goff, and Mustafa Khammash derived a relatively simple physiological model I’ll describe here. Their model highlights a key feature of homeostatic systems.

The systems involved in the model are precisely those identified by Cannon. The calcium level in the bloodstream changes based on three signals. First, an external demand can remove calcium. Second, calcium can move from the bones to the blood. Third, calcium can move from the intestine to the blood.

The amount of calcium extracted from the bones is proportional to the level of parathyroid hormone (PTH) in the blood. The amount extracted from the intestines is proportional to the level of active vitamin D in the blood. And the amount of active vitamin D added to the blood at any time is proportional to the amount of PTH. PTH stimulates the kidney to activate vitamin D.

Now, we close the loop: the source of the PTH in the blood is the parathyroid gland. Experimental evidence shows the amount of PTH in the blood scales linearly with the deviation of the calcium concentration from its homeostatic setpoint. That is, the PTH in the bloodstream serves as an error signal.

Put this picture together: some external demand depletes calcium in the blood. In response, the parathyroid gland releases PTH. The rate of calcium resorption from the bone then increases linearly, and the rate of the rate of calcium extraction from the intestines increases linearly. This “double rate” system is the critical one. At steady state, the body must add a constant rate of calcium to the blood. For this rate to be constant, the change of the rate has to be zero. This second-order effect, governed by the vitamin D production of the kidneys, is directly proportional to the setpoint error in the calcium level. Thus, we must have that the setpoint error in calcium concentration is driven to a constant level.

The cascading effect mitigated by the kidney was critical. If only the bones were involved, we would know that the steady-state calcium level would be a constant, but we wouldn’t know which constant. With the two-stage system, a constant calcium level implies that the value equals the setpoint.

There’s another way to see that the system has to converge to a state where the setpoint error is zero. The amount of activated vitamin D in the blood increases whenever the setpoint error is not zero. The only way for that active vitamin D level to not increase forever is for the set point error to go to zero.

The two-stage structure with the kidney results in the rate of intestinal calcium extraction being proportional to the integral of the PTH level over time. In fact, the vitamin D level is the integral of the setpoint error over time. Hence, the change in the calcium level in response to external demand is equal to a linear combination of the setpoint error signal and its integral. In control theory, we call such a regulator a proportional-integral controller.

This picture, from Khammash’s 2021 survey on perfect adaptation, illustrates the feedback loop.

I purposefully wrote this blog without equations because, in some sense, they don’t matter. A high-level systems view illustrates how calcium homeostasis is maintained. A network of biochemical reactions regulates the calcium level, and we can view the regulatory cascade as a proportional-integral controller. This is a qualitative quantitative result. We are making a few assumptions about linearity, but these are fairly mild and consistent with experiments.

Moreover, the model’s equations are underspecified. They depend on at least four parameters whose values I don’t know. The remarkable fact is that no matter what the values of these parameters are, the system must converge to the correct setpoint if it converges at all. If there’s a setpoint, integral control finds it. If calcium demand unexpectedly changes, the body adapts, recovering the same calcium concentration as before the shock. The steady-state values of calcium storage in bones and activated vitamin D levels in the bloodstream change in the new environment to ensure the calcium concentration does not. This adaptation to unknown futures is a remarkable property of integral control, and I’ll discuss it in the next post.

Subscribe now

1

I love medical terminology where you just concatenate prefixes and suffixes together to make it sound like you understand something. Low calcium = hypocalcemia.

By Ben Recht

Postdoc at The Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard (apply by December 20, 2024)

from CCI: jobs

The Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard is seeking postdocs who focus on developing novel computational and algorithmic methods, with a strong bent towards developing the mathematical foundations of which algorithms are most suitable to biological applications and why. Experience in machine learning is critical; prior biology/medicine knowledge […]

The Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard is seeking postdocs who focus on developing novel computational and algorithmic methods, with a strong bent towards developing the mathematical foundations of which algorithms are most suitable to biological applications and why.

Experience in machine learning is critical; prior biology/medicine knowledge is a plus.

Website: https://academicjobsonline.org/ajo/jobs/28630
Email: ewsc-postdocs@broadinstitute.org

By shacharlovett

Multi-world Validated Asynchronous Byzantine Agreement

from Decentralized Thoughts

In this post, we show how to solve Validated Asynchronous Byzantine Agreement via the multi-world VABA approach of Abraham, Malkhi, and Spiegelman 2018. What is Validated Asynchronous Byzantine Agreement? Given $n$ parties, and at most $f<n/3$ of them can be malicious (this is the optimal ratio for consensus with asynchronous...

In this post, we show how to solve Validated Asynchronous Byzantine Agreement via the multi-world VABA approach of Abraham, Malkhi, and Spiegelman 2018. What is Validated Asynchronous Byzantine Agreement? Given $n$ parties, and at most $f<n/3$ of them can be malicious (this is the optimal ratio for consensus with asynchronous...

Living with asynchrony and eventually reaching agreement by combining binding and randomness

from Decentralized Thoughts

The prevailing view on fault-tolerant agreement is that it is impossible in asynchrony for deterministic protocols, but adding randomization solves the problem. This statement is confusing in two aspects: The FLP result says that any protocol, even a randomized protocol, must have a non-terminating execution where every configuration is bivalent...

The prevailing view on fault-tolerant agreement is that it is impossible in asynchrony for deterministic protocols, but adding randomization solves the problem. This statement is confusing in two aspects: The FLP result says that any protocol, even a randomized protocol, must have a non-terminating execution where every configuration is bivalent...

Direct Sums for Parity Decision Trees

from arXiv: Computational Complexity

Authors: Tyler Besselman, Mika Göös, Siyao Guo, Gilbert Maystre, Weiqiang Yuan

Direct sum theorems state that the cost of solving $k$ instances of a problem is at least $\Omega(k)$ times the cost of solving a single instance. We prove the first such results in the randomised parity decision tree model. We show that a direct sum theorem holds whenever (1) the lower bound for parity decision trees is proved using the discrepancy method; or (2) the lower bound is proved relative to a product distribution.

Authors: Tyler Besselman, Mika Göös, Siyao Guo, Gilbert Maystre, Weiqiang Yuan

Direct sum theorems state that the cost of solving $k$ instances of a problem is at least $\Omega(k)$ times the cost of solving a single instance. We prove the first such results in the randomised parity decision tree model. We show that a direct sum theorem holds whenever (1) the lower bound for parity decision trees is proved using the discrepancy method; or (2) the lower bound is proved relative to a product distribution.

How many continuous measurements are needed to learn a vector?

from arXiv: Computational Complexity

Authors: David Krieg, Erich Novak, Mario Ullrich

One can recover vectors from $\mathbb{R}^m$ with arbitrary precision, using only $\lceil \log_2(m+1)\rceil +1$ continuous measurements that are chosen adaptively. This surprising result is explained and discussed, and we present applications to infinite-dimensional approximation problems.

Authors: David Krieg, Erich Novak, Mario Ullrich

One can recover vectors from $\mathbb{R}^m$ with arbitrary precision, using only $\lceil \log_2(m+1)\rceil +1$ continuous measurements that are chosen adaptively. This surprising result is explained and discussed, and we present applications to infinite-dimensional approximation problems.

Query Complexity with Unknowns

from arXiv: Computational Complexity

Authors: Nikhil S. Mande, Karteek Sreenivasaiah

We initiate the study of a new model of query complexity of Boolean functions where, in addition to 0 and 1, the oracle can answer queries with ``unknown''. The query algorithm is expected to output the function value if it can be conclusively determined by the partial information gathered, and it must output ``unknown'' if not. We formalize this model by using Kleene's strong logic of indeterminacy on three variables to capture unknowns. We call this model the `u-query model'. We relate the query complexity of functions in the new u-query model with their analogs in the standard query model. We show an explicit function that is exponentially harder in the u-query model than in the usual query model. We give sufficient conditions for a function to have u-query complexity asymptotically the same as its query complexity. Using u-query analogs of the combinatorial measures of sensitivity, block sensitivity, and certificate complexity, we show that deterministic, randomized, and quantum u-query complexities of all total Boolean functions are polynomially related to each other, just as in the usual query models.

Authors: Nikhil S. Mande, Karteek Sreenivasaiah

We initiate the study of a new model of query complexity of Boolean functions where, in addition to 0 and 1, the oracle can answer queries with ``unknown''. The query algorithm is expected to output the function value if it can be conclusively determined by the partial information gathered, and it must output ``unknown'' if not. We formalize this model by using Kleene's strong logic of indeterminacy on three variables to capture unknowns. We call this model the `u-query model'. We relate the query complexity of functions in the new u-query model with their analogs in the standard query model. We show an explicit function that is exponentially harder in the u-query model than in the usual query model. We give sufficient conditions for a function to have u-query complexity asymptotically the same as its query complexity. Using u-query analogs of the combinatorial measures of sensitivity, block sensitivity, and certificate complexity, we show that deterministic, randomized, and quantum u-query complexities of all total Boolean functions are polynomially related to each other, just as in the usual query models.

Plagiarism Detection Using Machine Learning

from arXiv: Computational Complexity

Authors: Omraj Kamat, Tridib Ghosh, Kalaivani J, Angayarkanni V, Rama P

Plagiarism is an act of using someone else's work without proper acknowledgment, and this sin is seen to cut across various arenas including the academy, publishing, and other similar arenas. The traditional methods of plagiarism detection through keyword matching and review by humans usually fail to cope with increasingly sophisticated techniques used to mask copy pasted content. This paper aims to introduce a plagiarism detection approach based on machine learning that utilizes natural language processing and complex classification algorithms toward efficient detection of similarities between the documents. The developed model has the capability to detect both exact and paraphrased plagiarism accurately using advanced feature extraction techniques with supervised learning algorithms. We adapted and tested our model on an extensive text sample dataset. And we demonstrated some promising results about precision, recall, and detection accuracy. These outcomes showed that applying machine learning techniques can significantly enhance the functionalities of plagiarism detection systems and improve traditional ones with robust scalability. Future work would include enlargement of the dataset and fine-tuning of the model toward more complicated cases of disguised plagiarism.

Authors: Omraj Kamat, Tridib Ghosh, Kalaivani J, Angayarkanni V, Rama P

Plagiarism is an act of using someone else's work without proper acknowledgment, and this sin is seen to cut across various arenas including the academy, publishing, and other similar arenas. The traditional methods of plagiarism detection through keyword matching and review by humans usually fail to cope with increasingly sophisticated techniques used to mask copy pasted content. This paper aims to introduce a plagiarism detection approach based on machine learning that utilizes natural language processing and complex classification algorithms toward efficient detection of similarities between the documents. The developed model has the capability to detect both exact and paraphrased plagiarism accurately using advanced feature extraction techniques with supervised learning algorithms. We adapted and tested our model on an extensive text sample dataset. And we demonstrated some promising results about precision, recall, and detection accuracy. These outcomes showed that applying machine learning techniques can significantly enhance the functionalities of plagiarism detection systems and improve traditional ones with robust scalability. Future work would include enlargement of the dataset and fine-tuning of the model toward more complicated cases of disguised plagiarism.

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

from arXiv: Computational Complexity

Authors: Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

In this paper, we analyze the computational limitations of Mamba and State-space Models (SSMs) by using the circuit complexity framework. Despite Mamba's stateful design and recent attention as a strong candidate to outperform Transformers, we have demonstrated that both Mamba and SSMs with $\mathrm{poly}(n)$-precision and constant-depth layers reside within the $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$ complexity class. This result indicates Mamba has the same computational capabilities as Transformer theoretically, and it cannot solve problems like arithmetic formula problems, boolean formula value problems, and permutation composition problems if $\mathsf{TC}^0 \neq \mathsf{NC}^1$. Therefore, it challenges the assumption Mamba is more computationally expressive than Transformers. Our contributions include rigorous proofs showing that Selective SSM and Mamba architectures can be simulated by $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$ circuits, and they cannot solve problems outside $\mathsf{TC}^0$.

Authors: Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

In this paper, we analyze the computational limitations of Mamba and State-space Models (SSMs) by using the circuit complexity framework. Despite Mamba's stateful design and recent attention as a strong candidate to outperform Transformers, we have demonstrated that both Mamba and SSMs with $\mathrm{poly}(n)$-precision and constant-depth layers reside within the $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$ complexity class. This result indicates Mamba has the same computational capabilities as Transformer theoretically, and it cannot solve problems like arithmetic formula problems, boolean formula value problems, and permutation composition problems if $\mathsf{TC}^0 \neq \mathsf{NC}^1$. Therefore, it challenges the assumption Mamba is more computationally expressive than Transformers. Our contributions include rigorous proofs showing that Selective SSM and Mamba architectures can be simulated by $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$ circuits, and they cannot solve problems outside $\mathsf{TC}^0$.

PAC codes with Bounded-Complexity Sequential Decoding: Pareto Distribution and Code Design

from arXiv: Computational Complexity

Authors: Mohsen Moradi, Hessam Mahdavifar

Recently, a novel variation of polar codes known as polarization-adjusted convolutional (PAC) codes has been introduced by Ar{\i}kan. These codes significantly outperform conventional polar and convolutional codes, particularly for short codeword lengths, and are shown to operate very close to the optimal bounds. It has also been shown that if the rate profile of PAC codes does not adhere to certain polarized cutoff rate constraints, the computation complexity for their sequential decoding grows exponentially. In this paper, we address the converse problem, demonstrating that if the rate profile of a PAC code follows the polarized cutoff rate constraints, the required computations for its sequential decoding can be bounded with a distribution that follows a Pareto distribution. This serves as a guideline for the rate-profile design of PAC codes. For a high-rate PAC\,$(1024,899)$ code, simulation results show that the PAC code with Fano decoder, when constructed based on the polarized cutoff rate constraints, achieves a coding gain of more than $0.75$ dB at a frame error rate (FER) of $10^{-5}$ compared to the state-of-the-art 5G polar and LDPC codes.

Authors: Mohsen Moradi, Hessam Mahdavifar

Recently, a novel variation of polar codes known as polarization-adjusted convolutional (PAC) codes has been introduced by Ar{\i}kan. These codes significantly outperform conventional polar and convolutional codes, particularly for short codeword lengths, and are shown to operate very close to the optimal bounds. It has also been shown that if the rate profile of PAC codes does not adhere to certain polarized cutoff rate constraints, the computation complexity for their sequential decoding grows exponentially. In this paper, we address the converse problem, demonstrating that if the rate profile of a PAC code follows the polarized cutoff rate constraints, the required computations for its sequential decoding can be bounded with a distribution that follows a Pareto distribution. This serves as a guideline for the rate-profile design of PAC codes. For a high-rate PAC\,$(1024,899)$ code, simulation results show that the PAC code with Fano decoder, when constructed based on the polarized cutoff rate constraints, achieves a coding gain of more than $0.75$ dB at a frame error rate (FER) of $10^{-5}$ compared to the state-of-the-art 5G polar and LDPC codes.

On the Expressive Power of Modern Hopfield Networks

from arXiv: Computational Complexity

Authors: Xiaoyu Li, Yuanpeng Li, Yingyu Liang, Zhenmei Shi, Zhao Song

Modern Hopfield networks (MHNs) have emerged as powerful tools in deep learning, capable of replacing components such as pooling layers, LSTMs, and attention mechanisms. Recent advancements have enhanced their storage capacity, retrieval speed, and error rates. However, the fundamental limits of their computational expressiveness remain unexplored. Understanding the expressive power of MHNs is crucial for optimizing their integration into deep learning architectures. In this work, we establish rigorous theoretical bounds on the computational capabilities of MHNs using circuit complexity theory. Our key contribution is that we show that MHNs are $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$. Hence, unless $\mathsf{TC}^0 = \mathsf{NC}^1$, a $\mathrm{poly}(n)$-precision modern Hopfield networks with a constant number of layers and $O(n)$ hidden dimension cannot solve $\mathsf{NC}^1$-hard problems such as the undirected graph connectivity problem and the tree isomorphism problem. We also extended our results to Kernelized Hopfield Networks. These results demonstrate the limitation in the expressive power of the modern Hopfield networks. Moreover, Our theoretical analysis provides insights to guide the development of new Hopfield-based architectures.

Authors: Xiaoyu Li, Yuanpeng Li, Yingyu Liang, Zhenmei Shi, Zhao Song

Modern Hopfield networks (MHNs) have emerged as powerful tools in deep learning, capable of replacing components such as pooling layers, LSTMs, and attention mechanisms. Recent advancements have enhanced their storage capacity, retrieval speed, and error rates. However, the fundamental limits of their computational expressiveness remain unexplored. Understanding the expressive power of MHNs is crucial for optimizing their integration into deep learning architectures. In this work, we establish rigorous theoretical bounds on the computational capabilities of MHNs using circuit complexity theory. Our key contribution is that we show that MHNs are $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$. Hence, unless $\mathsf{TC}^0 = \mathsf{NC}^1$, a $\mathrm{poly}(n)$-precision modern Hopfield networks with a constant number of layers and $O(n)$ hidden dimension cannot solve $\mathsf{NC}^1$-hard problems such as the undirected graph connectivity problem and the tree isomorphism problem. We also extended our results to Kernelized Hopfield Networks. These results demonstrate the limitation in the expressive power of the modern Hopfield networks. Moreover, Our theoretical analysis provides insights to guide the development of new Hopfield-based architectures.

Lossless Model Compression via Joint Low-Rank Factorization Optimization

from arXiv: Computational Complexity

Authors: Boyang Zhang, Daning Cheng, Yunquan Zhang, Fangmin Liu, Jiake Tian

Low-rank factorization is a popular model compression technique that minimizes the error $\delta$ between approximated and original weight matrices. Despite achieving performances close to the original models when $\delta$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.

Authors: Boyang Zhang, Daning Cheng, Yunquan Zhang, Fangmin Liu, Jiake Tian

Low-rank factorization is a popular model compression technique that minimizes the error $\delta$ between approximated and original weight matrices. Despite achieving performances close to the original models when $\delta$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.

On Zarankiewicz's Problem for Intersection Hypergraphs of Geometric Objects

from arXiv: Computational Geometry

Authors: Timothy M. Chan, Chaya Keller, Shakhar Smorodinsky

The hypergraph Zarankiewicz's problem, introduced by Erd\H{o}s in 1964, asks for the maximum number of hyperedges in an $r$-partite hypergraph with $n$ vertices in each part that does not contain a copy of $K_{t,t,\ldots,t}$. Erd\H{o}s obtained a near optimal bound of $O(n^{r-1/t^{r-1}})$ for general hypergraphs. In recent years, several works obtained improved bounds under various algebraic assumptions -- e.g., if the hypergraph is semialgebraic. In this paper we study the problem in a geometric setting -- for $r$-partite intersection hypergraphs of families of geometric objects. Our main results are essentially sharp bounds for families of axis-parallel boxes in $\mathbb{R}^d$ and families of pseudo-discs. For axis-parallel boxes, we obtain the sharp bound $O_{d,t}(n^{r-1}(\frac{\log n}{\log \log n})^{d-1})$. The best previous bound was larger by a factor of about $(\log n)^{d(2^{r-1}-2)}$. For pseudo-discs, we obtain the bound $O_t(n^{r-1}(\log n)^{r-2})$, which is sharp up to logarithmic factors. As this hypergraph has no algebraic structure, no improvement of Erd\H{o}s' 60-year-old $O(n^{r-1/t^{r-1}})$ bound was known for this setting. Futhermore, even in the special case of discs for which the semialgebraic structure can be used, our result improves the best known result by a factor of $\tilde{\Omega}(n^{\frac{2r-2}{3r-2}})$. To obtain our results, we use the recently improved results for the graph Zarankiewicz's problem in the corresponding settings, along with a variety of combinatorial and geometric techniques, including shallow cuttings, biclique covers, transversals, and planarity.

Authors: Timothy M. Chan, Chaya Keller, Shakhar Smorodinsky

The hypergraph Zarankiewicz's problem, introduced by Erd\H{o}s in 1964, asks for the maximum number of hyperedges in an $r$-partite hypergraph with $n$ vertices in each part that does not contain a copy of $K_{t,t,\ldots,t}$. Erd\H{o}s obtained a near optimal bound of $O(n^{r-1/t^{r-1}})$ for general hypergraphs. In recent years, several works obtained improved bounds under various algebraic assumptions -- e.g., if the hypergraph is semialgebraic. In this paper we study the problem in a geometric setting -- for $r$-partite intersection hypergraphs of families of geometric objects. Our main results are essentially sharp bounds for families of axis-parallel boxes in $\mathbb{R}^d$ and families of pseudo-discs. For axis-parallel boxes, we obtain the sharp bound $O_{d,t}(n^{r-1}(\frac{\log n}{\log \log n})^{d-1})$. The best previous bound was larger by a factor of about $(\log n)^{d(2^{r-1}-2)}$. For pseudo-discs, we obtain the bound $O_t(n^{r-1}(\log n)^{r-2})$, which is sharp up to logarithmic factors. As this hypergraph has no algebraic structure, no improvement of Erd\H{o}s' 60-year-old $O(n^{r-1/t^{r-1}})$ bound was known for this setting. Futhermore, even in the special case of discs for which the semialgebraic structure can be used, our result improves the best known result by a factor of $\tilde{\Omega}(n^{\frac{2r-2}{3r-2}})$. To obtain our results, we use the recently improved results for the graph Zarankiewicz's problem in the corresponding settings, along with a variety of combinatorial and geometric techniques, including shallow cuttings, biclique covers, transversals, and planarity.

Geometric spanners of bounded tree-width

from arXiv: Computational Geometry

Authors: Kevin Buchin, Carolin Rehs, Torben Scheele

Given a point set $P$ in the Euclidean space, a geometric $t$-spanner $G$ is a graph on $P$ such that for every pair of points, the shortest path in $G$ between those points is at most a factor $t$ longer than the Euclidean distance between those points. The value $t\geq 1$ is called the dilation of $G$. Commonly, the aim is to construct a $t$-spanner with additional desirable properties. In graph theory, a powerful tool to admit efficient algorithms is bounded tree-width. We therefore investigate the problem of computing geometric spanners with bounded tree-width and small dilation $t$. Let $d$ be a fixed integer and $P \subset \mathbb{R}^d$ be a point set with $n$ points. We give a first algorithm to compute an $\mathcal{O}(n/k^{d/(d-1)})$-spanner on $P$ with tree-width at most $k$. The dilation obtained by the algorithm is asymptotically worst-case optimal for graphs with tree-width $k$: We show that there is a set of $n$ points such that every spanner of tree-width $k$ has dilation $\mathcal{O}(n/k^{d/(d-1)})$. We further prove a tight dependency between tree-width and the number of edges in sparse connected planar graphs, which admits, for point sets in $\mathbb{R}^2$, a plane spanner with tree-width at most $k$ and small maximum vertex degree. Finally, we show an almost tight bound on the minimum dilation of a spanning tree of $n$ equally spaced points on a circle, answering an open question asked in previous work.

Authors: Kevin Buchin, Carolin Rehs, Torben Scheele

Given a point set $P$ in the Euclidean space, a geometric $t$-spanner $G$ is a graph on $P$ such that for every pair of points, the shortest path in $G$ between those points is at most a factor $t$ longer than the Euclidean distance between those points. The value $t\geq 1$ is called the dilation of $G$. Commonly, the aim is to construct a $t$-spanner with additional desirable properties. In graph theory, a powerful tool to admit efficient algorithms is bounded tree-width. We therefore investigate the problem of computing geometric spanners with bounded tree-width and small dilation $t$. Let $d$ be a fixed integer and $P \subset \mathbb{R}^d$ be a point set with $n$ points. We give a first algorithm to compute an $\mathcal{O}(n/k^{d/(d-1)})$-spanner on $P$ with tree-width at most $k$. The dilation obtained by the algorithm is asymptotically worst-case optimal for graphs with tree-width $k$: We show that there is a set of $n$ points such that every spanner of tree-width $k$ has dilation $\mathcal{O}(n/k^{d/(d-1)})$. We further prove a tight dependency between tree-width and the number of edges in sparse connected planar graphs, which admits, for point sets in $\mathbb{R}^2$, a plane spanner with tree-width at most $k$ and small maximum vertex degree. Finally, we show an almost tight bound on the minimum dilation of a spanning tree of $n$ equally spaced points on a circle, answering an open question asked in previous work.

Sparsification of the Generalized Persistence Diagrams for Scalability through Gradient Descent

from arXiv: Computational Geometry

Authors: Mathieu Carrière, Seunghyun Kim, Woojin Kim

The generalized persistence diagram (GPD) is a natural extension of the classical persistence barcode to the setting of multi-parameter persistence and beyond. The GPD is defined as an integer-valued function whose domain is the set of intervals in the indexing poset of a persistence module, and is known to be able to capture richer topological information than its single-parameter counterpart. However, computing the GPD is computationally prohibitive due to the sheer size of the interval set. Restricting the GPD to a subset of intervals provides a way to manage this complexity, compromising discriminating power to some extent. However, identifying and computing an effective restriction of the domain that minimizes the loss of discriminating power remains an open challenge. In this work, we introduce a novel method for optimizing the domain of the GPD through gradient descent optimization. To achieve this, we introduce a loss function tailored to optimize the selection of intervals, balancing computational efficiency and discriminative accuracy. The design of the loss function is based on the known erosion stability property of the GPD. We showcase the efficiency of our sparsification method for dataset classification in supervised machine learning. Experimental results demonstrate that our sparsification method significantly reduces the time required for computing the GPDs associated to several datasets, while maintaining classification accuracies comparable to those achieved using full GPDs. Our method thus opens the way for the use of GPD-based methods to applications at an unprecedented scale.

Authors: Mathieu Carrière, Seunghyun Kim, Woojin Kim

The generalized persistence diagram (GPD) is a natural extension of the classical persistence barcode to the setting of multi-parameter persistence and beyond. The GPD is defined as an integer-valued function whose domain is the set of intervals in the indexing poset of a persistence module, and is known to be able to capture richer topological information than its single-parameter counterpart. However, computing the GPD is computationally prohibitive due to the sheer size of the interval set. Restricting the GPD to a subset of intervals provides a way to manage this complexity, compromising discriminating power to some extent. However, identifying and computing an effective restriction of the domain that minimizes the loss of discriminating power remains an open challenge. In this work, we introduce a novel method for optimizing the domain of the GPD through gradient descent optimization. To achieve this, we introduce a loss function tailored to optimize the selection of intervals, balancing computational efficiency and discriminative accuracy. The design of the loss function is based on the known erosion stability property of the GPD. We showcase the efficiency of our sparsification method for dataset classification in supervised machine learning. Experimental results demonstrate that our sparsification method significantly reduces the time required for computing the GPDs associated to several datasets, while maintaining classification accuracies comparable to those achieved using full GPDs. Our method thus opens the way for the use of GPD-based methods to applications at an unprecedented scale.

Recursive Computation of Path Homology for Stratified Digraphs

from arXiv: Computational Geometry

Authors: Zhengtong Zhu, Zhiyi Chi

Stratified digraphs are popular models for feedforward neural networks. However, computation of their path homologies has been limited to low dimensions due to high computational complexity. A recursive algorithm is proposed to compute certain high-dimensional(reduced) path homologies of stratified digraphs. By recursion on matrix representations of homologies of subgraphs, the algorithm efficiently computes the full-depth path homology of a stratified digraph, i.e. homology with dimension equal to the depth of the graph. The algorithm can be used to compute full-depth persistent homologies and for acyclic digraphs, the maximal path homology, i.e., path homology with dimension equal to the maximum path length of a graph. Numerical experiments show that the algorithm has a significant advantage over the general algorithm in computation time as the depth of stratified digraph increases.

Authors: Zhengtong Zhu, Zhiyi Chi

Stratified digraphs are popular models for feedforward neural networks. However, computation of their path homologies has been limited to low dimensions due to high computational complexity. A recursive algorithm is proposed to compute certain high-dimensional(reduced) path homologies of stratified digraphs. By recursion on matrix representations of homologies of subgraphs, the algorithm efficiently computes the full-depth path homology of a stratified digraph, i.e. homology with dimension equal to the depth of the graph. The algorithm can be used to compute full-depth persistent homologies and for acyclic digraphs, the maximal path homology, i.e., path homology with dimension equal to the maximum path length of a graph. Numerical experiments show that the algorithm has a significant advantage over the general algorithm in computation time as the depth of stratified digraph increases.

A subgradient splitting algorithm for optimization on nonpositively curved metric spaces

from arXiv: Data Structures and Algorithms

Authors: Ariel Goodwin, Adrian S. Lewis, Genaro López-Acedo, Adriana Nicolae

Many of the primal ingredients of convex optimization extend naturally from Euclidean to Hadamard spaces $\unicode{x2014}$ nonpositively curved metric spaces like Euclidean, Hilbert, and hyperbolic spaces, metric trees, and more general CAT(0) cubical complexes. Linear structure, however, and the duality theory it supports are absent. Nonetheless, we introduce a new type of subgradient for convex functions on Hadamard spaces, based on Busemann functions. This notion supports a splitting subgradient method with guaranteed complexity bounds. In particular, the algorithm solves $p$-mean problems in general Hadamard spaces: we illustrate by computing medians in BHV tree space.

Authors: Ariel Goodwin, Adrian S. Lewis, Genaro López-Acedo, Adriana Nicolae

Many of the primal ingredients of convex optimization extend naturally from Euclidean to Hadamard spaces $\unicode{x2014}$ nonpositively curved metric spaces like Euclidean, Hilbert, and hyperbolic spaces, metric trees, and more general CAT(0) cubical complexes. Linear structure, however, and the duality theory it supports are absent. Nonetheless, we introduce a new type of subgradient for convex functions on Hadamard spaces, based on Busemann functions. This notion supports a splitting subgradient method with guaranteed complexity bounds. In particular, the algorithm solves $p$-mean problems in general Hadamard spaces: we illustrate by computing medians in BHV tree space.

Sliding Squares in Parallel

from arXiv: Data Structures and Algorithms

Authors: Hugo A. Akitaya, Sándor P. Fekete, Peter Kramer, Saba Molaei, Christian Rieck, Frederick Stock, Tobias Wallner

We consider algorithmic problems motivated by modular robotic reconfiguration, for which we are given $n$ square-shaped modules (or robots) in a (labeled or unlabeled) start configuration and need to find a schedule of sliding moves to transform it into a desired goal configuration, maintaining connectivity of the configuration at all times. Recent work from Computational Geometry has aimed at minimizing the total number of moves, resulting in schedules that can perform reconfigurations in $\mathcal{O}(n^2)$ moves, or $\mathcal{O}(nP)$ for an arrangement of bounding box perimeter size $P$, but are fully sequential. Here we provide first results in the sliding square model that exploit parallel robot motion, resulting in an optimal speedup to achieve reconfiguration in worst-case optimal makespan of $\mathcal{O}(P)$. We also provide tight bounds on the complexity of the problem by showing that even deciding the possibility of reconfiguration within makespan $1$ is NP-complete in the unlabeled case; for the labeled case, deciding reconfiguration within makespan $2$ is NP-complete, while makespan $1$ can be decided in polynomial time.

Authors: Hugo A. Akitaya, Sándor P. Fekete, Peter Kramer, Saba Molaei, Christian Rieck, Frederick Stock, Tobias Wallner

We consider algorithmic problems motivated by modular robotic reconfiguration, for which we are given $n$ square-shaped modules (or robots) in a (labeled or unlabeled) start configuration and need to find a schedule of sliding moves to transform it into a desired goal configuration, maintaining connectivity of the configuration at all times. Recent work from Computational Geometry has aimed at minimizing the total number of moves, resulting in schedules that can perform reconfigurations in $\mathcal{O}(n^2)$ moves, or $\mathcal{O}(nP)$ for an arrangement of bounding box perimeter size $P$, but are fully sequential. Here we provide first results in the sliding square model that exploit parallel robot motion, resulting in an optimal speedup to achieve reconfiguration in worst-case optimal makespan of $\mathcal{O}(P)$. We also provide tight bounds on the complexity of the problem by showing that even deciding the possibility of reconfiguration within makespan $1$ is NP-complete in the unlabeled case; for the labeled case, deciding reconfiguration within makespan $2$ is NP-complete, while makespan $1$ can be decided in polynomial time.

On the Bidirected Cut Relaxation for Steiner Forest

from arXiv: Data Structures and Algorithms

Authors: Jarosław Byrka, Fabrizio Grandoni, Vera Traub

The Steiner Forest problem is an important generalization of the Steiner Tree problem. We are given an undirected graph with nonnegative edge costs and a collection of pairs of vertices. The task is to compute a cheapest forest with the property that the elements of each pair belong to the same connected component of the forest. The current best approximation factor for Steiner Forest is 2, which is achieved by the classical primal-dual algorithm; improving on this factor is a big open problem in the area. Motivated by this open problem, we study an LP relaxation for Steiner Forest that generalizes the well-studied Bidirected Cut Relaxation for Steiner Tree. We prove that this relaxation has several promising properties. Among them, it is possible to round any half-integral LP solution to a Steiner Forest instance while increasing the cost by at most a factor 16/9. To prove this result we introduce a novel recursive densest-subgraph contraction algorithm.

Authors: Jarosław Byrka, Fabrizio Grandoni, Vera Traub

The Steiner Forest problem is an important generalization of the Steiner Tree problem. We are given an undirected graph with nonnegative edge costs and a collection of pairs of vertices. The task is to compute a cheapest forest with the property that the elements of each pair belong to the same connected component of the forest. The current best approximation factor for Steiner Forest is 2, which is achieved by the classical primal-dual algorithm; improving on this factor is a big open problem in the area. Motivated by this open problem, we study an LP relaxation for Steiner Forest that generalizes the well-studied Bidirected Cut Relaxation for Steiner Tree. We prove that this relaxation has several promising properties. Among them, it is possible to round any half-integral LP solution to a Steiner Forest instance while increasing the cost by at most a factor 16/9. To prove this result we introduce a novel recursive densest-subgraph contraction algorithm.

weberknecht -- a One-Sided Crossing Minimization solver

from arXiv: Data Structures and Algorithms

Authors: Johannes Rauch

We describe the implementation of the exact solver weberknecht and the heuristic solver weberknecht_h for the One-Sided Crossing Minimization problem.

Authors: Johannes Rauch

We describe the implementation of the exact solver weberknecht and the heuristic solver weberknecht_h for the One-Sided Crossing Minimization problem.

Verifying Shortest Paths in Linear Time

from arXiv: Data Structures and Algorithms

Authors: Ahmed Shokry, Amr Elmasry, Ayman Khalafallah, Amr Aly

In this paper we propose a linear-time certifying algorithm for the single-source shortest-path problem capable of verifying graphs with positive, negative, and zero arc weights. Previously proposed linear-time approaches only work for graphs with positive arc weights.

Authors: Ahmed Shokry, Amr Elmasry, Ayman Khalafallah, Amr Aly

In this paper we propose a linear-time certifying algorithm for the single-source shortest-path problem capable of verifying graphs with positive, negative, and zero arc weights. Previously proposed linear-time approaches only work for graphs with positive arc weights.

On Socially Fair Low-Rank Approximation and Column Subset Selection

from arXiv: Data Structures and Algorithms

Authors: Zhao Song, Ali Vakilian, David P. Woodruff, Samson Zhou

Low-rank approximation and column subset selection are two fundamental and related problems that are applied across a wealth of machine learning applications. In this paper, we study the question of socially fair low-rank approximation and socially fair column subset selection, where the goal is to minimize the loss over all sub-populations of the data. We show that surprisingly, even constant-factor approximation to fair low-rank approximation requires exponential time under certain standard complexity hypotheses. On the positive side, we give an algorithm for fair low-rank approximation that, for a constant number of groups and constant-factor accuracy, runs in $2^{\text{poly}(k)}$ time rather than the na\"{i}ve $n^{\text{poly}(k)}$, which is a substantial improvement when the dataset has a large number $n$ of observations. We then show that there exist bicriteria approximation algorithms for fair low-rank approximation and fair column subset selection that run in polynomial time.

Authors: Zhao Song, Ali Vakilian, David P. Woodruff, Samson Zhou

Low-rank approximation and column subset selection are two fundamental and related problems that are applied across a wealth of machine learning applications. In this paper, we study the question of socially fair low-rank approximation and socially fair column subset selection, where the goal is to minimize the loss over all sub-populations of the data. We show that surprisingly, even constant-factor approximation to fair low-rank approximation requires exponential time under certain standard complexity hypotheses. On the positive side, we give an algorithm for fair low-rank approximation that, for a constant number of groups and constant-factor accuracy, runs in $2^{\text{poly}(k)}$ time rather than the na\"{i}ve $n^{\text{poly}(k)}$, which is a substantial improvement when the dataset has a large number $n$ of observations. We then show that there exist bicriteria approximation algorithms for fair low-rank approximation and fair column subset selection that run in polynomial time.

A Dynamic Tree Structure for Hierarchical On-Chain Asset Management

from arXiv: Data Structures and Algorithms

Authors: Mojtaba Eshghie, Gustav Andersson Kasche

In this paper, we introduce the Sarv, a novel non-monolithic blockchain-based data structure designed to represent hierarchical relationships between digitally representable components. Sarv serves as an underlying infrastructure for a wide range of applications requiring hierarchical data management, such as supply chain tracking, asset management, and circular economy implementations. Our approach leverages a tree-based data structure to accurately reflect products and their sub-components, enabling functionalities such as modification, disassembly, borrowing, and refurbishment, mirroring real-world operations. The hierarchy within Sarv is embedded in the on-chain data structure through a smart contract-based design, utilizing Algorand Standard Assets (ASAs). The uniqueness of Sarv lies in its compact and non-monolithic architecture, its mutability, and a two-layer action authorization scheme that enhances security and delegation of asset management. We demonstrate that Sarv addresses real-world requirements by providing a scalable, mutable, and secure solution for managing hierarchical data on the blockchain.

Authors: Mojtaba Eshghie, Gustav Andersson Kasche

In this paper, we introduce the Sarv, a novel non-monolithic blockchain-based data structure designed to represent hierarchical relationships between digitally representable components. Sarv serves as an underlying infrastructure for a wide range of applications requiring hierarchical data management, such as supply chain tracking, asset management, and circular economy implementations. Our approach leverages a tree-based data structure to accurately reflect products and their sub-components, enabling functionalities such as modification, disassembly, borrowing, and refurbishment, mirroring real-world operations. The hierarchy within Sarv is embedded in the on-chain data structure through a smart contract-based design, utilizing Algorand Standard Assets (ASAs). The uniqueness of Sarv lies in its compact and non-monolithic architecture, its mutability, and a two-layer action authorization scheme that enhances security and delegation of asset management. We demonstrate that Sarv addresses real-world requirements by providing a scalable, mutable, and secure solution for managing hierarchical data on the blockchain.

Adversarially Robust Dense-Sparse Tradeoffs via Heavy-Hitters

from arXiv: Data Structures and Algorithms

Authors: David P. Woodruff, Samson Zhou

In the adversarial streaming model, the input is a sequence of adaptive updates that defines an underlying dataset and the goal is to approximate, collect, or compute some statistic while using space sublinear in the size of the dataset. In 2022, Ben-Eliezer, Eden, and Onak showed a dense-sparse trade-off technique that elegantly combined sparse recovery with known techniques using differential privacy and sketch switching to achieve adversarially robust algorithms for $L_p$ estimation and other algorithms on turnstile streams. In this work, we first give an improved algorithm for adversarially robust $L_p$-heavy hitters, utilizing deterministic turnstile heavy-hitter algorithms with better tradeoffs. We then utilize our heavy-hitter algorithm to reduce the problem to estimating the frequency moment of the tail vector. We give a new algorithm for this problem in the classical streaming setting, which achieves additive error and uses space independent in the size of the tail. We then leverage these ingredients to give an improved algorithm for adversarially robust $L_p$ estimation on turnstile streams.

Authors: David P. Woodruff, Samson Zhou

In the adversarial streaming model, the input is a sequence of adaptive updates that defines an underlying dataset and the goal is to approximate, collect, or compute some statistic while using space sublinear in the size of the dataset. In 2022, Ben-Eliezer, Eden, and Onak showed a dense-sparse trade-off technique that elegantly combined sparse recovery with known techniques using differential privacy and sketch switching to achieve adversarially robust algorithms for $L_p$ estimation and other algorithms on turnstile streams. In this work, we first give an improved algorithm for adversarially robust $L_p$-heavy hitters, utilizing deterministic turnstile heavy-hitter algorithms with better tradeoffs. We then utilize our heavy-hitter algorithm to reduce the problem to estimating the frequency moment of the tail vector. We give a new algorithm for this problem in the classical streaming setting, which achieves additive error and uses space independent in the size of the tail. We then leverage these ingredients to give an improved algorithm for adversarially robust $L_p$ estimation on turnstile streams.

TRAPP: An Efficient Point-to-Point Path Planning Algorithm for Road Networks with Restrictions

from arXiv: Data Structures and Algorithms

Authors: Hanzhang Chen, Xiangzhi Zhang, Shufeng Gong, Feng Yao, Song Yu, Yanfeng Zhang, Ge Yu

Path planning is a fundamental problem in road networks, with the goal of finding a path that optimizes objectives such as shortest distance or minimal travel time. Existing methods typically use graph indexing to ensure the efficiency of path planning. However, in real-world road networks, road segments may impose restrictions in terms of height, width, and weight. Most existing works ignore these road restrictions when building indices, which results in returning infeasible paths for vehicles. To address this, a naive approach is to build separate indices for each combination of different types of restrictions. However, this approach leads to a substantial number of indices, as the number of combinations grows explosively with the increase in different restrictions on road segments. In this paper, we propose a novel path planning method, TRAPP(Traffic Restrictions Adaptive Path Planning algorithm), which utilizes traffic flow data from the road network to filter out rarely used road restriction combinations, retain frequently used road restriction combinations, and build indices for them. Additionally, we introduce two optimizations aimed at reducing redundant path information storage within the indices and enhancing the speed of index matching. Our experimental results on real-world road networks demonstrate that TRAPP can effectively reduce the computational and memory overhead associated with building indices while ensuring the efficiency of path planning.

Authors: Hanzhang Chen, Xiangzhi Zhang, Shufeng Gong, Feng Yao, Song Yu, Yanfeng Zhang, Ge Yu

Path planning is a fundamental problem in road networks, with the goal of finding a path that optimizes objectives such as shortest distance or minimal travel time. Existing methods typically use graph indexing to ensure the efficiency of path planning. However, in real-world road networks, road segments may impose restrictions in terms of height, width, and weight. Most existing works ignore these road restrictions when building indices, which results in returning infeasible paths for vehicles. To address this, a naive approach is to build separate indices for each combination of different types of restrictions. However, this approach leads to a substantial number of indices, as the number of combinations grows explosively with the increase in different restrictions on road segments. In this paper, we propose a novel path planning method, TRAPP(Traffic Restrictions Adaptive Path Planning algorithm), which utilizes traffic flow data from the road network to filter out rarely used road restriction combinations, retain frequently used road restriction combinations, and build indices for them. Additionally, we introduce two optimizations aimed at reducing redundant path information storage within the indices and enhancing the speed of index matching. Our experimental results on real-world road networks demonstrate that TRAPP can effectively reduce the computational and memory overhead associated with building indices while ensuring the efficiency of path planning.

Acceleration by Random Stepsizes: Hedging, Equalization, and the Arcsine Stepsize Schedule

from arXiv: Data Structures and Algorithms

Authors: Jason M. Altschuler, Pablo A. Parrilo

We show that for separable convex optimization, random stepsizes fully accelerate Gradient Descent. Specifically, using inverse stepsizes i.i.d. from the Arcsine distribution improves the iteration complexity from $O(k)$ to $O(k^{1/2})$, where $k$ is the condition number. No momentum or other algorithmic modifications are required. This result is incomparable to the (deterministic) Silver Stepsize Schedule which does not require separability but only achieves partial acceleration $O(k^{\log_{1+\sqrt{2}} 2}) \approx O(k^{0.78})$. Our starting point is a conceptual connection to potential theory: the variational characterization for the distribution of stepsizes with fastest convergence rate mirrors the variational characterization for the distribution of charged particles with minimal logarithmic potential energy. The Arcsine distribution solves both variational characterizations due to a remarkable "equalization property" which in the physical context amounts to a constant potential over space, and in the optimization context amounts to an identical convergence rate over all quadratic functions. A key technical insight is that martingale arguments extend this phenomenon to all separable convex functions. We interpret this equalization as an extreme form of hedging: by using this random distribution over stepsizes, Gradient Descent converges at exactly the same rate for all functions in the function class.

Authors: Jason M. Altschuler, Pablo A. Parrilo

We show that for separable convex optimization, random stepsizes fully accelerate Gradient Descent. Specifically, using inverse stepsizes i.i.d. from the Arcsine distribution improves the iteration complexity from $O(k)$ to $O(k^{1/2})$, where $k$ is the condition number. No momentum or other algorithmic modifications are required. This result is incomparable to the (deterministic) Silver Stepsize Schedule which does not require separability but only achieves partial acceleration $O(k^{\log_{1+\sqrt{2}} 2}) \approx O(k^{0.78})$. Our starting point is a conceptual connection to potential theory: the variational characterization for the distribution of stepsizes with fastest convergence rate mirrors the variational characterization for the distribution of charged particles with minimal logarithmic potential energy. The Arcsine distribution solves both variational characterizations due to a remarkable "equalization property" which in the physical context amounts to a constant potential over space, and in the optimization context amounts to an identical convergence rate over all quadratic functions. A key technical insight is that martingale arguments extend this phenomenon to all separable convex functions. We interpret this equalization as an extreme form of hedging: by using this random distribution over stepsizes, Gradient Descent converges at exactly the same rate for all functions in the function class.

No-Free-Lunch Theories for Tensor-Network Machine Learning Models

from arXiv: Data Structures and Algorithms

Authors: Jing-Chuan Wu, Qi Ye, Dong-Ling Deng, Li-Wei Yu

Tensor network machine learning models have shown remarkable versatility in tackling complex data-driven tasks, ranging from quantum many-body problems to classical pattern recognitions. Despite their promising performance, a comprehensive understanding of the underlying assumptions and limitations of these models is still lacking. In this work, we focus on the rigorous formulation of their no-free-lunch theorem -- essential yet notoriously challenging to formalize for specific tensor network machine learning models. In particular, we rigorously analyze the generalization risks of learning target output functions from input data encoded in tensor network states. We first prove a no-free-lunch theorem for machine learning models based on matrix product states, i.e., the one-dimensional tensor network states. Furthermore, we circumvent the challenging issue of calculating the partition function for two-dimensional Ising model, and prove the no-free-lunch theorem for the case of two-dimensional projected entangled-pair state, by introducing the combinatorial method associated to the "puzzle of polyominoes". Our findings reveal the intrinsic limitations of tensor network-based learning models in a rigorous fashion, and open up an avenue for future analytical exploration of both the strengths and limitations of quantum-inspired machine learning frameworks.

Authors: Jing-Chuan Wu, Qi Ye, Dong-Ling Deng, Li-Wei Yu

Tensor network machine learning models have shown remarkable versatility in tackling complex data-driven tasks, ranging from quantum many-body problems to classical pattern recognitions. Despite their promising performance, a comprehensive understanding of the underlying assumptions and limitations of these models is still lacking. In this work, we focus on the rigorous formulation of their no-free-lunch theorem -- essential yet notoriously challenging to formalize for specific tensor network machine learning models. In particular, we rigorously analyze the generalization risks of learning target output functions from input data encoded in tensor network states. We first prove a no-free-lunch theorem for machine learning models based on matrix product states, i.e., the one-dimensional tensor network states. Furthermore, we circumvent the challenging issue of calculating the partition function for two-dimensional Ising model, and prove the no-free-lunch theorem for the case of two-dimensional projected entangled-pair state, by introducing the combinatorial method associated to the "puzzle of polyominoes". Our findings reveal the intrinsic limitations of tensor network-based learning models in a rigorous fashion, and open up an avenue for future analytical exploration of both the strengths and limitations of quantum-inspired machine learning frameworks.

Convergence analysis of wide shallow neural operators within the framework of Neural Tangent Kernel

from arXiv: Data Structures and Algorithms

Authors: Xianliang Xu, Ye Li, Zhongyi Huang

Neural operators are aiming at approximating operators mapping between Banach spaces of functions, achieving much success in the field of scientific computing. Compared to certain deep learning-based solvers, such as Physics-Informed Neural Networks (PINNs), Deep Ritz Method (DRM), neural operators can solve a class of Partial Differential Equations (PDEs). Although much work has been done to analyze the approximation and generalization error of neural operators, there is still a lack of analysis on their training error. In this work, we conduct the convergence analysis of gradient descent for the wide shallow neural operators within the framework of Neural Tangent Kernel (NTK). The core idea lies on the fact that over-parameterization and random initialization together ensure that each weight vector remains near its initialization throughout all iterations, yielding the linear convergence of gradient descent. In this work, we demonstrate that under the setting of over-parametrization, gradient descent can find the global minimum regardless of whether it is in continuous time or discrete time.

Authors: Xianliang Xu, Ye Li, Zhongyi Huang

Neural operators are aiming at approximating operators mapping between Banach spaces of functions, achieving much success in the field of scientific computing. Compared to certain deep learning-based solvers, such as Physics-Informed Neural Networks (PINNs), Deep Ritz Method (DRM), neural operators can solve a class of Partial Differential Equations (PDEs). Although much work has been done to analyze the approximation and generalization error of neural operators, there is still a lack of analysis on their training error. In this work, we conduct the convergence analysis of gradient descent for the wide shallow neural operators within the framework of Neural Tangent Kernel (NTK). The core idea lies on the fact that over-parameterization and random initialization together ensure that each weight vector remains near its initialization throughout all iterations, yielding the linear convergence of gradient descent. In this work, we demonstrate that under the setting of over-parametrization, gradient descent can find the global minimum regardless of whether it is in continuous time or discrete time.

Multicriteria Spanners -- A New Tool for Network Design

from arXiv: Data Structures and Algorithms

Authors: Elena Grigorescu, Nithish Kumar Kumar, Young-San Lin

Designing sparse directed spanners, which are subgraphs that approximately maintain distance constraints, has attracted sustained interest in TCS, especially due to their wide applicability, as well as the difficulty to obtain tight results. However, a significant drawback of the notion of spanners is that it cannot capture multiple distance-like constraints for the same demand pair. In this paper we initiate the study of directed multicriteria spanners, in which the notion of edge lengths is replaced by the notion of resource consumption vectors, where each entry corresponds to the consumption of the respective resource on that edge. The goal is to find a minimum-cost routing solution that satisfies the multiple constraints. To the best of our knowledge, we obtain the first approximation algorithms for the directed multicriteria spanners problems, under natural assumptions. Our results match the state-of-the-art approximation ratios in special cases of ours. We also establish reductions from other natural network connectivity problems to the directed multicriteria spanners problems, including Group Steiner Distances, introduced in the undirected setting by Bil\`o, Gual\`a, Leucci and Straziota (ESA 2024), and Edge-Avoiding spanners. Our reductions imply approximation algorithms for these problems and illustrate that the notion of directed multicriteria spanners is an appropriate abstraction and generalization of natural special cases from the literature. Our main technical tool is a delicate generalization of the minimum-density junction tree framework of Chekuri, Even, Gupta, and Segev (SODA 2008, TALG 2011) to the notion of minimum-density resource-constrained junction trees, which also extends ideas from Chlamt\'a\v{c}, Dinitz, Kortsarz, and Laekhanukit (SODA 2017, TALG 2020).

Authors: Elena Grigorescu, Nithish Kumar Kumar, Young-San Lin

Designing sparse directed spanners, which are subgraphs that approximately maintain distance constraints, has attracted sustained interest in TCS, especially due to their wide applicability, as well as the difficulty to obtain tight results. However, a significant drawback of the notion of spanners is that it cannot capture multiple distance-like constraints for the same demand pair. In this paper we initiate the study of directed multicriteria spanners, in which the notion of edge lengths is replaced by the notion of resource consumption vectors, where each entry corresponds to the consumption of the respective resource on that edge. The goal is to find a minimum-cost routing solution that satisfies the multiple constraints. To the best of our knowledge, we obtain the first approximation algorithms for the directed multicriteria spanners problems, under natural assumptions. Our results match the state-of-the-art approximation ratios in special cases of ours. We also establish reductions from other natural network connectivity problems to the directed multicriteria spanners problems, including Group Steiner Distances, introduced in the undirected setting by Bil\`o, Gual\`a, Leucci and Straziota (ESA 2024), and Edge-Avoiding spanners. Our reductions imply approximation algorithms for these problems and illustrate that the notion of directed multicriteria spanners is an appropriate abstraction and generalization of natural special cases from the literature. Our main technical tool is a delicate generalization of the minimum-density junction tree framework of Chekuri, Even, Gupta, and Segev (SODA 2008, TALG 2011) to the notion of minimum-density resource-constrained junction trees, which also extends ideas from Chlamt\'a\v{c}, Dinitz, Kortsarz, and Laekhanukit (SODA 2017, TALG 2020).

Accelerating Proximal Gradient Descent via Silver Stepsizes

from arXiv: Data Structures and Algorithms

Authors: Jinho Bok, Jason M. Altschuler

Surprisingly, recent work has shown that gradient descent can be accelerated without using momentum -- just by judiciously choosing stepsizes. An open question raised by several papers is whether this phenomenon of stepsize-based acceleration holds more generally for constrained and/or composite convex optimization via projected and/or proximal versions of gradient descent. We answer this in the affirmative by proving that the silver stepsize schedule yields analogously accelerated rates in these settings. These rates are conjectured to be asymptotically optimal among all stepsize schedules, and match the silver convergence rate of vanilla gradient descent (Altschuler and Parrilo, 2023), namely $O(\varepsilon^{- \log_{\rho} 2})$ for smooth convex optimization and $O(\kappa^{\log_\rho 2} \log \frac{1}{\varepsilon})$ under strong convexity, where $\varepsilon$ is the precision, $\kappa$ is the condition number, and $\rho = 1 + \sqrt{2}$ is the silver ratio. The key technical insight is the combination of recursive gluing -- the technique underlying all analyses of gradient descent accelerated with time-varying stepsizes -- with a certain Laplacian-structured sum-of-squares certificate for the analysis of proximal point updates.

Authors: Jinho Bok, Jason M. Altschuler

Surprisingly, recent work has shown that gradient descent can be accelerated without using momentum -- just by judiciously choosing stepsizes. An open question raised by several papers is whether this phenomenon of stepsize-based acceleration holds more generally for constrained and/or composite convex optimization via projected and/or proximal versions of gradient descent. We answer this in the affirmative by proving that the silver stepsize schedule yields analogously accelerated rates in these settings. These rates are conjectured to be asymptotically optimal among all stepsize schedules, and match the silver convergence rate of vanilla gradient descent (Altschuler and Parrilo, 2023), namely $O(\varepsilon^{- \log_{\rho} 2})$ for smooth convex optimization and $O(\kappa^{\log_\rho 2} \log \frac{1}{\varepsilon})$ under strong convexity, where $\varepsilon$ is the precision, $\kappa$ is the condition number, and $\rho = 1 + \sqrt{2}$ is the silver ratio. The key technical insight is the combination of recursive gluing -- the technique underlying all analyses of gradient descent accelerated with time-varying stepsizes -- with a certain Laplacian-structured sum-of-squares certificate for the analysis of proximal point updates.

Monday, December 09

The Case Against Google’s Claims of “Quantum Supremacy”: A Very Short Introduction.

from Gil Kalai

The 2019 paper “Quantum supremacy using a programmable superconducting processor”  asserted that Google’s Sycamore quantum computer,  with 53 qubits and a depth of 20, performed a specific computation in about 200 seconds. According to Google’s estimate, a state-of-the-art classical supercomputer … Continue reading →

The 2019 paper “Quantum supremacy using a programmable superconducting processor”  asserted that Google’s Sycamore quantum computer,  with 53 qubits and a depth of 20, performed a specific computation in about 200 seconds. According to Google’s estimate, a state-of-the-art classical supercomputer would require approximately 10,000 years to complete the same computation.

The Google experiment had two major components:

  1. The “Fidelity Claims”: Assertions regarding the fidelity of the samples produced by the quantum computer.
  2. The “Supremacy Claims”: Assertions that translated fidelity into a measure of advantage over classical computation.

There are valid reasons to question both of these claims in the context of Google’s 2019 experiment. In my view, these claims may reflect serious methodological mistakes rather than an objective scientific reality. I do not recommend treating Google’s past or future claims as a solid foundation for policy-making decisions.

Below is a brief review of the case against Google’s 2019 claims of quantum supremacy:

A) The “Supremacy” Assertions: Flawed Estimation of Classical Running Time

A.1) The claims regarding classical running times were off by 10 orders of magnitude.

A.2) Moreover, the Google team was aware that better classical algorithms existed. They had developed more sophisticated classical algorithms for one class of circuits and subsequently changed the type of circuits used for the “supremacy demonstration” just weeks before the final experiment.

A.3) The 2019 Google paper states, “Quantum processors have thus reached the regime of quantum supremacy. We expect that their computational power will continue to grow at a double-exponential rate.” It is surprising to encounter such an extraordinary claim in a scientific paper.

B) The “Fidelity” Assertions: Statistically Unreasonable Predictions Indicating Methodological Flaws

The google paper relies on a very simple a priori prediction of the fidelity based on the error-rates of individual components. (Formula (77).)

B.1) The agreement between the a priori prediction and the actual estimated fidelity is statistically implausible (“too good to be true”): It is unlikely that the fidelities of samples from hundreds of circuits would agree within 10-20% with a simple formula based on the multiplication of the fidelities of individual components.  In my opinion, this suggests a methodologically flawed optimization process, such as the one described in item C.  

B.2) The Google team provided a statistical explanation for this agreement based on three premises. The first premise is that the fidelities for the individual components are exact up to  ±20%. The second premise is that this ±20% instability is unbiased. The third premise is that all these fidelities for individual components are statistically independent. These premises are unreasonable and they contradict various other experimental findings.

B.3) As of now, the error rates for individual components have not been released by the Google team. (Most recently, in May 2023, they promised “to push” for this data.) Analysis of the partial data provided for readout errors reinforces these concerns.

C) The Calibration Process: Evidence of Undocumented Global Optimization

According to the Google paper, calibration was performed prior to running the random circuit experiments and was based on the behavior of 1- and 2-qubit circuits. This process involved modifying the definitions of 1-gates and 2-gates to align with how the quantum computer operates.

C.1) Statistical evidence suggests that the calibration process involved a methodologically flawed global optimization process. (This concern applies even to Google’s assertions about the fidelity of the smallest 12-qubit circuits.)

C.2) Non-statistical evidence also supports this claim. For example, contrary to the description provided by the Google team, it was revealed that they supplied an outdated calibration version (for the experimental circuits) to the Jülich Research Center scientists involved in the experiment. This calibration was later further modified after the experiment was conducted. (This discrepancy is also reflected in a video released by Google particularly between 2:13-3:07.)

C.3) The Google team has not disclosed their calibration programs, citing them as a commercial secret. For technical reasons, they were also unable to share the inputs for the calibration program, although they promised to do so in future experiments—a promise that has not yet been fulfilled.

google-slide13

A slide from my 2019 lecture “The Google quantum supremacy demo” (post), highlights that the error rates for two-qubit gates e_g have not yet been provided by the Google team as of today (Dec. 2024).

D) Comparing Google with IBM

As far as we know, there is a significant gap (in favor of Google) between what IBM quantum computers—which are in some ways more advanced than Google’s quantum computers—can achieve for random circuit sampling and what Google claims, even for circuits with 7–12 qubits. While one might argue that Google’s devices or team are simply better, in my view, this gap more likely reflects methodological issues in Google’s experiments.

E) (Not) Adopting Suggestions for Better Control

In our discussions with the Google team, they endorsed several of our suggestions for future experiments aimed at improving control over the quality of their experiments. However, in practice, later experiments did not implement any of these suggestions. Moreover, the structure of these later experiments makes them even harder to verify compared to the 2019 experiment. Additionally, unlike the 2019 experiment, the data for a subsequent random circuit sampling experiment does not include the amplitudes computed for the experimental circuits, further complicating efforts to scrutinize the results.

F) My Personal Conclusion

Google Quantum AI’s claims  (including published ones) should be approached with caution, particularly those of an extraordinary nature. These claims may stem from significant methodological errors and, as such, may reflect the researchers’ expectations more than objective scientific reality. I do not recommend treating Google’s past or future claims as a solid basis for policy-making decisions.

G) Remarks

G.1) Google’s supremacy claims (from the 2019 paper) have been refuted in a series of papers by several groups. This began with work by IBM researchers Pednault et al. shortly after Google’s original paper was published and continued with studies by Pan and Zhang; Pan, Chen, and Zhang; Kalachev, Panteleev, and Yung; Gao et al.; Liu et al.; and several other groups. For further details, see this post and the associated comment section, as well as this post

G.2) Google now acknowledges that using the tensor network contraction method, their 2019 53-qubit result can be computed classically in less than 200 seconds. However, in their more recent 2023/24 paper, “Phase Transitions…” (see Table 1), they claim that with 67 to 70 qubits, classical supercomputers would require many years to generate 1 million such bitstrings, even with tensor network contraction.

G.3) Items B) and C) highlights methodological issues with Google’s fidelity assertions, even for 12-qubit circuits. These concerns persist independently of the broader question of quantum supremacy for larger circuits, where the fidelity assertions are taken at face value.

G.4) For a more comprehensive view of our study of Google’s fidelity claims, refer to the following papers:

These papers describe an ongoing project with Yosi Rinott and Tomer Shoham, supported by Ohad Lev and Carsten Voelkmann. Together with Carsten, we plan to expand our study and apply our tools to other experiments. Additionally, see my earlier paper:

G.5) There is also supporting evidence for Google’s 2019 claims, such as a 2020 replication by a group from the University of Science and Technology of China (USTC) and later verifications of some of Google’s fidelity estimations.

G.6) There are some additional concerns regarding the Google experiment. In particular, there are problematic discrepancies between the experimental data, the Google noise model, and simulations.

G.7) In my opinion, the main current challenge for experimental quantum computing is to improve the quality of two-qubit gates and other components, as well as to carefully study the quality of quantum circuits in the 5–20 qubit regime. Experiments on quantum error correction for larger circuits are also important. 

H) Hype and Bitcoin

I usually don’t mind “hype” as a reflection of scientists’ enthusiasm for their work and the public’s excitement about scientific endeavors. However, in the case of Google, some caution is warranted, as the premature claims in 2019 may have had significant consequences. For example, following the 2019 “supremacy” announcement, the value of Bitcoin dropped (around October 24, 2019, after a period of stability) from roughly $9,500 to roughly $8,500 in just a few days, representing a loss for investors of more than ten billion dollars. (The value today is around $100,000.) Additionally, Google’s assertions may have imposed unrealistic challenges on other quantum computing efforts and encouraged a culture of undesirable scientific methodologies.

BNP5

Sergio Boixo, Hartmut Neven, and John Preskill in a video “Quantum next leap: Ten septillions years beyond-classic”

I) Update (Dec. 10): The Wind in the Willow

Yesterday, Google Quantum AI announced that their “Willow” quantum computer “performed a standard benchmark computation in under five minutes that would take one of today’s fastest supercomputers 10 septillion (that is, 10^25) years.” As far as I know there is no paper with the details. Google AI team announced also the appearance in Nature of their recent paper on distance-5 and distance-7 surface codes. It is asserted that the distance-7 codes exhibit an improvement of a factor of 2.4 compared to the physical qubits. The ratio of improvement Λ from distance-5 to distance-7 is 2.14. (We mentioned it in an August post following a comment by phan ting.)

We did not study yet these particular claims by Google Quantum AI, but my general conclusion apply to them “Google Quantum AI’s claims (including published ones) should be approached with caution, particularly those of an extraordinary nature. These claims may stem from significant methodological errors and, as such, may reflect the researchers’ expectations more than objective scientific reality.” (Our specific contention points are relevant to Google’s newer supremacy experiments but not directly to the quantum error-correction experiment.)

There is a nice very positive blog post over SO about the new developments where Scott wrote: “besides the new and more inarguable Google result, IBM, Quantinuum, QuEra, and USTC have now all also reported Random Circuit Sampling experiments with good results.” For me, the gap between Google and IBM for RCS is a serious additional reason not to take the Google assertions seriously (item D) and and if I am wrong I will gladly stand corrected.

By Gil Kalai

Reduction from the partition problem: Dynamic lot sizing problem with polynomial complexity

from arXiv: Computational Complexity

Authors: Chee-Khian Sim

In this note, we reduce an instance of the partition problem to a dynamic lot sizing problem in polynomial time, and show that solving the latter problem solves the former problem. We further show that the instance of the partition problem can be solved using polynomial number of addition, multiplication and sort operations in input data using the reduction.

Authors: Chee-Khian Sim

In this note, we reduce an instance of the partition problem to a dynamic lot sizing problem in polynomial time, and show that solving the latter problem solves the former problem. We further show that the instance of the partition problem can be solved using polynomial number of addition, multiplication and sort operations in input data using the reduction.

CMSO-transducing tree-like graph decompositions

from arXiv: Computational Complexity

Authors: Rutger Campbell, Bruno Guillon, Mamadou Moustapha Kanté, Eun Jung Kim, Noleen Köhler

We show that given a graph G we can CMSO-transduce its modular decomposition, its split decomposition and its bi-join decomposition. This improves results by Courcelle [Logical Methods in Computer Science, 2006] who gave such transductions using order-invariant MSO, a strictly more expressive logic than CMSO. Our methods more generally yield C2MSO-transductions of the canonical decomposition of weakly-partitive set systems and weakly-bipartitive systems of bipartitions.

Authors: Rutger Campbell, Bruno Guillon, Mamadou Moustapha Kanté, Eun Jung Kim, Noleen Köhler

We show that given a graph G we can CMSO-transduce its modular decomposition, its split decomposition and its bi-join decomposition. This improves results by Courcelle [Logical Methods in Computer Science, 2006] who gave such transductions using order-invariant MSO, a strictly more expressive logic than CMSO. Our methods more generally yield C2MSO-transductions of the canonical decomposition of weakly-partitive set systems and weakly-bipartitive systems of bipartitions.

Covering points by hyperplanes and related problems

from arXiv: Computational Geometry

Authors: Zuzana Patáková, Micha Sharir

For a set $P$ of $n$ points in $\mathbb R^d$, for any $d\ge 2$, a hyperplane $h$ is called $k$-rich with respect to $P$ if it contains at least $k$ points of $P$. Answering and generalizing a question asked by Peyman Afshani, we show that if the number of $k$-rich hyperplanes in $\mathbb R^d$, $d \geq 3$, is at least $\Omega(n^d/k^\alpha + n/k)$, with a sufficiently large constant of proportionality and with $d\le \alpha < 2d-1$, then there exists a $(d-2)$-flat that contains $\Omega(k^{(2d-1-\alpha)/(d-1)})$ points of $P$. We also present upper bound constructions that give instances in which the above lower bound is tight. An extension of our analysis yields similar lower bounds for $k$-rich spheres or $k$-rich flats.

Authors: Zuzana Patáková, Micha Sharir

For a set $P$ of $n$ points in $\mathbb R^d$, for any $d\ge 2$, a hyperplane $h$ is called $k$-rich with respect to $P$ if it contains at least $k$ points of $P$. Answering and generalizing a question asked by Peyman Afshani, we show that if the number of $k$-rich hyperplanes in $\mathbb R^d$, $d \geq 3$, is at least $\Omega(n^d/k^\alpha + n/k)$, with a sufficiently large constant of proportionality and with $d\le \alpha < 2d-1$, then there exists a $(d-2)$-flat that contains $\Omega(k^{(2d-1-\alpha)/(d-1)})$ points of $P$. We also present upper bound constructions that give instances in which the above lower bound is tight. An extension of our analysis yields similar lower bounds for $k$-rich spheres or $k$-rich flats.

Manifolds of positive reach, differentiability, tangent variation, and attaining the reach

from arXiv: Computational Geometry

Authors: André Lieutier, Mathijs Wintraecken

This paper contains three main results. Firstly, we give an elementary proof of the following statement: Let $M$ be a (closed, in both the geometrical and topological sense of the word) topological manifold embedded in $\mathbb{R}^d$. If $M$ has positive reach, then M can locally be written as the graph of a $C^{1,1}$ from the tangent space to the normal space. Conversely if $M$ can locally written as the graph of a $C^{1,1}$ function from the tangent space to the normal space, then $M$ has positive reach. The result was hinted at by Federer when he introduced the reach, and proved by Lytchak. Lytchak's proof relies heavily CAT(k)-theory. The proof presented here uses only basic results on homology. Secondly, we give optimal Lipschitz-constants for the derivative, in other words we give an optimal bound for the angle between tangent spaces in term of the distance between the points. This improves earlier results, that were either suboptimal or assumed that the manifold was $C^2$. Thirdly, we generalize a result by Aamari et al. which explains the how the reach is attained from the smooth setting to general sets of positive reach.

Authors: André Lieutier, Mathijs Wintraecken

This paper contains three main results. Firstly, we give an elementary proof of the following statement: Let $M$ be a (closed, in both the geometrical and topological sense of the word) topological manifold embedded in $\mathbb{R}^d$. If $M$ has positive reach, then M can locally be written as the graph of a $C^{1,1}$ from the tangent space to the normal space. Conversely if $M$ can locally written as the graph of a $C^{1,1}$ function from the tangent space to the normal space, then $M$ has positive reach. The result was hinted at by Federer when he introduced the reach, and proved by Lytchak. Lytchak's proof relies heavily CAT(k)-theory. The proof presented here uses only basic results on homology. Secondly, we give optimal Lipschitz-constants for the derivative, in other words we give an optimal bound for the angle between tangent spaces in term of the distance between the points. This improves earlier results, that were either suboptimal or assumed that the manifold was $C^2$. Thirdly, we generalize a result by Aamari et al. which explains the how the reach is attained from the smooth setting to general sets of positive reach.

A free lunch: manifolds of positive reach can be smoothed without decreasing the reach

from arXiv: Computational Geometry

Authors: Hana Dal Poz Kouřimská, André Lieutier, Mathijs Wintraecken

Assumptions on the reach are crucial for ensuring the correctness of many geometric and topological algorithms, including triangulation, manifold reconstruction and learning, homotopy reconstruction, and methods for estimating curvature or reach. However, these assumptions are often coupled with the requirement that the manifold be smooth, typically at least C^2 .In this paper, we prove that any manifold with positive reach can be approximated arbitrarily well by a C^$\infty$ manifold without significantly reducing the reach, by employing techniques from differential topology -partitions of unity and smoothing using convolution kernels. This result implies that nearly all theorems established for C^2 manifolds with a certain reach naturally extend to manifolds with the same reach, even if they are not C^2 , for free!

Authors: Hana Dal Poz Kouřimská, André Lieutier, Mathijs Wintraecken

Assumptions on the reach are crucial for ensuring the correctness of many geometric and topological algorithms, including triangulation, manifold reconstruction and learning, homotopy reconstruction, and methods for estimating curvature or reach. However, these assumptions are often coupled with the requirement that the manifold be smooth, typically at least C^2 .In this paper, we prove that any manifold with positive reach can be approximated arbitrarily well by a C^$\infty$ manifold without significantly reducing the reach, by employing techniques from differential topology -partitions of unity and smoothing using convolution kernels. This result implies that nearly all theorems established for C^2 manifolds with a certain reach naturally extend to manifolds with the same reach, even if they are not C^2 , for free!

Super-Polynomial Growth of the Generalized Persistence Diagram

from arXiv: Computational Geometry

Authors: Donghan Kim, Woojin Kim, Wonjun Lee

The Generalized Persistence Diagram (GPD) for multi-parameter persistence naturally extends the classical notion of persistence diagram for one-parameter persistence. However, unlike its classical counterpart, computing the GPD remains a significant challenge. The main hurdle is that, while the GPD is defined as the M\"obius inversion of the Generalized Rank Invariant (GRI), computing the GRI is intractable due to the formidable size of its domain, i.e., the set of all connected and convex subsets in a finite grid in $\mathbb{R}^d$ with $d \geq 2$. This computational intractability suggests seeking alternative approaches to computing the GPD. In order to study the complexity associated to computing the GPD, it is useful to consider its classical one-parameter counterpart, where for a filtration of a simplicial complex with $n$ simplices, its persistence diagram contains at most $n$ points. This observation leads to the question: 'Given a $d$-parameter simplicial filtration, could the cardinality of its GPD (specifically, the support of the GPD) also be bounded by a polynomial in the number of simplices in the filtration?' This is the case for $d=1$, where we compute the persistence diagram directly at the simplicial filtration level. If this were also the case for $d\geq2$, it might be possible to compute the GPD directly and much more efficiently without relying on the GRI. We show that the answer to the question above is negative, demonstrating the inherent difficulty of computing the GPD. More specifically, we construct a sequence of $d$-parameter simplicial filtrations where the cardinalities of their GPDs are not bounded by any polynomial in the the number of simplices. Furthermore, we show that several commonly used methods for constructing multi-parameter filtrations can give rise to such "wild" filtrations.

Authors: Donghan Kim, Woojin Kim, Wonjun Lee

The Generalized Persistence Diagram (GPD) for multi-parameter persistence naturally extends the classical notion of persistence diagram for one-parameter persistence. However, unlike its classical counterpart, computing the GPD remains a significant challenge. The main hurdle is that, while the GPD is defined as the M\"obius inversion of the Generalized Rank Invariant (GRI), computing the GRI is intractable due to the formidable size of its domain, i.e., the set of all connected and convex subsets in a finite grid in $\mathbb{R}^d$ with $d \geq 2$. This computational intractability suggests seeking alternative approaches to computing the GPD. In order to study the complexity associated to computing the GPD, it is useful to consider its classical one-parameter counterpart, where for a filtration of a simplicial complex with $n$ simplices, its persistence diagram contains at most $n$ points. This observation leads to the question: 'Given a $d$-parameter simplicial filtration, could the cardinality of its GPD (specifically, the support of the GPD) also be bounded by a polynomial in the number of simplices in the filtration?' This is the case for $d=1$, where we compute the persistence diagram directly at the simplicial filtration level. If this were also the case for $d\geq2$, it might be possible to compute the GPD directly and much more efficiently without relying on the GRI. We show that the answer to the question above is negative, demonstrating the inherent difficulty of computing the GPD. More specifically, we construct a sequence of $d$-parameter simplicial filtrations where the cardinalities of their GPDs are not bounded by any polynomial in the the number of simplices. Furthermore, we show that several commonly used methods for constructing multi-parameter filtrations can give rise to such "wild" filtrations.

Online Hitting Sets for Disks of Bounded Radii

from arXiv: Computational Geometry

Authors: Minati De, Satyam Singh, Csaba D. Tóth

We present algorithms for the online minimum hitting set problem: Given a set $P$ of $n$ points in the plane and a sequence of geometric objects that arrive one-by-one, we need to maintain a hitting set at all times. For disks of radii in the interval $[1,M]$, we present a $O(\log M \log n)$-competitive algorithm. This result generalizes from disks to positive homothets of any convex body in the plane with scaling factors in the interval $[1,M]$. As a main technical tool, we reduce the problem to the online hitting set problem for integer points and bottomless rectangles. Specifically, we present an $O(\log N)$-competitive algorithm for the variant where $P$ is a set of integer points in an $N\times N$ box, and the geometric objects are bottomless rectangles.

Authors: Minati De, Satyam Singh, Csaba D. Tóth

We present algorithms for the online minimum hitting set problem: Given a set $P$ of $n$ points in the plane and a sequence of geometric objects that arrive one-by-one, we need to maintain a hitting set at all times. For disks of radii in the interval $[1,M]$, we present a $O(\log M \log n)$-competitive algorithm. This result generalizes from disks to positive homothets of any convex body in the plane with scaling factors in the interval $[1,M]$. As a main technical tool, we reduce the problem to the online hitting set problem for integer points and bottomless rectangles. Specifically, we present an $O(\log N)$-competitive algorithm for the variant where $P$ is a set of integer points in an $N\times N$ box, and the geometric objects are bottomless rectangles.

A Subquadratic Time Approximation Algorithm for Individually Fair k-Center

from arXiv: Data Structures and Algorithms

Authors: Matthijs Ebbens, Nicole Funk, Jan Höckendorff, Christian Sohler, Vera Weil

We study the $k$-center problem in the context of individual fairness. Let $P$ be a set of $n$ points in a metric space and $r_x$ be the distance between $x \in P$ and its $\lceil n/k \rceil$-th nearest neighbor. The problem asks to optimize the $k$-center objective under the constraint that, for every point $x$, there is a center within distance $r_x$. We give bicriteria $(\beta,\gamma)$-approximation algorithms that compute clusterings such that every point $x \in P$ has a center within distance $\beta r_x$ and the clustering cost is at most $\gamma$ times the optimal cost. Our main contributions are a deterministic $O(n^2+ kn \log n)$ time $(2,2)$-approximation algorithm and a randomized $O(nk\log(n/\delta)+k^2/\varepsilon)$ time $(10,2+\varepsilon)$-approximation algorithm, where $\delta$ denotes the failure probability. For the latter, we develop a randomized sampling procedure to compute constant factor approximations for the values $r_x$ for all $x\in P$ in subquadratic time; we believe this procedure to be of independent interest within the context of individual fairness.

Authors: Matthijs Ebbens, Nicole Funk, Jan Höckendorff, Christian Sohler, Vera Weil

We study the $k$-center problem in the context of individual fairness. Let $P$ be a set of $n$ points in a metric space and $r_x$ be the distance between $x \in P$ and its $\lceil n/k \rceil$-th nearest neighbor. The problem asks to optimize the $k$-center objective under the constraint that, for every point $x$, there is a center within distance $r_x$. We give bicriteria $(\beta,\gamma)$-approximation algorithms that compute clusterings such that every point $x \in P$ has a center within distance $\beta r_x$ and the clustering cost is at most $\gamma$ times the optimal cost. Our main contributions are a deterministic $O(n^2+ kn \log n)$ time $(2,2)$-approximation algorithm and a randomized $O(nk\log(n/\delta)+k^2/\varepsilon)$ time $(10,2+\varepsilon)$-approximation algorithm, where $\delta$ denotes the failure probability. For the latter, we develop a randomized sampling procedure to compute constant factor approximations for the values $r_x$ for all $x\in P$ in subquadratic time; we believe this procedure to be of independent interest within the context of individual fairness.