Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Friday, June 14

Postdoc at Technical University of Denmark (apply by July 1, 2024)

from CCI: jobs

Hi! Would you come for a two-year postdoc in algorithms at T.U. Denmark near Copenhagen? Keywords: graphs, algorithms, data structures, dynamic, distributed, computational geometry, combinatorics, planar. More info in the link below — get in touch with me if you have further questions. PS: Note that the deadline for applications is in Central European Summer […]

Hi!
Would you come for a two-year postdoc in algorithms at T.U. Denmark near Copenhagen?

Keywords: graphs, algorithms, data structures, dynamic, distributed, computational geometry, combinatorics, planar.

More info in the link below — get in touch with me if you have further questions.

PS: Note that the deadline for applications is in Central European Summer Time.

Website: https://efzu.fa.em2.oraclecloud.com/hcmUI/CandidateExperience/en/sites/CX_1/job/3673
Email: erot@dtu.dk

By shacharlovett

Australasian Summer School: Recent Trends in Algorithms

from CS Theory Events

December 3-6, 2024 Sydney, Australia sites.google.com/view/sydney-tcs-summerschool2024/home Registration deadline: September 2, 2024 The week before ISAAC’24, on Dec 3-6 , Australian researchers are organizing a 4-day Summer (south hemisphere) School on “Recent Trends in Algorithms” in Sydney, with one day dedicated to each of Algorithmic Fairness, Quantum Computing, Data Structures, and Privacy. Participation to the summer … Continue reading Australasian Summer School: Recent Trends in Algorithms

By shacharlovett

December 3-6, 2024 Sydney, Australia https://sites.google.com/view/sydney-tcs-summerschool2024/home Registration deadline: September 2, 2024 The week before ISAAC’24, on Dec 3-6 , Australian researchers are organizing a 4-day Summer (south hemisphere) School on “Recent Trends in Algorithms” in Sydney, with one day dedicated to each of Algorithmic Fairness, Quantum Computing, Data Structures, and Privacy. Participation to the summer … Continue reading Australasian Summer School: Recent Trends in Algorithms

By shacharlovett

A Credible Junta

from Ben Recht

Meehl's Philosophical Psychology, Interlude 4.

This post contains some of my own thoughts reacting to Paul Meehl’s course “Philosophical Psychology.” I’m taking a brief interlude from my run-down of Lecture 7. Here’s the full table of contents of my blogging through the class.

Can we fix the crud problem with more math? In many ways, that’s what the “credibility revolution” in economics set out to do. To build a more sophisticated statistical tool kit that accurately teases out cause and effect when properly deployed. As Guido Imbens and Don Rubin put it in the introduction to their 2015 text Causal Inference for Statistics, Social, and Biomedical Sciences,

“In many applications of statistics, a large proportion of the questions of interest are fundamentally questions of causality rather than simply questions of description or association.”

Imbens and Rubin map a path for answering questions about epistemology using statistics:

  • “All causal questions are tied to specific interventions or treatments.”

  • “Causal questions are viewed as comparisons of potential outcomes.”

  • Comparisons of potential outcomes can be computed by careful estimation of average treatment effects.

Hence, all questions of interest in human-facing sciences are reduced to estimating effects in randomized experiments—whether or not a randomized experiment actually occurred. This means that the “gold standard” of causation remains null hypothesis testing. And that means that the entire discipline is based on correlation (a.k.a. description and association) and complex mathematical stories.

You don’t have to take my word for it. If you look at what the causal inference methods do, you will see that everything rests on null hypothesis testing. I mean, most of the estimates are built upon ordinary least-squares, and all least-squares estimates are combinations of correlations. 

Let me give a simple example of an often-used estimator: the Local Average Treatment Effect (LATE). LATE uses “Instrumental Variables” to tease out causal relationships. You care about whether X causes Y, but you worry there are lots of confounding factors in your observational data set. To remove the confounding factors, perhaps you could find some magic variable Z that is correlated with X but uncorrelated with all of the confounders. Maybe you also get lucky and can argue that any effect of Z on Y has to pass through X (to be clear, you spin a story). 

Economists have a bunch of crazy ideas for what should count as instrumental variables. Caveat emptor. My favorite example of an instrumental variable–one of the only ones I believe in–comes from randomized clinical trials. In a medical trial, you can’t force a patient to take the treatment. Hence, the randomized treatment is actually the offering of a treatment a trial aims to study. In this case, Z is whether or not a patient is offered treatment, X is whether the patient takes the treatment, and Y is the outcome the trialists care about. 

But let me not dwell on instrumental variable examples. I wrote more about it here and here. I actually really like Angrist, Imbens, and Rubin’s original paper on LATE. For today, I want to show why this is still just correlation analysis. The standard instrumental variable estimator that estimates the influence of X on Y is 

It’s a ratio of correlations. The standard way to “test for significance” of this effect is to do a significance test on the numerator. If it passes, you add two stars next to the correlation in the table. In an instrumental variable analysis, we changed the story but still just computed a correlation and declared significance if the number of data points was large enough. 

Even though other estimators aren’t as easy to write down, every causal inference method has this flavor. Everything is a combination of correlation and storytelling. “Causal inference,” as it’s built up in statistics and economics departments, is just an algebraically sophisticated language for data visualization.

Some of my best friends work on causal inference, and I respect what they’re after. They’d argue that these storytellings are better than just randomly picking two variables out of a hat. But I don’t see how causal inference methods can do anything to mitigate the effects of crud.

If there’s a latent crud distribution, causal storytelling connecting X and Y is no different than Meehl’s yarns about why certain methodists prefer certain shop classes. Clever people can construct stories about anything. If they gain access to STATA or R or Python, they can produce hundreds of pages of sciency robustness checks that back their story. If we don’t understand the crud distribution, there’s no math we can do to know whether the measured correlation between X and Y is real. If you buy Meehl’s framework (which I do), you can’t corroborate theories solely with the precision measurement of correlations. You need prediction.

Theories in the human-facing sciences need to make stronger predictions. At a bare minimum, the treatment effect estimates from one study should align across replication attempts. We seem to have issues even crossing this very low bar with our current framework.  Adding more math to make the treatment estimate more precise doesn’t help us generalize beyond the data on our laptops.

Theories need to tell us more than whether the correlation between variables is positive or negative. We need to subject them to risky tests. Theories need to make varied, precise predictions.  Only then does a precise measurement of these predicted empirical values matter. Reducing all question answering to Fisherian statistics will not solve these problems. But that’s where we seem to be stuck.

Subscribe now

By Ben Recht

On the Expressibility of the Reconstructional Color Refinement

from arXiv: Computational Complexity

Authors: V. Arvind, Johannes Köbler, Oleg Verbitsky

One of the most basic facts related to the famous Ulam reconstruction conjecture is that the connectedness of a graph can be determined by the deck of its vertex-deleted subgraphs, which are considered up to isomorphism. We strengthen this result by proving that connectedness can still be determined when the subgraphs in the deck are given up to equivalence under the color refinement isomorphism test. Consequently, this implies that connectedness is recognizable by Reconstruction Graph Neural Networks, a recently introduced GNN architecture inspired by the reconstruction conjecture (Cotta, Morris, Ribeiro 2021).

Authors: V. Arvind, Johannes Köbler, Oleg Verbitsky

One of the most basic facts related to the famous Ulam reconstruction conjecture is that the connectedness of a graph can be determined by the deck of its vertex-deleted subgraphs, which are considered up to isomorphism. We strengthen this result by proving that connectedness can still be determined when the subgraphs in the deck are given up to equivalence under the color refinement isomorphism test. Consequently, this implies that connectedness is recognizable by Reconstruction Graph Neural Networks, a recently introduced GNN architecture inspired by the reconstruction conjecture (Cotta, Morris, Ribeiro 2021).

A Refinement of the McCreight-Meyer Union Theorem

from arXiv: Computational Complexity

Authors: Matthew Fox, Chaitanya Karamchedu

Using properties of Blum complexity measures and certain complexity class operators, we exhibit a total computable and non-decreasing function $t_{\mathsf{poly}}$ such that for all $k$, $\Sigma_k\mathsf{P} = \Sigma_k\mathsf{TIME}(t_{\mathsf{poly}})$, $\mathsf{BPP} = \mathsf{BPTIME}(t_{\mathsf{poly}})$, $\mathsf{RP} = \mathsf{RTIME}(t_{\mathsf{poly}})$, $\mathsf{UP} = \mathsf{UTIME}(t_{\mathsf{poly}})$, $\mathsf{PP} = \mathsf{PTIME}(t_{\mathsf{poly}})$, $\mathsf{Mod}_k\mathsf{P} = \mathsf{Mod}_k\mathsf{TIME}(t_{\mathsf{poly}})$, $\mathsf{PSPACE} = \mathsf{DSPACE}(t_{\mathsf{poly}})$, and so forth. A similar statement holds for any collection of language classes, provided that each class is definable by applying a certain complexity class operator to some Blum complexity class.

Authors: Matthew Fox, Chaitanya Karamchedu

Using properties of Blum complexity measures and certain complexity class operators, we exhibit a total computable and non-decreasing function $t_{\mathsf{poly}}$ such that for all $k$, $\Sigma_k\mathsf{P} = \Sigma_k\mathsf{TIME}(t_{\mathsf{poly}})$, $\mathsf{BPP} = \mathsf{BPTIME}(t_{\mathsf{poly}})$, $\mathsf{RP} = \mathsf{RTIME}(t_{\mathsf{poly}})$, $\mathsf{UP} = \mathsf{UTIME}(t_{\mathsf{poly}})$, $\mathsf{PP} = \mathsf{PTIME}(t_{\mathsf{poly}})$, $\mathsf{Mod}_k\mathsf{P} = \mathsf{Mod}_k\mathsf{TIME}(t_{\mathsf{poly}})$, $\mathsf{PSPACE} = \mathsf{DSPACE}(t_{\mathsf{poly}})$, and so forth. A similar statement holds for any collection of language classes, provided that each class is definable by applying a certain complexity class operator to some Blum complexity class.

ALPHAGMUT: A Rationale-Guided Alpha Shape Graph Neural Network to Evaluate Mutation Effects

from arXiv: Computational Geometry

Authors: Boshen Wang, Bowei Ye, Lin Xu, Jie Liang

In silico methods evaluating the mutation effects of missense mutations are providing an important approach for understanding mutations in personal genomes and identifying disease-relevant biomarkers. However, existing methods, including deep learning methods, heavily rely on sequence-aware information, and do not fully leverage the potential of available 3D structural information. In addition, these methods may exhibit an inability to predict mutations in domains difficult to formulate sequence-based embeddings. In this study, we introduce a novel rationale-guided graph neural network AlphaGMut to evaluate mutation effects and to distinguish pathogenic mutations from neutral mutations. We compute the alpha shapes of protein structures to obtain atomic-resolution edge connectivities and map them to an accurate residue-level graph representation. We then compute structural-, topological-, biophysical-, and sequence properties of the mutation sites, which are assigned as node attributes in the graph. These node attributes could effectively guide the graph neural network to learn the difference between pathogenic and neutral mutations using k-hop message passing with a short training period. We demonstrate that AlphaGMut outperforms state-of-the-art methods, including DeepMind's AlphaMissense, in many performance metrics. In addition, AlphaGMut has the advantage of performing well in alignment-free settings, which provides broader prediction coverage and better generalization compared to current methods requiring deep sequence-aware information.

Authors: Boshen Wang, Bowei Ye, Lin Xu, Jie Liang

In silico methods evaluating the mutation effects of missense mutations are providing an important approach for understanding mutations in personal genomes and identifying disease-relevant biomarkers. However, existing methods, including deep learning methods, heavily rely on sequence-aware information, and do not fully leverage the potential of available 3D structural information. In addition, these methods may exhibit an inability to predict mutations in domains difficult to formulate sequence-based embeddings. In this study, we introduce a novel rationale-guided graph neural network AlphaGMut to evaluate mutation effects and to distinguish pathogenic mutations from neutral mutations. We compute the alpha shapes of protein structures to obtain atomic-resolution edge connectivities and map them to an accurate residue-level graph representation. We then compute structural-, topological-, biophysical-, and sequence properties of the mutation sites, which are assigned as node attributes in the graph. These node attributes could effectively guide the graph neural network to learn the difference between pathogenic and neutral mutations using k-hop message passing with a short training period. We demonstrate that AlphaGMut outperforms state-of-the-art methods, including DeepMind's AlphaMissense, in many performance metrics. In addition, AlphaGMut has the advantage of performing well in alignment-free settings, which provides broader prediction coverage and better generalization compared to current methods requiring deep sequence-aware information.

Maximizing the Maximum Degree in Ordered Yao Graphs

from arXiv: Computational Geometry

Authors: Péter Ágoston, Adrian Dumitrescu, Arsenii Sagdeev, Karamjeet Singh, Ji Zeng

For an ordered point set in a Euclidean space or, more generally, in an abstract metric space, the ordered Yao graph is obtained by connecting each of the points to its closest predecessor by a directed edge. We show that for every set of $n$ points in $\mathbb{R}^d$, there exists an order such that the corresponding ordered Yao graph has maximum degree at least $\log{n}/(4d)$. Apart from the $1/(4d)$ factor, this bound is the best possible. As for the abstract setting, we show that for every $n$-element metric space, there exists an order such that the corresponding ordered Yao graph has maximum degree $\Omega(\sqrt{\log{n}/\log\log{n}})$.

Authors: Péter Ágoston, Adrian Dumitrescu, Arsenii Sagdeev, Karamjeet Singh, Ji Zeng

For an ordered point set in a Euclidean space or, more generally, in an abstract metric space, the ordered Yao graph is obtained by connecting each of the points to its closest predecessor by a directed edge. We show that for every set of $n$ points in $\mathbb{R}^d$, there exists an order such that the corresponding ordered Yao graph has maximum degree at least $\log{n}/(4d)$. Apart from the $1/(4d)$ factor, this bound is the best possible. As for the abstract setting, we show that for every $n$-element metric space, there exists an order such that the corresponding ordered Yao graph has maximum degree $\Omega(\sqrt{\log{n}/\log\log{n}})$.

Efficient Discrepancy Testing for Learning with Distribution Shift

from arXiv: Data Structures and Algorithms

Authors: Gautam Chandrasekaran, Adam R. Klivans, Vasilis Kontonis, Konstantinos Stavropoulos, Arsen Vasilyan

A fundamental notion of distance between train and test distributions from the field of domain adaptation is discrepancy distance. While in general hard to compute, here we provide the first set of provably efficient algorithms for testing localized discrepancy distance, where discrepancy is computed with respect to a fixed output classifier. These results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift (TDS learning) due to Klivans et al. (2023). Our approach generalizes and improves all prior work on TDS learning: (1) we obtain universal learners that succeed simultaneously for large classes of test distributions, (2) achieve near-optimal error rates, and (3) give exponential improvements for constant depth circuits. Our methods further extend to semi-parametric settings and imply the first positive results for low-dimensional convex sets. Additionally, we separate learning and testing phases and obtain algorithms that run in fully polynomial time at test time.

Authors: Gautam Chandrasekaran, Adam R. Klivans, Vasilis Kontonis, Konstantinos Stavropoulos, Arsen Vasilyan

A fundamental notion of distance between train and test distributions from the field of domain adaptation is discrepancy distance. While in general hard to compute, here we provide the first set of provably efficient algorithms for testing localized discrepancy distance, where discrepancy is computed with respect to a fixed output classifier. These results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift (TDS learning) due to Klivans et al. (2023). Our approach generalizes and improves all prior work on TDS learning: (1) we obtain universal learners that succeed simultaneously for large classes of test distributions, (2) achieve near-optimal error rates, and (3) give exponential improvements for constant depth circuits. Our methods further extend to semi-parametric settings and imply the first positive results for low-dimensional convex sets. Additionally, we separate learning and testing phases and obtain algorithms that run in fully polynomial time at test time.

Computing congruences of finite inverse semigroups

from arXiv: Data Structures and Algorithms

Authors: Luna Elliott, Alex Levine, James D. Mitchell

In this paper we present an algorithm for computing a congruence on an inverse semigroup from a collection of generating pairs. This algorithm uses a myriad of techniques from computational group theory, automata, and the theory of inverse semigroups. An initial implementation of this algorithm outperforms existing implementations by several orders of magnitude.

Authors: Luna Elliott, Alex Levine, James D. Mitchell

In this paper we present an algorithm for computing a congruence on an inverse semigroup from a collection of generating pairs. This algorithm uses a myriad of techniques from computational group theory, automata, and the theory of inverse semigroups. An initial implementation of this algorithm outperforms existing implementations by several orders of magnitude.

Compact Parallel Hash Tables on the GPU

from arXiv: Data Structures and Algorithms

Authors: Steef Hegeman, Daan Wöltgens, Anton Wijs, Alfons Laarman

On the GPU, hash table operation speed is determined in large part by cache line efficiency, and state-of-the-art hashing schemes thus divide tables into cache line-sized buckets. This raises the question whether performance can be further improved by increasing the number of entries that fit in such buckets. Known compact hashing techniques have not yet been adapted to the massively parallel setting, nor have they been evaluated on the GPU. We consider a compact version of bucketed cuckoo hashing, and a version of compact iceberg hashing suitable for the GPU. We discuss the tables from a theoretical perspective, and provide an open source implementation of both schemes in CUDA for comparative benchmarking. In terms of performance, the state-of-the-art cuckoo hashing benefits from compactness on lookups and insertions (most experiments show at least 10-20% increase in throughput), and the iceberg table benefits significantly, to the point of being comparable to compact cuckoo hashing--while supporting performant dynamic operation.

Authors: Steef Hegeman, Daan Wöltgens, Anton Wijs, Alfons Laarman

On the GPU, hash table operation speed is determined in large part by cache line efficiency, and state-of-the-art hashing schemes thus divide tables into cache line-sized buckets. This raises the question whether performance can be further improved by increasing the number of entries that fit in such buckets. Known compact hashing techniques have not yet been adapted to the massively parallel setting, nor have they been evaluated on the GPU. We consider a compact version of bucketed cuckoo hashing, and a version of compact iceberg hashing suitable for the GPU. We discuss the tables from a theoretical perspective, and provide an open source implementation of both schemes in CUDA for comparative benchmarking. In terms of performance, the state-of-the-art cuckoo hashing benefits from compactness on lookups and insertions (most experiments show at least 10-20% increase in throughput), and the iceberg table benefits significantly, to the point of being comparable to compact cuckoo hashing--while supporting performant dynamic operation.

Reducing the Space Used by the Sieve of Eratosthenes When Factoring

from arXiv: Data Structures and Algorithms

Authors: Samuel Hartman, Jonathan P. Sorenson

We present a version of the sieve of Eratosthenes that can factor all integers $\le x$ in $O(x \log\log x)$ arithmetic operations using at most $O(\sqrt{x}/\log\log x)$ bits of space. This is an improved space bound under the condition that the algorithm takes at most $O(x\log\log x)$ time. We also show our algorithm performs well in practice.

Authors: Samuel Hartman, Jonathan P. Sorenson

We present a version of the sieve of Eratosthenes that can factor all integers $\le x$ in $O(x \log\log x)$ arithmetic operations using at most $O(\sqrt{x}/\log\log x)$ bits of space. This is an improved space bound under the condition that the algorithm takes at most $O(x\log\log x)$ time. We also show our algorithm performs well in practice.

Dynamic Correlation Clustering in Sublinear Update Time

from arXiv: Data Structures and Algorithms

Authors: Vincent Cohen-Addad, Silvio Lattanzi, Andreas Maggiori, Nikos Parotsidis

We study the classic problem of correlation clustering in dynamic node streams. In this setting, nodes are either added or randomly deleted over time, and each node pair is connected by a positive or negative edge. The objective is to continuously find a partition which minimizes the sum of positive edges crossing clusters and negative edges within clusters. We present an algorithm that maintains an $O(1)$-approximation with $O$(polylog $n$) amortized update time. Prior to our work, Behnezhad, Charikar, Ma, and L. Tan achieved a $5$-approximation with $O(1)$ expected update time in edge streams which translates in node streams to an $O(D)$-update time where $D$ is the maximum possible degree. Finally we complement our theoretical analysis with experiments on real world data.

Authors: Vincent Cohen-Addad, Silvio Lattanzi, Andreas Maggiori, Nikos Parotsidis

We study the classic problem of correlation clustering in dynamic node streams. In this setting, nodes are either added or randomly deleted over time, and each node pair is connected by a positive or negative edge. The objective is to continuously find a partition which minimizes the sum of positive edges crossing clusters and negative edges within clusters. We present an algorithm that maintains an $O(1)$-approximation with $O$(polylog $n$) amortized update time. Prior to our work, Behnezhad, Charikar, Ma, and L. Tan achieved a $5$-approximation with $O(1)$ expected update time in edge streams which translates in node streams to an $O(D)$-update time where $D$ is the maximum possible degree. Finally we complement our theoretical analysis with experiments on real world data.

The Behavior of Tree-Width and Path-Width under Graph Operations and Graph Transformations

from arXiv: Data Structures and Algorithms

Authors: Frank Gurski, Robin Weishaupt

Tree-width and path-width are well-known graph parameters. Many NP-hard graph problems allow polynomial-time solutions, when restricted to graphs of bounded tree-width or bounded path-width. In this work, we study the behavior of tree-width and path-width under various unary and binary graph transformations. Doing so, for considered transformations we provide upper and lower bounds for the tree-width and path-width of the resulting graph in terms of the tree-width and path-width of the initial graphs or argue why such bounds are impossible to specify. Among the studied, unary transformations are vertex addition, vertex deletion, edge addition, edge deletion, subgraphs, vertex identification, edge contraction, edge subdivision, minors, powers of graphs, line graphs, edge complements, local complements, Seidel switching, and Seidel complementation. Among the studied, binary transformations we consider the disjoint union, join, union, substitution, graph product, 1-sum, and corona of two graphs.

Authors: Frank Gurski, Robin Weishaupt

Tree-width and path-width are well-known graph parameters. Many NP-hard graph problems allow polynomial-time solutions, when restricted to graphs of bounded tree-width or bounded path-width. In this work, we study the behavior of tree-width and path-width under various unary and binary graph transformations. Doing so, for considered transformations we provide upper and lower bounds for the tree-width and path-width of the resulting graph in terms of the tree-width and path-width of the initial graphs or argue why such bounds are impossible to specify. Among the studied, unary transformations are vertex addition, vertex deletion, edge addition, edge deletion, subgraphs, vertex identification, edge contraction, edge subdivision, minors, powers of graphs, line graphs, edge complements, local complements, Seidel switching, and Seidel complementation. Among the studied, binary transformations we consider the disjoint union, join, union, substitution, graph product, 1-sum, and corona of two graphs.

Roping in Uncertainty: Robustness and Regularization in Markov Games

from arXiv: Data Structures and Algorithms

Authors: Jeremy McMahan, Giovanni Artiglio, Qiaomin Xie

We study robust Markov games (RMG) with $s$-rectangular uncertainty. We show a general equivalence between computing a robust Nash equilibrium (RNE) of a $s$-rectangular RMG and computing a Nash equilibrium (NE) of an appropriately constructed regularized MG. The equivalence result yields a planning algorithm for solving $s$-rectangular RMGs, as well as provable robustness guarantees for policies computed using regularized methods. However, we show that even for just reward-uncertain two-player zero-sum matrix games, computing an RNE is PPAD-hard. Consequently, we derive a special uncertainty structure called efficient player-decomposability and show that RNE for two-player zero-sum RMG in this class can be provably solved in polynomial time. This class includes commonly used uncertainty sets such as $L_1$ and $L_\infty$ ball uncertainty sets.

Authors: Jeremy McMahan, Giovanni Artiglio, Qiaomin Xie

We study robust Markov games (RMG) with $s$-rectangular uncertainty. We show a general equivalence between computing a robust Nash equilibrium (RNE) of a $s$-rectangular RMG and computing a Nash equilibrium (NE) of an appropriately constructed regularized MG. The equivalence result yields a planning algorithm for solving $s$-rectangular RMGs, as well as provable robustness guarantees for policies computed using regularized methods. However, we show that even for just reward-uncertain two-player zero-sum matrix games, computing an RNE is PPAD-hard. Consequently, we derive a special uncertainty structure called efficient player-decomposability and show that RNE for two-player zero-sum RMG in this class can be provably solved in polynomial time. This class includes commonly used uncertainty sets such as $L_1$ and $L_\infty$ ball uncertainty sets.

Matching with Nested and Bundled Pandora Boxes

from arXiv: Data Structures and Algorithms

Authors: Robin Bowers, Bo Waggoner

We consider max-weighted matching with costs for learning the weights, modeled as a "Pandora's Box" on each endpoint of an edge. Each vertex has an initially-unknown value for being matched to a neighbor, and an algorithm must pay some cost to observe this value. The goal is to maximize the total matched value minus costs. Our model is inspired by two-sided settings, such as matching employees to employers. Importantly for such settings, we allow for negative values which cause existing approaches to fail. We first prove upper bounds for algorithms in two natural classes. Any algorithm that "bundles" the two Pandora boxes incident to an edge is an $o(1)$-approximation. Likewise, any "vertex-based" algorithm, which uses properties of the separate Pandora's boxes but does not consider the interaction of their value distributions, is an $o(1)$-approximation. Instead, we utilize Pandora's Nested-Box Problem, i.e. multiple stages of inspection. We give a self-contained, fully constructive optimal solution to the nested-boxes problem, which may have structural observations of interest compared to prior work. By interpreting each edge as a nested box, we leverage this solution to obtain a constant-factor approximation algorithm. Finally, we show any ``edge-based'' algorithm, which considers the interactions of values along an edge but not with the rest of the graph, is also an $o(1)$-approximation.

Authors: Robin Bowers, Bo Waggoner

We consider max-weighted matching with costs for learning the weights, modeled as a "Pandora's Box" on each endpoint of an edge. Each vertex has an initially-unknown value for being matched to a neighbor, and an algorithm must pay some cost to observe this value. The goal is to maximize the total matched value minus costs. Our model is inspired by two-sided settings, such as matching employees to employers. Importantly for such settings, we allow for negative values which cause existing approaches to fail. We first prove upper bounds for algorithms in two natural classes. Any algorithm that "bundles" the two Pandora boxes incident to an edge is an $o(1)$-approximation. Likewise, any "vertex-based" algorithm, which uses properties of the separate Pandora's boxes but does not consider the interaction of their value distributions, is an $o(1)$-approximation. Instead, we utilize Pandora's Nested-Box Problem, i.e. multiple stages of inspection. We give a self-contained, fully constructive optimal solution to the nested-boxes problem, which may have structural observations of interest compared to prior work. By interpreting each edge as a nested box, we leverage this solution to obtain a constant-factor approximation algorithm. Finally, we show any ``edge-based'' algorithm, which considers the interactions of values along an edge but not with the rest of the graph, is also an $o(1)$-approximation.

A Sublinear Algorithm for Approximate Shortest Paths in Large Networks

from arXiv: Data Structures and Algorithms

Authors: Sabyasachi Basu, Nadia Kōshima, Talya Eden, Omri Ben-Eliezer, C. Seshadhri

Computing distances and finding shortest paths in massive real-world networks is a fundamental algorithmic task in network analysis. There are two main approaches to solving this task. On one hand are traversal-based algorithms like bidirectional breadth-first search (BiBFS) with no preprocessing step and slow individual distance inquiries. On the other hand are indexing-based approaches, which maintain a large index. This allows for answering individual inquiries very fast; however, index creation is prohibitively expensive. We seek to bridge these two extremes: quickly answer distance inquiries without the need for costly preprocessing. In this work, we propose a new algorithm and data structure, WormHole, for approximate shortest path computations. WormHole leverages structural properties of social networks to build a sublinearly sized index, drawing upon the explicit core-periphery decomposition of Ben-Eliezer et al. Empirically, the preprocessing time of WormHole improves upon index-based solutions by orders of magnitude, and individual inquiries are consistently much faster than in BiBFS. The acceleration comes at the cost of a minor accuracy trade-off. Nonetheless, our empirical evidence demonstrates that WormHole accurately answers essentially all inquiries within a maximum additive error of 2. We complement these empirical results with provable theoretical guarantees, showing that WormHole requires $n^{o(1)}$ node queries per distance inquiry in random power-law networks. In contrast, any approach without a preprocessing step requires $n^{\Omega(1)}$ queries for the same task. WormHole does not require reading the whole graph. Unlike the vast majority of index-based algorithms, it returns paths, not just distances. For faster inquiry times, it can be combined effectively with other index-based solutions, by running them only on the sublinear core.

Authors: Sabyasachi Basu, Nadia Kōshima, Talya Eden, Omri Ben-Eliezer, C. Seshadhri

Computing distances and finding shortest paths in massive real-world networks is a fundamental algorithmic task in network analysis. There are two main approaches to solving this task. On one hand are traversal-based algorithms like bidirectional breadth-first search (BiBFS) with no preprocessing step and slow individual distance inquiries. On the other hand are indexing-based approaches, which maintain a large index. This allows for answering individual inquiries very fast; however, index creation is prohibitively expensive. We seek to bridge these two extremes: quickly answer distance inquiries without the need for costly preprocessing. In this work, we propose a new algorithm and data structure, WormHole, for approximate shortest path computations. WormHole leverages structural properties of social networks to build a sublinearly sized index, drawing upon the explicit core-periphery decomposition of Ben-Eliezer et al. Empirically, the preprocessing time of WormHole improves upon index-based solutions by orders of magnitude, and individual inquiries are consistently much faster than in BiBFS. The acceleration comes at the cost of a minor accuracy trade-off. Nonetheless, our empirical evidence demonstrates that WormHole accurately answers essentially all inquiries within a maximum additive error of 2. We complement these empirical results with provable theoretical guarantees, showing that WormHole requires $n^{o(1)}$ node queries per distance inquiry in random power-law networks. In contrast, any approach without a preprocessing step requires $n^{\Omega(1)}$ queries for the same task. WormHole does not require reading the whole graph. Unlike the vast majority of index-based algorithms, it returns paths, not just distances. For faster inquiry times, it can be combined effectively with other index-based solutions, by running them only on the sublinear core.

Approximating Maximum Matching Requires Almost Quadratic Time

from arXiv: Data Structures and Algorithms

Authors: Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein

We study algorithms for estimating the size of maximum matching. This problem has been subject to extensive research. For $n$-vertex graphs, Bhattacharya, Kiss, and Saranurak [FOCS'23] (BKS) showed that an estimate that is within $\varepsilon n$ of the optimal solution can be achieved in $n^{2-\Omega_\varepsilon(1)}$ time, where $n$ is the number of vertices. While this is subquadratic in $n$ for any fixed $\varepsilon > 0$, it gets closer and closer to the trivial $\Theta(n^2)$ time algorithm that reads the entire input as $\varepsilon$ is made smaller and smaller. In this work, we close this gap and show that the algorithm of BKS is close to optimal. In particular, we prove that for any fixed $\delta > 0$, there is another fixed $\varepsilon = \varepsilon(\delta) > 0$ such that estimating the size of maximum matching within an additive error of $\varepsilon n$ requires $\Omega(n^{2-\delta})$ time in the adjacency list model.

Authors: Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein

We study algorithms for estimating the size of maximum matching. This problem has been subject to extensive research. For $n$-vertex graphs, Bhattacharya, Kiss, and Saranurak [FOCS'23] (BKS) showed that an estimate that is within $\varepsilon n$ of the optimal solution can be achieved in $n^{2-\Omega_\varepsilon(1)}$ time, where $n$ is the number of vertices. While this is subquadratic in $n$ for any fixed $\varepsilon > 0$, it gets closer and closer to the trivial $\Theta(n^2)$ time algorithm that reads the entire input as $\varepsilon$ is made smaller and smaller. In this work, we close this gap and show that the algorithm of BKS is close to optimal. In particular, we prove that for any fixed $\delta > 0$, there is another fixed $\varepsilon = \varepsilon(\delta) > 0$ such that estimating the size of maximum matching within an additive error of $\varepsilon n$ requires $\Omega(n^{2-\delta})$ time in the adjacency list model.

Thursday, June 13

Crud Hypothesis Testing

from Ben Recht

Meehl's Philosophical Psychology, Interlude 3.

This post contains some of my own thoughts reacting to Paul Meehl’s course “Philosophical Psychology.” I’m taking a brief interlude from my run-down of Lecture 7. Here’s the full table of contents of my blogging through the class.

Meehl’s course has already emphasized that significance testing is a very weak form of theory corroboration. Testing if some correlation is non-zero is very different from the earlier examples in the course. Saying “it will rain in April” is much less compelling than predicting next year’s precise daily rainfall in a specific city. It’s frankly less compelling than predicting a numerical value of the pressure of a gas from its volume and temperature. I’m a bit reluctant to plead for a “better” form of significance testing. Part of the issue with the human-facing sciences is the obsession with reducing all cause and effect, all experimental evidence, to Fisher’s exact test. Randomized controlled experiments are a particular experiment design, not the only experiment design. Someday, we’ll all break free from this bizarre, simplistic epistemology.

But that won’t be today. Let me ask something incremental rather than revolutionary for a moment. What would null hypothesis significance testing look like if we took crud seriously? We know the standard null hypothesis (i.e., that the means of two groups are equal) is never true. What seems to be true is that if we draw two random variables out of a pot, they will be surprisingly correlated. If that’s true, what should we test?

Here’s a crudded-up null hypothesis:

H0: Someone sampled your two variables X and Y from the crud distribution.

We could ask what is the probability of seeing a recorded correlation if H0 is true. What would the test look like? We’d need to compute a distribution of the potentially observed Pearson r values. Since we’re working with finite data, that distribution would be the convolution of the distribution of a sample correlation coefficient r (perhaps making a normal assumption) with the crud distribution. While you probably couldn’t compute this convolution in closed form, you could get a reasonable numerical approximation. The “p-value” now is synonymous with how far your data’s correlation is into the tail of this computed empirical crud distribution. If it’s more than two standard deviations from the mean crud, maybe you’re onto something.

Note that this sort of testing can’t cheat by growing n. In standard null hypothesis significance testing, a small correlation will be significant if n is large enough. But big n does not mean you’ll refute the cruddy null hypothesis. In fact, all that happens with growing n here is the “empirical” crud distribution converges to “population” crud distribution. That is, the convolution doesn’t change the distribution much. When n is moderate, you will be more or less testing if your correlation is more than two standard deviations away from the mean of the crud distribution. 

Again, I don’t think this cruddy null testing solves everything, but it is definitely better than what we do now. We should know what is a reasonably low bar for an effect size. We should power our studies to refute that low bar. This doesn’t seem like an unreasonable request, does it?

What stops this from happening is that we don’t seem too enthusiastic to measure these crud distributions carefully. What would that look like?  Since the crud distribution is a distribution of correlation coefficients, we’d need to find a somewhat reasonable set of pairings of treatments and control variables specific to a field. We’d need reasonable datasets from which we could sample these pairings and compute the crud distribution. To me, this sounds like what Meehl and Lykken did in the 1960s: finding surveys with candidly answered questionnaires and tabulating correlations. In 2024, we have so many different tabulated spreadsheets we can download. I’m curious to see what crud we’d find.

For people who are familiar with his writing, I don’t think my suggestions here are different than Jacob Cohen’s. In the 1960s, Cohen tried to formalize reasonable standardized effect size measures and use these to guide experiment design and analysis in psychology. One of Cohen’s more popular measures, Cohen’s d, is more or less equal to twice the correlation coefficient:

Cohen asked that people compute d, and then evaluate the effect on a relative scale (small effects are d<0.2, large effects are d>0.8). One problem with Cohen is he assumed the scale for d was universal. But it certainly varies from field to field. It varies within fields as well, depending on the questions you’re asking. As I noted yesterday in epidemiology, we will always have Cohen’s d less than 0.2 for diseases like cancer. So to merge Meehl with Cohen, we’d need to look at the right distribution of effect sizes of random interactions and use this to set a relative scale for the believability of stories about correlations.

After my dives into the history of machine learning, I’m not at all surprised that I’m rediscovering sensible advice from the 1960s. In fact, I wrote a book about why we keep reinventing ideas from the Cold War that will be out next year. (More on that later). My point today is that some ideas from the 1960s shouldn’t go out of style. Everyone pays lip service to Cohen, but then he gets ignored in practice. Cohen laments this disregard in the preface to the 1988 edition of his book. Perhaps this means that incremental changes aren’t the answer, and the system of mindless significance testing exists to maintain a powerful status quo. If that’s the case, maybe we need a revolution after all.

Wait! Didn’t we have a revolution? You know, a “credibility revolution?” Did that fix anything? Let me take on that question in the next post.

Subscribe now

By Ben Recht

Favorite Theorems: Algebraic Circuits

from Computational Complexity

Most of my favorite theorems tell us something new about the world of complexity. But let's not forget the greatest technical challenges in our area: proving separations that are "obviously" true. Here's the most exciting such result from the past decade.  

Superpolynomial Lower Bounds Against Low-Depth Algebraic Circuits
Nutan Limaye, Srikanth Srinivasan and Sébastien Tavenas

In this model, the inputs are variables and constants, and the goal is to create a specific formal polynomial using the gate operations of plus and times. Limaye, Srinivasan and Tavenas find an explicit polynomial such that any polynomial-size constant-depth algebraic circuit will compute it. 

How explicit? Here it is: Take d nxn matrices, multiply them together and output the top left element of the product. The \(N=dn^2\) variables are the entries of the matrices. The top left element is a polynomial of the inputs that can be computed by a simple polynomial-size circuit that just computes the iterated multiplication, just not in constant depth. The paper shows that for an unbounded d that is \(o(\log n)\), there is no constant-depth polynomial-size algebraic circuit.

The authors first prove a lower bound for set multilinear circuits and then extend to more general algebraic circuits.

By Lance Fortnow

Most of my favorite theorems tell us something new about the world of complexity. But let's not forget the greatest technical challenges in our area: proving separations that are "obviously" true. Here's the most exciting such result from the past decade.  

Superpolynomial Lower Bounds Against Low-Depth Algebraic Circuits
Nutan Limaye, Srikanth Srinivasan and Sébastien Tavenas

In this model, the inputs are variables and constants, and the goal is to create a specific formal polynomial using the gate operations of plus and times. Limaye, Srinivasan and Tavenas find an explicit polynomial such that any polynomial-size constant-depth algebraic circuit will compute it. 

How explicit? Here it is: Take d nxn matrices, multiply them together and output the top left element of the product. The \(N=dn^2\) variables are the entries of the matrices. The top left element is a polynomial of the inputs that can be computed by a simple polynomial-size circuit that just computes the iterated multiplication, just not in constant depth. The paper shows that for an unbounded d that is \(o(\log n)\), there is no constant-depth polynomial-size algebraic circuit.

The authors first prove a lower bound for set multilinear circuits and then extend to more general algebraic circuits.

By Lance Fortnow

TR24-105 | Communication Complexity vs Randomness Complexity in Interactive Proofs | Benny Applebaum, Kaartik Bhushan, Manoj Prabhakaran

from ECCC Papers

In this note, we study the interplay between the communication from a verifier in a general private-coin interactive protocol and the number of random bits it uses in the protocol. Under worst-case derandomization assumptions, we show that it is possible to transform any $I$-round interactive protocol that uses $\rho$ random bits into another one for the same problem with the additional property that the verifier's communication is bounded by $O(I\cdot \rho)$. Importantly, this is done with a minor, logarithmic, increase in the communication from the prover to the verifier and while preserving the randomness complexity. Along the way, we introduce a new compression game between computationally-bounded compressor and computationally-unbounded decompressor and a new notion of conditioned efficient distributions that may be of independent interest. Our solutions are based on a combination of perfect hashing and pseudorandom generators.

In this note, we study the interplay between the communication from a verifier in a general private-coin interactive protocol and the number of random bits it uses in the protocol. Under worst-case derandomization assumptions, we show that it is possible to transform any $I$-round interactive protocol that uses $\rho$ random bits into another one for the same problem with the additional property that the verifier's communication is bounded by $O(I\cdot \rho)$. Importantly, this is done with a minor, logarithmic, increase in the communication from the prover to the verifier and while preserving the randomness complexity. Along the way, we introduce a new compression game between computationally-bounded compressor and computationally-unbounded decompressor and a new notion of conditioned efficient distributions that may be of independent interest. Our solutions are based on a combination of perfect hashing and pseudorandom generators.

Near-Optimal Learning and Planning in Separated Latent MDPs

from arXiv: Computational Complexity

Authors: Fan Chen, Constantinos Daskalakis, Noah Golowich, Alexander Rakhlin

We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibility results, we consider several notions of separation of the constituent MDPs. The main thrust of this paper is in establishing a nearly-sharp *statistical threshold* for the horizon length necessary for efficient learning. On the computational side, we show that under a weaker assumption of separability under the optimal policy, there is a quasi-polynomial algorithm with time complexity scaling in terms of the statistical threshold. We further show a near-matching time complexity lower bound under the exponential time hypothesis.

Authors: Fan Chen, Constantinos Daskalakis, Noah Golowich, Alexander Rakhlin

We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibility results, we consider several notions of separation of the constituent MDPs. The main thrust of this paper is in establishing a nearly-sharp *statistical threshold* for the horizon length necessary for efficient learning. On the computational side, we show that under a weaker assumption of separability under the optimal policy, there is a quasi-polynomial algorithm with time complexity scaling in terms of the statistical threshold. We further show a near-matching time complexity lower bound under the exponential time hypothesis.

Nerve Models of Subdivision Bifiltrations

from arXiv: Computational Geometry

Authors: Michael Lesnick, Ken McCabe

We study the size of Sheehy's subdivision bifiltrations, up to homotopy. We focus in particular on the subdivision-Rips bifiltration $\mathcal{SR}(X)$ of a metric space $X$, the only density-sensitive bifiltration on metric spaces known to satisfy a strong robustness property. Given a simplicial filtration $\mathcal{F}$ with a total of $m$ maximal simplices across all indices, we introduce a nerve-based simplicial model for its subdivision bifiltration $\mathcal{SF}$ whose $k$-skeleton has size $O(m^{k+1})$. We also show that the $0$-skeleton of any simplicial model of $\mathcal{SF}$ has size at least $m$. We give several applications: For an arbitrary metric space $X$, we introduce a $\sqrt{2}$-approximation to $\mathcal{SR}(X)$, denoted $\mathcal{J}(X)$, whose $k$-skeleton has size $O(|X|^{k+2})$. This improves on the previous best approximation bound of $\sqrt{3}$, achieved by the degree-Rips bifiltration, which implies that $\mathcal{J}(X)$ is more robust than degree-Rips. Moreover, we show that the approximation factor of $\sqrt{2}$ is tight; in particular, there exists no exact model of $\mathcal{SR}(X)$ with poly-size skeleta. On the other hand, we show that for $X$ in a fixed-dimensional Euclidean space with the $\ell_p$-metric, there exists an exact model of $\mathcal{SR}(X)$ with poly-size skeleta for $p\in \{1, \infty\}$, as well as a $(1+\epsilon)$-approximation to $\mathcal{SR}(X)$ with poly-size skeleta for any $p \in (1, \infty)$ and fixed ${\epsilon > 0}$.

Authors: Michael Lesnick, Ken McCabe

We study the size of Sheehy's subdivision bifiltrations, up to homotopy. We focus in particular on the subdivision-Rips bifiltration $\mathcal{SR}(X)$ of a metric space $X$, the only density-sensitive bifiltration on metric spaces known to satisfy a strong robustness property. Given a simplicial filtration $\mathcal{F}$ with a total of $m$ maximal simplices across all indices, we introduce a nerve-based simplicial model for its subdivision bifiltration $\mathcal{SF}$ whose $k$-skeleton has size $O(m^{k+1})$. We also show that the $0$-skeleton of any simplicial model of $\mathcal{SF}$ has size at least $m$. We give several applications: For an arbitrary metric space $X$, we introduce a $\sqrt{2}$-approximation to $\mathcal{SR}(X)$, denoted $\mathcal{J}(X)$, whose $k$-skeleton has size $O(|X|^{k+2})$. This improves on the previous best approximation bound of $\sqrt{3}$, achieved by the degree-Rips bifiltration, which implies that $\mathcal{J}(X)$ is more robust than degree-Rips. Moreover, we show that the approximation factor of $\sqrt{2}$ is tight; in particular, there exists no exact model of $\mathcal{SR}(X)$ with poly-size skeleta. On the other hand, we show that for $X$ in a fixed-dimensional Euclidean space with the $\ell_p$-metric, there exists an exact model of $\mathcal{SR}(X)$ with poly-size skeleta for $p\in \{1, \infty\}$, as well as a $(1+\epsilon)$-approximation to $\mathcal{SR}(X)$ with poly-size skeleta for any $p \in (1, \infty)$ and fixed ${\epsilon > 0}$.

DeepJEB: 3D Deep Learning-based Synthetic Jet Engine Bracket Dataset

from arXiv: Computational Geometry

Authors: Seongjun Hong, Yongmin Kwon, Dongju Shin, Jangseop Park, Namwoo Kang

Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be improved for developing robust data-driven surrogate models. This study presents the DeepJEB dataset, which has been created using deep generative models and automated engineering simulation pipelines, to overcome these challenges. Moreover, this study provides comprehensive 3D geometries and their corresponding structural analysis data. Key experiments validated the effectiveness of the DeepJEB dataset, demonstrating significant improvements in the prediction accuracy and reliability of surrogate models trained on this data. The enhanced dataset showed a broader design space and better generalization capabilities than traditional datasets. These findings highlight the potential of DeepJEB as a benchmark dataset for developing reliable surrogate models in structural engineering. The DeepJEB dataset supports advanced modeling techniques, such as graph neural networks (GNNs) and high-dimensional convolutional networks (CNNs), leveraging node-level field data for precise predictions. This dataset is set to drive innovation in engineering design applications, enabling more accurate and efficient structural performance predictions. The DeepJEB dataset is publicly accessible at: www.narnia.ai/dataset

Authors: Seongjun Hong, Yongmin Kwon, Dongju Shin, Jangseop Park, Namwoo Kang

Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be improved for developing robust data-driven surrogate models. This study presents the DeepJEB dataset, which has been created using deep generative models and automated engineering simulation pipelines, to overcome these challenges. Moreover, this study provides comprehensive 3D geometries and their corresponding structural analysis data. Key experiments validated the effectiveness of the DeepJEB dataset, demonstrating significant improvements in the prediction accuracy and reliability of surrogate models trained on this data. The enhanced dataset showed a broader design space and better generalization capabilities than traditional datasets. These findings highlight the potential of DeepJEB as a benchmark dataset for developing reliable surrogate models in structural engineering. The DeepJEB dataset supports advanced modeling techniques, such as graph neural networks (GNNs) and high-dimensional convolutional networks (CNNs), leveraging node-level field data for precise predictions. This dataset is set to drive innovation in engineering design applications, enabling more accurate and efficient structural performance predictions. The DeepJEB dataset is publicly accessible at: https://www.narnia.ai/dataset

Resource Leveling: Complexity of a UET two-processor scheduling variant and related problems

from arXiv: Data Structures and Algorithms

Authors: Pascale Bendotti, Luca Brunod Indrigo, Philippe Chrétienne, Bruno Escoffier

This paper mainly focuses on a resource leveling variant of a two-processor scheduling problem. The latter problem is to schedule a set of dependent UET jobs on two identical processors with minimum makespan. It is known to be polynomial-time solvable. In the variant we consider, the resource constraint on processors is relaxed and the objective is no longer to minimize makespan. Instead, a deadline is imposed on the makespan and the objective is to minimize the total resource use exceeding a threshold resource level of two. This resource leveling criterion is known as the total overload cost. Sophisticated matching arguments allow us to provide a polynomial algorithm computing the optimal solution as a function of the makespan deadline. It extends a solving method from the literature for the two-processor scheduling problem. Moreover, the complexity of related resource leveling problems sharing the same objective is studied. These results lead to polynomial or pseudo-polynomial algorithms or NP-hardness proofs, allowing for an interesting comparison with classical machine scheduling problems.

Authors: Pascale Bendotti, Luca Brunod Indrigo, Philippe Chrétienne, Bruno Escoffier

This paper mainly focuses on a resource leveling variant of a two-processor scheduling problem. The latter problem is to schedule a set of dependent UET jobs on two identical processors with minimum makespan. It is known to be polynomial-time solvable. In the variant we consider, the resource constraint on processors is relaxed and the objective is no longer to minimize makespan. Instead, a deadline is imposed on the makespan and the objective is to minimize the total resource use exceeding a threshold resource level of two. This resource leveling criterion is known as the total overload cost. Sophisticated matching arguments allow us to provide a polynomial algorithm computing the optimal solution as a function of the makespan deadline. It extends a solving method from the literature for the two-processor scheduling problem. Moreover, the complexity of related resource leveling problems sharing the same objective is studied. These results lead to polynomial or pseudo-polynomial algorithms or NP-hardness proofs, allowing for an interesting comparison with classical machine scheduling problems.

DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding Windows (Technical Report)

from arXiv: Data Structures and Algorithms

Authors: Yiping Wang, Yanhao Wang, Cen Chen

The sliding window model of computation captures scenarios in which data are continually arriving in the form of a stream, and only the most recent $w$ items are used for analysis. In this setting, an algorithm needs to accurately track some desired statistics over the sliding window using a small space. When data streams contain sensitive information about individuals, the algorithm is also urgently needed to provide a provable guarantee of privacy. In this paper, we focus on the two fundamental problems of privately (1) estimating the frequency of an arbitrary item and (2) identifying the most frequent items (i.e., \emph{heavy hitters}), in the sliding window model. We propose \textsc{DPSW-Sketch}, a sliding window framework based on the count-min sketch that not only satisfies differential privacy over the stream but also approximates the results for frequency and heavy-hitter queries within bounded errors in sublinear time and space w.r.t.~$w$. Extensive experiments on five real-world and synthetic datasets show that \textsc{DPSW-Sketch} provides significantly better utility-privacy trade-offs than state-of-the-art methods.

Authors: Yiping Wang, Yanhao Wang, Cen Chen

The sliding window model of computation captures scenarios in which data are continually arriving in the form of a stream, and only the most recent $w$ items are used for analysis. In this setting, an algorithm needs to accurately track some desired statistics over the sliding window using a small space. When data streams contain sensitive information about individuals, the algorithm is also urgently needed to provide a provable guarantee of privacy. In this paper, we focus on the two fundamental problems of privately (1) estimating the frequency of an arbitrary item and (2) identifying the most frequent items (i.e., \emph{heavy hitters}), in the sliding window model. We propose \textsc{DPSW-Sketch}, a sliding window framework based on the count-min sketch that not only satisfies differential privacy over the stream but also approximates the results for frequency and heavy-hitter queries within bounded errors in sublinear time and space w.r.t.~$w$. Extensive experiments on five real-world and synthetic datasets show that \textsc{DPSW-Sketch} provides significantly better utility-privacy trade-offs than state-of-the-art methods.

Approximating Optimum Online for Capacitated Resource Allocation

from arXiv: Data Structures and Algorithms

Authors: Alexander Braun, Thomas Kesselheim, Tristan Pollner, Amin Saberi

We study online capacitated resource allocation, a natural generalization of online stochastic max-weight bipartite matching. This problem is motivated by ride-sharing and Internet advertising applications, where online arrivals may have the capacity to serve multiple offline users. Our main result is a polynomial-time online algorithm which is $(1/2 + \kappa)$-approximate to the optimal online algorithm for $\kappa = 0.0115$. This can be contrasted to the (tight) $1/2$-competitive algorithms to the optimum offline benchmark from the prophet inequality literature. Optimum online is a recently popular benchmark for online Bayesian problems which can use unbounded computation, but not "prophetic" knowledge of future inputs. Our algorithm (which also works for the case of stochastic rewards) rounds a generalized LP relaxation from the unit-capacity case via a two-proposal algorithm, as in previous works in the online matching literature. A key technical challenge in deriving our guarantee is bounding the positive correlation among users introduced when rounding our LP relaxation online. Unlike in the case of unit capacities, this positive correlation is unavoidable for guarantees beyond $1/2$. Conceptually, our results show that the study of optimum online as a benchmark can reveal problem-specific insights that are irrelevant to competitive analysis.

Authors: Alexander Braun, Thomas Kesselheim, Tristan Pollner, Amin Saberi

We study online capacitated resource allocation, a natural generalization of online stochastic max-weight bipartite matching. This problem is motivated by ride-sharing and Internet advertising applications, where online arrivals may have the capacity to serve multiple offline users. Our main result is a polynomial-time online algorithm which is $(1/2 + \kappa)$-approximate to the optimal online algorithm for $\kappa = 0.0115$. This can be contrasted to the (tight) $1/2$-competitive algorithms to the optimum offline benchmark from the prophet inequality literature. Optimum online is a recently popular benchmark for online Bayesian problems which can use unbounded computation, but not "prophetic" knowledge of future inputs. Our algorithm (which also works for the case of stochastic rewards) rounds a generalized LP relaxation from the unit-capacity case via a two-proposal algorithm, as in previous works in the online matching literature. A key technical challenge in deriving our guarantee is bounding the positive correlation among users introduced when rounding our LP relaxation online. Unlike in the case of unit capacities, this positive correlation is unavoidable for guarantees beyond $1/2$. Conceptually, our results show that the study of optimum online as a benchmark can reveal problem-specific insights that are irrelevant to competitive analysis.

A square root algorithm faster than Newton's method for multiprecision numbers, using floating-point arithmetic

from arXiv: Data Structures and Algorithms

Authors: Fabio Romano

In this paper, an optimized version of classical Bombelli's algorithm for computing integer square roots is presented. In particular, floating-point arithmetic is used to compute the initial guess of each digit of the root, following similar ideas to those used in "The Art of Computer Programming" Vol. 2, p. 4.3.1 for division. A program with an implementation of the algorithm in Java is also presented, and its running time is compared with that of the algorithm provided by the Java standard library, which uses the Newton's method. From tests, the algorithm presented here turns out to be much faster.

Authors: Fabio Romano

In this paper, an optimized version of classical Bombelli's algorithm for computing integer square roots is presented. In particular, floating-point arithmetic is used to compute the initial guess of each digit of the root, following similar ideas to those used in "The Art of Computer Programming" Vol. 2, p. 4.3.1 for division. A program with an implementation of the algorithm in Java is also presented, and its running time is compared with that of the algorithm provided by the Java standard library, which uses the Newton's method. From tests, the algorithm presented here turns out to be much faster.

Efficient Parallel Multi-Hop Reasoning: A Scalable Approach for Knowledge Graph Analysis

from arXiv: Data Structures and Algorithms

Authors: Jesmin Jahan Tithi, Fabio Checconi, Fabrizio Petrini

Multi-hop reasoning (MHR) is a process in artificial intelligence and natural language processing where a system needs to make multiple inferential steps to arrive at a conclusion or answer. In the context of knowledge graphs or databases, it involves traversing multiple linked entities and relationships to understand complex queries or perform tasks requiring a deeper understanding. Multi-hop reasoning is a critical function in various applications, including question answering, knowledge base completion, and link prediction. It has garnered significant interest in artificial intelligence, machine learning, and graph analytics. This paper focuses on optimizing MHR for time efficiency on large-scale graphs, diverging from the traditional emphasis on accuracy which is an orthogonal goal. We introduce a novel parallel algorithm that harnesses domain-specific learned embeddings to efficiently identify the top K paths between vertices in a knowledge graph to find the best answers to a three-hop query. Our contributions are: (1) We present a new parallel algorithm to enhance MHR performance, scalability and efficiency. (2) We demonstrate the algorithm's superior performance on leading-edge Intel and AMD architectures through empirical results. We showcase the algorithm's practicality through a case study on identifying academic affiliations of potential Turing Award laureates in Deep Learning, highlighting its capability to handle intricate entity relationships. This demonstrates the potential of our approach to enabling high-performance MHR, useful to navigate the growing complexity of modern knowledge graphs.

Authors: Jesmin Jahan Tithi, Fabio Checconi, Fabrizio Petrini

Multi-hop reasoning (MHR) is a process in artificial intelligence and natural language processing where a system needs to make multiple inferential steps to arrive at a conclusion or answer. In the context of knowledge graphs or databases, it involves traversing multiple linked entities and relationships to understand complex queries or perform tasks requiring a deeper understanding. Multi-hop reasoning is a critical function in various applications, including question answering, knowledge base completion, and link prediction. It has garnered significant interest in artificial intelligence, machine learning, and graph analytics. This paper focuses on optimizing MHR for time efficiency on large-scale graphs, diverging from the traditional emphasis on accuracy which is an orthogonal goal. We introduce a novel parallel algorithm that harnesses domain-specific learned embeddings to efficiently identify the top K paths between vertices in a knowledge graph to find the best answers to a three-hop query. Our contributions are: (1) We present a new parallel algorithm to enhance MHR performance, scalability and efficiency. (2) We demonstrate the algorithm's superior performance on leading-edge Intel and AMD architectures through empirical results. We showcase the algorithm's practicality through a case study on identifying academic affiliations of potential Turing Award laureates in Deep Learning, highlighting its capability to handle intricate entity relationships. This demonstrates the potential of our approach to enabling high-performance MHR, useful to navigate the growing complexity of modern knowledge graphs.

A Unified Framework for Integer Programming Formulation of Graph Matching Problems

from arXiv: Data Structures and Algorithms

Authors: Bahram Alidaee, Haibo Wang, Hugh Sloan

Graph theory has been a powerful tool in solving difficult and complex problems arising in all disciplines. In particular, graph matching is a classical problem in pattern analysis with enormous applications. Many graph problems have been formulated as a mathematical program and then solved using exact, heuristic, and/or approximated-guaranteed procedures. On the other hand, graph theory has been a powerful tool in visualizing and understanding complex mathematical programming problems, especially integer programs. Formulating a graph problem as a natural integer program (IP) is often a challenging task. However, an IP formulation of the problem has many advantages. Several researchers have noted the need for natural IP formulation of graph theoretic problems. The present study aims to provide a unified framework for IP formulation of graph-matching problems. Although there are many surveys on graph matching problems, none is concerned with IP formulation. This paper is the first to provide a comprehensive IP formulation for such problems. The framework includes a variety of graph optimization problems in the literature. While these problems have been studied by different research communities, however, the framework presented here helps to bring efforts from different disciplines to tackle such diverse and complex problems. We hope the present study can significantly help to simplify some of the difficult problems arising in practice, especially in pattern analysis.

Authors: Bahram Alidaee, Haibo Wang, Hugh Sloan

Graph theory has been a powerful tool in solving difficult and complex problems arising in all disciplines. In particular, graph matching is a classical problem in pattern analysis with enormous applications. Many graph problems have been formulated as a mathematical program and then solved using exact, heuristic, and/or approximated-guaranteed procedures. On the other hand, graph theory has been a powerful tool in visualizing and understanding complex mathematical programming problems, especially integer programs. Formulating a graph problem as a natural integer program (IP) is often a challenging task. However, an IP formulation of the problem has many advantages. Several researchers have noted the need for natural IP formulation of graph theoretic problems. The present study aims to provide a unified framework for IP formulation of graph-matching problems. Although there are many surveys on graph matching problems, none is concerned with IP formulation. This paper is the first to provide a comprehensive IP formulation for such problems. The framework includes a variety of graph optimization problems in the literature. While these problems have been studied by different research communities, however, the framework presented here helps to bring efforts from different disciplines to tackle such diverse and complex problems. We hope the present study can significantly help to simplify some of the difficult problems arising in practice, especially in pattern analysis.

Bipartite Matching in Massive Graphs: A Tight Analysis of EDCS

from arXiv: Data Structures and Algorithms

Authors: Amir Azarmehr, Soheil Behnezhad, Mohammad Roghani

Maximum matching is one of the most fundamental combinatorial optimization problems with applications in various contexts such as balanced clustering, data mining, resource allocation, and online advertisement. In many of these applications, the input graph is massive. The sheer size of these inputs makes it impossible to store the whole graph in the memory of a single machine and process it there. Graph sparsification has been an extremely powerful tool to alleviate this problem. In this paper, we study a highly successful and versatile sparsifier for the matching problem: the *edge-degree constrained subgraph (EDCS)* introduced first by Bernstein and Stein [ICALP'15]. The EDCS has a parameter $\beta \geq 2$ which controls the density of the sparsifier. It has been shown through various proofs in the literature that by picking a subgraph with $O(n\beta)$ edges, the EDCS includes a matching of size at least $2/3-O(1/\beta)$ times the maximum matching size. As such, by increasing $\beta$ the approximation ratio of EDCS gets closer and closer to $2/3$. In this paper, we propose a new approach for analyzing the approximation ratio of EDCS. Our analysis is *tight* for any value of $\beta$. Namely, we pinpoint the precise approximation ratio of EDCS for any sparsity parameter $\beta$. Our analysis reveals that one does not necessarily need to increase $\beta$ to improve approximation, as suggested by previous analysis. In particular, the best choice turns out to be $\beta = 6$, which achieves an approximation ratio of $.677$! This is arguably surprising as it is even better than $2/3 \sim .666$, the bound that was widely believed to be the limit for EDCS.

Authors: Amir Azarmehr, Soheil Behnezhad, Mohammad Roghani

Maximum matching is one of the most fundamental combinatorial optimization problems with applications in various contexts such as balanced clustering, data mining, resource allocation, and online advertisement. In many of these applications, the input graph is massive. The sheer size of these inputs makes it impossible to store the whole graph in the memory of a single machine and process it there. Graph sparsification has been an extremely powerful tool to alleviate this problem. In this paper, we study a highly successful and versatile sparsifier for the matching problem: the *edge-degree constrained subgraph (EDCS)* introduced first by Bernstein and Stein [ICALP'15]. The EDCS has a parameter $\beta \geq 2$ which controls the density of the sparsifier. It has been shown through various proofs in the literature that by picking a subgraph with $O(n\beta)$ edges, the EDCS includes a matching of size at least $2/3-O(1/\beta)$ times the maximum matching size. As such, by increasing $\beta$ the approximation ratio of EDCS gets closer and closer to $2/3$. In this paper, we propose a new approach for analyzing the approximation ratio of EDCS. Our analysis is *tight* for any value of $\beta$. Namely, we pinpoint the precise approximation ratio of EDCS for any sparsity parameter $\beta$. Our analysis reveals that one does not necessarily need to increase $\beta$ to improve approximation, as suggested by previous analysis. In particular, the best choice turns out to be $\beta = 6$, which achieves an approximation ratio of $.677$! This is arguably surprising as it is even better than $2/3 \sim .666$, the bound that was widely believed to be the limit for EDCS.

Wednesday, June 12

Bill Fefferman, Soumik Ghosh, Michael Gullans, Kohdai Kuroiwa, and Kunal Sharma’s paper on the Effect of Non-unital Noise on Random Circuit Sampling

from Gil Kalai

I would like to discuss the following remarkable paper posted on the arXiv in June 2023. Effect of Non-unital Noise on Random Circuit Sampling, by Bill Fefferman, Soumik Ghosh, Michael Gullans, Kohdai Kuroiwa, and Kunal Sharma’s  Abstract: In this work, … Continue reading →

fig1-Feffermanetal

I would like to discuss the following remarkable paper posted on the arXiv in June 2023.

Effect of Non-unital Noise on Random Circuit Sampling,

by Bill Fefferman, Soumik Ghosh, Michael Gullans, Kohdai Kuroiwa, and Kunal Sharma’s 

Abstract: In this work, drawing inspiration from the type of noise present in real hardware, we study the output distribution of random quantum circuits under practical non-unital noise sources with constant noise rates. We show that even in the presence of unital sources like the depolarizing channel, the distribution, under the combined noise channel, never resembles a maximally entropic distribution at any depth. To show this, we prove that the output distribution of such circuits never anticoncentrates meaning it is never too “flat” regardless of the depth of the circuit. This is in stark contrast to the behavior of noiseless random quantum circuits or those with only unital noise, both of which anticoncentrate at sufficiently large depths. As consequences, our results have interesting algorithmic implications on both the hardness and easiness of noisy random circuit sampling, since anticoncentration is a critical property exploited by both state-of-the-art classical hardness and easiness results.

I am thankful to Soumik and Kunal, two of the authors, for bringing the paper to my attention and to Soumik for helpful clarifications and useful discussion. This work was featured on the UChicago CS departmental website.

In this post I will describe some findings of the new paper and possible relevance to my study (with Yosi Rinott and Tomer Shoham) of the Google 2019 quantum supremacy paper.  As far as we can see, the phenomenon described in the new paper is not witnessed in the experimental data of the Google 2019 experiment. 

Noisy random circuit sampling 

Random quantum circuits have an important role for the study of NISQ (noisy intermediate quantum) computers, which, in turn, represent much of current efforts for demonstrating quantum computation. There were several important theoretical papers for understanding the effect of noise on the ideal probability distribution and for connecting it with classical computational hardness and easiness of the sampling task preformed by the quantum computer.  We devoted a post to the paper by Xun Gao et al. where we also mention a more recent paper by Aharonov et al

Update: Here is a very interesting very recent paper on random circuit sampling Incompressibility and spectral gaps of random circuits, by Chi-Fang Chen, Jeongwan Haah, Jonas Haferkamp, Yunchao Liu, Tony Metger, and Xinyu Tan, and a related paper by Lucas Gretta, William He, and Angelos Pelecanos, More efficient approximate k-wise independent permutations from random reversible circuits via log-Sobolev inequalities. There are interesting (speculative) connections between random quantum circuits and black holes. 

Anticoncentration and some findings of the new paper

A central problem studied in the new paper is 

Do random quantum circuits, under the influence of physically motivated noise models, anticoncentrate?

Anticoncentration means that the distribution is not concentrated on a sufficiently small number of outcomes and it is an important property for RCS samples under basic noise models. In mathematical terms, if the probability for a bitstring x is denoted by p(x), anticoncentration refers to a situation where  \displaystyle \sum_x p(x)^2 = c /2^n, for some c \ge 1. When the probability distribution is uniform we have c=1, and for Porter-Thomas distributions we have c=2

In the words of the authors, what the new paper shows is that 

“In this work, we answer this question in the negative: we show how random quantum circuits, under noise models inspired by real hardware, which is a mixture of both unital and non-unital noise of certain types, exhibit a lack of anticoncentration, regardless of the depth of the circuit. This shows that the output distribution does not resemble the uniform distribution, or close variants of the same. Interestingly, we find that this behaviour is exhibited even when there are also unital noise sources, like the depolarizing noise, present in the system, whose property is to push the system towards the maximally mixed state — which, when measured in the standard basis, gives the uniform distribution. In this sense, these non-unital sources ”dominate” the other sources.

(In mathematical terms this means that the effect of non-unital noise is that \displaystyle \sum_x p(x)^2, is a function that diverges with n.) The paper studies specifically the amplitude damping noise which (according to Soumik) is closely related to  a T1 decay of a gate. (T1 decay is a feature of all realistic hardware.) The effect of non-unital gate errors like amplitude dampling errors is to have larger fractions of zeroes in the samples produced by the quantum computer.  

Rethinking computational hardness and easiness 

According to the authors a key takeaway of the results is that we need to think carefully about new techniques to prove asymptotic easiness or hardness for random circuit sampling with constant noise rates. Existing techniques for both make strong assumptions on the nature of noise present — for instance, modeling the noise as completely depolarizing — which is unsatisfactory, because these techniques are very sensitive to such assumptions and break down for more realistic noise models. The paper suggests that the moment realistic noise models come into play, the true complexity of random circuit sampling is still an unresolved open question.

Relevance to the statistical analysis and the concerns regarding the Google 2019 experiment

In the past few years, Yosi Rinott, Tomer Shoham and I were engaged in a comprehensive statistical study of the Google’s 2019 quantum computer experiment. We wrote four papers:

  1. Y. Rinott, T. Shoham, and G. Kalai, Statistical Aspects of the Quantum Supremacy Demonstration, Statistical Science  (2022)
  2. G. Kalai, Y. Rinott and T. Shoham, Google’s 2019 “Quantum Supremacy” Claims: Data, Documentation, & Discussion  (see this post).
  3. G. Kalai, Y. Rinott and T. Shoham, Questions and Concerns About Google’s Quantum Supremacy Claim (see this post).
  4. G. Kalai, Y. Rinott and T. Shoham, Random circuit sampling: Fourier expansion and statistics. (this post)

The new paper by Fefferman et al. is related to this statistical study and in particular to bias toward ‘0’ for the experimental bitstrings.  It appears that the phenomena described in the paper of bias toward ‘0’ as a consequence of non-unital gate errors is not witnessed in the Google experimental data.

This reminds me of the  the paper by Xun Gao et al, which influenced our fourth paper. The story is similar: theoretical analysis suggests that gate errors contribute additional effects of similar mathematical nature to the effect of readout errors. Our paper used Fourier methods and statistical tools to study these effects and showed that these effects of gate errors are confirmed by simulations but are not witnessed in the Google 2019 experimental data. 

Asymmetric readout errors

One basic form of errors are readout errors: the probability that a measured qubit will read as ‘0’ instead of ‘1’  and vice versa. For the Google 2019 experiment the paper reports that the the probability q_1  that 1 is read as 0 is 0.055 and the probability q_2 that 0 is read as 1 is 0.023.

There are several ways to study the values of q_1 and q_2

  1. We can get an estimate on q_1-q_2 based on the statistics of 0’s and 1’s in the empirical samples. Namely the expected number of 0s is 1/2+(q_1-q_2)/2. (This gives estimates for individual qubits as well.) 
  2. The Google method was simply to initialize (computational states representing) many known random bitstrings and then simply measure many times. This method provides the values for readout errors for individual qubits, in fact, it gave readout errors for a specific set of  qubits
  3. We developed a statistical method (based on knowing the amplitudes of the ideal circuit) to estimate q_1 and q_2, and implemented it for one circuit with 12 qubits. 

Note that the first and third methods accounts not only for bias toward 0 for asymmetric readout errors but also for additional bias toward 0 coming from non unital gate errors that is considered in the new Fefferman et al.’s paper. It appears that Google’s method (item 2) does not involve the effect of non unital gate errors.  The raw data for the effect of readout errors as studied in the second item was provided by the Google team in January 2021. (The file type is not supported by WordPress, but if you are interested drop me a comment/email.)  

Are non-unital gate errors manifested for the Google 2019 data?

Comparing the different methods for estimating the parameters q_1 and q_2 we observe: 

  • There is a good fit between the readout errors q_1,q_2 obtained by Google’s method of item 2 and the fractions of 0s and 1s in the Google experimental samples (item 1). 
  • For n=12 (and circuit number 0) our MLE estimates for q_1 and q_2 are also consistent (and somewhat lower) than the Google’s estimates. Our estimation gave q_1=0.0465, q_2= 0.0196.

Both these findings show no sign in the Google experimental samples for additional asymmetry between 0s and 1s caused by non-unital gate errors. This finding is quite interesting; it may simply reflect the fact that the non-unital error rates are very low, it may be  a genuine finding about NISQ devices, or specific finding about the Sycamore quantum computer, it may be a legitimate effect of the Google’s calibration method, and it may also be an artifact of some methodological problems with the 2019 experiment of the kind suggested by our previous papers (mainly the third paper).   

Of course, this may also show some misunderstanding or mistake on my part; and, I asked, a few weeks ago, the Google team’s members (and other participants of our email discussion about the 2019 experiment) on this matter.  As always, comments are welcome. 

Let me also mention that the dependency of the ratios of 0s and 1s on the location of the bit that is witnessed in the empirical samples does not appear to be manifested in Google readout raw data. 

Fourier methods for  the asymmetric case

Fourier-theoretical aspects of asymmetric readout errors are quite interesting and are related to  p-biased version of the Fourier-Walsh basis that appeared in the works of Talagrand, Friedgut, and others. We also still do not know how to apply variants of the Fast Fourier transform for quick computations for estimators of noise with asymmetry.

Soumik Ghosh’s remark: The low and high noise regimes

Soumik Ghosh made the following insightful remark:

“Here’s my perspective on why Google potentially did not see the effect of gate errors in their data, which you write about in your blog post.

There are two noise regimes: low and high. High is when the noise rate is a constant. Low is when the noise rate scales as ~1/n (ie, goes down with system size). These two noise regimes have very different properties (see here for instance: https://arxiv.org/abs/2305.04954).

Implicitly, Google’s claims are in the low noise regime. For their 53 and 67 qubit devices, their noise rates are indeed of the order ~ 1/50 or 1/67. It doesn’t mean they will always be in the low noise regime in the future too, without major technological breakthroughs in noise reduction, but so far, there’s a claim to be made that they probably are.  

What our results imply is that if they are ever not in the low noise regime — say if the noise reduction capabilities bottom out and the noise stays a constant with a increase in system size — then they will see effects of gate errors very prominently. This will be true even if they mitigate readout bias. 

What if they are always in the low noise regime? Then it’s an open question as to what happens. Theoretically, it is known is that if the noise is purely unital (which is unrealistic), then the entire noise manifests as just a global depolarizing noise in the readout (the so called white noise model: https://arxiv.org/abs/2111.14907). It is an open question as to what happens with realistic non-unital noise models in this regime.”

From my point of view,  the distinction between low and high error regimes and Soumik’s remark are very much to the point. The distinction between the low and high error regimes is indeed important from theoretical and practical purposes. In my works on noise sensitivity, I indeed argue that in the high error regime the noisy samples will have a noise stable component (representing low degree Fourier-Walsh coefficients) which represent a very primitive computation and a noise sensitive component (representing high degree Fourier-Walsh coefficients) which does not enable computation at all. In the low rate error regime interesting computation and more importantly the creation of good quality error correcting codes are possible.

My theory predicts that in the intermediate scale the low error regime that allows quantum supremacy and good quality quantum error correcting codes cannot be reached. (See e.g.,  this post and this paper.) 

Let me move now from theory to the matter of scrutinizing the Google 2019 experiment and other NISQ experimental assertions. To check the effect of non-unital gate noise on the ratio of zeroes and ones for the Google experiments we probably need to run simulations with the noise parameters of the experiment. I hope that the Google team, other teams involved in the 2019 experiment, like those from NASA, and teams involved in other recent NISQ experiments will double-check this matter by themselves.

By Gil Kalai

Shoal++: High Throughput DAG-BFT Can Be Fast!

from Decentralized Thoughts

TL;DR: Shoal++ is a novel DAG-BFT system that supercharges Shoal to achieve near-optimal theoretical latency while preserving the high throughput and robustness of state-of-the-art certified DAG BFT protocols. We evaluated Shoal++ against state-of-the-art DAG BFT protocols, such as Bullshark and Shoal — as well as a concurrent DAG effort, Mysticeti...

TL;DR: Shoal++ is a novel DAG-BFT system that supercharges Shoal to achieve near-optimal theoretical latency while preserving the high throughput and robustness of state-of-the-art certified DAG BFT protocols. We evaluated Shoal++ against state-of-the-art DAG BFT protocols, such as Bullshark and Shoal — as well as a concurrent DAG effort, Mysticeti...

The Technical Depths of Crud

from Ben Recht

Meehl's Philosophical Psychology, Lecture 7, part 1.

This post digs into Lecture 7 of Paul Meehl’s course “Philosophical Psychology.” You can watch the video here. Here’s the full table of contents of my blogging through the class.

Meehl begins Lecture 7 by clarifying his rant about statistics from Lecture 6: “I love statisticians, and I like statistics.”  It’s certainly true that Meehl should not be confused as someone who is against statistical methodology. Lectures 6 through 10 are almost entirely about probability and statistics, after all. And after his five-minute quasi-apology to the “subgroup of statisticians who have a certain arrogance toward the social or medical sciences,” he spends the next 45 minutes of Lecture 7 diving into numerical examples of how the crud factor might manifest itself even when theories are false.

In the spirit of these technical calculations, let me take this post to work through a few mathy-ish loose ends on crud. There will be more equations than have been the norm in these blog posts, but that’s because we’re pushing into arguments with statisticians. I’m setting the stage here for the subsequent posts where I want to try to rethink statistical practices with crud in mind.

Thresholded Variables

Meehl works an extended example where the treatment variable is a thresholded normal. A potential example he gives would be groups that score high on a test versus groups that score low on a test. Perhaps you’d look at the mean of some attribute in people above the mean on an introversion scale and compare that to the mean of people low on the scale. If the introversion scale is a normal distribution, then the treatment variable is a thresholded normal distribution.

The correlation coefficients between thresholded normal random variables are close to those of the unthresholded variables. There are lots of fun integrals you can compute. Let θ denote the Heaviside function: θ(t) equals 1 if t is greater than 0 and equals 0 otherwise. If X and Y are normally distributed, then:

If you threshold one variable, the resulting correlation equals 0.8 of the initial correlation. Meehl alludes to this formula in his whiteboard calculations in Lecture 7. We can go a step further and threshold both X and Y:

If X and Y are correlated, their thresholded counterparts will be similarly correlated. Thresholding normal distributions does not eliminate the worry about crud.

Epidemiological Crud

I don’t exactly know how to best estimate the modern crud factor, but I think it’s worth giving some scale. In Monday’s post, I called out this JAMA Internal Medicine article that claimed people who ate organic diets had lower cancer rates. We all know these nutrition papers are absurd and easy to pick on. And yet they still consistently get credulously written up in the New York Times. This paper doesn’t seem to be any more egregious than any other in the field. The whole field is very bad! But it does help give a sense of scale.

In this paper, the authors come up with some score of how much organic food people eat. They find the top quartile of scorers have low cancer rates. Obviously, this is clearly a dressed up correlation with wealth and socioeconomic status. Bear with me anyway.

In their main finding, they have 50,914 with low organic score and 16,962 with high organic score. Of these survey respondents, 1,071 of the low-organic group reported cancer while only 269 of the high organic group reported cancer. That’s a 25% relative risk reduction. While it’s not proper to treat this as an RCT, the z-score here is more than 4 and the p-value is less than 0.0001. So I could imagine (as the paper does) some sort of “causal correction” mumbo jumbo that “corrects for confounders” or whatever and still gets you a p-value less than 0.05. Eat organic, everyone!

OK, so what’s the correlation coefficient? We have a formula for it. Take the z-score and divide it by 261. It’s about 0.02.

I don’t yet know what to make of this. The fact that cancer is already rare means the correlation coefficient can only be so high. For binary random variables when the treatment and control groups are of the same size, the largest the correlation coefficient can be is the square root of the odds of the prevalence:

This would be the correlation between X and Y even when you have 100% risk reduction. It would be worth thinking more about what Meehl’s crud has to do with epidemiology where we have huge n and low prevalence, and hence all variables with small correlation. What is the crud factor in epidemiology? Somebody should study that!

Varied Variance Estimators

Dean Eckles noted on Twitter that for non-binary outcomes, the common estimator for the variance in the z-test is a combination of the variance in the group when X=0 and the group when X=1:

I could quibble that this variance estimator isn’t better than the one used in the proportions z-test, but it’s a quibble. As I’ve said before and will say again, these formulas are just rituals and you can’t really justify anything with “rigor.” And it’s fine because we can still calculate stuff. If I use this variance estimator, the formula for z becomes

Carlos Cinelli tells me that Cohen uses this formula in his writings about power and effect sizes. While it is no longer a simple product, nothing in the crud story changes here. A significance test is still computing a simple function of the Pearson r, multiplying that number by the square root of n, and declaring significance when that product is larger than 2. That is the same as declaring significance when

That 4 in the denominator isn’t doing much work. Also, when r is less than ½, this z-score is less than 1.15 times larger than when you use the other variance estimator. We can’t escape the fact that significance tests are measurements of correlation. Maybe we should embrace that fact and see what happens.

Subscribe now

By Ben Recht

Assistant/Associate/Full Professor and PostDoc at Huazhong University of Science and Technology (apply by June 12, 2025)

from CCI: jobs

The Hopcroft Center on Computer Science (HCCS) at Huazhong University of Science and Technology invites applications for Assistant/Associate/Full Professor and Postdoctoral positions in Theoretical Computer Science and Artificial Intelligence. These positions are based in Wuhan, China, and offer competitive salaries. Please submit your CV and representative work to the following email address. Website: hccs.hust.edu.cn/rczp/Recruitment.htm Email: […]

The Hopcroft Center on Computer Science (HCCS) at Huazhong University of Science and Technology invites applications for Assistant/Associate/Full Professor and Postdoctoral positions in Theoretical Computer Science and Artificial Intelligence. These positions are based in Wuhan, China, and offer competitive salaries.

Please submit your CV and representative work to the following email address.

Website: https://hccs.hust.edu.cn/rczp/Recruitment.htm
Email: hccs_hust@163.com

By shacharlovett

TR24-104 | NP-hardness of testing equivalence to sparse polynomials and to constant-support polynomials | Omkar Baraskar, Agrim Dewan, Chandan Saha, Pulkit Sinha

from ECCC Papers

An $s$-sparse polynomial has at most $s$ monomials with nonzero coefficients. The Equivalence Testing problem for sparse polynomials (ETsparse) asks to decide if a given polynomial $f$ is equivalent to (i.e., in the orbit of) some $s$-sparse polynomial. In other words, given $f \in \mathbb{F}[\mathbf{x}]$ and $s \in \mathbb{N}$, ETsparse asks to check if there exist $A \in \mathrm{GL}(|\mathbf{x}|, \mathbb{F})$ and $\mathbf{b} \in \mathbb{F}^{|\mathbf{x}|}$ such that $f(A\mathbf{x} + \mathbf{b})$ is $s$-sparse. We show that ETsparse is NP-hard over any field $\mathbb{F}$, if $f$ is given in the sparse representation, i.e., as a list of nonzero coefficients and exponent vectors. This answers a question posed by Gupta, Saha and Thankey (SODA 2023) and also, more explicitly, by Baraskar, Dewan and Saha (STACS 2024). The result implies that the Minimum Circuit Size Problem (MCSP) is NP-hard for a dense subclass of depth-$3$ arithmetic circuits if the input is given in sparse representation. We also show that approximating the smallest $s_0$ such that a given $s$-sparse polynomial $f$ is in the orbit of some $s_0$-sparse polynomial to within a factor of $s^{\frac{1}{3} - \epsilon}$ is NP-hard for any $\epsilon >0$; observe that $s$-factor approximation is trivial as the input is $s$-sparse. Finally, we show that for any constant $\sigma \geq 6$, checking if a polynomial (given in sparse representation) is in the orbit of some support-$\sigma$ polynomial is NP-hard. Support of a polynomial $f$ is the maximum number of variables present in any monomial of $f$. These results are obtained via direct reductions from the $\mathrm{3\text{-}SAT}$ problem.

An $s$-sparse polynomial has at most $s$ monomials with nonzero coefficients. The Equivalence Testing problem for sparse polynomials (ETsparse) asks to decide if a given polynomial $f$ is equivalent to (i.e., in the orbit of) some $s$-sparse polynomial. In other words, given $f \in \mathbb{F}[\mathbf{x}]$ and $s \in \mathbb{N}$, ETsparse asks to check if there exist $A \in \mathrm{GL}(|\mathbf{x}|, \mathbb{F})$ and $\mathbf{b} \in \mathbb{F}^{|\mathbf{x}|}$ such that $f(A\mathbf{x} + \mathbf{b})$ is $s$-sparse. We show that ETsparse is NP-hard over any field $\mathbb{F}$, if $f$ is given in the sparse representation, i.e., as a list of nonzero coefficients and exponent vectors. This answers a question posed by Gupta, Saha and Thankey (SODA 2023) and also, more explicitly, by Baraskar, Dewan and Saha (STACS 2024). The result implies that the Minimum Circuit Size Problem (MCSP) is NP-hard for a dense subclass of depth-$3$ arithmetic circuits if the input is given in sparse representation. We also show that approximating the smallest $s_0$ such that a given $s$-sparse polynomial $f$ is in the orbit of some $s_0$-sparse polynomial to within a factor of $s^{\frac{1}{3} - \epsilon}$ is NP-hard for any $\epsilon >0$; observe that $s$-factor approximation is trivial as the input is $s$-sparse. Finally, we show that for any constant $\sigma \geq 6$, checking if a polynomial (given in sparse representation) is in the orbit of some support-$\sigma$ polynomial is NP-hard. Support of a polynomial $f$ is the maximum number of variables present in any monomial of $f$. These results are obtained via direct reductions from the $\mathrm{3\text{-}SAT}$ problem.

Incompressibility and spectral gaps of random circuits

from arXiv: Computational Complexity

Authors: Chi-Fang Chen, Jeongwan Haah, Jonas Haferkamp, Yunchao Liu, Tony Metger, Xinyu Tan

Random reversible and quantum circuits form random walks on the alternating group $\mathrm{Alt}(2^n)$ and unitary group $\mathrm{SU}(2^n)$, respectively. Known bounds on the spectral gap for the $t$-th moment of these random walks have inverse-polynomial dependence in both $n$ and $t$. We prove that the gap for random reversible circuits is $\Omega(n^{-3})$ for all $t\geq 1$, and the gap for random quantum circuits is $\Omega(n^{-3})$ for $t \leq \Theta(2^{n/2})$. These gaps are independent of $t$ in the respective regimes. We can further improve both gaps to $n^{-1}/\mathrm{polylog}(n, t)$ for $t\leq 2^{\Theta(n)}$, which is tight up to polylog factors. Our spectral gap results have a number of consequences: 1) Random reversible circuits with $\mathcal{O}(n^4 t)$ gates form multiplicative-error $t$-wise independent (even) permutations for all $t\geq 1$; for $t \leq \Theta(2^{n/6.1})$, we show that $\tilde{\mathcal{O}}(n^2 t)$ gates suffice. 2) Random quantum circuits with $\mathcal{O}(n^4 t)$ gates form multiplicative-error unitary $t$-designs for $t \leq \Theta(2^{n/2})$; for $t\leq \Theta(2^{2n/5})$, we show that $\tilde{\mathcal{O}}(n^2t)$ gates suffice. 3) The robust quantum circuit complexity of random circuits grows linearly for an exponentially long time, proving the robust Brown--Susskind conjecture [BS18,BCHJ+21]. Our spectral gap bounds are proven by reducing random quantum circuits to a more structured walk: a modification of the ``$\mathrm{PFC}$ ensemble'' from [MPSY24] together with an expander on the alternating group due to Kassabov [Kas07a], for which we give an efficient implementation using reversible circuits. In our reduction, we approximate the structured walk with local random circuits without losing the gap, which uses tools from the study of frustration-free Hamiltonians.

Authors: Chi-Fang Chen, Jeongwan Haah, Jonas Haferkamp, Yunchao Liu, Tony Metger, Xinyu Tan

Random reversible and quantum circuits form random walks on the alternating group $\mathrm{Alt}(2^n)$ and unitary group $\mathrm{SU}(2^n)$, respectively. Known bounds on the spectral gap for the $t$-th moment of these random walks have inverse-polynomial dependence in both $n$ and $t$. We prove that the gap for random reversible circuits is $\Omega(n^{-3})$ for all $t\geq 1$, and the gap for random quantum circuits is $\Omega(n^{-3})$ for $t \leq \Theta(2^{n/2})$. These gaps are independent of $t$ in the respective regimes. We can further improve both gaps to $n^{-1}/\mathrm{polylog}(n, t)$ for $t\leq 2^{\Theta(n)}$, which is tight up to polylog factors. Our spectral gap results have a number of consequences: 1) Random reversible circuits with $\mathcal{O}(n^4 t)$ gates form multiplicative-error $t$-wise independent (even) permutations for all $t\geq 1$; for $t \leq \Theta(2^{n/6.1})$, we show that $\tilde{\mathcal{O}}(n^2 t)$ gates suffice. 2) Random quantum circuits with $\mathcal{O}(n^4 t)$ gates form multiplicative-error unitary $t$-designs for $t \leq \Theta(2^{n/2})$; for $t\leq \Theta(2^{2n/5})$, we show that $\tilde{\mathcal{O}}(n^2t)$ gates suffice. 3) The robust quantum circuit complexity of random circuits grows linearly for an exponentially long time, proving the robust Brown--Susskind conjecture [BS18,BCHJ+21]. Our spectral gap bounds are proven by reducing random quantum circuits to a more structured walk: a modification of the ``$\mathrm{PFC}$ ensemble'' from [MPSY24] together with an expander on the alternating group due to Kassabov [Kas07a], for which we give an efficient implementation using reversible circuits. In our reduction, we approximate the structured walk with local random circuits without losing the gap, which uses tools from the study of frustration-free Hamiltonians.

Disrupting Bipartite Trading Networks: Matching for Revenue Maximization

from arXiv: Computational Complexity

Authors: Luca D'Amico-Wong, Yannai A. Gonczarowski, Gary Qiurui Ma, David C. Parkes

We model the role of an online platform disrupting a market with unit-demand buyers and unit-supply sellers. Each seller can transact with a subset of the buyers whom she already knows, as well as with any additional buyers to whom she is introduced by the platform. Given these constraints on trade, prices and transactions are induced by a competitive equilibrium. The platform's revenue is proportional to the total price of all trades between platform-introduced buyers and sellers. In general, we show that the platform's revenue-maximization problem is computationally intractable. We provide structural results for revenue-optimal matchings and isolate special cases in which the platform can efficiently compute them. Furthermore, in a market where the maximum increase in social welfare that the platform can create is $\Delta W$, we prove that the platform can attain revenue $\Omega(\Delta W/\log(\min\{n,m\}))$, where $n$ and $m$ are the numbers of buyers and sellers, respectively. When $\Delta W$ is large compared to welfare without the platform, this gives a polynomial-time algorithm that guarantees a logarithmic approximation of the optimal welfare as revenue. We also show that even when the platform optimizes for revenue, the social welfare is at least an $O(\log(\min\{n,m\}))$-approximation to the optimal welfare. Finally, we prove significantly stronger bounds for revenue and social welfare in homogeneous-goods markets.

Authors: Luca D'Amico-Wong, Yannai A. Gonczarowski, Gary Qiurui Ma, David C. Parkes

We model the role of an online platform disrupting a market with unit-demand buyers and unit-supply sellers. Each seller can transact with a subset of the buyers whom she already knows, as well as with any additional buyers to whom she is introduced by the platform. Given these constraints on trade, prices and transactions are induced by a competitive equilibrium. The platform's revenue is proportional to the total price of all trades between platform-introduced buyers and sellers. In general, we show that the platform's revenue-maximization problem is computationally intractable. We provide structural results for revenue-optimal matchings and isolate special cases in which the platform can efficiently compute them. Furthermore, in a market where the maximum increase in social welfare that the platform can create is $\Delta W$, we prove that the platform can attain revenue $\Omega(\Delta W/\log(\min\{n,m\}))$, where $n$ and $m$ are the numbers of buyers and sellers, respectively. When $\Delta W$ is large compared to welfare without the platform, this gives a polynomial-time algorithm that guarantees a logarithmic approximation of the optimal welfare as revenue. We also show that even when the platform optimizes for revenue, the social welfare is at least an $O(\log(\min\{n,m\}))$-approximation to the optimal welfare. Finally, we prove significantly stronger bounds for revenue and social welfare in homogeneous-goods markets.

PSMC: Provable and Scalable Algorithms for Motif Conductance Based Graph Clustering

from arXiv: Computational Complexity

Authors: Longlong Lin, Tao Jia, Zeli Wang, Jin Zhao, Rong-Hua Li

Higher-order graph clustering aims to partition the graph using frequently occurring subgraphs. Motif conductance is one of the most promising higher-order graph clustering models due to its strong interpretability. However, existing motif conductance based graph clustering algorithms are mainly limited by a seminal two-stage reweighting computing framework, needing to enumerate all motif instances to obtain an edge-weighted graph for partitioning. However, such a framework has two-fold vital defects: (1) It can only provide a quadratic bound for the motif with three vertices, and whether there is provable clustering quality for other motifs is still an open question. (2) The enumeration procedure of motif instances incurs prohibitively high costs against large motifs or large dense graphs due to combinatorial explosions. Besides, expensive spectral clustering or local graph diffusion on the edge-weighted graph also makes existing methods unable to handle massive graphs with millions of nodes. To overcome these dilemmas, we propose a Provable and Scalable Motif Conductance algorithm PSMC, which has a fixed and motif-independent approximation ratio for any motif. Specifically, PSMC first defines a new vertex metric Motif Resident based on the given motif, which can be computed locally. Then, it iteratively deletes the vertex with the smallest motif resident value very efficiently using novel dynamic update technologies. Finally, it outputs the locally optimal result during the above iterative process. To further boost efficiency, we propose several effective bounds to estimate the motif resident value of each vertex, which can greatly reduce computational costs. Empirical results show that our proposed algorithms achieve 3.2-32 times speedup and improve the quality by at least 12 times than the baselines.

Authors: Longlong Lin, Tao Jia, Zeli Wang, Jin Zhao, Rong-Hua Li

Higher-order graph clustering aims to partition the graph using frequently occurring subgraphs. Motif conductance is one of the most promising higher-order graph clustering models due to its strong interpretability. However, existing motif conductance based graph clustering algorithms are mainly limited by a seminal two-stage reweighting computing framework, needing to enumerate all motif instances to obtain an edge-weighted graph for partitioning. However, such a framework has two-fold vital defects: (1) It can only provide a quadratic bound for the motif with three vertices, and whether there is provable clustering quality for other motifs is still an open question. (2) The enumeration procedure of motif instances incurs prohibitively high costs against large motifs or large dense graphs due to combinatorial explosions. Besides, expensive spectral clustering or local graph diffusion on the edge-weighted graph also makes existing methods unable to handle massive graphs with millions of nodes. To overcome these dilemmas, we propose a Provable and Scalable Motif Conductance algorithm PSMC, which has a fixed and motif-independent approximation ratio for any motif. Specifically, PSMC first defines a new vertex metric Motif Resident based on the given motif, which can be computed locally. Then, it iteratively deletes the vertex with the smallest motif resident value very efficiently using novel dynamic update technologies. Finally, it outputs the locally optimal result during the above iterative process. To further boost efficiency, we propose several effective bounds to estimate the motif resident value of each vertex, which can greatly reduce computational costs. Empirical results show that our proposed algorithms achieve 3.2-32 times speedup and improve the quality by at least 12 times than the baselines.

On the power of adaption and randomization

from arXiv: Computational Complexity

Authors: David Krieg, Erich Novak, Mario Ullrich

We present bounds between different widths of convex subsets of Banach spaces, including Gelfand and Bernstein widths. Using this, and some relations between widths and minimal errors, we obtain bounds on the maximal gain of adaptive and randomized algorithms over non-adaptive, deterministic ones for approximating linear operators on convex sets. Our results also apply to the approximation of embeddings into the space of bounded functions based on function evaluations, i.e., to sampling recovery in the uniform norm. We conclude with a list of open problems.

Authors: David Krieg, Erich Novak, Mario Ullrich

We present bounds between different widths of convex subsets of Banach spaces, including Gelfand and Bernstein widths. Using this, and some relations between widths and minimal errors, we obtain bounds on the maximal gain of adaptive and randomized algorithms over non-adaptive, deterministic ones for approximating linear operators on convex sets. Our results also apply to the approximation of embeddings into the space of bounded functions based on function evaluations, i.e., to sampling recovery in the uniform norm. We conclude with a list of open problems.

Data Complexity in Expressive Description Logics With Path Expressions

from arXiv: Computational Complexity

Authors: Bartosz Bednarczyk

We investigate the data complexity of the satisfiability problem for the very expressive description logic ZOIQ (a.k.a. ALCHb Self reg OIQ) over quasi-forests and establish its NP-completeness. This completes the data complexity landscape for decidable fragments of ZOIQ, and reproves known results on decidable fragments of OWL2 (SR family). Using the same technique, we establish coNEXPTIME-completeness (w.r.t. the combined complexity) of the entailment problem of rooted queries in ZIQ.

Authors: Bartosz Bednarczyk

We investigate the data complexity of the satisfiability problem for the very expressive description logic ZOIQ (a.k.a. ALCHb Self reg OIQ) over quasi-forests and establish its NP-completeness. This completes the data complexity landscape for decidable fragments of ZOIQ, and reproves known results on decidable fragments of OWL2 (SR family). Using the same technique, we establish coNEXPTIME-completeness (w.r.t. the combined complexity) of the entailment problem of rooted queries in ZIQ.

Differentiability and Optimization of Multiparameter Persistent Homology

from arXiv: Computational Geometry

Authors: Luis Scoccola, Siddharth Setlur, David Loiseaux, Mathieu Carrière, Steve Oudot

Real-valued functions on geometric data -- such as node attributes on a graph -- can be optimized using descriptors from persistent homology, allowing the user to incorporate topological terms in the loss function. When optimizing a single real-valued function (the one-parameter setting), there is a canonical choice of descriptor for persistent homology: the barcode. The operation mapping a real-valued function to its barcode is differentiable almost everywhere, and the convergence of gradient descent for losses using barcodes is relatively well understood. When optimizing a vector-valued function (the multiparameter setting), there is no unique choice of descriptor for multiparameter persistent homology, and many distinct descriptors have been proposed. This calls for the development of a general framework for differentiability and optimization that applies to a wide range of multiparameter homological descriptors. In this article, we develop such a framework and show that it encompasses well-known descriptors of different flavors, such as signed barcodes and the multiparameter persistence landscape. We complement the theory with numerical experiments supporting the idea that optimizing multiparameter homological descriptors can lead to improved performances compared to optimizing one-parameter descriptors, even when using the simplest and most efficiently computable multiparameter descriptors.

Authors: Luis Scoccola, Siddharth Setlur, David Loiseaux, Mathieu Carrière, Steve Oudot

Real-valued functions on geometric data -- such as node attributes on a graph -- can be optimized using descriptors from persistent homology, allowing the user to incorporate topological terms in the loss function. When optimizing a single real-valued function (the one-parameter setting), there is a canonical choice of descriptor for persistent homology: the barcode. The operation mapping a real-valued function to its barcode is differentiable almost everywhere, and the convergence of gradient descent for losses using barcodes is relatively well understood. When optimizing a vector-valued function (the multiparameter setting), there is no unique choice of descriptor for multiparameter persistent homology, and many distinct descriptors have been proposed. This calls for the development of a general framework for differentiability and optimization that applies to a wide range of multiparameter homological descriptors. In this article, we develop such a framework and show that it encompasses well-known descriptors of different flavors, such as signed barcodes and the multiparameter persistence landscape. We complement the theory with numerical experiments supporting the idea that optimizing multiparameter homological descriptors can lead to improved performances compared to optimizing one-parameter descriptors, even when using the simplest and most efficiently computable multiparameter descriptors.

Faster Spectral Density Estimation and Sparsification in the Nuclear Norm

from arXiv: Data Structures and Algorithms

Authors: Yujia Jin, Ishani Karmarkar, Christopher Musco, Aaron Sidford, Apoorv Vikram Singh

We consider the problem of estimating the spectral density of the normalized adjacency matrix of an $n$-node undirected graph. We provide a randomized algorithm that, with $O(n\epsilon^{-2})$ queries to a degree and neighbor oracle and in $O(n\epsilon^{-3})$ time, estimates the spectrum up to $\epsilon$ accuracy in the Wasserstein-1 metric. This improves on previous state-of-the-art methods, including an $O(n\epsilon^{-7})$ time algorithm from [Braverman et al., STOC 2022] and, for sufficiently small $\epsilon$, a $2^{O(\epsilon^{-1})}$ time method from [Cohen-Steiner et al., KDD 2018]. To achieve this result, we introduce a new notion of graph sparsification, which we call nuclear sparsification. We provide an $O(n\epsilon^{-2})$-query and $O(n\epsilon^{-2})$-time algorithm for computing $O(n\epsilon^{-2})$-sparse nuclear sparsifiers. We show that this bound is optimal in both its sparsity and query complexity, and we separate our results from the related notion of additive spectral sparsification. Of independent interest, we show that our sparsification method also yields the first deterministic algorithm for spectral density estimation that scales linearly with $n$ (sublinear in the representation size of the graph).

Authors: Yujia Jin, Ishani Karmarkar, Christopher Musco, Aaron Sidford, Apoorv Vikram Singh

We consider the problem of estimating the spectral density of the normalized adjacency matrix of an $n$-node undirected graph. We provide a randomized algorithm that, with $O(n\epsilon^{-2})$ queries to a degree and neighbor oracle and in $O(n\epsilon^{-3})$ time, estimates the spectrum up to $\epsilon$ accuracy in the Wasserstein-1 metric. This improves on previous state-of-the-art methods, including an $O(n\epsilon^{-7})$ time algorithm from [Braverman et al., STOC 2022] and, for sufficiently small $\epsilon$, a $2^{O(\epsilon^{-1})}$ time method from [Cohen-Steiner et al., KDD 2018]. To achieve this result, we introduce a new notion of graph sparsification, which we call nuclear sparsification. We provide an $O(n\epsilon^{-2})$-query and $O(n\epsilon^{-2})$-time algorithm for computing $O(n\epsilon^{-2})$-sparse nuclear sparsifiers. We show that this bound is optimal in both its sparsity and query complexity, and we separate our results from the related notion of additive spectral sparsification. Of independent interest, we show that our sparsification method also yields the first deterministic algorithm for spectral density estimation that scales linearly with $n$ (sublinear in the representation size of the graph).

Closing the Computational-Query Depth Gap in Parallel Stochastic Convex Optimization

from arXiv: Data Structures and Algorithms

Authors: Arun Jambulapati, Aaron Sidford, Kevin Tian

We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior state-of-the-art, [CJJLLST23], while improving upon the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous state-of-the-art methods our result closes a gap between the best-known query depth and the best-known computational depth of parallel algorithms. Our method starts with a ball acceleration framework of previous parallel methods, i.e., [CJJJLST20, ACJJS21], which reduce the problem to minimizing a regularized Gaussian convolution of the function constrained to Euclidean balls. By developing and leveraging new stability properties of the Hessian of this induced function, we depart from prior parallel algorithms and reduce these ball-constrained optimization problems to stochastic unconstrained quadratic minimization problems. Although we are unable to prove concentration of the asymmetric matrices that we use to approximate this Hessian, we nevertheless develop an efficient parallel method for solving these quadratics. Interestingly, our algorithms can be improved using fast matrix multiplication and use nearly-linear work if the matrix multiplication exponent is 2.

Authors: Arun Jambulapati, Aaron Sidford, Kevin Tian

We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior state-of-the-art, [CJJLLST23], while improving upon the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous state-of-the-art methods our result closes a gap between the best-known query depth and the best-known computational depth of parallel algorithms. Our method starts with a ball acceleration framework of previous parallel methods, i.e., [CJJJLST20, ACJJS21], which reduce the problem to minimizing a regularized Gaussian convolution of the function constrained to Euclidean balls. By developing and leveraging new stability properties of the Hessian of this induced function, we depart from prior parallel algorithms and reduce these ball-constrained optimization problems to stochastic unconstrained quadratic minimization problems. Although we are unable to prove concentration of the asymmetric matrices that we use to approximate this Hessian, we nevertheless develop an efficient parallel method for solving these quadratics. Interestingly, our algorithms can be improved using fast matrix multiplication and use nearly-linear work if the matrix multiplication exponent is 2.

Optimal Electrical Oblivious Routing on Expanders

from arXiv: Data Structures and Algorithms

Authors: Cella Florescu, Rasmus Kyng, Maximilian Probst Gutenberg, Sushant Sachdeva

In this paper, we investigate the question of whether the electrical flow routing is a good oblivious routing scheme on an $m$-edge graph $G = (V, E)$ that is a $\Phi$-expander, i.e. where $\lvert \partial S \rvert \geq \Phi \cdot \mathrm{vol}(S)$ for every $S \subseteq V, \mathrm{vol}(S) \leq \mathrm{vol}(V)/2$. Beyond its simplicity and structural importance, this question is well-motivated by the current state-of-the-art of fast algorithms for $\ell_{\infty}$ oblivious routings that reduce to the expander-case which is in turn solved by electrical flow routing. Our main result proves that the electrical routing is an $O(\Phi^{-1} \log m)$-competitive oblivious routing in the $\ell_1$- and $\ell_\infty$-norms. We further observe that the oblivious routing is $O(\log^2 m)$-competitive in the $\ell_2$-norm and, in fact, $O(\log m)$-competitive if $\ell_2$-localization is $O(\log m)$ which is widely believed. Using these three upper bounds, we can smoothly interpolate to obtain upper bounds for every $p \in [2, \infty]$ and $q$ given by $1/p + 1/q = 1$. Assuming $\ell_2$-localization in $O(\log m)$, we obtain that in $\ell_p$ and $\ell_q$, the electrical oblivious routing is $O(\Phi^{-(1-2/p)}\log m)$ competitive. Using the currently known result for $\ell_2$-localization, this ratio deteriorates by at most a sublogarithmic factor for every $p, q \neq 2$. We complement our upper bounds with lower bounds that show that the electrical routing for any such $p$ and $q$ is $\Omega(\Phi^{-(1-2/p)}\log m)$-competitive. This renders our results in $\ell_1$ and $\ell_{\infty}$ unconditionally tight up to constants, and the result in any $\ell_p$- and $\ell_q$-norm to be tight in case of $\ell_2$-localization in $O(\log m)$.

Authors: Cella Florescu, Rasmus Kyng, Maximilian Probst Gutenberg, Sushant Sachdeva

In this paper, we investigate the question of whether the electrical flow routing is a good oblivious routing scheme on an $m$-edge graph $G = (V, E)$ that is a $\Phi$-expander, i.e. where $\lvert \partial S \rvert \geq \Phi \cdot \mathrm{vol}(S)$ for every $S \subseteq V, \mathrm{vol}(S) \leq \mathrm{vol}(V)/2$. Beyond its simplicity and structural importance, this question is well-motivated by the current state-of-the-art of fast algorithms for $\ell_{\infty}$ oblivious routings that reduce to the expander-case which is in turn solved by electrical flow routing. Our main result proves that the electrical routing is an $O(\Phi^{-1} \log m)$-competitive oblivious routing in the $\ell_1$- and $\ell_\infty$-norms. We further observe that the oblivious routing is $O(\log^2 m)$-competitive in the $\ell_2$-norm and, in fact, $O(\log m)$-competitive if $\ell_2$-localization is $O(\log m)$ which is widely believed. Using these three upper bounds, we can smoothly interpolate to obtain upper bounds for every $p \in [2, \infty]$ and $q$ given by $1/p + 1/q = 1$. Assuming $\ell_2$-localization in $O(\log m)$, we obtain that in $\ell_p$ and $\ell_q$, the electrical oblivious routing is $O(\Phi^{-(1-2/p)}\log m)$ competitive. Using the currently known result for $\ell_2$-localization, this ratio deteriorates by at most a sublogarithmic factor for every $p, q \neq 2$. We complement our upper bounds with lower bounds that show that the electrical routing for any such $p$ and $q$ is $\Omega(\Phi^{-(1-2/p)}\log m)$-competitive. This renders our results in $\ell_1$ and $\ell_{\infty}$ unconditionally tight up to constants, and the result in any $\ell_p$- and $\ell_q$-norm to be tight in case of $\ell_2$-localization in $O(\log m)$.

Streaming Algorithms with Few State Changes

from arXiv: Data Structures and Algorithms

Authors: Rajesh Jayaram, David P. Woodruff, Samson Zhou

In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these metrics fail to capture the asymmetric costs, inherent in modern hardware and database systems, of reading versus writing to memory. In fact, most streaming algorithms write to their memory on every update, which is undesirable when writing is significantly more expensive than reading. This raises the question of whether streaming algorithms with small space and number of memory writes are possible. We first demonstrate that, for the fundamental $F_p$ moment estimation problem with $p\ge 1$, any streaming algorithm that achieves a constant factor approximation must make $\Omega(n^{1-1/p})$ internal state changes, regardless of how much space it uses. Perhaps surprisingly, we show that this lower bound can be matched by an algorithm that also has near-optimal space complexity. Specifically, we give a $(1+\varepsilon)$-approximation algorithm for $F_p$ moment estimation that uses a near-optimal $\widetilde{\mathcal{O}}_\varepsilon(n^{1-1/p})$ number of state changes, while simultaneously achieving near-optimal space, i.e., for $p\in[1,2]$, our algorithm uses $\text{poly}\left(\log n,\frac{1}{\varepsilon}\right)$ bits of space, while for $p>2$, the algorithm uses $\widetilde{\mathcal{O}}_\varepsilon(n^{1-2/p})$ space. We similarly design streaming algorithms that are simultaneously near-optimal in both space complexity and the number of state changes for the heavy-hitters problem, sparse support recovery, and entropy estimation. Our results demonstrate that an optimal number of state changes can be achieved without sacrificing space complexity.

Authors: Rajesh Jayaram, David P. Woodruff, Samson Zhou

In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these metrics fail to capture the asymmetric costs, inherent in modern hardware and database systems, of reading versus writing to memory. In fact, most streaming algorithms write to their memory on every update, which is undesirable when writing is significantly more expensive than reading. This raises the question of whether streaming algorithms with small space and number of memory writes are possible. We first demonstrate that, for the fundamental $F_p$ moment estimation problem with $p\ge 1$, any streaming algorithm that achieves a constant factor approximation must make $\Omega(n^{1-1/p})$ internal state changes, regardless of how much space it uses. Perhaps surprisingly, we show that this lower bound can be matched by an algorithm that also has near-optimal space complexity. Specifically, we give a $(1+\varepsilon)$-approximation algorithm for $F_p$ moment estimation that uses a near-optimal $\widetilde{\mathcal{O}}_\varepsilon(n^{1-1/p})$ number of state changes, while simultaneously achieving near-optimal space, i.e., for $p\in[1,2]$, our algorithm uses $\text{poly}\left(\log n,\frac{1}{\varepsilon}\right)$ bits of space, while for $p>2$, the algorithm uses $\widetilde{\mathcal{O}}_\varepsilon(n^{1-2/p})$ space. We similarly design streaming algorithms that are simultaneously near-optimal in both space complexity and the number of state changes for the heavy-hitters problem, sparse support recovery, and entropy estimation. Our results demonstrate that an optimal number of state changes can be achieved without sacrificing space complexity.

Fast White-Box Adversarial Streaming Without a Random Oracle

from arXiv: Data Structures and Algorithms

Authors: Ying Feng, Aayush Jain, David P. Woodruff

Recently, the question of adversarially robust streaming, where the stream is allowed to depend on the randomness of the streaming algorithm, has gained a lot of attention. In this work, we consider a strong white-box adversarial model (Ajtai et al. PODS 2022), in which the adversary has access to all past random coins and the parameters used by the streaming algorithm. We focus on the sparse recovery problem and extend our result to other tasks such as distinct element estimation and low-rank approximation of matrices and tensors. The main drawback of previous work is that it requires a random oracle, which is especially problematic in the streaming model since the amount of randomness is counted in the space complexity of a streaming algorithm. Also, the previous work suffers from large update time. We construct a near-optimal solution for the sparse recovery problem in white-box adversarial streams, based on the subexponentially secure Learning with Errors assumption. Importantly, our solution does not require a random oracle and has a polylogarithmic per item processing time. We also give results in a related white-box adversarially robust distributed model. Our constructions are based on homomorphic encryption schemes satisfying very mild structural properties that are currently satisfied by most known schemes.

Authors: Ying Feng, Aayush Jain, David P. Woodruff

Recently, the question of adversarially robust streaming, where the stream is allowed to depend on the randomness of the streaming algorithm, has gained a lot of attention. In this work, we consider a strong white-box adversarial model (Ajtai et al. PODS 2022), in which the adversary has access to all past random coins and the parameters used by the streaming algorithm. We focus on the sparse recovery problem and extend our result to other tasks such as distinct element estimation and low-rank approximation of matrices and tensors. The main drawback of previous work is that it requires a random oracle, which is especially problematic in the streaming model since the amount of randomness is counted in the space complexity of a streaming algorithm. Also, the previous work suffers from large update time. We construct a near-optimal solution for the sparse recovery problem in white-box adversarial streams, based on the subexponentially secure Learning with Errors assumption. Importantly, our solution does not require a random oracle and has a polylogarithmic per item processing time. We also give results in a related white-box adversarially robust distributed model. Our constructions are based on homomorphic encryption schemes satisfying very mild structural properties that are currently satisfied by most known schemes.

Lookback Prophet Inequalities

from arXiv: Data Structures and Algorithms

Authors: Ziyad Benomar, Dorian Baudry, Vianney Perchet

Prophet inequalities are fundamental optimal stopping problems, where a decision-maker observes sequentially items with values sampled independently from known distributions, and must decide at each new observation to either stop and gain the current value or reject it irrevocably and move to the next step. This model is often too pessimistic and does not adequately represent real-world online selection processes. Potentially, rejected items can be revisited and a fraction of their value can be recovered. To analyze this problem, we consider general decay functions $D_1,D_2,\ldots$, quantifying the value to be recovered from a rejected item, depending on how far it has been observed in the past. We analyze how lookback improves, or not, the competitive ratio in prophet inequalities in different order models. We show that, under mild monotonicity assumptions on the decay functions, the problem can be reduced to the case where all the decay functions are equal to the same function $x \mapsto \gamma x$, where $\gamma = \inf_{x>0} \inf_{j \geq 1} D_j(x)/x$. Consequently, we focus on this setting and refine the analyses of the competitive ratios, with upper and lower bounds expressed as increasing functions of $\gamma$.

Authors: Ziyad Benomar, Dorian Baudry, Vianney Perchet

Prophet inequalities are fundamental optimal stopping problems, where a decision-maker observes sequentially items with values sampled independently from known distributions, and must decide at each new observation to either stop and gain the current value or reject it irrevocably and move to the next step. This model is often too pessimistic and does not adequately represent real-world online selection processes. Potentially, rejected items can be revisited and a fraction of their value can be recovered. To analyze this problem, we consider general decay functions $D_1,D_2,\ldots$, quantifying the value to be recovered from a rejected item, depending on how far it has been observed in the past. We analyze how lookback improves, or not, the competitive ratio in prophet inequalities in different order models. We show that, under mild monotonicity assumptions on the decay functions, the problem can be reduced to the case where all the decay functions are equal to the same function $x \mapsto \gamma x$, where $\gamma = \inf_{x>0} \inf_{j \geq 1} D_j(x)/x$. Consequently, we focus on this setting and refine the analyses of the competitive ratios, with upper and lower bounds expressed as increasing functions of $\gamma$.

Fast Sampling Based Sketches for Tensors

from arXiv: Data Structures and Algorithms

Authors: William Swartworth, David P. Woodruff

We introduce a new approach for applying sampling-based sketches to two and three mode tensors. We illustrate our technique to construct sketches for the classical problems of $\ell_0$ sampling and producing $\ell_1$ embeddings. In both settings we achieve sketches that can be applied to a rank one tensor in $(\mathbb{R}^d)^{\otimes q}$ (for $q=2,3$) in time scaling with $d$ rather than $d^2$ or $d^3$. Our main idea is a particular sampling construction based on fast convolution which allows us to quickly compute sums over sufficiently random subsets of tensor entries.

Authors: William Swartworth, David P. Woodruff

We introduce a new approach for applying sampling-based sketches to two and three mode tensors. We illustrate our technique to construct sketches for the classical problems of $\ell_0$ sampling and producing $\ell_1$ embeddings. In both settings we achieve sketches that can be applied to a rank one tensor in $(\mathbb{R}^d)^{\otimes q}$ (for $q=2,3$) in time scaling with $d$ rather than $d^2$ or $d^3$. Our main idea is a particular sampling construction based on fast convolution which allows us to quickly compute sums over sufficiently random subsets of tensor entries.

Tuesday, June 11

Postdoctoral Researcher at George Mason University (apply by June 22, 2024)

from CCI: jobs

Postdoctoral position at George Mason University (Washington, DC) in Foundations of AI available starting Fall’24. Host: Grigory Yaroslavtsev. Competitive salary (~$70K) with a possibility of a joint appointment with other institutions. Please be ready to provide three recommendations upon request. This position will close on June 30. Website: grigory.ai Email: grigory@grigory.us

Postdoctoral position at George Mason University (Washington, DC) in Foundations of AI available starting Fall’24. Host: Grigory Yaroslavtsev.

Competitive salary (~$70K) with a possibility of a joint appointment with other institutions. Please be ready to provide three recommendations upon request. This position will close on June 30.

Website: http://grigory.ai
Email: grigory@grigory.us

By shacharlovett

Correlation Is All We Have

from Ben Recht

Meehl's Philosophical Psychology, Lecture 6, part 7/7.

This post digs into Lecture 6 of Paul Meehl’s course “Philosophical Psychology.” You can watch the video here. Here’s the full table of contents of my blogging through the class.

To understand why the crud factor is a major problem, it’s worth unpacking how hypothesis tests, though the foundation of causal inference, are always about correlation.

Let’s focus on the most standard hypothesis tests that attempt to estimate whether some outcome is larger on average in one group versus another. Meehl used the example of comparing the difference in color naming aptitude between boys and girls. I worked through an example last week comparing the effectiveness of a vaccine to a placebo. Almost all hypothesis tests work by computing the difference, D, between the means of the outcomes in each group. Since the groups are always samples of a population, the computed difference on the sample is only an estimate of the difference in the broader population. We thus also compute an estimate of the standard error, SE, which is the standard deviation of this mean difference. This gives us a sense of the precision of our difference estimate. 

To test statistical significance, we compute the probability under the null hypothesis that the population’s mean difference was literally zero, but, due to the standard error, we would have seen a sampled mean difference at least as large as the one we computed. Ugh, what a silly and confusing convention! But bear with me. In a few paragraphs, you will know what this is really doing and why it’s not only annoying but also pernicious.

The probability of seeing such a sampled difference under the null hypothesis is the probability of seeing a value of size D/SE or larger when taking a sample from a normal distribution with mean zero and variance 1. The quantity D/SE is all we need to compute the p-value and test statistical significance. The ratio D/SE hence gets a special name: the z-score. The z-score is always the estimated mean difference divided by the estimated standard error. If it is greater than 2, then the finding is statistically significant. That’s because 95% of the normal distribution is within two standard deviations of the mean.

Now, that’s the motivation for the z-score. I hope it’s clear that there is nothing particularly rigorous about this methodology. I reminded us yesterday that the null hypothesis–that the mean difference equals zero—is never literally true. But it’s also never literally true that the mean difference is normally distributed. I also can never figure out why we should care whether a probability is small under the null hypothesis. And then what? No one has ever explained to me what you do with this information. As Gerd Gigerenzer likes to remind us, null hypothesis significance testing (NHST) is just a mindless ritual.

Let me give you a different interpretation of NHST that I think might be more intuitive but also raises more concern about the whole enterprise. Let me introduce two new variables: Y is the outcome we want to compare between groups. X is a binary variable equal to 0 if the individual is in group A and equal to 1 if the individual is in group B. Let’s say you have a sample of n pairs of (X,Y). The mean of group A is then

This identity is true because when Xi is 0, (1-Xi) equals 1. So the numerator is summing up the Y values when X equals 0, the denominator is counting the number of times X equals zero. Similarly, the mean of group B is

With my notation, if you do a little algebra, it turns out that the z-score takes a simple form:

In words: the z-score is the Pearson correlation between X and Y times the square root of n.1 So when you ritually NHST, all you are doing is seeing whether the Pearson r of the treatment and outcome variables is greater than 2 over root n. That’s it.

I want to emphasize that my formula for the z-score is true even in randomized controlled trials! RCTs, it turns out, are also only about correlation. But because we explicitly randomize the X variable, we know that correlation is measuring the effect of causation. We’re not proving that X causes Y in an RCT. We’re measuring the size of the influence of X on Y knowing there is already some causal connection between the two. Correlation doesn’t imply causation, but it’s the only thing we can measure with statistics.

OK, but now it should be crystal clear why the NHST framework is an annoyingly obfuscatory convention and also why the crud factor is a huge problem. If you have two weakly correlated variables X and Y, you’ll find a statistically significant result with a large enough sample. Two things can bring us to statistical significance: Either the treatment and the outcome are highly correlated OR n is large. When n is small, you are “underpowered” insofar as you’ll fail to reject the null hypothesis even though X and Y are strongly correlated. But when n is moderately sized, you will reject the null hypothesis for weakly correlated variables.

Let’s come back to the crud factor and crud distribution. I’m going to be a bit more precise about crud today. Given a population of variables, the crud distribution is the distribution of the Pearson correlation coefficients between all pairs in the population when the pairs are chosen uniformly at random. Following the suggestion of Matt Hoffman and Jordan Ellenberg, the crud factor is the average magnitude of the Pearson r. In the example yesterday from Webster and Starbuck, the crud distribution was approximately Gaussian with mean 0.1 and standard deviation 0.2. The crud factor of this distribution is 0.2.

Now let’s imagine Meehl’s silly thought experiment from the end of Lecture 6: I have a pot of binary treatments, I have a pot of outcomes, I have a pot of theories. I pick out one thing from each pot at random. A theory T, a treatment X, and an outcome Y. I then assert T implies that X causes Y, even though there is no connection here between the logical content of the theory and the relationship between treatment and outcome. Then I gather a sample and test for significance. 

What happens in the presence of a crud distribution? Let’s use the rough numbers from Webster and Starbuck as parameters for the crud distribution. If I gather a cohort of 800 participants, then the probability of rejecting the null at 0.05 is over 75%. For two variables I chose at random, the probability of finding a statistically significant result is 3 in 4. You might say, well, maybe we could fix this by setting the p-value threshold lower. Say to 0.001? At this p-value threshold, the probability of finding a statistically significant result between randomly chosen variables is over 60%. Even with a 5-9s p-value threshold (rejecting if p<10-5), the chance of rejecting the null for a nonsensical comparison is over 50%.

We find ourselves in the midst of yet another statistical paradox. If I sufficiently “power” a study that looks at relationships between randomly chosen attributes, and if the crud factor is not negligible, then I can corroborate theories that have no connection with facts. And the corroboration only increases if I make n larger. Oh crud!

Subscribe now

1

I’ve never seen the z-score written this way before. If you’ve seen this formula in print, could you please send me a reference? And if you haven’t seen this identity before, are you as outraged as I am about how we teach hypothesis testing?

By Ben Recht