Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed, or follow on Twitter, to stay up to date.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Wednesday, March 29

Complexity of Reconfiguration in Surface Chemical Reaction Networks

Authors: Robert M. Alaniz, Josh Brunner, Michael Coulombe, Erik D. Demaine, Yevhenii Diomidov, Ryan Knobel, Timothy Gomez, Elise Grizzell, Jayson Lynch, Andrew Rodriguez, Robert Schweller, Tim Wylie

We analyze the computational complexity of basic reconfiguration problems for the recently introduced surface Chemical Reaction Networks (sCRNs), where ordered pairs of adjacent species nondeterministically transform into a different ordered pair of species according to a predefined set of allowed transition rules (chemical reactions). In particular, two questions that are fundamental to the simulation of sCRNs are whether a given configuration of molecules can ever transform into another given configuration, and whether a given cell can ever contain a given species, given a set of transition rules. We show that these problems can be solved in polynomial time, are NP-complete, or are PSPACE-complete in a variety of different settings, including when adjacent species just swap instead of arbitrary transformation (swap sCRNs), and when cells can change species a limited number of times (k-burnout). Most problems turn out to be at least NP-hard except with very few distinct species (2 or 3).

We analyze the computational complexity of basic reconfiguration problems for the recently introduced surface Chemical Reaction Networks (sCRNs), where ordered pairs of adjacent species nondeterministically transform into a different ordered pair of species according to a predefined set of allowed transition rules (chemical reactions). In particular, two questions that are fundamental to the simulation of sCRNs are whether a given configuration of molecules can ever transform into another given configuration, and whether a given cell can ever contain a given species, given a set of transition rules. We show that these problems can be solved in polynomial time, are NP-complete, or are PSPACE-complete in a variety of different settings, including when adjacent species just swap instead of arbitrary transformation (swap sCRNs), and when cells can change species a limited number of times (k-burnout). Most problems turn out to be at least NP-hard except with very few distinct species (2 or 3).

Towards Crossing-Free Hamiltonian Cycles in Simple Drawings of Complete Graphs

Authors: Oswin Aichholzer, Joachim Orthaber, Birgit Vogtenhuber

It is a longstanding conjecture that every simple drawing of a complete graph on $n\geq 3$ vertices contains a crossing-free Hamiltonian cycle. We confirm this conjecture for cylindrical drawings, strongly $c$-monotone drawings, as well as $x$-bounded drawings. Moreover, we introduce the stronger question of whether a crossing-free Hamiltonian path between each pair of vertices always exists.

It is a longstanding conjecture that every simple drawing of a complete graph on $n\geq 3$ vertices contains a crossing-free Hamiltonian cycle. We confirm this conjecture for cylindrical drawings, strongly $c$-monotone drawings, as well as $x$-bounded drawings. Moreover, we introduce the stronger question of whether a crossing-free Hamiltonian path between each pair of vertices always exists.

Online embedding of metrics

Authors: Ilan Newman, Yuri Rabinovich

We study deterministic online embeddings of metrics spaces into normed spaces and into trees against an adaptive adversary. Main results include a polynomial lower bound on the (multiplicative) distortion of embedding into Euclidean spaces, a tight exponential upper bound on embedding into the line, and a $(1+\epsilon)$-distortion embedding in $\ell_\infty$ of a suitably high dimension.

Authors: Ilan Newman, Yuri Rabinovich

We study deterministic online embeddings of metrics spaces into normed spaces and into trees against an adaptive adversary. Main results include a polynomial lower bound on the (multiplicative) distortion of embedding into Euclidean spaces, a tight exponential upper bound on embedding into the line, and a $(1+\epsilon)$-distortion embedding in $\ell_\infty$ of a suitably high dimension.

Randomized rounding algorithms for large scale unsplittable flow problems

Authors: François Lamothe, Emmanuel Rachelson, Alain Haït, Cedric Baudoin, Jean-Baptiste Dupe

Unsplittable flow problems cover a wide range of telecommunication and transportation problems and their efficient resolution is key to a number of applications. In this work, we study algorithms that can scale up to large graphs and important numbers of commodities. We present and analyze in detail a heuristic based on the linear relaxation of the problem and randomized rounding. We provide empirical evidence that this approach is competitive with state-of-the-art resolution methods either by its scaling performance or by the quality of its solutions. We provide a variation of the heuristic which has the same approximation factor as the state-of-the-art approximation algorithm. We also derive a tighter analysis for the approximation factor of both the variation and the state-of-the-art algorithm. We introduce a new objective function for the unsplittable flow problem and discuss its differences with the classical congestion objective function. Finally, we discuss the gap in practical performance and theoretical guarantees between all the aforementioned algorithms.

Unsplittable flow problems cover a wide range of telecommunication and transportation problems and their efficient resolution is key to a number of applications. In this work, we study algorithms that can scale up to large graphs and important numbers of commodities. We present and analyze in detail a heuristic based on the linear relaxation of the problem and randomized rounding. We provide empirical evidence that this approach is competitive with state-of-the-art resolution methods either by its scaling performance or by the quality of its solutions. We provide a variation of the heuristic which has the same approximation factor as the state-of-the-art approximation algorithm. We also derive a tighter analysis for the approximation factor of both the variation and the state-of-the-art algorithm. We introduce a new objective function for the unsplittable flow problem and discuss its differences with the classical congestion objective function. Finally, we discuss the gap in practical performance and theoretical guarantees between all the aforementioned algorithms.

On de novo Bridging Paired-end RNA-seq Data

Authors: Xiang Li, Mingfu Shao

The high-throughput short-reads RNA-seq protocols often produce paired-end reads, with the middle portion of the fragments being unsequenced. We explore if the full-length fragments can be computationally reconstructed from the sequenced two ends in the absence of the reference genome - a problem here we refer to as de novo bridging. Solving this problem provides longer, more informative RNA-seq reads, and benefits downstream RNA-seq analysis such as transcript assembly, expression quantification, and splicing differential analysis. However, de novo bridging is a challenging and complicated task owing to alternative splicing, transcript noises, and sequencing errors. It remains unclear if the data provides sufficient information for accurate bridging, let alone efficient algorithms that determine the true bridges. Methods have been proposed to bridge paired-end reads in the presence of reference genome (called reference-based bridging), but the algorithms are far away from scaling for de novo bridging as the underlying compacted de Bruijn graph(cdBG) used in the latter task often contains millions of vertices and edges. We designed a new truncated Dijkstra's algorithm for this problem, and proposed a novel algorithm that reuses the shortest path tree to avoid running the truncated Dijkstra's algorithm from scratch for all vertices for further speeding up. These innovative techniques result in scalable algorithms that can bridge all paired-end reads in a cdBG with millions of vertices. Our experiments showed that paired-end RNA-seq reads can be accurately bridged to a large extent. The resulting tool is freely available at github.com/Shao-Group/rnabridge-denovo.

Authors: Xiang Li, Mingfu Shao

The high-throughput short-reads RNA-seq protocols often produce paired-end reads, with the middle portion of the fragments being unsequenced. We explore if the full-length fragments can be computationally reconstructed from the sequenced two ends in the absence of the reference genome - a problem here we refer to as de novo bridging. Solving this problem provides longer, more informative RNA-seq reads, and benefits downstream RNA-seq analysis such as transcript assembly, expression quantification, and splicing differential analysis. However, de novo bridging is a challenging and complicated task owing to alternative splicing, transcript noises, and sequencing errors. It remains unclear if the data provides sufficient information for accurate bridging, let alone efficient algorithms that determine the true bridges. Methods have been proposed to bridge paired-end reads in the presence of reference genome (called reference-based bridging), but the algorithms are far away from scaling for de novo bridging as the underlying compacted de Bruijn graph(cdBG) used in the latter task often contains millions of vertices and edges. We designed a new truncated Dijkstra's algorithm for this problem, and proposed a novel algorithm that reuses the shortest path tree to avoid running the truncated Dijkstra's algorithm from scratch for all vertices for further speeding up. These innovative techniques result in scalable algorithms that can bridge all paired-end reads in a cdBG with millions of vertices. Our experiments showed that paired-end RNA-seq reads can be accurately bridged to a large extent. The resulting tool is freely available at https://github.com/Shao-Group/rnabridge-denovo.

Overcoming Probabilistic Faults in Disoriented Linear Search

Authors: Konstantinos Georgiou, Nikos Giachoudis, Evangelos Kranakis

We consider search by mobile agents for a hidden, idle target, placed on the infinite line. Feasible solutions are agent trajectories in which all agents reach the target sooner or later. A special feature of our problem is that the agents are $p$-faulty, meaning that every attempt to change direction is an independent Bernoulli trial with known probability $p$, where $p$ is the probability that a turn fails. We are looking for agent trajectories that minimize the worst-case expected termination time, relative to competitive analysis.

First, we study linear search with one deterministic $p$-faulty agent, i.e., with no access to random oracles, $p\in (0,1/2)$. For this problem, we provide trajectories that leverage the probabilistic faults into an algorithmic advantage. Our strongest result pertains to a search algorithm (deterministic, aside from the adversarial probabilistic faults) which, as $p\to 0$, has optimal performance $4.59112+\epsilon$, up to the additive term $\epsilon$ that can be arbitrarily small. Additionally, it has performance less than $9$ for $p\leq 0.390388$. When $p\to 1/2$, our algorithm has performance $\Theta(1/(1-2p))$, which we also show is optimal up to a constant factor.

Second, we consider linear search with two $p$-faulty agents, $p\in (0,1/2)$, for which we provide three algorithms of different advantages, all with a bounded competitive ratio even as $p\rightarrow 1/2$. Indeed, for this problem, we show how the agents can simulate the trajectory of any $0$-faulty agent (deterministic or randomized), independently of the underlying communication model. As a result, searching with two agents allows for a solution with a competitive ratio of $9+\epsilon$, or a competitive ratio of $4.59112+\epsilon$. Our final contribution is a novel algorithm for searching with two $p$-faulty agents that achieves a competitive ratio $3+4\sqrt{p(1-p)}$.

We consider search by mobile agents for a hidden, idle target, placed on the infinite line. Feasible solutions are agent trajectories in which all agents reach the target sooner or later. A special feature of our problem is that the agents are $p$-faulty, meaning that every attempt to change direction is an independent Bernoulli trial with known probability $p$, where $p$ is the probability that a turn fails. We are looking for agent trajectories that minimize the worst-case expected termination time, relative to competitive analysis.

First, we study linear search with one deterministic $p$-faulty agent, i.e., with no access to random oracles, $p\in (0,1/2)$. For this problem, we provide trajectories that leverage the probabilistic faults into an algorithmic advantage. Our strongest result pertains to a search algorithm (deterministic, aside from the adversarial probabilistic faults) which, as $p\to 0$, has optimal performance $4.59112+\epsilon$, up to the additive term $\epsilon$ that can be arbitrarily small. Additionally, it has performance less than $9$ for $p\leq 0.390388$. When $p\to 1/2$, our algorithm has performance $\Theta(1/(1-2p))$, which we also show is optimal up to a constant factor.

Second, we consider linear search with two $p$-faulty agents, $p\in (0,1/2)$, for which we provide three algorithms of different advantages, all with a bounded competitive ratio even as $p\rightarrow 1/2$. Indeed, for this problem, we show how the agents can simulate the trajectory of any $0$-faulty agent (deterministic or randomized), independently of the underlying communication model. As a result, searching with two agents allows for a solution with a competitive ratio of $9+\epsilon$, or a competitive ratio of $4.59112+\epsilon$. Our final contribution is a novel algorithm for searching with two $p$-faulty agents that achieves a competitive ratio $3+4\sqrt{p(1-p)}$.

Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model

Authors: Rashmi Ranjan Bhuyan, Adel Javanmard, Sungchul Kim, Gourab Mukherjee, Ryan A. Rossi, Tong Yu, Handong Zhao

We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic probit model with the consumers' preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways, we consider a global shrinkage structure, which assumes that the consumers' preferences across the different segments can be well approximated by a spatial autoregressive (SAR) model. In such a streamed longitudinal set-up, we measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on penalized stochastic gradient descent (PSGD) and explicitly characterize its regret as functions of time, the temporal variability in the model parameters as well as the strength of the auto-correlation network structure spanning the varied customer segments. Our regret analysis results not only demonstrate asymptotic optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information as policies based on unshrunken models are highly sub-optimal in the aforementioned set-up.

We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic probit model with the consumers' preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways, we consider a global shrinkage structure, which assumes that the consumers' preferences across the different segments can be well approximated by a spatial autoregressive (SAR) model. In such a streamed longitudinal set-up, we measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on penalized stochastic gradient descent (PSGD) and explicitly characterize its regret as functions of time, the temporal variability in the model parameters as well as the strength of the auto-correlation network structure spanning the varied customer segments. Our regret analysis results not only demonstrate asymptotic optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information as policies based on unshrunken models are highly sub-optimal in the aforementioned set-up.

Algorithms for subgraph complementation to some classes of graphs

Authors: Dhanyamol Antony, Sagartanu Pal, R.B. Sandeep

For a class $\mathcal{G}$ of graphs, the objective of \textsc{Subgraph Complementation to} $\mathcal{G}$ is to find whether there exists a subset $S$ of vertices of the input graph $G$ such that modifying $G$ by complementing the subgraph induced by $S$ results in a graph in $\mathcal{G}$. We obtain a polynomial-time algorithm for the problem when $\mathcal{G}$ is the class of graphs with minimum degree at least $k$, for a constant $k$, answering an open problem by Fomin et al. (Algorithmica, 2020). When $\mathcal{G}$ is the class of graphs without any induced copies of the star graph on $t+1$ vertices (for any constant $t\geq 3$) and diamond, we obtain a polynomial-time algorithm for the problem. This is in contrast with a result by Antony et al. (Algorithmica, 2022) that the problem is NP-complete and cannot be solved in subexponential-time (assuming the Exponential Time Hypothesis) when $\mathcal{G}$ is the class of graphs without any induced copies of the star graph on $t+1$ vertices, for every constant $t\geq 5$.

For a class $\mathcal{G}$ of graphs, the objective of \textsc{Subgraph Complementation to} $\mathcal{G}$ is to find whether there exists a subset $S$ of vertices of the input graph $G$ such that modifying $G$ by complementing the subgraph induced by $S$ results in a graph in $\mathcal{G}$. We obtain a polynomial-time algorithm for the problem when $\mathcal{G}$ is the class of graphs with minimum degree at least $k$, for a constant $k$, answering an open problem by Fomin et al. (Algorithmica, 2020). When $\mathcal{G}$ is the class of graphs without any induced copies of the star graph on $t+1$ vertices (for any constant $t\geq 3$) and diamond, we obtain a polynomial-time algorithm for the problem. This is in contrast with a result by Antony et al. (Algorithmica, 2022) that the problem is NP-complete and cannot be solved in subexponential-time (assuming the Exponential Time Hypothesis) when $\mathcal{G}$ is the class of graphs without any induced copies of the star graph on $t+1$ vertices, for every constant $t\geq 5$.

Faster Deterministic Distributed MIS and Approximate Matching

Authors: Mohsen Ghaffari, Christoph Grunau

$\renewcommand{\tilde}{\widetilde}$We present an $\tilde{O}(\log^2 n)$ round deterministic distributed algorithm for the maximal independent set problem. By known reductions, this round complexity extends also to maximal matching, $\Delta+1$ vertex coloring, and $2\Delta-1$ edge coloring. These four problems are among the most central problems in distributed graph algorithms and have been studied extensively for the past four decades. This improved round complexity comes closer to the $\tilde{\Omega}(\log n)$ lower bound of maximal independent set and maximal matching [Balliu et al. FOCS '19]. The previous best known deterministic complexity for all of these problems was $\Theta(\log^3 n)$. Via the shattering technique, the improvement permeates also to the corresponding randomized complexities, e.g., the new randomized complexity of $\Delta+1$ vertex coloring is now $\tilde{O}(\log^2\log n)$ rounds.

Our approach is a novel combination of the previously known two methods for developing deterministic algorithms for these problems, namely global derandomization via network decomposition (see e.g., [Rozhon, Ghaffari STOC'20; Ghaffari, Grunau, Rozhon SODA'21; Ghaffari et al. SODA'23]) and local rounding of fractional solutions (see e.g., [Fischer DISC'17; Harris FOCS'19; Fischer, Ghaffari, Kuhn FOCS'17; Ghaffari, Kuhn FOCS'21; Faour et al. SODA'23]). We consider a relaxation of the classic network decomposition concept, where instead of requiring the clusters in the same block to be non-adjacent, we allow each node to have a small number of neighboring clusters. We also show a deterministic algorithm that computes this relaxed decomposition faster than standard decompositions. We then use this relaxed decomposition to significantly improve the integrality of certain fractional solutions, before handing them to the local rounding procedure that now has to do fewer rounding steps.

Authors: Mohsen Ghaffari, Christoph Grunau

$\renewcommand{\tilde}{\widetilde}$We present an $\tilde{O}(\log^2 n)$ round deterministic distributed algorithm for the maximal independent set problem. By known reductions, this round complexity extends also to maximal matching, $\Delta+1$ vertex coloring, and $2\Delta-1$ edge coloring. These four problems are among the most central problems in distributed graph algorithms and have been studied extensively for the past four decades. This improved round complexity comes closer to the $\tilde{\Omega}(\log n)$ lower bound of maximal independent set and maximal matching [Balliu et al. FOCS '19]. The previous best known deterministic complexity for all of these problems was $\Theta(\log^3 n)$. Via the shattering technique, the improvement permeates also to the corresponding randomized complexities, e.g., the new randomized complexity of $\Delta+1$ vertex coloring is now $\tilde{O}(\log^2\log n)$ rounds.

Our approach is a novel combination of the previously known two methods for developing deterministic algorithms for these problems, namely global derandomization via network decomposition (see e.g., [Rozhon, Ghaffari STOC'20; Ghaffari, Grunau, Rozhon SODA'21; Ghaffari et al. SODA'23]) and local rounding of fractional solutions (see e.g., [Fischer DISC'17; Harris FOCS'19; Fischer, Ghaffari, Kuhn FOCS'17; Ghaffari, Kuhn FOCS'21; Faour et al. SODA'23]). We consider a relaxation of the classic network decomposition concept, where instead of requiring the clusters in the same block to be non-adjacent, we allow each node to have a small number of neighboring clusters. We also show a deterministic algorithm that computes this relaxed decomposition faster than standard decompositions. We then use this relaxed decomposition to significantly improve the integrality of certain fractional solutions, before handing them to the local rounding procedure that now has to do fewer rounding steps.

Universal Coating in the 3D Hybrid Model

Authors: Irina Kostitsyna, David Liedtke, Christian Scheideler

Motivated by the prospect of nano-robots that assist human physiological functions at the nanoscale, we investigate the coating problem in the three-dimensional model for hybrid programmable matter. In this model, a single agent with strictly limited viewing range and the computational capability of a deterministic finite automaton can act on passive tiles by picking up a tile, moving, and placing it at some spot. The goal of the coating problem is to fill each node of some surface graph of size $n$ with a tile. We first solve the problem on a restricted class of graphs with a single tile type, and then use constantly many tile types to encode this graph in certain surface graphs capturing the surface of 3D objects. Our algorithm requires $\mathcal{O}(n^2)$ steps, which is worst-case optimal compared to an agent with global knowledge and no memory restrictions.

Motivated by the prospect of nano-robots that assist human physiological functions at the nanoscale, we investigate the coating problem in the three-dimensional model for hybrid programmable matter. In this model, a single agent with strictly limited viewing range and the computational capability of a deterministic finite automaton can act on passive tiles by picking up a tile, moving, and placing it at some spot. The goal of the coating problem is to fill each node of some surface graph of size $n$ with a tile. We first solve the problem on a restricted class of graphs with a single tile type, and then use constantly many tile types to encode this graph in certain surface graphs capturing the surface of 3D objects. Our algorithm requires $\mathcal{O}(n^2)$ steps, which is worst-case optimal compared to an agent with global knowledge and no memory restrictions.

Tuesday, March 28

An unexpected democracy slogan

from Scott Aaronson

At least six readers have by now sent me the following photo, which was taken in Israel a couple nights ago during the historic street protests against Netanyahu’s attempted putsch: (Update: The photo was also featured on Gil Kalai’s blog, and was credited there to Alon Rosen.) This is surely the first time that “P=NP” […]

At least six readers have by now sent me the following photo, which was taken in Israel a couple nights ago during the historic street protests against Netanyahu’s attempted putsch:

(Update: The photo was also featured on Gil Kalai’s blog, and was credited there to Alon Rosen.)

This is surely the first time that “P=NP” has emerged as a viral rallying cry for the preservation of liberal democracy, even to whatever limited extent it has.

But what was the graffiti artist’s intended meaning? A few possibilities:

1. The government has flouted so many rules of Israel’s social compact that our side needs to flout the rules too: shut down the universities, shut down the airport, block the roads, even assert that P=NP (!).
2. As a protest movement up against overwhelming odds, we need to shoot for the possibly-impossible, like solving 3SAT in polynomial time.
3. A shibboleth for scientific literate people following the news: “Israel is full of sane people who know what ‘P=NP’ means as you know what it means, are amused by its use as political graffiti as you’d be amused by it, and oppose Netanyahu’s putsch for the same reasons you’d oppose it.”
4. No meaning, the artist was just amusing himself or herself.
5. The artist reads Shtetl-Optimized and wanted effectively to force me to feature his or her work here.

Anyway, if the artist becomes aware of this post, he or she is warmly welcomed to clear things up for us.

And when this fight resumes after Passover, may those standing up for the checks and balances of a liberal-democratic society achieve … err … satisfaction, however exponentially unlikely it seems.

By Scott

TR23-040 | Certified Hardness vs. Randomness for Log-Space | Edward Pyne, Wei Zhan, Ran Raz

from ECCC Papers

Let $\mathcal{L}$ be a language that can be decided in linear space and let $\epsilon >0$ be any constant. Let $\mathcal{A}$ be the exponential hardness assumption that for every $n$, membership in $\mathcal{L}$ for inputs of length~$n$ cannot be decided by circuits of size smaller than $2^{\epsilon n}$. We prove that for every function $f :\{0,1\}^* \rightarrow \{0,1\}$, computable by a randomized logspace algorithm $R$, there exists a deterministic logspace algorithm $D$ (attempting to compute $f$), such that on every input $x$ of length $n$, the algorithm $D$ outputs one of the following: 1: The correct value $f(x)$. 2: The string: I am unable to compute $f(x)$ because the hardness assumption $\mathcal{A}$ is false'', followed by a (provenly correct) circuit of size smaller than $2^{\epsilon n'}$ for membership in $\mathcal{L}$ for inputs of length~$n'$, for some $n' = \Theta (\log n)$; that is, a circuit that refutes $\mathcal{A}$. Moreover, $D$ is explicitly constructed, given $R$. We note that previous works on the hardness-versus-randomness paradigm give derandomized algorithms that rely blindly on the hardness assumption. If the hardness assumption is false, the algorithms may output incorrect values, and thus a user cannot trust that an output given by the algorithm is correct. Instead, our algorithm $D$ verifies the computation so that it never outputs an incorrect value. Thus, if $D$ outputs a value for $f(x)$, that value is certified to be correct. Moreover, if $D$ does not output a value for $f(x)$, it alerts that the hardness assumption was found to be false, and refutes the assumption. Our next result is a universal derandomizer for $BPL$ (the class of problems solvable by bounded-error randomized logspace algorithms): We give a deterministic algorithm $U$ that takes as an input a randomized logspace algorithm $R$ and an input $x$ and simulates the computation of $R$ on $x$, deteriministically. Under the widely believed assumption $BPL=L$, the space used by $U$ is at most $C_R \cdot \log n$ (where $C_R$ is a constant depending on~$R$). Moreover, for every constant $c \geq 1$, if $BPL\subseteq SPACE[(\log(n))^{c}]$ then the space used by $U$ is at most $C_R \cdot (\log(n))^{c}$. Finally, we prove that if optimal hitting sets for ordered branching programs exist then there is a deterministic logspace algorithm that, given a black-box access to an ordered branching program $B$ of size $n$, estimates the probability that $B$ accepts on a uniformly random input. This extends the result of (Cheng and Hoza CCC 2020), who proved that an optimal hitting set implies a white-box two-sided derandomization.
Let $\mathcal{L}$ be a language that can be decided in linear space and let $\epsilon >0$ be any constant. Let $\mathcal{A}$ be the exponential hardness assumption that for every $n$, membership in $\mathcal{L}$ for inputs of length~$n$ cannot be decided by circuits of size smaller than $2^{\epsilon n}$. We prove that for every function $f :\{0,1\}^* \rightarrow \{0,1\}$, computable by a randomized logspace algorithm $R$, there exists a deterministic logspace algorithm $D$ (attempting to compute $f$), such that on every input $x$ of length $n$, the algorithm $D$ outputs one of the following: 1: The correct value $f(x)$. 2: The string: I am unable to compute $f(x)$ because the hardness assumption $\mathcal{A}$ is false'', followed by a (provenly correct) circuit of size smaller than $2^{\epsilon n'}$ for membership in $\mathcal{L}$ for inputs of length~$n'$, for some $n' = \Theta (\log n)$; that is, a circuit that refutes $\mathcal{A}$. Moreover, $D$ is explicitly constructed, given $R$. We note that previous works on the hardness-versus-randomness paradigm give derandomized algorithms that rely blindly on the hardness assumption. If the hardness assumption is false, the algorithms may output incorrect values, and thus a user cannot trust that an output given by the algorithm is correct. Instead, our algorithm $D$ verifies the computation so that it never outputs an incorrect value. Thus, if $D$ outputs a value for $f(x)$, that value is certified to be correct. Moreover, if $D$ does not output a value for $f(x)$, it alerts that the hardness assumption was found to be false, and refutes the assumption. Our next result is a universal derandomizer for $BPL$ (the class of problems solvable by bounded-error randomized logspace algorithms): We give a deterministic algorithm $U$ that takes as an input a randomized logspace algorithm $R$ and an input $x$ and simulates the computation of $R$ on $x$, deteriministically. Under the widely believed assumption $BPL=L$, the space used by $U$ is at most $C_R \cdot \log n$ (where $C_R$ is a constant depending on~$R$). Moreover, for every constant $c \geq 1$, if $BPL\subseteq SPACE[(\log(n))^{c}]$ then the space used by $U$ is at most $C_R \cdot (\log(n))^{c}$. Finally, we prove that if optimal hitting sets for ordered branching programs exist then there is a deterministic logspace algorithm that, given a black-box access to an ordered branching program $B$ of size $n$, estimates the probability that $B$ accepts on a uniformly random input. This extends the result of (Cheng and Hoza CCC 2020), who proved that an optimal hitting set implies a white-box two-sided derandomization.

TR23-039 | Query Complexity of Search Problems | Yogesh Dahiya, Arkadev Chattopadhyay, Meena Mahajan

from ECCC Papers

We relate various complexity measures like sensitivity, block sensitivity, certificate complexity for multi-output functions to the query complexities of such functions. Using these relations, we improve upon the known relationship between pseudo-deterministic query complexity and deterministic query complexity for total search problems: We show that pseudo-deterministic query complexity is at most the third power of its deterministic query complexity. (Previously a fourth-power relation was shown by Goldreich,Goldwasser,Ron (ITCS13).) We then obtain a significantly simpler and self-contained proof of a separation between pseudodeterminism and randomized query complexity recently proved by Goldwasser, Impagliazzo, Pitassi, Santhanam (CCC 2021). We also separate pseudodeterminism from randomness in AND decision trees, and determinism from pseudodeterminism in PARITY decision trees. For a hypercube colouring problem closely related to the pseudodeterministic complexity of a complete problem in $TFNP^{dt}$, we prove that either the monotone block-sensitivity or the anti-monotone block sensitivity is $\Omega(n^{1/3})$; previously an $\Omega(n^{1/2})$ bound was known but for general block-sensitivity.
We relate various complexity measures like sensitivity, block sensitivity, certificate complexity for multi-output functions to the query complexities of such functions. Using these relations, we improve upon the known relationship between pseudo-deterministic query complexity and deterministic query complexity for total search problems: We show that pseudo-deterministic query complexity is at most the third power of its deterministic query complexity. (Previously a fourth-power relation was shown by Goldreich,Goldwasser,Ron (ITCS13).) We then obtain a significantly simpler and self-contained proof of a separation between pseudodeterminism and randomized query complexity recently proved by Goldwasser, Impagliazzo, Pitassi, Santhanam (CCC 2021). We also separate pseudodeterminism from randomness in AND decision trees, and determinism from pseudodeterminism in PARITY decision trees. For a hypercube colouring problem closely related to the pseudodeterministic complexity of a complete problem in $TFNP^{dt}$, we prove that either the monotone block-sensitivity or the anti-monotone block sensitivity is $\Omega(n^{1/3})$; previously an $\Omega(n^{1/2})$ bound was known but for general block-sensitivity.

TR23-038 | Indistinguishability Obfuscation, Range Avoidance, and Bounded Arithmetic | Rahul Ilango, Jiatu Li, Ryan Williams

from ECCC Papers

The range avoidance problem (denoted by Avoid) asks to find a string outside of the range of a given circuit $C:\{0,1\}^n\to\{0,1\}^m$, where $m>n$. Although at least half of the strings of length $m$ are correct answers, it is not clear how to deterministically find one. Recent results of Korten (FOCS'21) and Ren, Wang, and Santhanam (FOCS' 22) show that efficient deterministic algorithms for Avoid would have far-reaching consequences, including strong circuit lower bounds and explicit constructions of combinatorial objects (e.g., Ramsey graphs, extractors, rigid matrices). This strongly motivates the question: does an efficient deterministic algorithm for Avoid actually exist? In this work, we prove under the existence of subexponentially secure indistinguishability obfuscation (iO) that deterministic polynomial-time algorithms for Avoid imply NP=coNP. Combining this with Jain, Lin, and Sahai's recent breakthrough construction of iO from well-founded assumptions (STOC'21, EUROCRYPT'22), we provide the first plausible evidence that Avoid has no efficient deterministic algorithm. Moreover, we also prove the hardness of Avoid based on polynomially-secure iO and a weaker variant of the Nondeterministic Exponential Time Hypothesis (NETH). Extending our techniques, we prove a surprising separation in bounded arithmetic, conditioned on similar assumptions. Assuming subexponentially secure iO and coNP is not infinitely often in AM, we show that Avoid has no deterministic polynomial-time algorithm even when we are allowed $O(1)$ queries to an oracle that can invert the given input circuit on an arbitrarily chosen $m$-bit string. It follows that the dual Weak Pigeonhole Principle, the combinatorial principle underlying Avoid, is not provable in Cook's theory $PV_1$. This gives (under plausible assumptions) the first separation of Cook's theory $PV_1$ for polynomial-time reasoning and Jerabek's theory $APC_1$ for probabilistic polynomial-time reasoning.
The range avoidance problem (denoted by Avoid) asks to find a string outside of the range of a given circuit $C:\{0,1\}^n\to\{0,1\}^m$, where $m>n$. Although at least half of the strings of length $m$ are correct answers, it is not clear how to deterministically find one. Recent results of Korten (FOCS'21) and Ren, Wang, and Santhanam (FOCS' 22) show that efficient deterministic algorithms for Avoid would have far-reaching consequences, including strong circuit lower bounds and explicit constructions of combinatorial objects (e.g., Ramsey graphs, extractors, rigid matrices). This strongly motivates the question: does an efficient deterministic algorithm for Avoid actually exist? In this work, we prove under the existence of subexponentially secure indistinguishability obfuscation (iO) that deterministic polynomial-time algorithms for Avoid imply NP=coNP. Combining this with Jain, Lin, and Sahai's recent breakthrough construction of iO from well-founded assumptions (STOC'21, EUROCRYPT'22), we provide the first plausible evidence that Avoid has no efficient deterministic algorithm. Moreover, we also prove the hardness of Avoid based on polynomially-secure iO and a weaker variant of the Nondeterministic Exponential Time Hypothesis (NETH). Extending our techniques, we prove a surprising separation in bounded arithmetic, conditioned on similar assumptions. Assuming subexponentially secure iO and coNP is not infinitely often in AM, we show that Avoid has no deterministic polynomial-time algorithm even when we are allowed $O(1)$ queries to an oracle that can invert the given input circuit on an arbitrarily chosen $m$-bit string. It follows that the dual Weak Pigeonhole Principle, the combinatorial principle underlying Avoid, is not provable in Cook's theory $PV_1$. This gives (under plausible assumptions) the first separation of Cook's theory $PV_1$ for polynomial-time reasoning and Jerabek's theory $APC_1$ for probabilistic polynomial-time reasoning.

Some Problems

from Gil Kalai

In the previous post, Two Three Four posts ago I wrote about three recent breakthroughs in combinatorics and here I want to mention some problems that I posed over the years that are loosely related to these advances. Rank of … Continue reading →

In the previous post, Two Three Four posts ago I wrote about three recent breakthroughs in combinatorics and here I want to mention some problems that I posed over the years that are loosely related to these advances.

Rank of incidence matrices and q-analogs

The goal of finding q-analogs of combinatorial results where, roughly speaking, sets are replaced by subspaces of vector spaces over a field with $q$ elements, is common both in enumerative combinatorics and in extremal combinatorics. A recent breakthrough we discussed by Keevash, Sah, and Sawhney was about the existence of $q$-analogs of designs (subspace designs).

I will mention a problem (Question 4) in this direction about incidence matrices, following three questions that were largely solved.

Incidence matrices and weighted incidence matrices for sets

The incidence matrix of $k$-subsets vs. $m$-subsets of $[n]$, is a matrix $I_{k,m}(n)$ whose rows correspond to $k$-subsets $R$ of $[n]$, whose columns correspond to $m$-subsets $S$ of $[n]$, and the entry $I(S,T)$ equals 1 if $R \subset S$, and equals 0 if $R \not \subset S$.

A weighted incidence matrix of $k$-subsets vs. $m$-subsets of $[n]$, is a matrix $J_{k,m}(n)$ whose rows correspond to $k$-subsets $R$ of $[n]$, whose columns correspond to $m$-subsets $S$ of $[n]$, and $J(R,S) \ne 0$ if $R \subset S$ and $J(R,S)=0$  if $R \not \subset S$.

Question 1: What is the rank $r_p(n,k,m)$ of the incidence matrix of $k$-subsets vs. $m$-subsets of $[n]$, over a field of characteristic $p$.

This question was beautifully answered by Richard Wilson in 1990 . The problem was posed by Nati Linial and Bruce Rothschild in 1981 and they settled the case $p=2$.  (The answer for $p=2$, $m=k-1$, that motivated the question, was observed earlier by Perles and by Frankl.) It is unforgivable that I did not present the statement of Wilson’s theorem here on the blog.

Question 2: What is the minimum rank, denoted by $s_p(n,k,m)$, of a weighted incidence matrix of $k$-subsets vs. $m$-subsets of $[n]$ over a field of characteristic $p$.

This question was answered by me in the early 80s (it is related also to various results by others around the same time). The answer is ${n-k+m} \choose {m}$, and remarkably it is independent from the characteristic $p$.

Incidence matrices and weighted incidence matrices for subspaces

The incidence matrix $I^q=I^q_{k,m}(n)$ of $k$-dimensional subspaces vs. $m$-dimensional subspaces of $F_q^n$ ($k has entries $I_{V,U}=1$ if $V \subset U$ and $I_{V,U}=0$ if $V \not \subset U$.

A weighted incidence matrix $J^q=J^q_{k,m}(n)$ of $k$-dimensional subspaces vs. $m$-dimensional subspaces of $F_q^n$ ($k) has entries $J_{V,U}\ne 0$ if $V \subset U$ and $J_{V,U}=0$ if $V \not \subset U$.

We pose two additional questions which are the “q-analogs” of Questions 1 and 2.

Question 3:  What is the rank $r_q^p(n,k,m)$ of the incidence matrix $I^q_{k,m}(n)$ over a field of characteristic $p$ (you can simply take the field $F_p$).

Frumkin and Yakir settled problem 3 when $q$ is not a power of $p$.

The problem I want to pose (again) here is:

Question 4:  What is the minimum rank denoted by $s_q^p(n,k,m)$ of a weighted incidence matrix $J^q_{k,m}(n)$ over a field of characteristic $p$.

In particular, I would like to know if the answer does not depend on $p$ and if it agrees with some easy lower bounds (obtained from certain identity submatrices) like in the case of a field with one element $p=1$ (namely, subsets).

q-trees

Qoestion 5: What are the q-analogs of trees? (and hypertrees).

The basic idea is first to find weights so that the incidence matrix of 1-subspace vs 2-subspaces has rank $q+q^2+\cdots +q^n$, and then the $q$-trees will correspond to collection of $q+q^2+\cdots +q^n$ 2-spaces with linearly independent columns. (I don’t expect uniqueness.) This is a little related to a $q$-analog of the notion of symmetric matroids that I studied in the late 80s.

A year ago Ferdinand Ihringer, Motaz Mokatren and I made some very preliminary steps in this direction before moving on (separately) to other projects. It will be nice to come back to it.

Remark: There is even greater generalities where problems can be extended from set systems (graphs and hypergraphs) to more general algebraic objects. Those could be related to general primitive permutation groups, to association schemes, and to other objects in algebraic combinatorics.

Unit distances and related problems in discrete geometry.

Let $V$ be a normed space. For a set $S \subset V$ the unit distance graph $G(V)$ is the graph whose vertices are points in $V$ and two vertices are adjacent if their distance is one.

We can consider the following quantities

1) $a(V)$: The maximum size of a unit distance set in $V$. (In other words, the maximum clique in $G(V)$.)

2) $\chi(V)$:  The number of colors needed for $V$ if two points of unit distance are colored with different colors.

3) $b(V)$: The maximum number of colors needed to color points in a set $S$ of diameter 1 if  every color set has diameter smaller than 1. (This is the Borsuk number of $V$.)

4) $b_f(V)$ The maximum number of colors needed to color points in a finite set $S$ of diameter 1 if  two points of unit distance are colored with different colors.

5) $kiss(V)$ The maximum number of points of norm 1 with pairwise distances at least 1.

6) The maximum over all sets $S$ with points of pairwise distance at least one, of the minimum degree in the unit distance graph $G(S)$.

7) The maximum over all sets $S$ with points of pairwise distance at least one of the chromatic number of the unit distance graph of $S$.

Estimating these seven quantities for Euclidean spaces and for other normed spaces are well-known problems. (See my survey article on problems around Borsuk’s problem.) Alon, Bucić, and Sauermann made a remarkable breakthrough on the first problem of the largest clique in a unit distance graphs for arbitrary normed spaces.

Jordan Ellenberg asked: “Does the Alon-Bucic-Sauermann result give you upper bounds for the chromatic number of (the unit distance graph of) $R^d$ with a typical norm? (Or is that already easy for some reason?) “.  But I know little about Jordan’s question.

There is much to say about them but I will not discuss these problems here. I will mention a single annoying problem.

Question 6: Is there an example of a normed space $V$ such that $b(V) \ne b_f(V)$?

(I am not even sure if for the seventh item it makes a difference to consider finite $S$.)

Intersection patterns of standard boxes

In the post that I mentioned we also discussed Tomon’s remarkable result on intersection patterns of standard boxes. Here is some loosely related problem. In short, we want to find topological analogs for results on intersection patterns of standard boxes.

Topological Helly-type theorems is an important direction in geometric and topological combinatorics. The idea is to prove Helly type theorems about convex sets in a much wider topological contexts.

A primary goal of Topological Helly-type theorems is to extend results for nerves of families of convex sets in $\mathbb R^d$ to the class of $d$-Leray simplcial complexes. Among the results achieved so far are: The upper bound theorem; Eckhoff’s conjecture; Alon and Kleitman’s (p,q)-theorem; colorful and matroidal Helly’s theorem; topological Amenta’s theorem, and more.

Another goal of Topological Helly-type theorems is to study if results on nerves of standard boxes can be extended to flag $d$-Leray simplicial complexes?

In this direction the immediate goal is to extend Eckhoff’s upper bound theorem. (Item 3 in this post.)

Question 7. Conjecture: Let $K$ be a $d$ Leray flag complex of dimension $d$ with $n$ vertices. Then the $f$-vector of $K$ obeys Eckhoff’s upper bound theorem for standard boxes.

A closely related question is

Question 8. Conjecture: Let $K$ be a $d$-Leray flag complex of dimension $m$, then the $f$-vector of $K$ is the $f$-vector of a completely balanced $m$-dimensional $d$-Leray complex.

Studying the equality cases of the conjecture is also of interest and the extremal complexes can also be regarded as some sort of ultra-trees.

Roy Meshulam and I worked together on topological Helly type theorems for more than two decades.

By Gil Kalai

TR23-037 | Capturing One-Way Functions via NP-Hardness of Meta-Complexity | Shuichi Hirahara

from ECCC Papers

A one-way function is a function that is easy to compute but hard to invert *on average*. We establish the first characterization of a one-way function by *worst-case* hardness assumptions, by introducing a natural meta-computational problem whose NP-hardness (and the worst-case hardness of NP) characterizes the existence of a one-way function. Specifically, we generalize the notion of time-bounded conditional Kolmogorov complexity to *distributional Kolmogorov complexity*, and prove that a one-way function exists if and only if it is NP-hard to approximate the distributional Kolmogorov complexity under randomized polynomial-time reductions and NP is hard in the worst case. We also propose the *Meta-Complexity Padding Conjecture*, which postulates that distributional Kolmogorov complexity is paddable by an approximation-preserving reduction. Under this conjecture, we prove that the worst-case hardness of an approximate version of the Minimum Circuit Size Problem characterizes the existence of a one-way function. Our results extend the emerging paradigm of meta-complexity, which suggests that proving NP-hardness of meta-computational problems (i.e., problems that ask to compute complexity) is sufficient to exclude errorless Heuristica and error-prone Pessiland from Impagliazzo's five worlds. The key technical contribution is to conditionally close the gap between errorless and error-prone average-case complexities by combining Nanashima's proof techniques of showing "limits" of black-box reductions (ITCS'21) with non-black-box worst-case-to-average-case reductions of Hirahara (FOCS'18).
A one-way function is a function that is easy to compute but hard to invert *on average*. We establish the first characterization of a one-way function by *worst-case* hardness assumptions, by introducing a natural meta-computational problem whose NP-hardness (and the worst-case hardness of NP) characterizes the existence of a one-way function. Specifically, we generalize the notion of time-bounded conditional Kolmogorov complexity to *distributional Kolmogorov complexity*, and prove that a one-way function exists if and only if it is NP-hard to approximate the distributional Kolmogorov complexity under randomized polynomial-time reductions and NP is hard in the worst case. We also propose the *Meta-Complexity Padding Conjecture*, which postulates that distributional Kolmogorov complexity is paddable by an approximation-preserving reduction. Under this conjecture, we prove that the worst-case hardness of an approximate version of the Minimum Circuit Size Problem characterizes the existence of a one-way function. Our results extend the emerging paradigm of meta-complexity, which suggests that proving NP-hardness of meta-computational problems (i.e., problems that ask to compute complexity) is sufficient to exclude errorless Heuristica and error-prone Pessiland from Impagliazzo's five worlds. The key technical contribution is to conditionally close the gap between errorless and error-prone average-case complexities by combining Nanashima's proof techniques of showing "limits" of black-box reductions (ITCS'21) with non-black-box worst-case-to-average-case reductions of Hirahara (FOCS'18).

Efficient Lipschitzian Global Optimization of H\"older Continuous Multivariate Functions

Authors: Kaan Gokcesu, Hakan Gokcesu

This study presents an effective global optimization technique designed for multivariate functions that are H\"older continuous. Unlike traditional methods that construct lower bounding proxy functions, this algorithm employs a predetermined query creation rule that makes it computationally superior. The algorithm's performance is assessed using the average or cumulative regret, which also implies a bound for the simple regret and reflects the overall effectiveness of the approach. The results show that with appropriate parameters the algorithm attains an average regret bound of $O(T^{-\frac{\alpha}{n}})$ for optimizing a H\"older continuous target function with H\"older exponent $\alpha$ in an $n$-dimensional space within a given time horizon $T$. We demonstrate that this bound is minimax optimal.

Authors: Kaan Gokcesu, Hakan Gokcesu

This study presents an effective global optimization technique designed for multivariate functions that are H\"older continuous. Unlike traditional methods that construct lower bounding proxy functions, this algorithm employs a predetermined query creation rule that makes it computationally superior. The algorithm's performance is assessed using the average or cumulative regret, which also implies a bound for the simple regret and reflects the overall effectiveness of the approach. The results show that with appropriate parameters the algorithm attains an average regret bound of $O(T^{-\frac{\alpha}{n}})$ for optimizing a H\"older continuous target function with H\"older exponent $\alpha$ in an $n$-dimensional space within a given time horizon $T$. We demonstrate that this bound is minimax optimal.

On the Efficiency of An Election Game of Two or More Parties: How Bad Can It Be?

Authors: Chuang-Chieh Lin, Chi-Jen Lu, Po-An Chen

We extend our previous work on two-party election competition [Lin, Lu & Chen 2021] to the setting of three or more parties. An election campaign among two or more parties is viewed as a game of two or more players. Each of them has its own candidates as the pure strategies to play. People, as voters, comprise supporters for each party, and a candidate brings utility for the the supporters of each party. Each player nominates exactly one of its candidates to compete against the other party's. \emph{A candidate is assumed to win the election with higher odds if it brings more utility for all the people.} The payoff of each player is the expected utility its supporters get. The game is egoistic if every candidate benefits her party's supporters more than any candidate from the competing party does. In this work, we first argue that the election game always has a pure Nash equilibrium when the winner is chosen by the hardmax function, while there exist game instances in the three-party election game such that no pure Nash equilibrium exists even the game is egoistic. Next, we propose two sufficient conditions for the egoistic election game to have a pure Nash equilibrium. Based on these conditions, we propose a fixed-parameter tractable algorithm to compute a pure Nash equilibrium of the egoistic election game. Finally, perhaps surprisingly, we show that the price of anarchy of the egoistic election game is upper bounded by the number of parties. Our findings suggest that the election becomes unpredictable when more than two parties are involved and, moreover, the social welfare deteriorates with the number of participating parties in terms of possibly increasing price of anarchy. This work alternatively explains why the two-party system is prevalent in democratic countries.

Authors: Chuang-Chieh Lin, Chi-Jen Lu, Po-An Chen

We extend our previous work on two-party election competition [Lin, Lu & Chen 2021] to the setting of three or more parties. An election campaign among two or more parties is viewed as a game of two or more players. Each of them has its own candidates as the pure strategies to play. People, as voters, comprise supporters for each party, and a candidate brings utility for the the supporters of each party. Each player nominates exactly one of its candidates to compete against the other party's. \emph{A candidate is assumed to win the election with higher odds if it brings more utility for all the people.} The payoff of each player is the expected utility its supporters get. The game is egoistic if every candidate benefits her party's supporters more than any candidate from the competing party does. In this work, we first argue that the election game always has a pure Nash equilibrium when the winner is chosen by the hardmax function, while there exist game instances in the three-party election game such that no pure Nash equilibrium exists even the game is egoistic. Next, we propose two sufficient conditions for the egoistic election game to have a pure Nash equilibrium. Based on these conditions, we propose a fixed-parameter tractable algorithm to compute a pure Nash equilibrium of the egoistic election game. Finally, perhaps surprisingly, we show that the price of anarchy of the egoistic election game is upper bounded by the number of parties. Our findings suggest that the election becomes unpredictable when more than two parties are involved and, moreover, the social welfare deteriorates with the number of participating parties in terms of possibly increasing price of anarchy. This work alternatively explains why the two-party system is prevalent in democratic countries.

The limited-memory recursive variational Gaussian approximation (L-RVGA)

Authors: Marc Lambert (DGA, SIERRA), Silvère Bonnabel (CAOR, ISEA), Francis Bach (LIENS, SIERRA)

We consider the problem of computing a Gaussian approximation to the posterior distribution of a parameter given a large number N of observations and a Gaussian prior, when the dimension of the parameter d is also large. To address this problem we build on a recently introduced recursive algorithm for variational Gaussian approximation of the posterior, called recursive variational Gaussian approximation (RVGA), which is a single pass algorithm, free of parameter tuning. In this paper, we consider the case where the parameter dimension d is high, and we propose a novel version of RVGA that scales linearly in the dimension d (as well as in the number of observations N), and which only requires linear storage capacity in d. This is afforded by the use of a novel recursive expectation maximization (EM) algorithm applied for factor analysis introduced herein, to approximate at each step the covariance matrix of the Gaussian distribution conveying the uncertainty in the parameter. The approach is successfully illustrated on the problems of high dimensional least-squares and logistic regression, and generalized to a large class of nonlinear models.

Authors: Marc Lambert (DGA, SIERRA), Silvère Bonnabel (CAOR, ISEA), Francis Bach (LIENS, SIERRA)

We consider the problem of computing a Gaussian approximation to the posterior distribution of a parameter given a large number N of observations and a Gaussian prior, when the dimension of the parameter d is also large. To address this problem we build on a recently introduced recursive algorithm for variational Gaussian approximation of the posterior, called recursive variational Gaussian approximation (RVGA), which is a single pass algorithm, free of parameter tuning. In this paper, we consider the case where the parameter dimension d is high, and we propose a novel version of RVGA that scales linearly in the dimension d (as well as in the number of observations N), and which only requires linear storage capacity in d. This is afforded by the use of a novel recursive expectation maximization (EM) algorithm applied for factor analysis introduced herein, to approximate at each step the covariance matrix of the Gaussian distribution conveying the uncertainty in the parameter. The approach is successfully illustrated on the problems of high dimensional least-squares and logistic regression, and generalized to a large class of nonlinear models.

Exact Short Products From Truncated Multipliers

Authors: Daniel Lemire

We sometimes need to compute the most significant digits of the product of small integers with a multiplier requiring much storage: e.g., a large integer (e.g., $5^{100}$) or an irrational number ($\pi$). We only need to access the most significant digits of the multiplier-as long as the integers are sufficiently small. We provide an efficient algorithm to compute the range of integers given a truncated multiplier and a desired number of digits.

Authors: Daniel Lemire

We sometimes need to compute the most significant digits of the product of small integers with a multiplier requiring much storage: e.g., a large integer (e.g., $5^{100}$) or an irrational number ($\pi$). We only need to access the most significant digits of the multiplier-as long as the integers are sufficiently small. We provide an efficient algorithm to compute the range of integers given a truncated multiplier and a desired number of digits.

Orbits, schemes and dynamic programming procedures for the TSP 4-OPT neighborhood

Authors: Giuseppe Lancia, Marcello Dalpasso

We discuss the way to group all 25 possible 4-OPT moves into 7 orbits of equivalent moves. We then describe two implementations, one for a $\Theta(n^3)$ algorithm by de Berg's et al. and one of a $\Theta(n^2)$ algorithm by Glover, for finding the best 4-OPT move via dynamic programming.

Authors: Giuseppe Lancia, Marcello Dalpasso

We discuss the way to group all 25 possible 4-OPT moves into 7 orbits of equivalent moves. We then describe two implementations, one for a $\Theta(n^3)$ algorithm by de Berg's et al. and one of a $\Theta(n^2)$ algorithm by Glover, for finding the best 4-OPT move via dynamic programming.

A Survey on the Densest Subgraph Problem and its Variants

Authors: Tommaso Lanciano, Atsushi Miyauchi, Adriano Fazzone, Francesco Bonchi

The Densest Subgraph Problem requires to find, in a given graph, a subset of vertices whose induced subgraph maximizes a measure of density. The problem has received a great deal of attention in the algorithmic literature over the last five decades, with many variants proposed and many applications built on top of this basic definition. Recent years have witnessed a revival of research interest on this problem with several interesting contributions, including some groundbreaking results, published in 2022 and 2023. This survey provides a deep overview of the fundamental results and an exhaustive coverage of the many variants proposed in the literature, with a special attention on the most recent results. The survey also presents a comprehensive overview of applications and discusses some interesting open problems for this evergreen research topic.

The Densest Subgraph Problem requires to find, in a given graph, a subset of vertices whose induced subgraph maximizes a measure of density. The problem has received a great deal of attention in the algorithmic literature over the last five decades, with many variants proposed and many applications built on top of this basic definition. Recent years have witnessed a revival of research interest on this problem with several interesting contributions, including some groundbreaking results, published in 2022 and 2023. This survey provides a deep overview of the fundamental results and an exhaustive coverage of the many variants proposed in the literature, with a special attention on the most recent results. The survey also presents a comprehensive overview of applications and discusses some interesting open problems for this evergreen research topic.

Monday, March 27

Introducing Bocconi’s new M.Sc. in Artificial Intelligence

from Luca Trevisan

This September, Bocconi will start a new M.Sc. in Artificial Intelligence. It will be a two-year computer science degree meant for students with Bachelor degrees in computer science, engineering, math, statistics, physics and related quantitative fields. In the first year, … Continue reading →

This September, Bocconi will start a new M.Sc. in Artificial Intelligence. It will be a two-year computer science degree meant for students with Bachelor degrees in computer science, engineering, math, statistics, physics and related quantitative fields.

In the first year, courses on algorithms, mathematical methods, optimization, information theory, and software engineering will build a foundation in math and CS, then courses on deep learning, reinforcement learning, natural language processing and computer vision and image processing will go in depth on machine learning and some of its applications. In the second year there are various options and elective courses, with the possibility to study, for example, cryptography and blockchains, or bio-medical applications. As common for the second year of Bocconi’s M.Sc. degrees, there will be options for exchange programs to spend a semester abroad. Students also take a seminar on ethics in AI, a project-oriented AI lab, and a foreign language (not English and not the student’s native language) course. The language of instruction is English.

Tomorrow at 5pm CET there will be an online information session: those interested can sign up here.

Applications open today and are due by May 25th.

By luca

TR23-036 | Derandomization with Minimal Memory Footprint | Dean Doron, Roei Tell

from ECCC Papers

Existing proofs that deduce BPL=L from circuit lower bounds convert randomized algorithms into deterministic algorithms with large constant overhead in space. We study space-bounded derandomization with minimal footprint, and ask what is the minimal possible space overhead for derandomization. We show that $BPSPACE[S] \subseteq DSPACE[c \cdot S]$ for $c \approx 2$, assuming space-efficient cryptographic PRGs, and, either: (1) lower bounds against bounded-space algorithms with advice, or: (2) lower bounds against certain uniform compression algorithms. Under additional assumptions regarding the power of catalytic computation, in a new setting of parameters that was not studied before, we are even able to get $c \approx 1$. Our results are constructive: Given a candidate hard function (and a candidate cryptographic PRG) we show how to transform the randomized algorithm into an efficient deterministic one. This follows from new PRGs and targeted PRGs for space-bounded algorithms, which we combine with novel space-efficient evaluation methods. A central ingredient in all our constructions is hardness amplification reductions in logspace-uniform $TC^0$, that were not known before.
Existing proofs that deduce BPL=L from circuit lower bounds convert randomized algorithms into deterministic algorithms with large constant overhead in space. We study space-bounded derandomization with minimal footprint, and ask what is the minimal possible space overhead for derandomization. We show that $BPSPACE[S] \subseteq DSPACE[c \cdot S]$ for $c \approx 2$, assuming space-efficient cryptographic PRGs, and, either: (1) lower bounds against bounded-space algorithms with advice, or: (2) lower bounds against certain uniform compression algorithms. Under additional assumptions regarding the power of catalytic computation, in a new setting of parameters that was not studied before, we are even able to get $c \approx 1$. Our results are constructive: Given a candidate hard function (and a candidate cryptographic PRG) we show how to transform the randomized algorithm into an efficient deterministic one. This follows from new PRGs and targeted PRGs for space-bounded algorithms, which we combine with novel space-efficient evaluation methods. A central ingredient in all our constructions is hardness amplification reductions in logspace-uniform $TC^0$, that were not known before.

PhD student at Jönköping university (apply by April 17, 2023)

from CCI: jobs

Jönköping University (JU) advertises one position as PhD student in Computer Science, in collaboration with the theoretical computer science laboratory, Linköping university (LiU). The PhD student will join a research project in the intersection of artificial intelligence, complexity theory, and universal algebra, directed by Johannes Schmidt (JU) and Victor Lagerkvist (LiU). Website: ju.varbi.com/se/what:job/jobID:601050/where:4/ Email: victor.lagerkvist@liu.se

Jönköping University (JU) advertises one position as PhD student in Computer Science, in collaboration with the theoretical computer science laboratory, Linköping university (LiU). The PhD student will join a research project in the intersection of artificial intelligence, complexity theory, and universal algebra, directed by Johannes Schmidt (JU) and Victor Lagerkvist (LiU).

Website: https://ju.varbi.com/se/what:job/jobID:601050/where:4/
Email: victor.lagerkvist@liu.se

By shacharlovett

Critical Times in Israel: Last Night’s Demonstrations

from Gil Kalai

Last night, the demonstrations in Israel regarding the “judicial reforms” escalated after the prime minister Netanyahu fired the defense minister Galant who called to stop the legislation. My wife and I enjoyed a concert and after it we heard loud … Continue reading →

Last night, the demonstrations in Israel regarding the “judicial reforms” escalated after the prime minister Netanyahu fired the defense minister Galant who called to stop the legislation. My wife and I enjoyed a concert and after it we heard loud calls “Bibi fired Galant” and even “The dictator fired Galant,” and were surprised by the news and the huge groups of young people, very angry for a very good reason. And, of course, we joined them. The picture above provided by Alon Rosen is from a major highway in Tel Aviv that was blocked for 9 hours. Whether the picture is genuine or not, it shows the anarchist nature of at least some of the protestors.  (We know for sure that some computer scientists were there.) I am not sure if a proof of this bold claim was also provided.

By Gil Kalai

Euler Characteristic Tools For Topological Data Analysis

Authors: Olympio Hacquard, Vadim Lebovici

In this article, we study Euler characteristic techniques in topological data analysis. Pointwise computing the Euler characteristic of a family of simplicial complexes built from data gives rise to the so-called Euler characteristic profile. We show that this simple descriptor achieve state-of-the-art performance in supervised tasks at a very low computational cost. Inspired by signal analysis, we compute hybrid transforms of Euler characteristic profiles. These integral transforms mix Euler characteristic techniques with Lebesgue integration to provide highly efficient compressors of topological signals. As a consequence, they show remarkable performances in unsupervised settings. On the qualitative side, we provide numerous heuristics on the topological and geometric information captured by Euler profiles and their hybrid transforms. Finally, we prove stability results for these descriptors as well as asymptotic guarantees in random settings.

Authors: Olympio Hacquard, Vadim Lebovici

In this article, we study Euler characteristic techniques in topological data analysis. Pointwise computing the Euler characteristic of a family of simplicial complexes built from data gives rise to the so-called Euler characteristic profile. We show that this simple descriptor achieve state-of-the-art performance in supervised tasks at a very low computational cost. Inspired by signal analysis, we compute hybrid transforms of Euler characteristic profiles. These integral transforms mix Euler characteristic techniques with Lebesgue integration to provide highly efficient compressors of topological signals. As a consequence, they show remarkable performances in unsupervised settings. On the qualitative side, we provide numerous heuristics on the topological and geometric information captured by Euler profiles and their hybrid transforms. Finally, we prove stability results for these descriptors as well as asymptotic guarantees in random settings.

Motion Planning for Triple-Axis Spectrometers

Authors: Tobias Weber

We present the free and open source software TAS-Paths, a novel system which calculates optimal, collision-free paths for the movement of triple-axis spectrometers. The software features an easy to use graphical user interface, but can also be scripted and used as a library. It allows the user to plan and visualise the motion of the instrument before the experiment and can be used during measurements to circumvent obstacles. The instrument path is calculated in angular configuration space in order to keep a maximum angular distance from any obstacle.

Authors: Tobias Weber

We present the free and open source software TAS-Paths, a novel system which calculates optimal, collision-free paths for the movement of triple-axis spectrometers. The software features an easy to use graphical user interface, but can also be scripted and used as a library. It allows the user to plan and visualise the motion of the instrument before the experiment and can be used during measurements to circumvent obstacles. The instrument path is calculated in angular configuration space in order to keep a maximum angular distance from any obstacle.

Simplexwise Distance Distributions for finite spaces with metrics and measures

Authors: Vitaliy Kurlin

A finite set of unlabelled points in Euclidean space is the simplest representation of many real objects from mineral rocks to sculptures. Since most solid objects are rigid, their natural equivalence is rigid motion or isometry maintaining all inter-point distances. More generally, any finite metric space is an example of a metric-measure space that has a probability measure and a metric satisfying all axioms.

This paper develops Simplexwise Distance Distributions (SDDs) for any finite metric spaces and metric-measures spaces. These SDDs classify all known non-equivalent spaces that were impossible to distinguish by simpler invariants. We define metrics on SDDs that are Lipschitz continuous and allow exact computations whose parametrised complexities are polynomial in the number of given points.

Authors: Vitaliy Kurlin

A finite set of unlabelled points in Euclidean space is the simplest representation of many real objects from mineral rocks to sculptures. Since most solid objects are rigid, their natural equivalence is rigid motion or isometry maintaining all inter-point distances. More generally, any finite metric space is an example of a metric-measure space that has a probability measure and a metric satisfying all axioms.

This paper develops Simplexwise Distance Distributions (SDDs) for any finite metric spaces and metric-measures spaces. These SDDs classify all known non-equivalent spaces that were impossible to distinguish by simpler invariants. We define metrics on SDDs that are Lipschitz continuous and allow exact computations whose parametrised complexities are polynomial in the number of given points.

Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback

Authors: Mohammad Pedramfar, Vaneet Aggarwal

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other words, the delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components. Three models of delayed feedback: bounded adversarial, stochastic independent, and stochastic conditionally independent are studied, and regret bounds are derived for each of the delay models. Ignoring the problem dependent parameters, we show that regret bound for all the delay models is $\tilde{O}(T^{2/3} + T^{1/3} \nu)$ for time horizon $T$, where $\nu$ is a delay parameter defined differently in the three cases, thus demonstrating an additive term in regret with delay in all the three delay models. The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback.

Authors: Mohammad Pedramfar, Vaneet Aggarwal

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other words, the delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components. Three models of delayed feedback: bounded adversarial, stochastic independent, and stochastic conditionally independent are studied, and regret bounds are derived for each of the delay models. Ignoring the problem dependent parameters, we show that regret bound for all the delay models is $\tilde{O}(T^{2/3} + T^{1/3} \nu)$ for time horizon $T$, where $\nu$ is a delay parameter defined differently in the three cases, thus demonstrating an additive term in regret with delay in all the three delay models. The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback.

Differentially Private Synthetic Control

Authors: Saeyoung Rho, Rachel Cummings, Vishal Misra

Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool ) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit, and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as providing guidance to practitioners for hyperparameter tuning.

Authors: Saeyoung Rho, Rachel Cummings, Vishal Misra

Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool ) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit, and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as providing guidance to practitioners for hyperparameter tuning.

Distributed Silhouette Algorithm: Evaluating Clustering on Big Data

Authors: Marco Gaido

In the big data era, the key feature that each algorithm needs to have is the possibility of efficiently running in parallel in a distributed environment. The popular Silhouette metric to evaluate the quality of a clustering, unfortunately, does not have this property and has a quadratic computational complexity with respect to the size of the input dataset. For this reason, its execution has been hindered in big data scenarios, where clustering had to be evaluated otherwise. To fill this gap, in this paper we introduce the first algorithm that computes the Silhouette metric with linear complexity and can easily execute in parallel in a distributed environment. Its implementation is freely available in the Apache Spark ML library.

Authors: Marco Gaido

In the big data era, the key feature that each algorithm needs to have is the possibility of efficiently running in parallel in a distributed environment. The popular Silhouette metric to evaluate the quality of a clustering, unfortunately, does not have this property and has a quadratic computational complexity with respect to the size of the input dataset. For this reason, its execution has been hindered in big data scenarios, where clustering had to be evaluated otherwise. To fill this gap, in this paper we introduce the first algorithm that computes the Silhouette metric with linear complexity and can easily execute in parallel in a distributed environment. Its implementation is freely available in the Apache Spark ML library.

TRAK-ing Model Behavior with Data

Homepage    Code    Paper

In our latest paper, we revisit the problem of data attribution and introduce TRAK—a scalable and effective method for attributing machine learning predictions to training data. TRAK achieves dramatically better speed-efficacy tradeoffs than prior methods, allowing us to apply it across a variety of settings, from image classifiers to language models.

As machine learning models become more capable (and simultaneously, more complex and opaque), the question of “why did my model make this prediction?” is becoming increasingly important. And the key to answering this question often lies within the data that the model was trained on.

Consequently, there’s been a lot of interest in (and work on) data attribution: how do you attribute a given prediction back to the individual data points in the training set?

Data attribution methods help us understand how the composition of the training data impacts model predictions. In particular, these methods have been shown to be useful for variety of downstream applications, such as identifying brittle model predictions, assigning data valuations, detecting mislabeled data, and comparing learning algorithms, among many others.

What is a data attribution method, exactly? One way to define a data attribution method is as a function that takes in a specific example $z$ and outputs a set of scores (one per training example) that denote the “importance” of each training example to a model’s prediction on $z$. In the context of image classification model, an example output of a data attribution method could look like this:

Prior works on data attribution, such as influence functions, Shapley values, empirical influences, and datamodels, all fit naturally into this definition.

Now, given all these data attribution methods and their various uses, what’s the need for yet another one?

Two desiderata for data attribution

There are two natural dimensions along which we’d want to evaluate data attribution methods: efficacy and speed.

• Efficacy: We want methods that provide data attributions that are reliable and faithful to the model’s decision process. (We’ll make this more precise soon.)
• Speed: At the same time, we’d like methods that are going to scale. Indeed, the ever-increasing scale of models and datasets makes only data attribution methods with a (very) modest dependence on model and dataset size practical.

Evaluating speed is pretty straightforward: we can just use wall-clock time In our paper, we also consider an implementation-agnostic metric that still leads to the same conclusions. on a single A100 as a measure of speed.

Evaluating efficacy, on the other hand, turns out to be somewhat tricky. Indeed, when we look at prior works, there is no unifying metric, and many works rely on qualitative heuristics such as visual inspection or proxies such as the recall rate for identifying mislabelled examples.

So how should one evaluate the efficacy of data attribution methods? Well, Inspired by our earlier work on this topic, we adopt the perspective that the overarching goal of data attribution is to make accurate counterfactual predictions: e.g., answers to questions of the form “what would happen to this prediction if a given set images were removed from the training set”?

To capture this intuition, in our paper, we propose measuring efficacy with a new metric that we call linear datamodeling score (or LDS), which measures this predictiveness as a single value between zero and one Intuitively, when LDS=1, it means that the corresponding method can perfectly predict model behavior given only the examples are included in the training set. .

Efficacy vs. Speed tradeoffs

With the above metrics in hand, let’s take a look at existing data attribution methods:

As we can see, attaining both efficacy and speed is a challenge for all data attribution methods that we evaluated—there’s nothing in the top left corner!

Indeed, on one hand, resampling-based methods (the green and purple points in the top right) are very effective, but also very costly. After all, these method requires re-training the model many times—and while this might be feasible when we’re studying ResNets on CIFAR-10 or ImageNet, it’s unclear what to do when studying models like CLIP or GPT-N on datasets like Open Images or Common Crawl.

On the other hand, there are methods like influence functions or TracIn that are fast but not very effective (they fall into the bottom left part of the above plot). Fundamentally, this is because the approximations made by these methods don’t quite capture the structure of complex models such as deep neural networks.

The above results raise the question:

Do we really have to pay such a heavy price for accurate data attribution?

TRAK: Effective, efficient data attribution

In our new paper, we introduce TRAK (Tracing with the Randomly-Projected After Kernel), a new data attribution that significantly improves upon existing tradeoffs between efficacy and speed in data attribution.

We’ll leave the math to the paper, but the principle behind TRAK is quite simple: we approximate the model of interest with an instance of kernel regression (using the so-called “after kernel”); we randomly project the corresponding features to a lower-dimensional space; and then we apply (known) heuristics for data attributions in the kernel regression regime to perform data attribution on the original model.

Here is our previous plot with TRAK added to it (note the x axis scale!):

As we can see, TRAK fares pretty well here: it is 100x faster than models of comparable efficacy, and (at least) 10x more effective than models of comparable speed. As an example: for BERT models finetuned on the QNLI dataset, TRAK scores computed in just 17 hours on a single A100 GPU are more effective than re-training based datamodels computed in 125 GPU-days. TRAK is also significantly more effective than comparably fast methods, such as influence function-based methods or TracIn.

Applying TRAK across models and modalities

TRAK applies out-of-the-box to a wide variety of machine learning problems—including across modalities (e.g. language & vision among others), architectures (e.g. CNNs, Transformers, etc) and tasks (classification, contrastive learning, question answering, etc).

In our paper, we apply TRAK to ResNets trained on CIFAR and ImageNet, BERT-base models trained on QNLI, CLIP models trained on MS COCO, and mT5 models fine-tuned on a fact tracing benchmark. In each of these applications, we show that TRAK is significantly more effective at counterfactual prediction than existing baselines.

Next, we’ll take a closer look at two of such applications: CLIP, and (transformer-based) language models—two families of models that have become very popular and useful in the last few years.

Application #1: Attributing CLIP

CLIP (Contrastive Langauge-Image Pre-training) representations are a powerful tool for translating between vision and language domains. State-of-the-art CLIP models are trained on gigantic datasets of image-caption pairs, and properties of the training datasets seem to play a big role in the quality of the resulting representations.

But which training examples actually play the key roles in learning a given image-caption pair association? We can use TRAK to study this question. Below, for a few different image-caption pairs, we show the top training examples identified by TRAK:

♦ ♦

You can see that nearest neighbors identified both by TRAK and by the CLIP representations themselves are semantically similar to the target images. But good attribution is more than just finding similar examples—they must actually be important to how the model makes decisions.

To evaluate this, we first, for each image-caption pair we remove the $k=400$ examples with the most positive TRAK scores; train a new CLIP model on the rest of the training set; and measure the model’s alignment (in terms of cosine similarity) on the image-caption pair being studied. The resulting models are on average 36% less aligned on the corresponding image-caption pairs. Even with removing just $k=50$ examples, you can see that the cosine similarity degrades quite a bit (a 16% drop).

Now, if we do the same but instead remove the most similar examples according to CLIP (or remove similar examples according to TracIn, an existing data attribution method), the resulting drop in cosine similarity is much less profound: compare 36% from TRAK to 11% for CLIP representations and 4% for TracIn scores, respectively.

TRAK thus identifies training examples that are counterfactually importantWhile we only studied the above counterfactual question in our paper, we think that our results open the door to many interesting questions---e.g., Can we find a much smaller subset of LAION-7B that will get us similar embeddings?!

Application #2: TRAK for Model-Faithful Fact Tracing

Next, we look at another application of TRAK that we found to be particularly interesting: the problem of fact tracing. That is, tracing factual assertions made by a language model back to corresponding data sources in the training set expressing those facts.

Specifically, we applied TRAK to a fact-tracing benchmark called FTRACE-TRex. Briefly, the goal of this benchmark is to identify the data sources responsible for a given fact expressed by a large language model.

The test set of FTRACE-TRex is a set of “queries”. Each query pertains to a particular fact, and is annotated with a set of “ground-truth proponent” training examples that express the same fact. So if “Paris is the capital of France” was a test set query, its ground-truth proponents in the training set might include, say,

1. “The capital of France is Paris,”
2. “The beautiful capital city of Paris, France, comes alive at night,”
3. “Paris, France is one of the busiest capital cities in Europe”

The FTRACE-TRex benchmark captures a data attribution method’s ability to surface these ground-truth proponents as the “most important” training examples for their corresponding queries. (That is, a good data attribution method should assign high scores to a query’s ground-truth proponents, relative to the rest of the training set.)

To our surprise, TRAK, while outperforming the previous best data attribution method (TracIn) on this task, did worse on this benchmark than a simple information-retrieval baseline (BM25)—this baseline is based only on the lexical overlap between the query and the possible data sources, and doesn’t even look at the model at all!

To understand the possible roots of TRAK’s underperformance here, we carried out a counterfactual analysis. What we found was that when we removed the training examples that TRAK identified as the most important (despite these being the “wrong” ones according to the FTRACE-TRex benchmark), the model was indeed less likely to express the corresponding facts. More crucially, though, this effect was actually stronger when we removed TRAK-identified examples than when we removed BM25-identified examples, and even stronger than when we removed the ground-truth primary sources!

We’ll leave the details of our experiment to the paper, but our result highlights a difference between “fact tracing” and data attribution (which might be more appropriately called “behavior tracing”). That is, finding a data source that factually supports a language model’s output is a different task than identifying the actual data sources that caused the model to generate that output. One might try to solve the former task with techniques such as information retrieval or web search (to “cite” sources). But the latter task is fundamentally one that cannot be decoupled from the model—and TRAK seems to be a promising approach here.

More broadly, we think that TRAK can be a useful tool in getting a more data-driven understanding of learning mechanisms underlying LLMs.

Attributing with TRAK: The PyTorch API

With the evaluation and examples above, hopefully we made a case that TRAK can be an effective and efficient data attribution method. But what does it take to apply it to your task?

We made sure that using TRAK is as easy as possible. To this end, we’re releasing trak as a fastFor our engineering-oriented reader: this involved writing a custom CUDA kernel for fast random projections, to avoid materializing a random matrix with billions of entries., minimal PyTorch-based API that allows users to compute TRAK scores with just a few lines of code:

We provide built-in TRAK functions for popular applications such as analyzing image classifiers, CLIP models, and text classifiers. The API also makes it easy to adapt TRAK for new, custom tasks.

You can get started with installation here. See our code and documentation, as well as tutorials and examples for more details.

Conclusion

In this post, we considered the problem of data attribution—i.e., tracing the behavior of machine learning models back to their training data. Our new method, TRAK, can provide attributions that are accurate and orders of magnitude more efficient than comparably effective methods.

Prior works have already shown the wide utility and potential of data attribution methods, including explaining predictions, debugging model behavior, assigning data valuations, and filtering or optimizing data. But, the main bottleneck in deploying these methods was their lack of scalability or predictiveness in modern settings. We believe that TRAK significantly reduces that bottleneck and that using TRAK in these downstream applications will accelerate and unlock many new capabilities, particularly in settings where training is very expensive, such as large language models.

We are excited about the possibilities enabled by TRAK. See the paper for more details, or get started right away applying TRAK to your problem with our library!

In our latest paper, we revisit the problem of data attribution and introduce TRAK—a scalable and effective method for attributing machine learning predictions to training data. TRAK achieves dramatically better speed-efficacy tradeoffs than prior methods, allowing us to apply it across a variety of settings, from image classifiers to language models.

As machine learning models become more capable (and simultaneously, more complex and opaque), the question of “why did my model make this prediction?” is becoming increasingly important. And the key to answering this question often lies within the data that the model was trained on.

Consequently, there’s been a lot of interest in (and work on) data attribution: how do you attribute a given prediction back to the individual data points in the training set?

Data attribution methods help us understand how the composition of the training data impacts model predictions. In particular, these methods have been shown to be useful for variety of downstream applications, such as identifying brittle model predictions, assigning data valuations, detecting mislabeled data, and comparing learning algorithms, among many others.

What is a data attribution method, exactly? One way to define a data attribution method is as a function that takes in a specific example $z$ and outputs a set of scores (one per training example) that denote the “importance” of each training example to a model’s prediction on $z$. In the context of image classification model, an example output of a data attribution method could look like this:

Prior works on data attribution, such as influence functions, Shapley values, empirical influences, and datamodels, all fit naturally into this definition.

Now, given all these data attribution methods and their various uses, what’s the need for yet another one?

Two desiderata for data attribution

There are two natural dimensions along which we’d want to evaluate data attribution methods: efficacy and speed.

• Efficacy: We want methods that provide data attributions that are reliable and faithful to the model’s decision process. (We’ll make this more precise soon.)
• Speed: At the same time, we’d like methods that are going to scale. Indeed, the ever-increasing scale of models and datasets makes only data attribution methods with a (very) modest dependence on model and dataset size practical.

Evaluating speed is pretty straightforward: we can just use wall-clock time In our paper, we also consider an implementation-agnostic metric that still leads to the same conclusions. on a single A100 as a measure of speed.

Evaluating efficacy, on the other hand, turns out to be somewhat tricky. Indeed, when we look at prior works, there is no unifying metric, and many works rely on qualitative heuristics such as visual inspection or proxies such as the recall rate for identifying mislabelled examples.

So how should one evaluate the efficacy of data attribution methods? Well, Inspired by our earlier work on this topic, we adopt the perspective that the overarching goal of data attribution is to make accurate counterfactual predictions: e.g., answers to questions of the form “what would happen to this prediction if a given set images were removed from the training set”?

To capture this intuition, in our paper, we propose measuring efficacy with a new metric that we call linear datamodeling score (or LDS), which measures this predictiveness as a single value between zero and one Intuitively, when LDS=1, it means that the corresponding method can perfectly predict model behavior given only the examples are included in the training set. .

Efficacy vs. Speed tradeoffs

With the above metrics in hand, let’s take a look at existing data attribution methods:

As we can see, attaining both efficacy and speed is a challenge for all data attribution methods that we evaluated—there’s nothing in the top left corner!

Indeed, on one hand, resampling-based methods (the green and purple points in the top right) are very effective, but also very costly. After all, these method requires re-training the model many times—and while this might be feasible when we’re studying ResNets on CIFAR-10 or ImageNet, it’s unclear what to do when studying models like CLIP or GPT-N on datasets like Open Images or Common Crawl.

On the other hand, there are methods like influence functions or TracIn that are fast but not very effective (they fall into the bottom left part of the above plot). Fundamentally, this is because the approximations made by these methods don’t quite capture the structure of complex models such as deep neural networks.

The above results raise the question:

Do we really have to pay such a heavy price for accurate data attribution?

TRAK: Effective, efficient data attribution

In our new paper, we introduce TRAK (Tracing with the Randomly-Projected After Kernel), a new data attribution that significantly improves upon existing tradeoffs between efficacy and speed in data attribution.

We’ll leave the math to the paper, but the principle behind TRAK is quite simple: we approximate the model of interest with an instance of kernel regression (using the so-called “after kernel”); we randomly project the corresponding features to a lower-dimensional space; and then we apply (known) heuristics for data attributions in the kernel regression regime to perform data attribution on the original model.

Here is our previous plot with TRAK added to it (note the x axis scale!):

As we can see, TRAK fares pretty well here: it is 100x faster than models of comparable efficacy, and (at least) 10x more effective than models of comparable speed. As an example: for BERT models finetuned on the QNLI dataset, TRAK scores computed in just 17 hours on a single A100 GPU are more effective than re-training based datamodels computed in 125 GPU-days. TRAK is also significantly more effective than comparably fast methods, such as influence function-based methods or TracIn.

Applying TRAK across models and modalities

TRAK applies out-of-the-box to a wide variety of machine learning problems—including across modalities (e.g. language & vision among others), architectures (e.g. CNNs, Transformers, etc) and tasks (classification, contrastive learning, question answering, etc).

In our paper, we apply TRAK to ResNets trained on CIFAR and ImageNet, BERT-base models trained on QNLI, CLIP models trained on MS COCO, and mT5 models fine-tuned on a fact tracing benchmark. In each of these applications, we show that TRAK is significantly more effective at counterfactual prediction than existing baselines.

Next, we’ll take a closer look at two of such applications: CLIP, and (transformer-based) language models—two families of models that have become very popular and useful in the last few years.

Application #1: Attributing CLIP

CLIP (Contrastive Langauge-Image Pre-training) representations are a powerful tool for translating between vision and language domains. State-of-the-art CLIP models are trained on gigantic datasets of image-caption pairs, and properties of the training datasets seem to play a big role in the quality of the resulting representations.

But which training examples actually play the key roles in learning a given image-caption pair association? We can use TRAK to study this question. Below, for a few different image-caption pairs, we show the top training examples identified by TRAK:

You can see that nearest neighbors identified both by TRAK and by the CLIP representations themselves are semantically similar to the target images. But good attribution is more than just finding similar examples—they must actually be important to how the model makes decisions.

To evaluate this, we first, for each image-caption pair we remove the $k=400$ examples with the most positive TRAK scores; train a new CLIP model on the rest of the training set; and measure the model’s alignment (in terms of cosine similarity) on the image-caption pair being studied. The resulting models are on average 36% less aligned on the corresponding image-caption pairs. Even with removing just $k=50$ examples, you can see that the cosine similarity degrades quite a bit (a 16% drop).

Now, if we do the same but instead remove the most similar examples according to CLIP (or remove similar examples according to TracIn, an existing data attribution method), the resulting drop in cosine similarity is much less profound: compare 36% from TRAK to 11% for CLIP representations and 4% for TracIn scores, respectively.

TRAK thus identifies training examples that are counterfactually importantWhile we only studied the above counterfactual question in our paper, we think that our results open the door to many interesting questions---e.g., Can we find a much smaller subset of LAION-7B that will get us similar embeddings?!

Application #2: TRAK for Model-Faithful Fact Tracing

Next, we look at another application of TRAK that we found to be particularly interesting: the problem of fact tracing. That is, tracing factual assertions made by a language model back to corresponding data sources in the training set expressing those facts.

Specifically, we applied TRAK to a fact-tracing benchmark called FTRACE-TRex. Briefly, the goal of this benchmark is to identify the data sources responsible for a given fact expressed by a large language model.

The test set of FTRACE-TRex is a set of “queries”. Each query pertains to a particular fact, and is annotated with a set of “ground-truth proponent” training examples that express the same fact. So if “Paris is the capital of France” was a test set query, its ground-truth proponents in the training set might include, say,

1. “The capital of France is Paris,”
2. “The beautiful capital city of Paris, France, comes alive at night,”
3. “Paris, France is one of the busiest capital cities in Europe”

The FTRACE-TRex benchmark captures a data attribution method’s ability to surface these ground-truth proponents as the “most important” training examples for their corresponding queries. (That is, a good data attribution method should assign high scores to a query’s ground-truth proponents, relative to the rest of the training set.)

To our surprise, TRAK, while outperforming the previous best data attribution method (TracIn) on this task, did worse on this benchmark than a simple information-retrieval baseline (BM25)—this baseline is based only on the lexical overlap between the query and the possible data sources, and doesn’t even look at the model at all!

To understand the possible roots of TRAK’s underperformance here, we carried out a counterfactual analysis. What we found was that when we removed the training examples that TRAK identified as the most important (despite these being the “wrong” ones according to the FTRACE-TRex benchmark), the model was indeed less likely to express the corresponding facts. More crucially, though, this effect was actually stronger when we removed TRAK-identified examples than when we removed BM25-identified examples, and even stronger than when we removed the ground-truth primary sources!

We’ll leave the details of our experiment to the paper, but our result highlights a difference between “fact tracing” and data attribution (which might be more appropriately called “behavior tracing”). That is, finding a data source that factually supports a language model’s output is a different task than identifying the actual data sources that caused the model to generate that output. One might try to solve the former task with techniques such as information retrieval or web search (to “cite” sources). But the latter task is fundamentally one that cannot be decoupled from the model—and TRAK seems to be a promising approach here.

More broadly, we think that TRAK can be a useful tool in getting a more data-driven understanding of learning mechanisms underlying LLMs.

Attributing with TRAK: The PyTorch API

With the evaluation and examples above, hopefully we made a case that TRAK can be an effective and efficient data attribution method. But what does it take to apply it to your task?

We made sure that using TRAK is as easy as possible. To this end, we’re releasing trak as a fastFor our engineering-oriented reader: this involved writing a custom CUDA kernel for fast random projections, to avoid materializing a random matrix with billions of entries., minimal PyTorch-based API that allows users to compute TRAK scores with just a few lines of code:

We provide built-in TRAK functions for popular applications such as analyzing image classifiers, CLIP models, and text classifiers. The API also makes it easy to adapt TRAK for new, custom tasks.

You can get started with installation here. See our code and documentation, as well as tutorials and examples for more details.

Conclusion

In this post, we considered the problem of data attribution—i.e., tracing the behavior of machine learning models back to their training data. Our new method, TRAK, can provide attributions that are accurate and orders of magnitude more efficient than comparably effective methods.

Prior works have already shown the wide utility and potential of data attribution methods, including explaining predictions, debugging model behavior, assigning data valuations, and filtering or optimizing data. But, the main bottleneck in deploying these methods was their lack of scalability or predictiveness in modern settings. We believe that TRAK significantly reduces that bottleneck and that using TRAK in these downstream applications will accelerate and unlock many new capabilities, particularly in settings where training is very expensive, such as large language models.

We are excited about the possibilities enabled by TRAK. See the paper for more details, or get started right away applying TRAK to your problem with our library!

Sunday, March 26

William Wulf, 1939–2023

from Richard Lipton

A great teacher and a great leader Bill Wulf just passed away. We send our best thoughts to his dear wife Anita Jones and the rest of his family. He is greatly missed. From the UVa

A great teacher and a great leader

Bill Wulf just passed away. We send our best thoughts to his dear wife Anita Jones and the rest of his family. He is greatly missed.

From the UVa obit: This image was algorithmically composed of more than 2,000 photos to illustrate Wulf’s influence on computer science, engineering and the University of Virginia.

CMU

Wulf started his great career as a faculty at CMU in 1968. He earned the first Ph.D. from the University of Virginia in the then newly founded Department of Applied Mathematics and Computer Science.

I met him after I started at CMU as a graduate student. He taught one of the required classes that I took. I still recall his ability to lecture on systems—an area that I did not expect to work in—it is not theory. But I recall his ability to explain and get us to be excited about systems. Thanks Bill.

Beyond CMU

Wulf had several impacts on CS. In 1970, when compiler technology was arguably only a dozen years old, he pioneered methods of optimizing the object code. He wrote a book about this titled, The Design of an Optimizing Compiler. He and Anita created the startup Tartan Laboratories with John Nestor to channel compiler optimization, especially for the then-new Ada programming language. Guy Steele is among several notable former employees of Tartan.

Wulf was elected to NAE in 1993 “for professional leadership and for contributions to programming systems and computer architecture”. He then served as NAE president from 1996 to 2007. He helped guide the NAE and get it redirected and made it a better organization.

He also did a famous job of resigning from the University of Virginia to protest the recent conduct of the UVa Board of Visitors in removing President Teresa Sullivan.

See this for the original news and his appeal, I urge my fellow faculty to join me. The links from that page are unrecoverable, but see this article in the Washington Post, which quotes his entire statement on why he refused to change his mind even after Sullivan was reinstated.

Open Problems

Again we thanks Wulf for his work that helped make CS a better place. He will be missed.

By rjlipton

The SIGACT Book Review column list of books it wants reviewed

I am posting this for Nick Tran who is the current SIGACT Book Review Editor (before him it was Fred Green for about 6 years, and before him it was me (Bill Gasarch) for 18 years. Nobody should have the job for more than 6 years. No TV show should to on more than 6 years. The 6-year rule probably has other applications.)

Nick asked me to post the list of books that need reviewing. It is most of this post.

If you spot one you want to review then email him (email address later)  the name of the  book you want to review and your postal address so he can send it to you or have it sent to you. Here are his specs:

Reviews of recently published or bucket-list books of interest to the TCS community are welcome. Manuscripts (NOTE FROM BILL manuscripts'? Really? Sounds like the kind of thing you would FAX or postal mail) should be between 3 and 6 pages and include a brief introduction, a detailed content summary, an assessment of the work, and a recommendation to the book's targeted audience.

Nick's email is ntran@scu.edu

The books are:

ALGORITHMS

Knebl, H. (2020).  Algorithms and Data Structures: Foundations and Probabilistic Methods for Design and Analysis. Springer.

Roughgarden, T. (2022). Algorithms Illuminated: Omnibus Edition. Cambridge University Press.

MISCELLANEOUS COMPUTER SCIENCE

Amaral Turkman, M., Paulino, C., & Müller, P. (2019). Computational Bayesian Statistics: An Introduction (Institute of Mathematical Statistics Textbooks). Cambridge University Press.

Nakajima, S., Watanabe, K., & Sugiyama, M. (2019). Variational Bayesian Learning Theory. Cambridge University Press.

Hidary, J. D. (2021). Quantum Computing: An Applied Approach (2nd ed.). Springer.

Apt, K. R., & Hoare, T. (Eds.). (2022). Edsger Wybe Dijkstra: His Life, Work, and Legacy (ACM Books). Morgan & Claypool.

Burton, E., Goldsmith, J., Mattei, N., Siler, C., & Swiatek, S. (2023). Computing and Technology Ethics: Engaging through Science Fiction. The MIT Press.

DISCRETE MATHEMATICS AND COMPUTING

O’Regan, G. (2020). Mathematics in Computing: An Accessible Guide to Historical, Foundational and Application Contexts. Springer Publishing.

Rosenberg, A. L., & Trystram, D. (2020). Understand Mathematics, Understand Computing: Discrete Mathematics That All Computing Students Should Know. Springer Publishing.

Liben-Nowell, D. (2022). Connecting Discrete Mathematics and Computer Science (2nd ed.). Cambridge University Press.

CRYPTOGRAPHY AND SECURITY

Oorschot, P. . C. (2020). Computer Security and the Internet: Tools and Jewels (Information Security and Cryptography). Springer.

COMBINATORICS AND GRAPH THEORY

Golumbic, M. C., & Sainte-Laguë, A. (2021). The Zeroth Book of Graph Theory: An Annotated Translation of Les Réseaux (ou Graphes)—André Sainte-Laguë (1926) (Lecture Notes in Mathematics). Springer.

Beineke, L., Golumbic, M., & Wilson, R. (Eds.). (2021). Topics in Algorithmic Graph Theory (Encyclopedia of Mathematics and its Applications).  Cambridge University Press.

PROGRAMMING ETC.

Nielson, F., & Nielson, R. H. (2019). Formal Methods: An Appetizer. Springer.

Sanders, P., Mehlhorn, K., Dietzfelbinger, M., & Dementiev, R. (2019). Sequential and Parallel Algorithms and Data Structures: The Basic Toolbox. Springer.

MISCELLANEOUS MATHEMATICS

Kurgalin, S., & Borzunov, S. (2022). Algebra and Geometry with Python. Springer.

By gasarch

I am posting this for Nick Tran who is the current SIGACT Book Review Editor (before him it was Fred Green for about 6 years, and before him it was me (Bill Gasarch) for 18 years. Nobody should have the job for more than 6 years. No TV show should to on more than 6 years. The 6-year rule probably has other applications.)

Nick asked me to post the list of books that need reviewing. It is most of this post.

If you spot one you want to review then email him (email address later)  the name of the  book you want to review and your postal address so he can send it to you or have it sent to you. Here are his specs:

Reviews of recently published or bucket-list books of interest to the TCS community are welcome. Manuscripts (NOTE FROM BILL manuscripts'? Really? Sounds like the kind of thing you would FAX or postal mail) should be between 3 and 6 pages and include a brief introduction, a detailed content summary, an assessment of the work, and a recommendation to the book's targeted audience.

Nick's email is ntran@scu.edu

The books are:

ALGORITHMS

Knebl, H. (2020).  Algorithms and Data Structures: Foundations and Probabilistic Methods for Design and Analysis. Springer.

Roughgarden, T. (2022). Algorithms Illuminated: Omnibus Edition. Cambridge University Press.

MISCELLANEOUS COMPUTER SCIENCE

Amaral Turkman, M., Paulino, C., & Müller, P. (2019). Computational Bayesian Statistics: An Introduction (Institute of Mathematical Statistics Textbooks). Cambridge University Press.

Nakajima, S., Watanabe, K., & Sugiyama, M. (2019). Variational Bayesian Learning Theory. Cambridge University Press.

Hidary, J. D. (2021). Quantum Computing: An Applied Approach (2nd ed.). Springer.

Apt, K. R., & Hoare, T. (Eds.). (2022). Edsger Wybe Dijkstra: His Life, Work, and Legacy (ACM Books). Morgan & Claypool.

Burton, E., Goldsmith, J., Mattei, N., Siler, C., & Swiatek, S. (2023). Computing and Technology Ethics: Engaging through Science Fiction. The MIT Press.

DISCRETE MATHEMATICS AND COMPUTING

O’Regan, G. (2020). Mathematics in Computing: An Accessible Guide to Historical, Foundational and Application Contexts. Springer Publishing.

Rosenberg, A. L., & Trystram, D. (2020). Understand Mathematics, Understand Computing: Discrete Mathematics That All Computing Students Should Know. Springer Publishing.

Liben-Nowell, D. (2022). Connecting Discrete Mathematics and Computer Science (2nd ed.). Cambridge University Press.

CRYPTOGRAPHY AND SECURITY

Oorschot, P. . C. (2020). Computer Security and the Internet: Tools and Jewels (Information Security and Cryptography). Springer.

COMBINATORICS AND GRAPH THEORY

Golumbic, M. C., & Sainte-Laguë, A. (2021). The Zeroth Book of Graph Theory: An Annotated Translation of Les Réseaux (ou Graphes)—André Sainte-Laguë (1926) (Lecture Notes in Mathematics). Springer.

Beineke, L., Golumbic, M., & Wilson, R. (Eds.). (2021). Topics in Algorithmic Graph Theory (Encyclopedia of Mathematics and its Applications).  Cambridge University Press.

PROGRAMMING ETC.

Nielson, F., & Nielson, R. H. (2019). Formal Methods: An Appetizer. Springer.

Sanders, P., Mehlhorn, K., Dietzfelbinger, M., & Dementiev, R. (2019). Sequential and Parallel Algorithms and Data Structures: The Basic Toolbox. Springer.

MISCELLANEOUS MATHEMATICS

Kurgalin, S., & Borzunov, S. (2022). Algebra and Geometry with Python. Springer.

By gasarch

TR23-035 | A Duality Between One-Way Functions and Average-Case Symmetry of Information | Shuichi Hirahara, Rahul Ilango, Zhenjian Lu, Mikito Nanashima, Igor Carboni Oliveira

from ECCC Papers

Symmetry of Information (SoI) is a fundamental property of Kolmogorov complexity that relates the complexity of a pair of strings and their conditional complexities. Understanding if this property holds in the time-bounded setting is a longstanding open problem. In the nineties, Longpré and Mocas (1993) and Longpré and Watanabe (1995) established that if SoI holds for time-bounded Kolmogorov complexity then cryptographic one-way functions do not exist, and asked if a converse holds. We show that one-way functions exist if and only if (probabilistic) time-bounded SoI fails on average, i.e., if there is a samplable distribution of pairs (x,y) of strings such that SoI for pK$^t$ complexity fails for many of these pairs. Our techniques rely on recent perspectives offered by probabilistic Kolmogorov complexity and meta-complexity, and reveal further equivalences between inverting one-way functions and the validity of key properties of Kolmogorov complexity in the time-bounded setting: (average-case) language compression and (average-case) conditional coding. Motivated by these results, we investigate correspondences of this form for the worst-case hardness of NP (i.e., NP ? BPP) and for the average-case hardness of NP (i.e., DistNP ? HeurBPP), respectively. Our results establish the existence of similar dualities between these computational assumptions and the failure of results from Kolmogorov complexity in the time-bounded setting. In particular, these characterizations offer a novel way to investigate the main hardness conjectures of complexity theory (and the relationships among them) through the lens of Kolmogorov complexity and its properties.
Symmetry of Information (SoI) is a fundamental property of Kolmogorov complexity that relates the complexity of a pair of strings and their conditional complexities. Understanding if this property holds in the time-bounded setting is a longstanding open problem. In the nineties, Longpré and Mocas (1993) and Longpré and Watanabe (1995) established that if SoI holds for time-bounded Kolmogorov complexity then cryptographic one-way functions do not exist, and asked if a converse holds. We show that one-way functions exist if and only if (probabilistic) time-bounded SoI fails on average, i.e., if there is a samplable distribution of pairs (x,y) of strings such that SoI for pK$^t$ complexity fails for many of these pairs. Our techniques rely on recent perspectives offered by probabilistic Kolmogorov complexity and meta-complexity, and reveal further equivalences between inverting one-way functions and the validity of key properties of Kolmogorov complexity in the time-bounded setting: (average-case) language compression and (average-case) conditional coding. Motivated by these results, we investigate correspondences of this form for the worst-case hardness of NP (i.e., NP ? BPP) and for the average-case hardness of NP (i.e., DistNP ? HeurBPP), respectively. Our results establish the existence of similar dualities between these computational assumptions and the failure of results from Kolmogorov complexity in the time-bounded setting. In particular, these characterizations offer a novel way to investigate the main hardness conjectures of complexity theory (and the relationships among them) through the lens of Kolmogorov complexity and its properties.

The 2022 Turing Award

from Richard Lipton

Bob Metcalfe is the sole winner of the 2022 Turing Award. He keyed the development of Ethernet technology growing out of his PhD thesis while at Xerox PARC in the early 1970s. Previous recognitions of his work include the IEEE Medal of Honor and the National Medal of Technology and Innovation. Congrats Bob. If you […]

Bob Metcalfe is the sole winner of the 2022 Turing Award. He keyed the development of Ethernet technology growing out of his PhD thesis while at Xerox PARC in the early 1970s. Previous recognitions of his work include the IEEE Medal of Honor and the National Medal of Technology and Innovation. Congrats Bob.

If you wish to know the key to what he invented it was the insight:

Listen before you talk.

 Source with photo by Kim Kulish

Simple. More in a moment.

The Insight

The concept of Ethernet has its roots in the late 1960s and the University of Hawaii’s Aloha Network. Aloha was a radio-communications network connecting the Hawaiian Islands to a central computer on the main campus, in Oahu.

Aloha was often referred to as one of the first wireless packet networks. It used two radio frequencies, separating send and receive data that passed between user terminals and the main hub connected to the computer. Designed for simplicity, the network followed two key rules:

• Send a packet when you’re ready and wait for a receipt acknowledgement.

• If no acknowledgement occurs, resend at some random time.

Kind of like:

Talk before you listen.

As use of the network grew, it became obvious that packet collision would severely limit the capacity of the network. Researching the problem for his doctoral thesis, Bob Metcalfe devised a solution. His innovation earned him not only his Harvard Ph.D. but a place in history as the inventor of the Ethernet. Metcalfe’s solution: Listen before you talk. Simple insight, but brilliant insight. Here’s how it led to the Turing Award.

How It Worked

The two methods are either Talk-Listen or Listen-Talk. The point is that under reasonable traffic the first method can essentially stop totally while the second will continue to get some information thru.

• Talk-Listen: Imagine some packet is in progress and someone else talks. Then the two packets are destroyed. This can happen at a high enough rate so that essentially all traffic fails.

• Listen-Talk: Imagine some packet is in progress. Then others will listen and they will not send. Thus the information will get thru.

Is there extra work for listening for the proximity of something that might not get delivered to you? Yes. But the local effort to listen is linear. Moreover, the expected enforced delay is a constant—it goes as the local network load. Any impact on rate can thus be more-than-compensated by improvements in the base speed of hardware.

Good Laws and Bad Predictions

Metcalfe is among those we’ve sometimes blogged about who have a “law” named for them. Metcalfe’s Law, in its original form, states

The value of an ${n}$-user network scales as ${n^2}$.

This makes sense on the face of it: if the network is connecting ${n}$ peer users then each avails ${n-1}$ connections of equal value. The equal-value assertion was rejected in a 2006 paper by Robert Briscoe, Andrew Odlyzko, and Benjamin Tilly, who argued that value scales as ${n\log n}$. Recent studies of large-scale network usage, however, have inclined toward the higher rate—including a 2013 paper by Metcalfe himself.

Metcalfe is also known for a spectacularly wrong prediction. He wrote in 1995 that owing to an expected overload as people tried to connect, the Internet would “go spectacularly supernova and in 1996 catastrophically collapse.” It did not. He later jokingly ate his words on a paper copy of the article including this comment and swallowing it before the audience of a keynote speech.

According to this telling, his view of the overload was that ${n}$ would grow exponentially, not that order-${n^2}$ usage would overwhelm it. He had, after all, designed Ethernet to scale up.

Open Problems

While on the subject of predictions, we wonder whether Turing Awards are any easier to predict than Oscars for film or Nobels in science. Some insights into the makeup of Turing Awards are in this paper, including this rundown of winners by university affiliation at the time of the main cited work:

• Stanford University 22

• Massachusetts Institute of Technology 19

• University of California, Berkeley 13

• Carnegie Mellon University 10

• Princeton University 7

• Harvard University 7

• New York University 5

• University of Oxford 4

• University of Toronto 4

• Weizmann Institute of Science 3

• Cornell University 3

• University of Edinburgh 2

By RJLipton+KWRegan

RANDOM & APPROX 2023

Guest post by the 2023 Program Committee Chairs: Nicole Megow (APPROX) and  Adam Smith (RANDOM) The 27th International Workshop on Randomization and Computation (RANDOM 2023) and the 26th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX 2023) will be held in person in Atlanta, GA, USA from September 11-13, 2023.   RANDOM 2023 focuses on applications of randomness to computational and combinatorial problems while APPROX 2023 focuses on algorithmic and complexity theoretic issues relevant to the development of efficient approximate solutions to computationally difficult problems. IMPORTANT DATES: Submissions:  May 4, 2023, 18:00 EDT (UTC-4)Notifications: June 26, 2023 Camera ready: July 10, 2023 For more information, see the CFP on the conference websites:randomconference.com/approxconference.wordpress.com

Guest post by the 2023 Program Committee Chairs: Nicole Megow (APPROX) and  Adam Smith (RANDOM)

The 27th International Workshop on Randomization and Computation (RANDOM 2023) and the 26th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX 2023) will be held in person in Atlanta, GA, USA from September 11-13, 2023.

RANDOM 2023 focuses on applications of randomness to computational and combinatorial problems while APPROX 2023 focuses on algorithmic and complexity theoretic issues relevant to the development of efficient approximate solutions to computationally difficult problems.

IMPORTANT DATES:

Submissions:  May 4, 2023, 18:00 EDT (UTC-4)
Notifications: June 26, 2023

Camera ready: July 10, 2023

For more information, see the CFP on the conference websites:
http://randomconference.com/

http://approxconference.wordpress.com

By Omer Reingold

Saturday, March 25

Postdoc at Université Paris-Dauphine (apply by April 15, 2023)

from CCI: jobs

We are offering a 1+1 year post-doc position as part of the ANR funded project Sub-EXponential APproximation and ParametErized ALgorithms (S-EX-AP-PE-AL). The topic of the position is the intersection of FPT algorithms and approximation. The post-doc will be supervised by Michael Lampis and will be based in LAMSADE, Université Paris-Dauphine, located in central Paris. Website: […]

We are offering a 1+1 year post-doc position as part of the ANR funded project Sub-EXponential APproximation and ParametErized ALgorithms (S-EX-AP-PE-AL). The topic of the position is the intersection of FPT algorithms and approximation. The post-doc will be supervised by Michael Lampis and will be based in LAMSADE, Université Paris-Dauphine,
located in central Paris.

Email: michail.lampis@dauphine.fr

By shacharlovett

An Aperiodic Monotile

from Gil Kalai

I suppose that most of you already heard about the first ever aperiodic planar tiling with one type of tiles. It was discovered by David Smith, Joseph Samuel Myers, Craig S. Kaplan, and Chaim Goodman-Strauss. Amazing!!! Here are  blogposts … Continue reading →

I suppose that most of you already heard about the first ever aperiodic planar tiling with one type of tiles. It was discovered by David Smith, Joseph Samuel Myers, Craig S. Kaplan, and Chaim Goodman-Strauss. Amazing!!!

Here are  blogposts from Complex Projective 4-Space and from the Aperiodical.

An aperiodic monotile

by David Smith, Joseph Samuel Myers, Craig S. Kaplan, and Chaim Goodman-Strauss

Abstract:  A longstanding open problem asks for an aperiodic monotile, also known as an “einstein”: a shape that admits tilings of the plane, but never periodic tilings. We answer this problem for topological disk tiles by exhibiting a continuum of combinatorially equivalent aperiodic polygons. We first show that a representative example, the “hat” polykite, can form clusters called “metatiles”, for which substitution rules can be defined. Because the metatiles admit tilings of the plane, so too does the hat. We then prove that generic members of our continuum of polygons are aperiodic, through a new kind of geometric incommensurability argument. Separately, we give a combinatorial, computer-assisted proof that the hat must form hierarchical — and hence aperiodic — tilings.

The new paper starts with a description of the very interesting history. From Hilbert’s Entscheidungsproblem, that led to notions of undecidability, Wang’s 1961 work, and Berger’s 1966 undecidability result that showed that some tiles admit only apriodic tilings. (This required 20426 types of tiles.) Penrose’s famous apreiodic tiling from 1978 consists of just two tiles.

There are lovely related results for the hyperbolic plane starting from  1974 paper by Boroczky, and later papers by  Block and Weinberger, Margulis and Mozes, Goodman-Strauss and others.

I could not figure out from browsing the paper about the status of the question: “Can we have aperiodic tiling with convex tiles?”

Half a year ago Rachel Greenfeld and Terry Tao disproved in the paper  “A counterexample to the periodic tiling conjecture,“ the periodic tiling conjecture (Grünbaum-Shephard and Lagarias-Wang) that asserted that any finite subset of a lattice $\mathbb Z^d$ which tiles that lattice by translations, in fact tiles periodically.

By Gil Kalai

Postdoctoral Researcher at University of Bergen (Norway) (apply by April 30, 2023)

from CCI: jobs

The Algorithms group at the University of Bergen has an open 2-year Postdoctoral Researcher position in SAT-solving (deadline 30 April 2023). The goal is to apply SAT-solving techniques in the development of algorithms and in the investigation of mathematical conjectures arising in combinatorics, graph theory, and related areas. Website: www.jobbnorge.no/en/available-jobs/job/242720/researcher-in-informatics-sat-solving Email: mateus.oliveira@uib.no

The Algorithms group at the University of Bergen has an open 2-year Postdoctoral Researcher position in SAT-solving (deadline 30 April 2023). The goal is to apply SAT-solving techniques in the development of algorithms and in the investigation of mathematical conjectures arising in combinatorics, graph theory, and related areas.

Website: https://www.jobbnorge.no/en/available-jobs/job/242720/researcher-in-informatics-sat-solving
Email: mateus.oliveira@uib.no

By shacharlovett

Friday, March 24

Mathematics of the impossible, Chapter 7, Space

from Emanuele Viola

As mentioned in Chapter 2, Time is only one of several resources we with to study. Another important one is space, which is another name for the memory of the machine. Computing under space constraints may be less familiar to us than computing under time constraints, and many surprises lay ahead in this chapter that […]

As mentioned in Chapter 2, Time is only one of several resources we with to study. Another important one is space, which is another name for the memory of the machine. Computing under space constraints may be less familiar to us than computing under time constraints, and many surprises lay ahead in this chapter that challenge our intuition of efficient computation.

If only space is under consideration, and one is OK with a constant-factor slackness, then TMs and RAMs are equivalent; much like $\text {P}$ is invariant under power changes in time. In a sense, changing the space by a constant factor is like changing the time by a power; from this point of view the equivalence is not surprising.

We shall consider both space bounds bigger than the input length and smaller. For the latter, we have to consider the input separately. The machine should be able to read the input, but not write on its cells. One way to formalize this is to consider 2TMs, where one tape holds the input and is read-only. The other is a standard read-write tape.

We also want to compute functions $f$ whose output is more than $1$ bit. One option is to equip the machine with yet another tape, which is write-only. We prefer to stick to two tapes and instead require that given $x,i$ the $i$ output bit of $f(x)$ is computable efficiently.

Definition 7.1. A function $f:X\subseteq \{0,1\}^* \to \{0,1\}^*$ is computable in $\text {Space}(s(n))$ if there is a 2TM which on input $(x,i)$ on the first tape, where $x\in X$ and $i\le |f(x)|$, outputs the $i$ bit of $f(x)$, and the machine never writes on the first tape and never uses more than $s(|x|)$ cells on the second.

We define:

\begin{aligned} \text {L}:= & \bigcup _{d}\text {Space}(d\log n),\\ \text {PSpace}:= & \bigcup _{d}\text {Space}(n^{d}). \end{aligned}

We investigate next some basic relationship between space and time.

Theorem 7.1. For every functions $t$ and $s$:

1. $\text {\ensuremath {k}TM-Time}(t)\subseteq \text {Space}(c_{k}t)$,
2. $\text {Time}(t)\subseteq \text {Space}(ct\log (nt))$,
3. $\text {Space}(s)\subseteq 2\text {TM-Time}(c^{s(n)}\cdot n^{c})$.

Proof1. A TM running in time $t$ can only move its heads at most $t$ tape cells. We can write all these contents in one tape. To simulate one step of the $k\text {TM}$ we do a pass on all the contents and update them accordingly.

2. A RAM running in time $t$ can only access $\le t$ memory cells, each containing at most $c\log nt$ bits; the factor $n$ is to take into account that the machine starts with word length $\ge \log n$. We simulate this machine and for each Write operation we add a record on the tape with the memory cell index and the content, similar to Theorem ??. When the RAM reads from memory, we scan the records and read off the memory values from one of them. If the record isn’t found, then the simulation returns the value corresponding to the initial configuration.

We also allocate $c\log nt$ bits for the registers of the RAM. It can be shown that the operations among them can be performed in the desired space, since they only involve a logarithmic number of bits. A stronger result is proved later in Theorem 7.5.

3. On an input $x$ with $|x|=n$, a $\text {Space}(s)$ machine can be in at most $n^{c}\cdot c^{s(n)}$ configurations. The first $n^{c}$ factor is for the head position on the input. The second factor is for the contents of the second tape. Since the machine ultimately stops, configurations cannot repeat, hence the same machine (with no modification) will stop in the desired time. QED

A non-trivial saving is given by the following theorem.

Theorem 7.2. [12] $k\text {-TM-Time}(t)\subseteq \text {Space}(c_{k}t/\log t)$, for every function $t$.

This result is not specific to MTMs but can be proved for other models such as RAMs.

From Theorem 7.1 and the next exercise we have

\begin{aligned} \text {L}\subseteq \text {P}\subseteq \text {NP}\subseteq \text {PH}\subseteq \text {PSpace}\subseteq \text {Exp}. \end{aligned}

Exercise 7.1. Prove $\text {PH}\subseteq \text {PSpace}$.

Just like for Time, for space one has universal machines and a hierarchy theorem. The latter implies $\text {L}\ne \text {PSpace}$. Hence, analogously to the situation for Time and NTime (section º5.1), we know that at least one of the inclusions above between L and PSpace is strict. Most people seem to think that all are, but nobody can prove that any specific one is.

7.1 Branching programs

Branching programs are the non-uniform counterpart of $\text {Space}$, just like circuits are the non-uniform counterpart of $\text {Time}$.

Definition 7.2. A (branching) program is a directed acyclic graph. A node can be labeled with an input variable, in which case it has two outgoing edges labeled $0$ and $1$. Alternatively a node can be labeled with $0$ or $1$, in which case it has no outgoing edges. One special node is the start node.

The space of the program with $S$ nodes is $\log S$. A program computes a function $f:\zonzo$ by following the path from the starting node, following edge labels corresponding to the input, and outputting $b\in \{0,1\}$ as soon as it reaches a node labeled $b$.

Theorem 7.3. Suppose a 2TM $M$ computes $f:\zonzo$ in space $s$. Then $f$ has branching programs of space $c_{M}(s+\log n)$. In particular, any $f\in \text {L}$ has branching programs of size $n^{c_{f}}$.

ProofEach node in the program corresponds to a configuration. QED

Definition 7.3. The branching program given by Theorem 7.3 is called the configuration graph of $M$.

The strongest available impossibility result for branching programs is the following.

Theorem 7.4. [16] There are explicit functions (in $\text {P}$) that require branching programs of size $cn^{2}/\log n$ on inputs of length $n$.

7.2 The power of L

Computing with severe space bounds, as in L, seems quite difficult. Also, it might be somewhat less familiar than, say, computing within a time bound. It turns out that L is a powerful class capable of amazing computational feats that challenge our intuition of efficient computation. Moreover, these computational feats hinge on deep mathematical techniques of wide applicability. We hinted at this in Chapter 1. We now give further examples. At the same time we develop our intuition of what is computable with little space.

To set the stage, we begin with a composition result. In the previous sections we used several times the simple result that the composition of two maps in P is also in P. This is useful as it allows us to break a complicated algorithm in small steps to be analyzed separately – which is a version of the divide et impera paradigm. A similar composition result holds and is useful for space, but the argument is somewhat less obvious.

Lemma 7.1. Let $f_{1}:\{0,1\}^* \to \{0,1\}^*$ be in $\text {Space}(s_{1})$ and satisfy $|f_{1}(x)|\le m(|x|)$ for a function $m$. Suppose $f_{2}:\{0,1\}^* \to \{0,1\}$ is in $\text {Space}(s_{2})$.

Then the composed function $g$ defined as $g(x)=f_{2}(f_{1}(x))$ is computable in space $c(s_{2}(m(n))+s_{1}(n)+\log nm(n))$.

In particular, if $f_{1}$ and $f_{2}$ are in L then $g$ is in L, as long as $m\le n^{d}$ for a constant $d$.

Exercise 7.2. Prove this.

7.2.1 Arithmetic

A first example of the power of L is given by its ability to perform basic arithmetic. Grade school algorithms use a lot of space, for example they employ space $\ge n$ to multiply two $n$-bit integers.

Theorem 7.5. The following problems are in L:

1. Addition of two input integers.
2. Iterated addition: Addition of any number of input integers.
3. Multiplication of two input integers.
4. Iterated multiplication: Multiplication of any number of input integers.
5. Division of two integers.

Iterated multiplication is of particular interest because it can be used to compute “pseudorandom functions.” Such objects shed light on our ability to prove impossibility results via the “Natural Proofs” connection which we will see in Chapter ??.

Proof of 1. in Theorem 7.5 We are given as input $x,y\in \mathbb {N}$ and an index $i$ and need to compute bit $i$ of $x+y$. Starting from the least significant bits, we add the bits of $x$ and $y$, storing the carry of 1 bit in memory. Output bits are discarded until we reach bit $i$, which is output. QED

Exercise 7.3. Prove 2. and 3. in Theorem 7.5.

Proving 4. and 5. is more involved and requires some of those deep mathematical tools of wide applicability we alluded to before. Division can be computed once we can compute iterated multiplication. In a nutshell, the idea is to use the expansion

\begin{aligned} \frac {1}{x}=\sum _{i\ge 0}(-1)^{i}(x-1)^{i}. \end{aligned}

We omit details about bounding the error. Instead, we point out that this requires computing powers $(x-1)^{i}$ which is an example of iterated multiplication (and in fact is no easier).

So for the rest of this section we focus on iterated multiplication. Our main tool for this the Chinese-remaindering representation of integers, abbreviated CRR.

Definition 7.4. We denote by $\mathbb {Z}_{m}$ the integers modulo $m$ equipped with addition and multiplication (modulo $m$).

Theorem 7.6. Let $p_{1},...,p_{\ell }$ be distinct primes and $m:=\prod _{i}p_{i}$. Then $\mathbb {Z}_{m}$ is isomorphic to $\mathbb {Z}_{p_{1}}\times \ldots \times \mathbb {Z}_{p_{\ell }}$.

The forward direction of the isomorphism is given by the map

$x \in \mathbb {Z}_{m} \rightarrow (x \mod p_{1},x \mod p_{2},...,x \mod p_{\ell }).$

For the converse direction, there exist integers $e_{1},...,e_{\ell }\leq (p')^{c}$, depending only on the $p_{i}$ such that the converse direction is given by the map

$(x\mod p_{1},x\mod p_{2},...,x\mod p_{\ell }) \to x:=\sum _{i=1}^{\ell }e_{i}\cdot (x\mod p_{i}).$

Each integer $e_{i}$ is $0$ mod $p_{j}$ for $j\not =i$ and is $1$ mod $p_{i}$.

Example 7.1. $\mathbb {Z}_{6}$ is isomorphic to $\mathbb {Z}_{2}\times \mathbb {Z}_{3}$. The equation $2+3=5$ corresponds to $(0,2)+(1,0)=(1,2)$. The equation $2\cdot 3=6$ corresponds to $(0,2)+(1,0)=(0,0)$. Note how addition and multiplication in CRR are performed in each coordinate separately; how convenient.

To compute iterated multiplication the idea is to move to CRR, perform the multiplications there, and then move back to standard representation. A critical point is that each coordinate in the CRR has a representation of only $c\log n$ bits, which makes it easy to perform iterated multiplication one multiplication at the time, since we can afford to write down intermediate products.

The algorithm is as follows:

Computing the product of input integers $x_{1},\ldots ,x_{t}$. [6]

1. Let $\ell :=n^{c}$ and compute the first $\ell$ prime numbers $p_{1},p_{2},\ldots ,p_{\ell }$.
2. Convert the input into CRR: Compute $(x_{1}\mod p_{1},\ldots ,x_{1}\mod p_{\ell }),\ldots ,(x_{t}\mod p_{1},\ldots ,x_{t}\mod p_{\ell })$.
3. Compute the multiplications in CRR: $(\Pi _{i=1}^{t}x_{i}\mod p_{1}),\ldots ,(\Pi _{i=1}^{t}x_{i}\mod p_{\ell })$.
4. Convert back to standard representation.

Exercise 7.4. Prove the correctness of this algorithm.

Now we explain how steps 1, 2, and 3 can be implemented in L. Step 4 can be implemented in L too, but showing this is somewhat technical due to the computation of the numbers $e_{i}$ in Theorem 7.6. However these numbers only depend on the input length, and so we will be able to give a self-contained proof that iterated multiplication has branching programs of size $n^{c}$.

Step 1

By Theorem ??, the primes $p_{i}$ have magnitude $\le n^{c}$ and so can be represented with $c\log n$ bits. We can enumerate over integers with $\le c\log n$ bits in L. For each integer $x$ we can test if it’s prime by again enumerating over all integers $y$ and $z$ with $\le c\log n$ bits and checking if $x=yz$, say using the space-efficient algorithm for multiplication in Theorem 7.5. (The space required for this step would in fact be $c\log \log n$.)

Step 2

We explain how given $y\in \{0,1\} ^{n}$ we can compute $(y\mod p_{1},\ldots ,y\mod p_{\ell })$ in L. If $y_{j}$ is bit $j$ of $y$ we have that

Step 3

This is a smaller version of the original problem: for each $j\leq \ell$, we want to compute $(\Pi _{i=1}^{t}x_{i}\mod p_{j})$ from $x_{1}\mod p_{j},\ldots ,x_{t}\mod p_{j}$. However, as mentioned earlier, each $(x_{i}\mod p_{j})$ is at most $n^{c}$ in magnitude and thus has a representation of $c\log n$ bits. Hence we can just perform one multiplication at the time in L.

Step 4

By Theorem 7.6, to convert back to standard representation from CRR we have to compute the map

\begin{aligned} (y\mod p_{1},\ldots ,y\mod p_{\ell })\to \sum _{i=1}^{\ell }e_{i}\cdot (y\mod p_{i}). \end{aligned}

Assuming we can compute the $e_{i}$, this is just multiplication and iterated addition, which are in L by Theorem 7.5.

Putting the steps together

Combining the steps together we can compute iterated multiplication in L as long as we are given the integers $e_{i}$ in Theorem 7.6.

Theorem 7.7. Given integers $x_{1},x_{2},\ldots ,x_{t}$, and given the integers $e_{1},e_{2},\ldots ,e_{\ell }$ as in Theorem 7.6, where $\ell =n^{c}$, we can compute $\prod _{i}x_{i}$ in L.

In particular, because the $e_{i}$ only depend on the input length, but not on the $x_{i}$ they can be hardwired in a branching program.

Corollary 7.1. Iterated multiplication has branching programs of size $n^{c}$.

Exercise 7.5. Show that given integers $x_{1},x_{2},\ldots ,x_{t}$ and $y_{1},y_{2},\ldots ,y_{t}$ one can decide if

\begin{aligned} \prod _{i=1}^{t}x_{i}=\prod _{i=1}^{t}y_{i} \end{aligned}

in L. You cannot use the fact that iterated multiplication is in L, a result which we stated but not completely proved.

Exercise 7.6. Show that the iterated multiplication of $d\times d$ matrices over the integers has branching programs of size $n^{c_{d}}$.

7.2.2 Graphs

We now give another example of the power of L.

Definition 7.5. The undirected reachability problem: Given an undirected graph $G$ and two nodes $s$ and $t$ in $G$ determine if there is a path from $s$ to $t$.

Standard time-efficient algorithms to solve this problem mark nodes in the graph. In logarithmic space we can keep track of a constant number of nodes, but it is not clear how we can avoid repeating old steps forever.

Theorem 7.8. [20] Undirected reachability is in L.

The idea behind this result is that a random walk on the graph will visit every node, and can be computed using small space, since we just need to keep track of the current node. Then, one can derandomize the random walk and obtain a deterministic walk, again computable in L.

7.2.3 Linear algebra

Our final example comes from linear algebra. Familiar methods for solving a linear system

\begin{aligned} Ax=b \end{aligned}

can be done requires a lot of space. For example using elimination we need to rewrite the matrix $A$. Similarly, we cannot easily compute determinants using small space. However, a different method exists.

Theorem 7.9. [9] Solving a linear system is computable in $\text {Space}(c\log ^{2}n)$.

7.3 Checkpoints

The checkpoint technique is a fundamental tool in the study of space-bounded computation. Let us illustrate it at a high level. Let us consider a graph $G$, and let us write $u\leadsto ^{t}v$ if there is a path of length $\le t$ from $u$ to $v$. The technique allows us to trade the length of the path with quantifiers. Specifically, for any parameter $b$, we can break down paths from $u$ to $v$ in $b$ smaller paths that go through $b-1$ checkpoints. The length of the smaller paths needs be only $t/b$ (assuming that $b$ divides $t$). We can guess the breakpoints and verify each smaller path separately, at the price of introducing quantifiers but with the gain that the path length got reduced from $t$ to $t/b$. The checkpoint technique is thus an instantiation of the general paradigm of guessing computation and verifying it locally, introduced in Chapter 5. One difference is that now we are only going to guess parts of the computation.

The checkpoint technique   $u\leadsto ^{t}v\Leftrightarrow \exists p_{1},p_{2},\ldots ,p_{b-1}:\forall i\in \{0,1,b-1\}:p_{i}\leadsto ^{t/b}p_{i+1},$   where we denote $p_{0}:=u$ and $p_{b}:=v$.

An important aspect of this technique is that it can be applied recursively: We can apply it again to the problems $p_{i}\leadsto ^{t/b}p_{i+1}$. We need to introduce more quantifiers, but we can reduce the path length to $t/b^{2}$, and so on. We will see several instantiations of this technique, for various settings of parameters, ranging from $b=2$ to $b=n^{c}$.

We now utilize the checkpoint technique to show a simulation of small-space computation by small-depth alternating circuits.

Theorem 7.10. A function computable by a branching program with $S$ nodes is also computable by an alternating circuit of depth $c\log _{b}S$ and size $S^{b\log _{b}S+c}$, for any $b\le S$.

To illustrate the parameters, suppose $S=n^{a}$, and let us pick $b:=n^{\epsilon }$ where $n\ge c_{a,\epsilon }$. Then we have circuits of depth $d:=ca/\epsilon$ and size $\le S^{n^{\epsilon }d+c}=2^{n^{c\epsilon }}$. In other words, we can have depth $d$ and size $2^{n^{c_{a}/d}}$, for every $d$. This in particular holds for every function in $\text {L}$. We will later give explicit functions (also in $\text {P}$) which cannot be computed by circuits of depth $d$ and size $2^{n^{c/d}}$, “just short” of ruling out $\text {L}$. This state of affairs is worth emphasis:

(1) Every $f$ in $\text {L}$ has alternating circuits of depth $d$ and size $2^{n^{c_{f}/d}}$. (2) We can prove that there are explicit functions (also in L) which cannot be computed by circuits of depth $d$ and size $2^{n^{c/d}}$. (3) Improving the constant in the double exponent for a function in P would yield $\text {L}\ne \text {P}$. In this sense, the result in (2) is the best possible short of proving a major separation.

ProofWe apply the checkpoint technique to the branching program, recursively, with parameter $b$. For simplicity we first assume that $S$ is a power of $b$. Each application of the technique reduces the path length by a factor $b$. Hence with $\log _{b}S$ applications we can reduce the path length to $1$.

In one application, we have an $\exists$ quantifier over $b-1$ nodes, corresponding to an Or gate with fan-in $S^{b-1}$, and then a $\forall$ quantifier over $b$ smaller paths, corresponding to an And gate with fan-in $b$. This gives a tree with $S^{b-1}\cdot b\le S^{b}$ leaves. Iterating, the number of leaves will be

\begin{aligned} (S^{b})^{\log _{b}S}. \end{aligned}

Each leaf can be connected to the input bit on which it depends. The size of the circuit is at most $c$ times the number of leaves.

If $S$ is not a power of $b$ we can view the branching program as having $S'\le bS$ nodes where $S'$ is a power of $b$ . QED

The following is a uniform version of Theorem 7.10, and the proof is similar. It shows that we can speed up space-bounded computation with alternations.

Theorem 7.11. [17]Any $f\in L$ is also in $\Sigma _{c_{f}/\epsilon }\text {Time}(n^{\epsilon })$, for any $\epsilon >0$.

ProofLet $G$ be the configuration graph of $f$. Note this graph has $n^{c_{f}}$ nodes. We need to decide if the start configuration reaches the accept configuration in this graph within $t:=n^{c_{f}}$ steps.

We apply to this graph the checkpoint technique recursively, with parameter $b:=n^{\epsilon /2}$. Each application of the technique reduces the path length by a factor $b$. Hence with $c_{f}/\epsilon$ applications we can reduce the path length to

\begin{aligned} \frac {t}{b^{c_{f}/\epsilon }}=\frac {n^{c_{f}}}{n^{c_{f}/2}}\le 1. \end{aligned}

Each quantifier ranges over $b\log n^{c_{f}}=c_{f}n^{\epsilon /2}\log n\le n^{\epsilon }$ bits for large enough $n$.

There remains to check a path of length $1$, i.e., an edge. The endpoints of this edge are two configurations $u$ and $v$ which depend on the quantified bits. The machine can compute the two endpoints in time $\log ^{c}m$ where $m$ is the total number of quantified bits, using rapid access. Once it has $u$ and $v$ it can check if $u$ leads to $v$ in one step by reading one bit from the input. Note $m\le n^{\epsilon }\cdot c_{f}/\epsilon$, so $\log ^{c}m\le c_{f}n^{\epsilon }$. QED

We note that by the generic simulation of alternating time by small-depth circuits in Lemma 6.2, the above theorem also gives a result similar to Theorem 7.10.

7.4 Reductions: L vs. P

Again, we can use reductions to related the space complexity of problems. In particular we can identify the problems in P which have space-efficient algorithms iff every problem in P does.

Definition 7.6. A problem $f$ is P-complete if $f\in \text {P}$ and $f\in \text {L}\Rightarrow \text {L}=\text {P}$.

Definition 7.7. The circuit value problem: Given a circuit $C$ and an input $x$, compute $C(x)$.

Theorem 7.12. Circuit value is P-complete.

ProofCircuit value is in P since we can evaluate one gate at the time. Now let $g\in \text {P}$. We can reduce computing $g$ on input $x$ to a circuit value instance, as in the simulation of TMs by circuits in Theorem ??. The important point is that this reduction is computable in L. Indeed, given an index to a gate in the circuit, we can compute the type of the gate and index to its children via simple arithmetic, which is in L by Theorem 7.5, and some computation which only depends on the description of the P-time machine for $g$. QED

Definition 7.8. The monotone circuit value problem: Given a circuit $C$ with no negations and an input $x$, compute $C(x)$.

Exercise 7.7. Prove that monotone circuit value is P-complete.

Recall from section 7.2.3 that finding solutions to linear systems

\begin{aligned} Ax=b \end{aligned}

has space-efficient algorithms. Interestingly, if we generalize equalities to inequalities the problem becomes P complete. This set of results illustrates a sense in which “linear algebra” is easier than “optimization.”

Definition 7.9. The linear inequalities problem: Given a $d\times d$ matrix $A$ of integers and a $d$-dimensional vector, determine if the system $Ax\le b$ has a solution over the reals.

Theorem 7.13. Linear inequalities is P-complete.

ProofThe ellipsoid algorithm shows that Linear inequalities is in P, but we will not discuss this classic result.

Instead, we focus on showing how given a circuit $C$ and an input $x$ we can construct a set of inequalities that are satisfiable iff $C(x)=1$.

We shall have as many variables $v_{i}$ as gates in the circuit, counting input gates as well.

For an input gate $g_{i}=x_{i}$ add equation $v_{i}=x_{i}$.

For a Not gate $g_{i}=\text {Not}(g_{j})$ add equation $v_{i}=1-v_{j}$.

For an And gate $g_{i}=\text {And}(g_{j},g_{k})$ add equations $0\le v_{i}\le 1,v_{i}\le v_{j},v_{i}\le v_{k},v_{j}+v_{k}-1\le v_{i}$.

The case of Or is similar, or can be dispensed by writing an Or using Not and And.

Finally, if $g_{i}$ is the output gate add equation $v_{i}=1$.

We claim that in every solution to $Av\le b$ the value of $v_{i}$ is the value in $\{0,1\}$ of gate $g_{i}$ on input $x$. This can be proved by induction. For input and Not gates this is immediate. For an And gate, note that if $v_{j}=0$ then $v_{i}=0$ as well because of the equations $v_{i}\ge 0$ and $v_{i}\le v_{j}$. The same holds if $v_{k}=0$. If both $v_{j}$ and $v_{k}$ are 1 then $v_{i}$ is $1$ as well because of the equations $v_{i}\le 1$ and $v_{j}+v_{k}-1\le v_{i}$. QED

7.4.1 Nondeterministic space

Because of the insight we gained from considering non-deterministic time-bounded computation in section º5.1, we are naturally interested in non-deterministic space-bounded computation. In fact, perhaps we will gain even more insight, because this notion will really challenge our understanding of computation.

For starters, let us define non-deterministic space-bounded computation. A naive approach is to define it using the quantifiers from section º6.4, leading to the class $\exists \cdot \text {L}$. This is an ill-fated choice:

Exercise 7.8. Prove $\exists \cdot \text {L}=\exists \cdot \text {P}$.

Instead, non-deterministic space is defined in terms of non-deterministic TMs.

Definition 7.10. A function $f:\{0,1\}^* \to \{0,1\}$ is computable in $\text {NSpace}(s(n))$ if there is a two-tape TM which on input $x$ never writes on the first tape and never uses more than $s(n)$ cells on the second, and moreover:

1. The machine is equipped with a special “Guess” state. Upon entering this state, a guess bit is written on the work tape under the head.

2. $f(x)=1$ iff there exists a choice for the guess bits that causes the machine to output $1$.

We define

\begin{aligned} \text {NL}:=\bigcup _{d}\text {NSpace}(d\log n). \end{aligned}

How can we exploit this non-determinism? Recall from section 7.2.2 that reachability in undirected graphs is in L. It is unknown if the same holds for directed graphs. However, we can solve it in NL.

Definition 7.11. The directed reachability problem: Given a directed graph $G$ and two nodes $s$ and $t$, decide if there is a path from $s$ to $t$.

Theorem 7.14. Directed reachability $\text {is in NL}.$

ProofThe proof simply amounts to guessing a path in the graph. The algorithm is as follows:

“On input $G,s,t$, let $v:=s$.

For $i=0$ to $|G|$:

If $v=t$, accept.

Guess a neighbor $w$ of $v$. Let $v:=w$.

If you haven’t accepted, reject.”

The space needed is $|v|+|i|=c\log |G|$. QED

We can define NL completeness similarly to NP and P completeness, and have the following result.

Theorem 7.15. Directed reachability is NL-complete. That is, it is in NL and it is in L iff $\text {L}=\text {NL}$.

Exercise 7.9. Prove this.

7.4.2 Nspace vs. time

Recall by definition $\text {Space}(s(n))\subseteq \text {NSpace}(s(n))$. We showed $\text {Space}(s(n))\subseteq \text {Time}(n^{c}c^{s(n)})$ in Theorem 7.1. We can strengthen the inclusion to show that it holds even for non-deterministic space.

Theorem 7.16. $\text {NSpace}(s(n))\subseteq \text {Time}(n^{c}c^{s(n)})$.

ProofOn input x, we compute the configuration graph $G$ of $M$ on input $x$. The nodes are the configurations, and there is an edge from $u$ to $v$ if the machine can go from $u$ to $v$ in one step. Then we solve reachability on this graph in power time, using say breadth-first-search. QED

The next theorem shows that non-deterministic space is not much more powerful than deterministic space: it buys at most a square. Contrast this with the P vs. NP question! The best determistic simulation of NP that we know is the trivial $\text {NP}\subseteq \text {Exp}$. Thus the situation for space is entirely different.

How can this be possible? The high-level idea, which was used already in Lemma 7.1, can be cast as follows:

Unlike time, space can be reused.

Theorem 7.17. [23] $\text {NSpace}(s(n))\subseteq \text {Space}(cs^{2}(n))$, for every $s(n)\ge \log n$. In particular, $\text {NPSpace}=\text {PSpace}$.

ProofWe use the checkpoint technique with parameter $b=2$, and re-use the space to verify the smaller paths. Let $N$ be a non-deterministic TM computing a function in $\text {NSpace}(s(n))$. We aim to construct a deterministic TM $M$ that on input $x$ returns

\begin{aligned} \text {Reach}(C_{\text {start}},C_{\text {accept}},c^{s(n)}), \end{aligned}

where $\text {Reach}(u,v,t)$ decides if $v$ is reachable from $u$ in $\le t$ steps in the configuration graph of $N$ on input $x$, and $C_{\text {start}}$ is the start configuration, $C_{\text {accept}}$ is the accept configuration, and $c^{s(n)}$ is the number of configurations of $N$.

The key point is how to implement Reach.

$\text {Computing\text { Reach}\ensuremath {(u,v,t)}}$

For all “middle” configurations $m$

$\text {If both \text {Reach}\ensuremath {(u,m,t/2)=1} and \text {Reach}\ensuremath {(m,v,t/2)=1} then Accept.}$

Reject

Let $S(t)$ denote the space needed for computing $\text {Reach}(u,v,t)$. We have

\begin{aligned} S(t)\le cs(n)+S(t/2). \end{aligned}

This is because we can re-use the space for two calls to Reach. Therefore, the space for $\text {Reach}(C_{\text {start}},C_{\text {accept}},c^{s(n)})$ is

\begin{aligned} \le cs(n)+cs(n)+\ldots +cs(n)\le cs^{2}(n). \end{aligned}

QED

Recall that we do not know if $\text {Ntime}(t)$ is closed under complement. It is generally believed not to be the case, and we showed that if it is then the PH collapses Exercise 6.3.

What about space? Is $\text {NSpace}(s)$ closed under complement? Theorem 7.17 shows $\text {NSpace}(s(n))\subseteq \text {Space}(cs^{2}(n))$. So if $f\in \text {NSpace}(s(n))$ then the complement function $1-f$ is in $\text {Space}(cs^{2}(n))$ (and in particular in $\text {Space}(cs^{2}(n))$). Hence, up to a quadratic loss in space, non-deterministic space is closed under complement.

Can we avoid squaring the space?

Yes! This is weird!

Theorem 7.18. The complement of Path is in NL.

ProofWe want a non-deterministic 2TM that given $G,s,\text {and }t$ accepts if there is no path from $s$ to $t$ in $G$.

For starters, suppose the machine has computed the number $C$ of nodes reachable from $s$. The key idea is that there is no path from $s$ to $t$ iff there are $C$ nodes different from $t$ reachable from $s$. Thus, knowing $C$ we can solve the problem as follows

Algorithm for deciding if there is no path from $s$ to $t$, given $C$:

$\text {Initialize \text {Count}=0; Enumerate over all nodes \ensuremath {v\ne t}}$

$\text {Guess a path from \ensuremath {s} of length \ensuremath {|G|}. If path reaches \ensuremath {v}, increase \text {Count} by \ensuremath {1} }$

If $\text {Count}=C$ Accept, else Reject.

There remains to compute $C$.

Let $A_{i}$ be the nodes at distance $\le i$ from $s$, and let $C_{i}:=|A_{i}|$. Note $A_{0}=\{s\},c_{0}=1$. We seek to compute $C=C_{n}$.

To compute $C_{i+1}$ from $C_{i}$, enumerate nodes $v$ (candidate in $A_{i+1}$). For each $v$, enumerate over all nodes $w$ in $A_{i}$, and check if $w\to v$ is an edge. If so, increase $C_{i+1}$ by $1$.

The enumeration over $A_{i}$ is done guessing $C_{i}$ nodes and paths from $s$. If we don’t find $C_{i}$ nodes, we reject. QED

7.5 TiSp

So far in this chapter we have focused on bounding the space usage. For this, the TM model was sufficient, as remarked at the beginning. It is natural to consider algorithms that operate in little time and space. For this, of course, whether we use TMs or RAMs makes a difference.

Definition 7.12. Let $\text {TiSp}(t,s)$ be the functions computable on a RAM that on every input $x\in \{0,1\} ^{n}$ runs in time $t(n)$ and only uses memory cells $0$ to $s(n)-1$.

Exercise 7.10. Prove $\text {L}=\bigcup _{d}\text {TiSp}(n^{d},d)$.

An alternative definition of TiSp would allow the RAM to access $s(n)$ cells anywhere in memory. One can maintain a data structure to show that this alternative definition is equivalent to Definition 7.12.

To illustrate the relationship between TiSp, Time, and Space, consider undirected reachability. It is solvable in Time$(n\log ^{c}n)$ by bradth-first search, and in logarithmic space by Theorem 7.8. But it isn’t known if it is in $\text {TiSp}(n\log ^{a}n,a\log n)$ for some constant $a$.

Exercise 7.11. Prove that Theorem 7.11 holds even for any function $f$ in $\text {TiSp}(n^{a},n^{1-\epsilon })$ for any $a$ and $\epsilon >0$.

The following is a non-uniform version of TiSp.

Definition 7.13. A branching program of length $t$ and width $W$ is a branching program where the nodes are partitioned in $t$ layers $L_{1},L_{2},\ldots ,L_{t}$ where nodes in $L_{i}$ only lead to nodes in $L_{i+1}$, and $|L_{i}|\le W$ for every $i$.

Thus $t$ represents the time of the computation, and $\log W$ the space.

Recall that Theorem 7.10 gives bounds of the form $\ge cn^{2}/\log n$ on the size of branching program (without distinguishing between length and width). For branching programd og length $t$ and width $W$ this bound gives $t\ge cn^{2}/W\log n$. But it gives nothing for power width like $W=n^{2}$. The state-of-the-art for power width is $t\ge \Omega (n\sqrt {\log n/\log \log n})$ [27] (in fact the bound holds even for subexponential) .

With these definitions in hand we can refine the connection between branching programs and small-depth circuits in Theorem 7.10 for circuits of depth 3.

Theorem 7.19. Let $f:\{0,1\} ^{n}\to \{0,1\}$ be computable by a branching program with width $W$ and time $t$. Then $f$ is computable by an alternating depth-3 circuit with $\le 2^{c\sqrt {t\log W}}$ wires.

We will later show explicit functions that require depth-3 circuits of size $2^{c\sqrt {n}}$. Theorem 7.19 shows that improving this would also improve results for small-width branching programs, a refinement of the message emphasized before after Theorem 7.10.

Exercise 7.12. Prove this.

7.6 Notes

For a discussion of the complexity of division, see [3]. For a compendium of P-complete problems see [11].

References

[1]   Mikl≤s Ajtai. $\Sigma \sp {1}\sb {1}$-formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.

[2]   Mikl≤s Ajtai. A non-linear time lower bound for boolean branching programs. Theory of Computing, 1(1):149–176, 2005.

[3]   Eric Allender. The division breakthroughs. Bulletin of the EATCS, 74:61–77, 2001.

[4]   Noga Alon, Oded Goldreich, Johan Hσstad, and RenΘ Peralta. Simple constructions of almost $k$-wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992.

[5]   Kenneth E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, volume 32, pages 307–314, 1968.

[6]   Paul Beame, Stephen A. Cook, and H. James Hoover. Log depth circuits for division and related problems. SIAM J. Comput., 15(4):994–1003, 1986.

[7]   Paul Beame, Michael Saks, Xiaodong Sun, and Erik Vee. Time-space trade-off lower bounds for randomized computation of decision problems. J. of the ACM, 50(2):154–195, 2003.

[8]   Stephen A. Cook. A hierarchy for nondeterministic time complexity. J. of Computer and System Sciences, 7(4):343–353, 1973.

[9]   L. Csanky. Fast parallel matrix inversion algorithms. SIAM J. Comput., 5(4):618–623, 1976.

[10]   Aviezri S. Fraenkel and David Lichtenstein. Computing a perfect strategy for n x n chess requires time exponential in n. J. Comb. Theory, Ser. A, 31(2):199–214, 1981.

[11]   Raymond Greenlaw, H. James Hoover, and Walter Ruzzo. Limits to Parallel Computation: P-Completeness Theory. 02 2001.

[12]   John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. On time versus space. J. ACM, 24(2):332–337, 1977.

[13]   Richard M. Karp and Richard J. Lipton. Turing machines that take advice. L’Enseignement MathΘmatique. Revue Internationale. IIe SΘrie, 28(3-4):191–209, 1982.

[14]   Leonid A. Levin. Universal sequential search problems. Problemy Peredachi Informatsii, 9(3):115–116, 1973.

[15]   Joseph Naor and Moni Naor. Small-bias probability spaces: efficient constructions and applications. SIAM J. on Computing, 22(4):838–856, 1993.

[16]   E. I. Nechiporuk. A boolean function. Soviet Mathematics-Doklady, 169(4):765–766, 1966.

[17]   Valery A. Nepomnjaščiĭ. Rudimentary predicates and Turing calculations. Soviet Mathematics-Doklady, 11(6):1462–1465, 1970.

[18]   NEU. From RAM to SAT. Available at http://www.ccs.neu.edu/home/viola/, 2012.

[19]   Wolfgang J. Paul, Nicholas Pippenger, Endre SzemerΘdi, and William T. Trotter. On determinism versus non-determinism and related problems (preliminary version). In IEEE Symp. on Foundations of Computer Science (FOCS), pages 429–438, 1983.

[20]   Omer Reingold. Undirected connectivity in log-space. J. of the ACM, 55(4), 2008.

[21]   J. M. Robson. N by N checkers is exptime complete. SIAM J. Comput., 13(2):252–267, 1984.

[22]   Rahul Santhanam. On separators, segregators and time versus space. In IEEE Conf. on Computational Complexity (CCC), pages 286–294, 2001.

[23]   Walter J. Savitch. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4(2):177–192, 1970.

[24]   Michael Sipser. A complexity theoretic approach to randomness. In ACM Symp. on the Theory of Computing (STOC), pages 330–335, 1983.

[25]   Seinosuke Toda. PP is as hard as the polynomial-time hierarchy. SIAM J. on Computing, 20(5):865–877, 1991.

[26]   Leslie G. Valiant and Vijay V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85–93, 1986.

[27]   N. V. Vinodchandran. A note on the circuit complexity of PP. Theor. Comput. Sci., 347(1-2):415–418, 2005.

[28]   Emanuele Viola. On approximate majority and probabilistic time. Computational Complexity, 18(3):337–375, 2009.

By Manu

TR23-034 | On teaching the approximation method for circuit lower bounds | Oded Goldreich

from ECCC Papers

This text provides a basic presentation of the the approximation method of Razborov (Matematicheskie Zametki, 1987) and its application by Smolensky (19th STOC, 1987) for proving lower bounds on the size of ${\cal AC}^0[p]$-circuits that compute sums mod~$q$ (for primes $q\neq p$). The textbook presentations of the latter result concentrate on proving the special case of $q=2$, and do not provide details on the proof of the general case. Furthermore, the presentations I have read tend to be too terse to my taste. The current text provides a detailed exposition of both the special case and the general case. Nevertheless, I agree with the common practice of covering only the case of $q=2$ in class, and suggest leaving the general case (i.e, $q>2$) for advanced reading.
This text provides a basic presentation of the the approximation method of Razborov (Matematicheskie Zametki, 1987) and its application by Smolensky (19th STOC, 1987) for proving lower bounds on the size of ${\cal AC}^0[p]$-circuits that compute sums mod~$q$ (for primes $q\neq p$). The textbook presentations of the latter result concentrate on proving the special case of $q=2$, and do not provide details on the proof of the general case. Furthermore, the presentations I have read tend to be too terse to my taste. The current text provides a detailed exposition of both the special case and the general case. Nevertheless, I agree with the common practice of covering only the case of $q=2$ in class, and suggest leaving the general case (i.e, $q>2$) for advanced reading.

TR23-033 | Fast Numerical Multivariate Multipoint Evaluation | Sumanta Ghosh, Prahladh Harsha, Simao Herdade, Mrinal Kumar, Ramprasad Saptharishi

from ECCC Papers

We design nearly-linear time numerical algorithms for the problem of multivariate multipoint evaluation over the fields of rational, real and complex numbers. We consider both \emph{exact} and \emph{approximate} versions of the algorithm. The input to the algorithms are (1) coefficients of an $m$-variate polynomial $f$ with degree $d$ in each variable, and (2) points $\mathbf{a}_1,\ldots, \mathbf{a}_N$ each of whose coordinate has value bounded by one and bit-complexity $s$. * Approximate Version: Given additionally an accuracy parameter $t$, the algorithm computes rational numbers $\beta_1,\ldots, \beta_N$ such that $|{f(\mathbf{a}_i) - \beta_i} \leq \sfrac{1}{2^t}$ for all $i$, and has a running time of $\left((Nm + d^m)(s + t)\right)^{1 + o(1)}$ for all $m$ and all sufficiently large $d$. * Exact version (when over rationals): Given additionally a bound $c$ on the bit-complexity of all evaluations, the algorithm computes the rational numbers $f(\mathbf{a}_1), \ldots, f(\mathbf{a}_N)$, in time $\left((Nm + d^m)(s + c)\right)^{1 + o(1)}$ for all $m$ and all sufficiently large $d$. Our results also naturally extend to the case when the input is over the field of real or complex numbers under an appropriate standard model of representation of field elements in such fields. Prior to this work, a nearly-linear time algorithm for multivariate multipoint evaluation (exact or approximate) over any infinite field appears to be known only for the case of univariate polynomials, and was discovered in a recent work of Moroz [Proc. 62nd FOCS, 2021]. In this work, we extend this result from the univariate to the multivariate setting. However, our algorithm is based on ideas that seem to be conceptually different from those of Moroz and crucially relies on a recent algorithm of Bhargava, Ghosh, Guo, Kumar & Umans [Proc. 63rd FOCS, 2022] for multivariate multipoint evaluation over finite fields, and known efficient algorithms for the problems of rational number reconstruction and fast Chinese remaindering in computational number theory.
We design nearly-linear time numerical algorithms for the problem of multivariate multipoint evaluation over the fields of rational, real and complex numbers. We consider both \emph{exact} and \emph{approximate} versions of the algorithm. The input to the algorithms are (1) coefficients of an $m$-variate polynomial $f$ with degree $d$ in each variable, and (2) points $\mathbf{a}_1,\ldots, \mathbf{a}_N$ each of whose coordinate has value bounded by one and bit-complexity $s$. * Approximate Version: Given additionally an accuracy parameter $t$, the algorithm computes rational numbers $\beta_1,\ldots, \beta_N$ such that $|{f(\mathbf{a}_i) - \beta_i} \leq \sfrac{1}{2^t}$ for all $i$, and has a running time of $\left((Nm + d^m)(s + t)\right)^{1 + o(1)}$ for all $m$ and all sufficiently large $d$. * Exact version (when over rationals): Given additionally a bound $c$ on the bit-complexity of all evaluations, the algorithm computes the rational numbers $f(\mathbf{a}_1), \ldots, f(\mathbf{a}_N)$, in time $\left((Nm + d^m)(s + c)\right)^{1 + o(1)}$ for all $m$ and all sufficiently large $d$. Our results also naturally extend to the case when the input is over the field of real or complex numbers under an appropriate standard model of representation of field elements in such fields. Prior to this work, a nearly-linear time algorithm for multivariate multipoint evaluation (exact or approximate) over any infinite field appears to be known only for the case of univariate polynomials, and was discovered in a recent work of Moroz [Proc. 62nd FOCS, 2021]. In this work, we extend this result from the univariate to the multivariate setting. However, our algorithm is based on ideas that seem to be conceptually different from those of Moroz and crucially relies on a recent algorithm of Bhargava, Ghosh, Guo, Kumar & Umans [Proc. 63rd FOCS, 2022] for multivariate multipoint evaluation over finite fields, and known efficient algorithms for the problems of rational number reconstruction and fast Chinese remaindering in computational number theory.

Xavier Waintal responds (tl;dr Grover is still quadratically faster)

from Scott Aaronson

This morning Xavier Waintal, coauthor of the new arXiv preprint “””refuting””” Grover’s algorithm, which I dismantled here yesterday, emailed me a two-paragraph response. He remarked that the “classy” thing for me to do would be to post the response on my blog, but: “I would totally understand if you did not want to be contradicted […]

This morning Xavier Waintal, coauthor of the new arXiv preprint “””refuting””” Grover’s algorithm, which I dismantled here yesterday, emailed me a two-paragraph response. He remarked that the “classy” thing for me to do would be to post the response on my blog, but: “I would totally understand if you did not want to be contradicted in your own zone of influence.”

Here is Waintal’s response, exactly as sent to me:

The elephant in the (quantum computing) room: opening the Pandora box of the quantum oracle

One of the problem we face in the field of quantum computing is a vast diversity of cultures between, say, complexity theorists on one hand and physicists on the other hand. The former define mathematical objects and consider any mathematical problem as legitimate. The hypothesis are never questioned, by definition. Physicists on the other hand spend their life questioning the hypothesis, wondering if they do apply to the real world. This dichotomy is particularly acute in the context of the emblematic Grover search algorithm, one of the cornerstone of quantum computing. Grover’s algorithm uses the concept of “oracle”, a black box function that one can call, but of which one is forbidden to see the source code. There are well known complexity theorems that show that in this context a quantum computer can solve the “search problem” faster than a classical computer.

But because we closed our eyes and decided not to look at the source code does not mean it does not exist. In https://arxiv.org/pdf/2303.11317.pdf, Miles Stoudenmire and I deconstruct the concept of oracle and show that as soon as we give the same input to both quantum and classical computers (the quantum circuit used to program the oracle on the actual quantum hardware) then the *generic* quantum advantage disappears. The charge of the proof is reversed: one must prove certain properties of the quantum circuit in order to possibly have a theoretical quantum advantage. More importantly – for the physicist that I am – our classical algorithm is very fast and we show that we can solve large instances of any search problem. This means that even for problems where *asymptotically* quantum computers are faster than classical ones, the crossing point where they become so is for astronomically large computing time, in the tens of millions of years. Who is willing to wait that long for the answer to a single question, even if the answer is 42?

The above explicitly confirms something that I realized immediately on reading the preprint, and that fully explains the tone of my response. Namely, Stoudenmire and Waintal’s beef isn’t merely with Grover’s algorithm, or even with the black-box model; it’s with the entire field of complexity theory. If they were right that complexity theorists never “questioned hypotheses” or wondered what did or didn’t apply to the real world, then complexity theory shouldn’t exist in CS departments at all—at most it should exist in pure math departments.

But a converse claim is also true. Namely, suppose it turned out that complexity theorists had already fully understood, for decades, all the elementary points Stoudenmire and Waintal were making about oracles versus explicit circuits. Suppose complexity theorists hadn’t actually been confused, at all, about under what sorts of circumstances the square-root speedup of Grover’s algorithm was (1) provable, (2) plausible but unproven, or (3) nonexistent. Suppose they’d also been intimately familiar with the phenomenon of asymptotically faster algorithms that get swamped in practice by unfavorable constants, and with the overhead of quantum error-correction. Suppose, indeed, that complexity theorists hadn’t merely understood all this stuff, but expressed it clearly and accurately where Stoudenmire and Waintal’s presentation was garbled and mixed with absurdities (e.g., the Grover problem “being classically solvable with a linear number of queries,” the Grover speedup not being “generic,” their being able to “solve large instances of any search problem” … does that include, for example, CircuitSAT? do they still not get the point about CircuitSAT?). Then Stoudenmire and Waintal’s whole objection would collapse.

Anyway, we don’t have to suppose! In the SciRate discussion of the preprint, a commenter named Bibek Pokharel helpfully digs up some undergraduate lecture notes from 2017 that are perfectly clear about what Stoudenmire and Waintal treat as revelations (though one could even go 20+ years earlier). The notes are focused here on Simon’s algorithm, but the discussion generalizes to any quantum black-box algorithm, including Grover’s:

The difficulty in claiming that we’re getting a quantum speedup [via Simon’s algorithm] is that, once we pin down the details of how we’re computing [the oracle function] f—so, for example, the actual matrix A [such that f(x)=Ax]—we then need to compare against classical algorithms that know those details as well. And as soon as we reveal the innards of the black box, the odds of an efficient classical solution become much higher! So for example, if we knew the matrix A, then we could solve Simon’s problem in classical polynomial time just by calculating A‘s nullspace. More generally, it’s not clear whether anyone to this day has found a straightforward “application” of Simon’s algorithm, in the sense of a class of efficiently computable functions f that satisfy the Simon promise, and for which any classical algorithm plausibly needs exponential time to solve Simon’s problem, even if the algorithm is given the code for f.

In the same lecture notes, one can find the following discussion of Grover’s algorithm, and how its unconditional square-root speedup becomes conditional (albeit, still extremely plausible in many cases) as soon as the black box is instantiated by an explicit circuit:

For an NP-complete problem like CircuitSAT, we can be pretty confident that the Grover speedup is real, because no one has found any classical algorithm that’s even slightly better than brute force. On the other hand, for more “structured” NP-complete problems, we do know exponential-time algorithms that are faster than brute force. For example, 3SAT is solvable classically in about O(1.3n) time. So then, the question becomes a subtle one of whether Grover’s algorithm can be combined with the best classical tricks that we know to achieve a polynomial speedup even compared to a classical algorithm that uses the same tricks. For many NP-complete problems the answer seems to be yes, but it need not be yes for all of them.

The notes in question were written by some random complexity theorist named Scot Aronsen (sp?). But if you don’t want to take it from that guy, then take it from (for example) the Google quantum chemist Ryan Babbush, again on the SciRate page:

It is well understood that applying Grover’s algorithm to 3-SAT in the standard way would not give a quadratic speedup over the best classical algorithm for 3-SAT in the worst case (and especially not on average). But there are problems for which Grover is expected to give a quadratic speedup over any classical algorithm in the worst case. For example, the problem “Circuit SAT” starts by me handing you a specification of a poly-size classical circuit with AND/OR/NOT gates, so it’s all explicit. Then you need to solve SAT on this circuit. Classically we strongly believe it will take time 2^n (this is even the basis of many conjectures in complexity theory, like the exponential time hypothesis), and quantumly we know it can be done with 2^{n/2}*poly(n) quantum gates using Grover and the explicitly given classical circuit. So while I think there are some very nice insights in this paper, the statement in the title “Grover’s Algorithm Offers No Quantum Advantage” seems untrue in a general theoretical sense. Of course, this is putting aside issues with the overheads of error-correction for quadratic speedups (a well understood practical matter that is resolved by going to large problem sizes that wouldn’t be available to the first fault-tolerant quantum computers). What am I missing?

More generally, over the past few days, as far as I can tell, every actual expert in quantum algorithms who’s looked at Stoudenmire and Waintal’s preprint—every one, whether complexity theorist or physicist or chemist—has reached essentially the same conclusions about it that I did. The one big difference is that many of the experts, who are undoubtedly better people than I am, extended a level of charity to Stoudenmire and Waintal (“well, this of course seems untrue, but here’s what it could have meant”) that Stoudenmire and Waintal themselves very conspicuously failed to extend to complexity theory.

By Scott

TR23-032 | Linear Independence, Alternants and Applications | Vishwas Bhargava, Shubhangi Saraf, Ilya Volkovich

from ECCC Papers

We develop a new technique for analyzing linear independence of multivariate polynomials. One of our main technical contributions is a \emph{Small Witness for Linear Independence} (SWLI) lemma which states the following. If the polynomials $f_1,f_2, \ldots, f_k \in \F[X]$ over $X=\{x_1, \ldots, x_n\}$ are $\F$-linearly independent then there exists a subset $S \subseteq X$ of size at most $k-1$ such that $f_1,f_2, \ldots, f_k$ are also $\F(X\setminus S)$-linearly independent. We show how to effectively combine this lemma with the use of the \emph{alternant} matrix to analyze linear independence of polynomials. We also give applications of our technique to the questions of polynomial identity testing and arithmetic circuit reconstruction. \begin{enumerate} \item We give a general technique for lifting efficient polynomial identity testing algorithms from basic classes of circuits, satisfying some closure properties, to more general classes of circuits. As one of the corollaries of this result, we obtain the first algorithm for polynomial identity testing for depth-$4$, constant-occur circuits that works over \textbf{all} fields. This strengthens a result by \cite{ASSS16} (\emph{STOC '12}) that works in the case when the characteristic is $0$ or sufficiently large. Another corollary is an identity testing algorithm for a special case of depth-$5$ circuits. To the best of our knowledge, this is the first algorithm for this class of circuits. \item We give new and efficient black-box reconstruction algorithms for the class of set-multilinear depth-3 circuits of constant top fan-in, where the set-multilinear variable partition is \emph{unknown}. This generalizes the results of \cite{BSV21} (\emph{STOC '21}) and \cite{PSV22} (\emph{ECCC '22}) which work in the case of known variable partition, and correspond to tensor decomposition of constant-rank tensors. \end{enumerate}
We develop a new technique for analyzing linear independence of multivariate polynomials. One of our main technical contributions is a \emph{Small Witness for Linear Independence} (SWLI) lemma which states the following. If the polynomials $f_1,f_2, \ldots, f_k \in \F[X]$ over $X=\{x_1, \ldots, x_n\}$ are $\F$-linearly independent then there exists a subset $S \subseteq X$ of size at most $k-1$ such that $f_1,f_2, \ldots, f_k$ are also $\F(X\setminus S)$-linearly independent. We show how to effectively combine this lemma with the use of the \emph{alternant} matrix to analyze linear independence of polynomials. We also give applications of our technique to the questions of polynomial identity testing and arithmetic circuit reconstruction. \begin{enumerate} \item We give a general technique for lifting efficient polynomial identity testing algorithms from basic classes of circuits, satisfying some closure properties, to more general classes of circuits. As one of the corollaries of this result, we obtain the first algorithm for polynomial identity testing for depth-$4$, constant-occur circuits that works over \textbf{all} fields. This strengthens a result by \cite{ASSS16} (\emph{STOC '12}) that works in the case when the characteristic is $0$ or sufficiently large. Another corollary is an identity testing algorithm for a special case of depth-$5$ circuits. To the best of our knowledge, this is the first algorithm for this class of circuits. \item We give new and efficient black-box reconstruction algorithms for the class of set-multilinear depth-3 circuits of constant top fan-in, where the set-multilinear variable partition is \emph{unknown}. This generalizes the results of \cite{BSV21} (\emph{STOC '21}) and \cite{PSV22} (\emph{ECCC '22}) which work in the case of known variable partition, and correspond to tensor decomposition of constant-rank tensors. \end{enumerate}

Analyzing Innermost Runtime Complexity Through Tuple Interpretations

Authors: Liye Guo (Radboud University), Deivid Vale (Radboud University)

Time complexity in rewriting is naturally understood as the number of steps needed to reduce terms to normal forms. Establishing complexity bounds to this measure is a well-known problem in the rewriting community. A vast majority of techniques to find such bounds consist of modifying termination proofs in order to recover complexity information. This has been done for instance with semantic interpretations, recursive path orders, and dependency pairs. In this paper, we follow the same program by tailoring tuple interpretations to deal with innermost complexity analysis. A tuple interpretation interprets terms as tuples holding upper bounds to the cost of reduction and size of normal forms. In contrast with the full rewriting setting, the strongly monotonic requirement for cost components is dropped when reductions are innermost. This weakened requirement on cost tuples allows us to prove the innermost version of the compatibility result: if all rules in a term rewriting system can be strictly oriented, then the innermost rewrite relation is well-founded. We establish the necessary conditions for which tuple interpretations guarantee polynomial bounds to the runtime of compatible systems and describe a search procedure for such interpretations.

Authors: Liye Guo (Radboud University), Deivid Vale (Radboud University)

Time complexity in rewriting is naturally understood as the number of steps needed to reduce terms to normal forms. Establishing complexity bounds to this measure is a well-known problem in the rewriting community. A vast majority of techniques to find such bounds consist of modifying termination proofs in order to recover complexity information. This has been done for instance with semantic interpretations, recursive path orders, and dependency pairs. In this paper, we follow the same program by tailoring tuple interpretations to deal with innermost complexity analysis. A tuple interpretation interprets terms as tuples holding upper bounds to the cost of reduction and size of normal forms. In contrast with the full rewriting setting, the strongly monotonic requirement for cost components is dropped when reductions are innermost. This weakened requirement on cost tuples allows us to prove the innermost version of the compatibility result: if all rules in a term rewriting system can be strictly oriented, then the innermost rewrite relation is well-founded. We establish the necessary conditions for which tuple interpretations guarantee polynomial bounds to the runtime of compatible systems and describe a search procedure for such interpretations.

Dual-Quaternion Interpolation

Authors: Benjamin Kenwright

Transformations in the field of computer graphics and geometry are one of the most important concepts for efficient manipulation and control of objects in 2-dimensional and 3-dimensional space. Transformations take many forms each with their advantages and disadvantages. A particularly powerful tool for representing transforms in a unified form are dual-quaternions. A benefit of this unified form is the interpolation properties, which address a range of limitations (compact form that allows a rotational and translational components to be coupled). In this article, we examine various dual-quaternion interpolation options that achieve different trade-offs between computational cost, aesthetic factors and coupling dependency. Surprisingly, despite dual-quaternions being a common tool in graphics libraries, there are limited details on the interpolation details. Here we attempt to explain interpolation concept, elaborating on underpinning theories, while explaining concepts and bespoke modifications for added control.

Authors: Benjamin Kenwright

Transformations in the field of computer graphics and geometry are one of the most important concepts for efficient manipulation and control of objects in 2-dimensional and 3-dimensional space. Transformations take many forms each with their advantages and disadvantages. A particularly powerful tool for representing transforms in a unified form are dual-quaternions. A benefit of this unified form is the interpolation properties, which address a range of limitations (compact form that allows a rotational and translational components to be coupled). In this article, we examine various dual-quaternion interpolation options that achieve different trade-offs between computational cost, aesthetic factors and coupling dependency. Surprisingly, despite dual-quaternions being a common tool in graphics libraries, there are limited details on the interpolation details. Here we attempt to explain interpolation concept, elaborating on underpinning theories, while explaining concepts and bespoke modifications for added control.

The strength of a simplex is the key to a continuous isometry classification of Euclidean clouds of unlabelled points

Authors: Vitaliy Kurlin

This paper solves the continuous classification problem for finite clouds of unlabelled points under Euclidean isometry. The Lipschitz continuity of required invariants in a suitable metric under perturbations of points is motivated by the inevitable noise in measurements of real objects.

The best solved case of this isometry classification is known as the SSS theorem in school geometry saying that any triangle up to congruence (isometry in the plane) has a continuous complete invariant of three side lengths.

However, there is no easy extension of the SSS theorem even to four points in the plane partially due to a 4-parameter family of 4-point clouds that have the same six pairwise distances. The computational time of most past metrics that are invariant under isometry was exponential in the size of the input. The final obstacle was the discontinuity of previous invariants at singular configurations, for example, when a triangle degenerates to a straight line.

All the challenges above are now resolved by the Simplexwise Centred Distributions that combine inter-point distances of a given cloud with the new strength of a simplex that finally guarantees the Lipschitz continuity. The computational times of new invariants and metrics are polynomial in the number of points for a fixed Euclidean dimension.

Authors: Vitaliy Kurlin

This paper solves the continuous classification problem for finite clouds of unlabelled points under Euclidean isometry. The Lipschitz continuity of required invariants in a suitable metric under perturbations of points is motivated by the inevitable noise in measurements of real objects.

The best solved case of this isometry classification is known as the SSS theorem in school geometry saying that any triangle up to congruence (isometry in the plane) has a continuous complete invariant of three side lengths.

However, there is no easy extension of the SSS theorem even to four points in the plane partially due to a 4-parameter family of 4-point clouds that have the same six pairwise distances. The computational time of most past metrics that are invariant under isometry was exponential in the size of the input. The final obstacle was the discontinuity of previous invariants at singular configurations, for example, when a triangle degenerates to a straight line.

All the challenges above are now resolved by the Simplexwise Centred Distributions that combine inter-point distances of a given cloud with the new strength of a simplex that finally guarantees the Lipschitz continuity. The computational times of new invariants and metrics are polynomial in the number of points for a fixed Euclidean dimension.

Parameterized Algorithms for Topological Indices in Chemistry

Authors: Giovanna K. Conrado, Amir K. Goharshady, Harshit J. Motwani, Sergei Novozhilov

We have developed efficient parameterized algorithms for the enumeration problems of graphs arising in chemistry. In particular, we have focused on the following problems: enumeration of Kekul\'e structures, computation of Hosoya index, computation of Merrifield-Simmons index, and computation of graph entropy based on matchings and independent sets. All these problems are known to be $\# P$-complete. We have developed FPT algorithms for bounded treewidth and bounded pathwidth for these problems with a better time complexity than the known state-of-the-art in the literature. We have also conducted experiments on the entire PubChem database of chemical compounds and tested our algorithms. We also provide a comparison with naive baseline algorithms for these problems, along with a distribution of treewidth for the chemical compounds available in the PubChem database.

We have developed efficient parameterized algorithms for the enumeration problems of graphs arising in chemistry. In particular, we have focused on the following problems: enumeration of Kekul\'e structures, computation of Hosoya index, computation of Merrifield-Simmons index, and computation of graph entropy based on matchings and independent sets. All these problems are known to be $\# P$-complete. We have developed FPT algorithms for bounded treewidth and bounded pathwidth for these problems with a better time complexity than the known state-of-the-art in the literature. We have also conducted experiments on the entire PubChem database of chemical compounds and tested our algorithms. We also provide a comparison with naive baseline algorithms for these problems, along with a distribution of treewidth for the chemical compounds available in the PubChem database.