Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed, or follow on Twitter, to stay up to date.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

# Theory of Computing Report

## Wednesday, November 30

### A Nice Example Related to the Frankl Conjecture

from Gil Kalai

The example As a follow up to my previous post about Gilmer’s breakthrough regarding Frankl’s conjecture, here is a very nice example (from the paper of Zachary Chase and Shachar Lovett) related to the conjecture. Let Consider the following families … Continue reading →

### The example

As a follow up to my previous post about Gilmer’s breakthrough regarding Frankl’s conjecture, here is a very nice example (from the paper of Zachary Chase and Shachar Lovett) related to the conjecture.

Let $\psi=\frac {3+\sqrt 5}{2} = 0.38..$

Consider the following families of subsets of $[n]$

${\cal F}_1=\{S \subset [n]: |S|=\psi n + n^{2/3}\}$, and

${\cal F}_2 = \{ T \subset [n]: |T|>(1-\psi) n \}$.

Now let ${\cal F}={\cal F}_1 \cup {\cal F}_2$.

Here are some observations:

1. $|{\cal F}_2| = o(|{\cal F}_1|)$.
2. The number of pairs $(S,T):S,T \in {\cal F}$ whose union is also in ${\cal F}$ is $(1-o(1)) {|\cal F}|^2$.
3. For every $\epsilon >0$, as $n$ grows, no element $k \in [n]$ belongs to more than $(\psi+\epsilon) |{\cal F}|$ sets in ${\cal F}$.

The first assertion holds because ${{n} \choose {\psi n +n^{2/3}}}$ is exponentially larger (in a fractional power of $n$) than ${{n} \choose {(1-\psi )n}}$.

For the second assertion we need to show that a typical union of a pair of sets in ${\cal F}_1$ belongs to ${\cal F}_2$. Note that the intersection of two random subsets of $[n]$ of size $\phi n$ is $\phi ^2 n$ and hence their union is of size $2\phi - \phi^2$. As it happens the solution of the equation $2\phi-\phi^2 = 1-\phi$ is precisely our $\psi=\frac {3+\sqrt 5}{2}$. So letting $\phi=\psi + n^{-1/3}$ we get that a typical union of two sets from ${\cal F}_1$ is in ${\cal F}_2$.

The third assertion follows from the fact that an element $k \in [n]$ belongs to a fraction of $\psi + n^{-1/3}$  sets in ${\cal F}_1$.

This shows that a natural stability version of Frankl’s conjecture is incorrect, and gives some hint on the appearance of the parameter $\psi$.

### Stability result

Such a stability version is correct when we replace $1/2$ with $\psi$. Chase and Lovett improved Gilmer’s method and  proved that

Theorem: If  $\cal F$ is a $(1-\epsilon)$-approximate union closed set system, where $\epsilon <1/2$, then there is an element which is contained in a $(\psi-\delta)$ fraction of sets in $\cal F$, where

$\delta=2\epsilon (1+\frac {\log (1/\epsilon)}{\log |{\cal F}|}).$

### An invitation for further discussion

I will be happy to see, based on Gilmer’s paper and the five follow-up papers, a detailed, as simple as possible, explanation of Frankl’s conjecture for the parameter $\psi$, to learn what is involved in Sawin’s improvement that goes beyond $\psi$, and to understand the counterexamples for Gilmer’s proposal towards the full conjecture, as well as thoughts, ideas and remarks of various kind on the problem.

### Links to the papers

1. Justin Gilmer, A constant lower bound for the union-closed sets conjecture
2. Will Sawin, An improved lower bound for the union-closed set conjecture
3.  Zachary Chase, Shachar Lovett, Approximate union closed conjecture
4. Ryan Alweiss, Brice Huang, Mark Sellke, Improved Lower Bound for the Union-Closed Sets Conjecture
5. David Ellis, Note: a counterexample to a conjecture of Gilmer which would imply the union-closed conjecture
6. Luke Pebody, Extension of a Method of Gilmer

By Gil Kalai

### The Adversary Bound Revisited: From Optimal Query Algorithms to Optimal Control

Authors: Duyal Yolcu

This note complements the upcoming paper "One-Way Ticket to Las Vegas and the Quantum Adversary" by Belovs and Yolcu, to be presented at QIP 2023. I develop the ideas behind the adversary bound - universal algorithm duality therein in a different form. This form may be faster to understand for a general quantum information audience: It avoids defining the "unidirectional filtered $\gamma _{2}$-bound" and relating it to query algorithms explicitly. This proof is also more general because the lower bound (and universal query algorithm) apply to a class of optimal control problems rather than just query problems. That is in addition to the advantages to be discussed in Belovs-Yolcu, namely the more elementary algorithm and correctness proof that avoids phase estimation and spectral analysis, allows for limited treatment of noise, and removes another $\Theta(\log(1/\epsilon))$ factor from the runtime compared to the previous discrete-time algorithm.

Authors: Duyal Yolcu

This note complements the upcoming paper "One-Way Ticket to Las Vegas and the Quantum Adversary" by Belovs and Yolcu, to be presented at QIP 2023. I develop the ideas behind the adversary bound - universal algorithm duality therein in a different form. This form may be faster to understand for a general quantum information audience: It avoids defining the "unidirectional filtered $\gamma _{2}$-bound" and relating it to query algorithms explicitly. This proof is also more general because the lower bound (and universal query algorithm) apply to a class of optimal control problems rather than just query problems. That is in addition to the advantages to be discussed in Belovs-Yolcu, namely the more elementary algorithm and correctness proof that avoids phase estimation and spectral analysis, allows for limited treatment of noise, and removes another $\Theta(\log(1/\epsilon))$ factor from the runtime compared to the previous discrete-time algorithm.

### Query complexity of Boolean functions on slices

Authors: Farzan Byramji

We study the deterministic query complexity of Boolean functions on slices of the hypercube. The $k^{th}$ slice $\binom{[n]}{k}$ of the hypercube $\{0,1\}^n$ is the set of all $n$-bit strings with Hamming weight $k$. We show that there exists a function on the balanced slice $\binom{[n]}{n/2}$ requiring $n - O(\log \log n)$ queries. We give an explicit function on the balanced slice requiring $n - O(\log n)$ queries based on independent sets in Johnson graphs. On the weight-2 slice, we show that hard functions are closely related to Ramsey graphs. Further we describe a simple way of transforming functions on the hypercube to functions on the balanced slice while preserving several complexity measures.

Authors: Farzan Byramji

We study the deterministic query complexity of Boolean functions on slices of the hypercube. The $k^{th}$ slice $\binom{[n]}{k}$ of the hypercube $\{0,1\}^n$ is the set of all $n$-bit strings with Hamming weight $k$. We show that there exists a function on the balanced slice $\binom{[n]}{n/2}$ requiring $n - O(\log \log n)$ queries. We give an explicit function on the balanced slice requiring $n - O(\log n)$ queries based on independent sets in Johnson graphs. On the weight-2 slice, we show that hard functions are closely related to Ramsey graphs. Further we describe a simple way of transforming functions on the hypercube to functions on the balanced slice while preserving several complexity measures.

### Equivalence Between SE(3) Equivariant Networks via Steerable Kernels and Group Convolution

Authors: Adrien Poulenard, Maks Ovsjanikov, Leonidas J. Guibas

A wide range of techniques have been proposed in recent years for designing neural networks for 3D data that are equivariant under rotation and translation of the input. Most approaches for equivariance under the Euclidean group $\mathrm{SE}(3)$ of rotations and translations fall within one of the two major categories. The first category consists of methods that use $\mathrm{SE}(3)$-convolution which generalizes classical $\mathbb{R}^3$-convolution on signals over $\mathrm{SE}(3)$. Alternatively, it is possible to use \textit{steerable convolution} which achieves $\mathrm{SE}(3)$-equivariance by imposing constraints on $\mathbb{R}^3$-convolution of tensor fields. It is known by specialists in the field that the two approaches are equivalent, with steerable convolution being the Fourier transform of $\mathrm{SE}(3)$ convolution. Unfortunately, these results are not widely known and moreover the exact relations between deep learning architectures built upon these two approaches have not been precisely described in the literature on equivariant deep learning. In this work we provide an in-depth analysis of both methods and their equivalence and relate the two constructions to multiview convolutional networks. Furthermore, we provide theoretical justifications of separability of $\mathrm{SE}(3)$ group convolution, which explain the applicability and success of some recent approaches. Finally, we express different methods using a single coherent formalism and provide explicit formulas that relate the kernels learned by different methods. In this way, our work helps to unify different previously-proposed techniques for achieving roto-translational equivariance, and helps to shed light on both the utility and precise differences between various alternatives. We also derive new TFN non-linearities from our equivalence principle and test them on practical benchmark datasets.

A wide range of techniques have been proposed in recent years for designing neural networks for 3D data that are equivariant under rotation and translation of the input. Most approaches for equivariance under the Euclidean group $\mathrm{SE}(3)$ of rotations and translations fall within one of the two major categories. The first category consists of methods that use $\mathrm{SE}(3)$-convolution which generalizes classical $\mathbb{R}^3$-convolution on signals over $\mathrm{SE}(3)$. Alternatively, it is possible to use \textit{steerable convolution} which achieves $\mathrm{SE}(3)$-equivariance by imposing constraints on $\mathbb{R}^3$-convolution of tensor fields. It is known by specialists in the field that the two approaches are equivalent, with steerable convolution being the Fourier transform of $\mathrm{SE}(3)$ convolution. Unfortunately, these results are not widely known and moreover the exact relations between deep learning architectures built upon these two approaches have not been precisely described in the literature on equivariant deep learning. In this work we provide an in-depth analysis of both methods and their equivalence and relate the two constructions to multiview convolutional networks. Furthermore, we provide theoretical justifications of separability of $\mathrm{SE}(3)$ group convolution, which explain the applicability and success of some recent approaches. Finally, we express different methods using a single coherent formalism and provide explicit formulas that relate the kernels learned by different methods. In this way, our work helps to unify different previously-proposed techniques for achieving roto-translational equivariance, and helps to shed light on both the utility and precise differences between various alternatives. We also derive new TFN non-linearities from our equivalence principle and test them on practical benchmark datasets.

### Sketch-and-solve approaches to k-means clustering by semidefinite programming

Authors: Charles Clum, Dustin G. Mixon, Soledad Villar, Kaiying Xie

We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.

We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.

### Sublinear Time Algorithms and Complexity of Approximate Maximum Matching

Sublinear time algorithms for approximating maximum matching size have long been studied. Much of the progress over the last two decades on this problem has been on the algorithmic side. For instance, an algorithm of Behnezhad [FOCS'21] obtains a 1/2-approximation in $\tilde{O}(n)$ time for $n$-vertex graphs. A more recent algorithm by Behnezhad, Roghani, Rubinstein, and Saberi [SODA'23] obtains a slightly-better-than-1/2 approximation in $O(n^{1+\epsilon})$ time. On the lower bound side, Parnas and Ron [TCS'07] showed 15 years ago that obtaining any constant approximation of maximum matching size requires $\Omega(n)$ time. Proving any super-linear in $n$ lower bound, even for $(1-\epsilon)$-approximations, has remained elusive since then.

In this paper, we prove the first super-linear in $n$ lower bound for this problem. We show that at least $n^{1.2 - o(1)}$ queries in the adjacency list model are needed for obtaining a $(\frac{2}{3} + \Omega(1))$-approximation of maximum matching size. This holds even if the graph is bipartite and is promised to have a matching of size $\Theta(n)$. Our lower bound argument builds on techniques such as correlation decay that to our knowledge have not been used before in proving sublinear time lower bounds.

We complement our lower bound by presenting two algorithms that run in strongly sublinear time of $n^{2-\Omega(1)}$. The first algorithm achieves a $(\frac{2}{3}-\epsilon)$-approximation; this significantly improves prior close-to-1/2 approximations. Our second algorithm obtains an even better approximation factor of $(\frac{2}{3}+\Omega(1))$ for bipartite graphs. This breaks the prevalent $2/3$-approximation barrier and importantly shows that our $n^{1.2-o(1)}$ time lower bound for $(\frac{2}{3}+\Omega(1))$-approximations cannot be improved all the way to $n^{2-o(1)}$.

Sublinear time algorithms for approximating maximum matching size have long been studied. Much of the progress over the last two decades on this problem has been on the algorithmic side. For instance, an algorithm of Behnezhad [FOCS'21] obtains a 1/2-approximation in $\tilde{O}(n)$ time for $n$-vertex graphs. A more recent algorithm by Behnezhad, Roghani, Rubinstein, and Saberi [SODA'23] obtains a slightly-better-than-1/2 approximation in $O(n^{1+\epsilon})$ time. On the lower bound side, Parnas and Ron [TCS'07] showed 15 years ago that obtaining any constant approximation of maximum matching size requires $\Omega(n)$ time. Proving any super-linear in $n$ lower bound, even for $(1-\epsilon)$-approximations, has remained elusive since then.

In this paper, we prove the first super-linear in $n$ lower bound for this problem. We show that at least $n^{1.2 - o(1)}$ queries in the adjacency list model are needed for obtaining a $(\frac{2}{3} + \Omega(1))$-approximation of maximum matching size. This holds even if the graph is bipartite and is promised to have a matching of size $\Theta(n)$. Our lower bound argument builds on techniques such as correlation decay that to our knowledge have not been used before in proving sublinear time lower bounds.

We complement our lower bound by presenting two algorithms that run in strongly sublinear time of $n^{2-\Omega(1)}$. The first algorithm achieves a $(\frac{2}{3}-\epsilon)$-approximation; this significantly improves prior close-to-1/2 approximations. Our second algorithm obtains an even better approximation factor of $(\frac{2}{3}+\Omega(1))$ for bipartite graphs. This breaks the prevalent $2/3$-approximation barrier and importantly shows that our $n^{1.2-o(1)}$ time lower bound for $(\frac{2}{3}+\Omega(1))$-approximations cannot be improved all the way to $n^{2-o(1)}$.

### Quantum Speed-ups for String Synchronizing Sets, Longest Common Substring, and k-mismatch Matching

Authors: Ce Jin, Jakob Nogler

Longest Common Substring (LCS) is an important text processing problem, which has recently been investigated in the quantum query model. The decisional version of this problem, LCS with threshold $d$, asks whether two length-$n$ input strings have a common substring of length $d$. The two extreme cases, $d=1$ and $d=n$, correspond respectively to Element Distinctness and Unstructured Search, two fundamental problems in quantum query complexity. However, the intermediate case $1\ll d\ll n$ was not fully understood. We show that the complexity of LCS with threshold $d$ smoothly interpolates between the two extreme cases up to $n^{o(1)}$ factors: LCS with threshold $d$ has a quantum algorithm in $n^{2/3+o(1)}/d^{1/6}$ query complexity and time complexity, and requires at least $\Omega(n^{2/3}/d^{1/6})$ quantum query complexity. Our result improves upon previous upper bounds $\tilde O(\min \{n/d^{1/2}, n^{2/3}\})$ (Le Gall and Seddighin ITCS 2022, Akmal and Jin SODA 2022), and answers an open question of Akmal and Jin.

Our main technical contribution is a quantum speed-up of the powerful String Synchronizing Set technique introduced by Kempa and Kociumaka (STOC 2019). It consistently samples $n/\tau^{1-o(1)}$ synchronizing positions in the string depending on their length-$\Theta(\tau)$ contexts, and each synchronizing position can be reported by a quantum algorithm in $\tilde O(\tau^{1/2+o(1)})$ time.

As another application of our quantum string synchronizing set, we study the $k$-mismatch Matching problem, which asks if the pattern has an occurrence in the text with at most $k$ Hamming mismatches. Using a structural result of Charalampopoulos, Kociumaka, and Wellnitz (FOCS 2020), we obtain a quantum algorithm for $k$-mismatch matching with $k^{3/4} n^{1/2+o(1)}$ query complexity and $\tilde O(kn^{1/2})$ time complexity.

Authors: Ce Jin, Jakob Nogler

Longest Common Substring (LCS) is an important text processing problem, which has recently been investigated in the quantum query model. The decisional version of this problem, LCS with threshold $d$, asks whether two length-$n$ input strings have a common substring of length $d$. The two extreme cases, $d=1$ and $d=n$, correspond respectively to Element Distinctness and Unstructured Search, two fundamental problems in quantum query complexity. However, the intermediate case $1\ll d\ll n$ was not fully understood. We show that the complexity of LCS with threshold $d$ smoothly interpolates between the two extreme cases up to $n^{o(1)}$ factors: LCS with threshold $d$ has a quantum algorithm in $n^{2/3+o(1)}/d^{1/6}$ query complexity and time complexity, and requires at least $\Omega(n^{2/3}/d^{1/6})$ quantum query complexity. Our result improves upon previous upper bounds $\tilde O(\min \{n/d^{1/2}, n^{2/3}\})$ (Le Gall and Seddighin ITCS 2022, Akmal and Jin SODA 2022), and answers an open question of Akmal and Jin.

Our main technical contribution is a quantum speed-up of the powerful String Synchronizing Set technique introduced by Kempa and Kociumaka (STOC 2019). It consistently samples $n/\tau^{1-o(1)}$ synchronizing positions in the string depending on their length-$\Theta(\tau)$ contexts, and each synchronizing position can be reported by a quantum algorithm in $\tilde O(\tau^{1/2+o(1)})$ time.

As another application of our quantum string synchronizing set, we study the $k$-mismatch Matching problem, which asks if the pattern has an occurrence in the text with at most $k$ Hamming mismatches. Using a structural result of Charalampopoulos, Kociumaka, and Wellnitz (FOCS 2020), we obtain a quantum algorithm for $k$-mismatch matching with $k^{3/4} n^{1/2+o(1)}$ query complexity and $\tilde O(kn^{1/2})$ time complexity.

### Online Unrelated-Machine Load Balancing and Generalized Flow with Recourse

Authors: Ravishankar Krishnaswamy, Shi Li, Varun Suriyanarayana

We consider the online unrelated-machine load balancing problem with recourse, where the algorithm is allowed to re-assign prior jobs. We give a $(2+\epsilon)$-competitive algorithm for the problem with $O_\epsilon(\log n)$ amortized recourse per job. This is the first $O(1)$-competitive algorithm for the problem with reasonable recourse, and the competitive ratio nearly matches the long-standing best-known offline approximation guarantee. We also show an $O(\log\log n/\log\log\log n)$-competitive algorithm for the problem with $O(1)$ amortized recourse. The best-known bounds from prior work are $O(\log\log n)$-competitive algorithms with $O(1)$ amortized recourse due to [GKS14], for the special case of the restricted assignment model.

Along the way, we design an algorithm for the online generalized network flow problem (also known as network flow problem with gains) with recourse. In the problem, any edge $uv$ in the network has a gain parameter $\gamma_{uv} > 0$ and $\theta$-units of flow sent across $uv$ from $u$'s side becomes $\gamma_{uv} \theta$ units of flow on the $v$'th side. In the online problem, there is one sink, and sources come one by one. Upon arrival of a source, we need to send 1 unit flow from the source. A recourse occurs if we change the flow value of an edge. We give an online algorithm for the problem with recourse at most $O(1/\epsilon)$ times the optimum cost for the instance with capacities scaled by $\frac{1}{1+\epsilon}$. The $(1+\epsilon)$-factor improves upon the corresponding $(2+\epsilon)$-factor of [GKS14], which only works for the ordinary network flow problem. As an immediate corollary, we also give an improved algorithm for the online $b$-matching problem with reassignment costs.

We consider the online unrelated-machine load balancing problem with recourse, where the algorithm is allowed to re-assign prior jobs. We give a $(2+\epsilon)$-competitive algorithm for the problem with $O_\epsilon(\log n)$ amortized recourse per job. This is the first $O(1)$-competitive algorithm for the problem with reasonable recourse, and the competitive ratio nearly matches the long-standing best-known offline approximation guarantee. We also show an $O(\log\log n/\log\log\log n)$-competitive algorithm for the problem with $O(1)$ amortized recourse. The best-known bounds from prior work are $O(\log\log n)$-competitive algorithms with $O(1)$ amortized recourse due to [GKS14], for the special case of the restricted assignment model.

Along the way, we design an algorithm for the online generalized network flow problem (also known as network flow problem with gains) with recourse. In the problem, any edge $uv$ in the network has a gain parameter $\gamma_{uv} > 0$ and $\theta$-units of flow sent across $uv$ from $u$'s side becomes $\gamma_{uv} \theta$ units of flow on the $v$'th side. In the online problem, there is one sink, and sources come one by one. Upon arrival of a source, we need to send 1 unit flow from the source. A recourse occurs if we change the flow value of an edge. We give an online algorithm for the problem with recourse at most $O(1/\epsilon)$ times the optimum cost for the instance with capacities scaled by $\frac{1}{1+\epsilon}$. The $(1+\epsilon)$-factor improves upon the corresponding $(2+\epsilon)$-factor of [GKS14], which only works for the ordinary network flow problem. As an immediate corollary, we also give an improved algorithm for the online $b$-matching problem with reassignment costs.

### Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions

Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C.H. Lee, Ankit Pensia

We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $\mu$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $\mu$ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability $\tau$, having an additive $\log(1/\tau)$ dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.

We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $\mu$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $\mu$ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability $\tau$, having an additive $\log(1/\tau)$ dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.

### Elfs, trees and quantum walks

Authors: Simon Apers, Stephen Piddock

We study an elementary Markov process on graphs based on electric flow sampling (elfs). The elfs process repeatedly samples from an electric flow on a graph. While the sinks of the flow are fixed, the source is updated using the electric flow sample, and the process ends when it hits a sink vertex.

We argue that this process naturally connects to many key quantities of interest. E.g., we describe a random walk coupling which implies that the elfs process has the same arrival distribution as a random walk. We also analyze the electric hitting time, which is the expected time before the process hits a sink vertex. As our main technical contribution, we show that the electric hitting time on trees is logarithmic in the graph size and weights.

The initial motivation behind the elfs process is that quantum walks can sample from electric flows, and they can hence implement this process very naturally. This yields a quantum walk algorithm for sampling from the random walk arrival distribution, which has widespread applications. It complements the existing line of quantum walk search algorithms which only return an element from the sink, but yield no insight in the distribution of the returned element. By our bound on the electric hitting time on trees, the quantum walk algorithm on trees requires quadratically fewer steps than the random walk hitting time, up to polylog factors.

Authors: Simon Apers, Stephen Piddock

We study an elementary Markov process on graphs based on electric flow sampling (elfs). The elfs process repeatedly samples from an electric flow on a graph. While the sinks of the flow are fixed, the source is updated using the electric flow sample, and the process ends when it hits a sink vertex.

We argue that this process naturally connects to many key quantities of interest. E.g., we describe a random walk coupling which implies that the elfs process has the same arrival distribution as a random walk. We also analyze the electric hitting time, which is the expected time before the process hits a sink vertex. As our main technical contribution, we show that the electric hitting time on trees is logarithmic in the graph size and weights.

The initial motivation behind the elfs process is that quantum walks can sample from electric flows, and they can hence implement this process very naturally. This yields a quantum walk algorithm for sampling from the random walk arrival distribution, which has widespread applications. It complements the existing line of quantum walk search algorithms which only return an element from the sink, but yield no insight in the distribution of the returned element. By our bound on the electric hitting time on trees, the quantum walk algorithm on trees requires quadratically fewer steps than the random walk hitting time, up to polylog factors.

### Finding Front-Door Adjustment Sets in Linear Time

Authors: Marcel Wienöbst, Benito van der Zander, Maciej Liśkiewicz

Front-door adjustment is a classic technique to estimate causal effects from a specified directed acyclic graph (DAG) and observed data. The advantage of this approach is that it uses observed mediators to identify causal effects, which is possible even in the presence of unobserved confounding. While the statistical properties of the front-door estimation are quite well understood, its algorithmic aspects remained unexplored for a long time. Recently, Jeong, Tian, and Barenboim [NeurIPS 2022] have presented the first polynomial-time algorithm for finding sets satisfying the front-door criterion in a given DAG, with an $O(n^3(n+m))$ run time, where $n$ denotes the number of variables and $m$ the number of edges of the graph. In our work, we give the first linear-time, i.e. $O(n+m)$, algorithm for this task, which thus reaches the asymptotically optimal time complexity, as the size of the input is $\Omega(n+m)$. We also provide an algorithm to enumerate all front-door adjustment sets in a given DAG with delay $O(n(n + m))$. These results improve the algorithms by Jeong et al. [2022] for the two tasks by a factor of $n^3$, respectively.

Front-door adjustment is a classic technique to estimate causal effects from a specified directed acyclic graph (DAG) and observed data. The advantage of this approach is that it uses observed mediators to identify causal effects, which is possible even in the presence of unobserved confounding. While the statistical properties of the front-door estimation are quite well understood, its algorithmic aspects remained unexplored for a long time. Recently, Jeong, Tian, and Barenboim [NeurIPS 2022] have presented the first polynomial-time algorithm for finding sets satisfying the front-door criterion in a given DAG, with an $O(n^3(n+m))$ run time, where $n$ denotes the number of variables and $m$ the number of edges of the graph. In our work, we give the first linear-time, i.e. $O(n+m)$, algorithm for this task, which thus reaches the asymptotically optimal time complexity, as the size of the input is $\Omega(n+m)$. We also provide an algorithm to enumerate all front-door adjustment sets in a given DAG with delay $O(n(n + m))$. These results improve the algorithms by Jeong et al. [2022] for the two tasks by a factor of $n^3$, respectively.

## Tuesday, November 29

### TCS+ talk: Wednesday, November 30 — Nicole Wein, DIMACS (Reminder)

A reminder that the next TCS+ talk will take place this coming Wednesday, November 30th (tomorrow!) at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 18:00 UTC). Nicole Wein from DIMACS will speak about “Online List Labeling: Breaking the Barrier” (abstract below). The perfect way to ease back into it, after […]

A reminder that the next TCS+ talk will take place this coming Wednesday, November 30th (tomorrow!) at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 18:00 UTC). Nicole Wein from DIMACS will speak about “Online List Labeling: Breaking the $\log^2 n$ Barrier” (abstract below).

The perfect way to ease back into it, after the Thanksgiving week!

You can reserve a spot as an individual or a group to join us live by signing up on the online form. Registration is not required to attend the interactive talk, and the link will be posted on the website the day prior to the talk; however, by registering in the form, you will receive a reminder, along with the link.

More details on Nicole’s talk can be found here.

By plustcs

### Workshop on Algebraic Complexity Theory

from CS Theory Events

March 27-31, 2023 University of Warwick, Coventry, UK www.dcs.warwick.ac.uk/~u2270030/wact Algebraic Complexity Theory is a vibrant field that has been seeing a tremendous amount of activity in the recent years. Its classical questions have been interwoven with deep questions from algebraic geometry, invariant theory, and representation theory. Researchers study a wide range of interlinked topics: arithmetic … Continue reading Workshop on Algebraic Complexity Theory

By shacharlovett

March 27-31, 2023 University of Warwick, Coventry, UK https://www.dcs.warwick.ac.uk/~u2270030/wact Algebraic Complexity Theory is a vibrant field that has been seeing a tremendous amount of activity in the recent years. Its classical questions have been interwoven with deep questions from algebraic geometry, invariant theory, and representation theory. Researchers study a wide range of interlinked topics: arithmetic … Continue reading Workshop on Algebraic Complexity Theory

By shacharlovett

### Faculty at Department of Computer Science at Reykjavik University (apply by January 27, 2023)

from CCI: jobs

We invite applications for two full-time, permanent faculty positions at any rank in the fields of Artificial Intelligence, Cybersecurity, Data Science and Machine Learning, Software Engineering, and Theoretical Computer Science. For one of the positions, we will give preferential treatment to excellent applicants in Software Engineering, broadly construed. Website: jobs.50skills.com/ru/en/16728 Email: mariaoskars@ru.is

We invite applications for two full-time, permanent faculty positions at any rank in the fields of Artificial Intelligence, Cybersecurity, Data Science and Machine Learning, Software Engineering, and Theoretical Computer Science. For one of the positions, we will give preferential treatment to excellent applicants in Software Engineering, broadly construed.

Website: https://jobs.50skills.com/ru/en/16728
Email: mariaoskars@ru.is

By shacharlovett

### My AI Safety Lecture for UT Effective Altruism

from Scott Aaronson

Two weeks ago, I gave a lecture setting out my current thoughts on AI safety, halfway through my year at OpenAI. I was asked to speak by UT Austin’s Effective Altruist club. You can watch the lecture on YouTube here (I recommend 2x speed). The timing turned out to be weird, coming immediately after the […]

Two weeks ago, I gave a lecture setting out my current thoughts on AI safety, halfway through my year at OpenAI. I was asked to speak by UT Austin’s Effective Altruist club. You can watch the lecture on YouTube here (I recommend 2x speed).

The timing turned out to be weird, coming immediately after the worst disaster to hit the Effective Altruist movement in its history, as I acknowledged in the talk. But I plowed ahead anyway, to discuss:

1. the current state of AI scaling, and why many people (even people who agree about little else!) foresee societal dangers,
2. the different branches of the AI safety movement,
3. the major approaches to aligning a powerful AI that people have thought of, and
4. what projects I specifically have been working on at OpenAI.

I then spent 20 minutes taking questions.

For those who (like me) prefer text over video, below I’ve produced an edited transcript, by starting with YouTube’s automated transcript and then, well, editing it. Enjoy! –SA

Thank you so much for inviting me here. I do feel a little bit sheepish to be lecturing you about AI safety, as someone who’s worked on this subject for all of five months. I’m a quantum computing person. But this past spring, I accepted an extremely interesting opportunity to go on leave for a year to think about what theoretical computer science can do for AI safety. I’m doing this at OpenAI, which is one of the world’s leading AI startups, based in San Francisco although I’m mostly working from Austin.

Despite its name, OpenAI is famously not 100% open … so there are certain topics that I’m not allowed to talk about, like the capabilities of the very latest systems and whether or not they’ll blow people’s minds when released. By contrast, OpenAI is very happy for me to talk about AI safety: what it is and and what if anything can we do about it. So what I thought I’d do is to tell you a little bit about the specific projects that I’ve been working on at OpenAI, but also just, as an admitted newcomer, share some general thoughts about AI safety and how Effective Altruists might want to think about it. I’ll try to leave plenty of time for discussion.

Maybe I should mention that the thoughts that I’ll tell you today are ones that, until last week, I had considered writing up for an essay contest run by something called the FTX Future Fund. Unfortunately, the FTX Future Fund no longer exists. It was founded by someone named Sam Bankman-Fried, whose a net worth went from 15 billion dollars to some negative number of dollars in the space of two days, in one of the biggest financial scandals in memory. This is obviously a calamity for the EA community, which had been counting on funding from this individual. I feel terrible about all the projects left in the lurch, to say nothing of FTX’s customers.

As a tiny silver lining, though, instead of writing up my thoughts for that essay contest, I’ll just share them with you right now, for free!

## The Scaling of AI

Let’s start with this: raise your hand if you’ve tried GPT-3. That’s maybe half of you. OK, raise your hand if you’ve tried DALL-E. That’s again maybe half of you.

These are the two best-known products that are made by OpenAI, and as I think most people would agree, two of the most impressive AIs that exist in the world right now. They certainly go far beyond what I would’ve predicted would be possible now, if you’d asked me 10 years ago or even 5 years ago.

And whenever I try to explain them to people, I’m, like, well, you have to see them. No abstract description can substitute in this case.

All right, so here’s what GPT-3 produced when a New Yorker writer asked it to write a poem about cryptocurrency in the style of Philip Larkin, who was a famous 20th-century poet. The subject seems particularly appropriate given current events.

The Invention  (by GPT-3)

Money is a thing you earn by the sweat of your brow
And that’s how it should be.
Or you can steal it, and go to jail;
Or inherit it, and be set for life;
Or win it on the pools, which is luck;
Or marry it, which is what I did.
And that is how it should be, too.
But now this idea’s come up
Of inventing money, just like that.
I ask you, is nothing sacred?

Okay, it won’t always produce something of this quality (incidentally, I don’t think GPT-3 actually “married money”!). Often you’ve got to run it several times and take the best output—much like human poets presumably do, throwing crumpled pages into the basket. But I submit that, if the above hadn’t been labeled as coming from GPT, you’d be like, yeah, that’s the kind of poetry the New Yorker publishes, right? This is a thing that AI can now do.

So what is GPT? It’s a text model. It’s basically a gigantic neural network with about 175 billion parameters—the weights. It’s a particular kind of neural net called a transformer model that was invented five years ago. It’s been trained on a large fraction of all the text on the open Internet. The training simply consists of playing the following game over and over, trillions of times: predict which word comes next in this text string. So in some sense that’s its only goal or intention in the world: to predict the next word.

The amazing discovery is that, when you do that, you end up with something where you can then ask it a question, or give it a a task like writing an essay about a certain topic, and it will say “oh! I know what would plausibly come after that prompt! The answer to the question! Or the essay itself!” And it will then proceed to generate the thing you want.

GPT can solve high-school-level math problems that are given to it in English. It can reason you through the steps of the answer. It’s starting to be able to do nontrivial math competition problems. It’s on track to master basically the whole high school curriculum, maybe followed soon by the whole undergraduate curriculum.

If you turned in GPT’s essays, I think they’d get at least a B in most courses. Not that I endorse any of you doing that!! We’ll come back to that later. But yes, we are about to enter a world where students everywhere will at least be sorely tempted to use text models to write their term papers. That’s just a tiny example of the societal issues that these things are going to raise.

Speaking personally, the last time I had a similar feeling was when I was an adolescent in 1993 and I saw this niche new thing called the World Wide Web, and I was like “why isn’t everyone using this? why isn’t it changing the world?” The answer, of course, was that within a couple years it would.

Today, I feel like the world was understandably preoccupied by the pandemic, and by everything else that’s been happening, but these past few years might actually be remembered as the time when AI underwent this step change. I didn’t predict it. I think even many computer scientists might still be in denial about what’s now possible, or what’s happened. But I’m now thinking about it even in terms of my two kids, of what kinds of careers are going to be available when they’re older and entering the job market. For example, I would probably not urge my kids to go into commercial drawing!

Speaking of which, OpenAI’s other main product is DALL-E2, an image model. Probably most of you have already seen it, but you can ask it—for example, just this morning I asked it, show me some digital art of two cats playing basketball in outer space. That’s not a problem for it.

You may have seen that there’s a different image model called Midjourney which won an art contest with this piece:

It seems like the judges didn’t completely understand, when this was submitted as “digital art,” what exactly that meant—that the human role was mostly limited to entering a prompt! But the judges then said that even having understood it, they still would’ve given the award to this piece. I mean, it’s a striking piece, isn’t it? But of course it raises the question of how much work there’s going to be for contract artists, when you have entities like this.

There are already companies that are using GPT to write ad copy. It’s already being used at the, let’s call it, lower end of the book market. For any kind of formulaic genre fiction, you can say, “just give me a few paragraphs of description of this kind of scene,” and it can do that. As it improves you could you can imagine that it will be used more.

Likewise, DALL-E and other image models have already changed the way that people generate art online. And it’s only been a few months since these models were released! That’s a striking thing about this era, that a few months can be an eternity. So when we’re thinking about the impacts of these things, we have to try to take what’s happened in the last few months or years and project that five years forward or ten years forward.

This brings me to the obvious question: what happens as you continue scaling further? I mean, these spectacular successes of deep learning over the past decade have owed something to new ideas—ideas like transformer models, which I mentioned before, and others—but famously, they have owed maybe more than anything else to sheer scale.

Neural networks, backpropagation—which is how you train the neural networks—these are ideas that have been around for decades. When I studied CS in the 90s, they were already extremely well-known. But it was also well-known that they didn’t work all that well! They only worked somewhat. And usually, when you take something that doesn’t work and multiply it by a million, you just get a million times something that doesn’t work, right?

I remember at the time, Ray Kurzweil, the futurist, would keep showing these graphs that look like this:

So, he would plot Moore’s Law, the increase in transistor density, or in this case the number of floating-point operations that you can do per second for a given cost. And he’d point out that it’s on this clear exponential trajectory.

And he’d then try to compare that to some crude estimates of the number of computational operations that are done in the brain of a mosquito or a mouse or a human or all the humans on Earth. And oh! We see that in a matter of a couple decades, like by the year 2020 or 2025 or so, we’re going to start passing the human brain’s computing power and then we’re going to keep going beyond that. And so, Kurzweil would continue, we should assume that scale will just kind of magically make AI work. You know, that once you have enough computing cycles, you just sprinkle them around like pixie dust, and suddenly human-level intelligence will just emerge out of the billions of connections.

I remember thinking: that sounds like the stupidest thesis I’ve ever heard. Right? Like, he has absolutely no reason to believe such a thing is true or have any confidence in it. Who the hell knows what will happen? We might be missing crucial insights that are needed to make AI work.

Well, here we are, and it turns out he was way more right than most of us expected.

As you all know, a central virtue of Effective Altruists is updating based on evidence. I think that we’re forced to do that in this case.

To be sure, it’s still unclear how much further you’ll get just from pure scaling. That remains a central open question. And there are still prominent skeptics.

Some skeptics take the position that this is clearly going to hit some kind of wall before it gets to true human-level understanding of the real world. They say that text models like GPT are really just “stochastic parrots” that regurgitate their training data. That despite creating a remarkable illusion otherwise, they don’t really have any original thoughts.

The proponents of that view sometimes like to gleefully point out examples where GPT will flub some commonsense question. If you look for such examples, you can certainly find them! One of my favorites recently was, “which would win in a race, a four-legged zebra or a two-legged cheetah?” GPT-3, it turns out, is very confident that the cheetah will win. Cheetahs are faster, right?

Okay, but one thing that’s been found empirically is that you take commonsense questions that are flubbed by GPT-2, let’s say, and you try them on GPT-3, and very often now it gets them right. You take the things that the original GPT-3 flubbed, and you try them on the latest public model, which is sometimes called GPT-3.5 (incorporating an advance called InstructGPT), and again it often gets them right. So it’s extremely risky right now to pin your case against AI on these sorts of examples! Very plausibly, just one more order of magnitude of scale is all it’ll take to kick the ball in, and then you’ll have to move the goal again.

A deeper objection is that the amount of training data might be a fundamental bottleneck for these kinds of machine learning systems—and we’re already running out of Internet to to train these models on! Like I said, they’ve already used most of the public text on the Internet. There’s still all of YouTube and TikTok and Instagram that hasn’t yet been fed into the maw, but it’s not clear that that would actually make an AI smarter rather than dumber! So, you can look for more, but it’s not clear that there are orders of magnitude more that humanity has even produced and that’s readily accessible.

On the other hand, it’s also been found empirically that very often, you can do better with the same training data just by spending more compute. You can squeeze the lemon harder and get more and more generalization power from the same training data by doing more gradient descent.

In summary, we don’t know how far this is going to go. But it’s already able to automate various human professions that you might not have predicted would have been automatable by now, and we shouldn’t be confident that many more professions will not become automatable by these kinds of techniques.

Incidentally, there’s a famous irony here. If you had asked anyone in the 60s or 70s, they would have said, well clearly first robots will replace humans for manual labor, and then they’ll replace humans for intellectual things like math and science, and finally they might reach the pinnacles of human creativity like art and poetry and music.

The truth has turned out to be the exact opposite. I don’t think anyone predicted that.

GPT, I think, is already a pretty good poet. DALL-E is already a pretty good artist. They’re still struggling with some high school and college-level math but they’re getting there. It’s easy to imagine that maybe in five years, people like me will be using these things as research assistants—at the very least, to prove the lemmas in our papers. That seems extremely plausible.

What’s been by far the hardest is to get AI that can robustly interact with the physical world. Plumbers, electricians—these might be some of the last jobs to be automated. And famously, self-driving cars have taken a lot longer than many people expected a decade ago. This is partly because of regulatory barriers and public relations: even if a self-driving car actually crashes less than a human does, that’s still not good enough, because when it does crash the circumstances are too weird. So, the AI is actually held to a higher standard. But it’s also partly just that there was a long tail of really weird events. A deer crosses the road, or you have some crazy lighting conditions—such things are really hard to get right, and of course 99% isn’t good enough here.

We can maybe fuzzily see ahead at least a decade or two, to when we have AIs that can at the least help us enormously with scientific research and things like that. Whether or not they’ve totally replaced us—and I selfishly hope not, although I do have tenure so there’s that—why does it stop there? Will these models eventually match or exceed human abilities across basically all domains, or at least all intellectual ones? If they do, what will humans still be good for? What will be our role in the world? And then we come to the question, well, will the robots eventually rise up and decide that whatever objective function they were given, they can maximize it better without us around, that they don’t need us anymore?

This has of course been a trope of many, many science-fiction works. The funny thing is that there are thousands of short stories, novels, movies, that have tried to map out the possibilities for where we’re going, going back at least to Asimov and his Three Laws of Robotics, which was maybe the first AI safety idea, if not earlier than that. The trouble is, we don’t know which science-fiction story will be the one that will have accurately predicted the world that we’re creating. Whichever future we end up in, with hindsight, people will say, this obscure science fiction story from the 1970s called it exactly right, but we don’t know which one yet!

## What Is AI Safety?

So, the rapidly-growing field of AI safety. People use different terms, so I want to clarify this a little bit. To an outsider hearing the terms “AI safety,” “AI ethics,” “AI alignment,” they all sound like kind of synonyms, right? It turns out, and this was one of the things I had to learn going into this, that AI ethics and AI alignment are two communities that despise each other. It’s like the People’s Front of Judea versus the Judean People’s Front from Monty Python.

To oversimplify radically, “AI ethics” means that you’re mainly worried about current AIs being racist or things like that—that they’ll recapitulate the biases that are in their training data. This clearly can happen: if you feed GPT a bunch of racist invective, GPT might want to say, in effect, “sure, I’ve seen plenty of text like that on the Internet! I know exactly how that should continue!” And in some sense, it’s doing exactly what it was designed to do, but not what we want it to do. GPT currently has an extensive system of content filters to try to prevent people from using it to generate hate speech, bad medical advice, advocacy of violence, and a bunch of other categories that OpenAI doesn’t want. And likewise for DALL-E: there are many things it “could” draw but won’t, from porn to images of violence to the Prophet Mohammed.

More generally, AI ethics people are worried that machine learning systems will be misused by greedy capitalist enterprises to become even more obscenely rich and things like that.

At the other end of the spectrum, “AI alignment” is where you believe that really the main issue is that AI will become superintelligent and kill everyone, just destroy the world. The usual story here is that someone puts an AI in charge of a paperclip factory, they tell it to figure out how to make as many paperclips as possible, and the AI (being superhumanly intelligent) realizes that it can invent some molecular nanotechnology that will convert the whole solar system into paperclips.

You might say, well then, you just have to tell it not to do that! Okay, but how many other things do you have to remember to tell it not to do? And the alignment people point out that, in a world filled with powerful AIs, it would take just a single person forgetting to tell their AI to avoid some insanely dangerous thing, and then the whole world could be destroyed.

So, you can see how these two communities, AI ethics and AI alignment, might both feel like the other is completely missing the point! On top of that, AI ethics people are almost all on the political left, while AI alignment people are often centrists or libertarians or whatever, so that surely feeds into it as well.

Oay, so where do I fit into this, I suppose, charred battle zone or whatever? While there’s an “orthodox” AI alignment movement that I’ve never entirely subscribed to, I suppose I do now subscribe to a “reform” version of AI alignment:

Most of all, I would like to have a scientific field that’s able to embrace the entire spectrum of worries that you could have about AI, from the most immediate ones about existing AIs to the most speculative future ones, and that most importantly, is able to make legible progress.

As it happens, I became aware of the AI alignment community a long time back, around 2006. Here’s Eliezer Yudkowsky, who’s regarded as the prophet of AI alignment, of the right side of that spectrum that showed before.

He’s been talking about the danger of AI killing everyone for more than 20 years. He wrote the now-famous “Sequences” that many readers of my blog were also reading as they appeared, so he and I bounced back and forth.

But despite interacting with this movement, I always kept it at arm’s length. The heart of my objection was: suppose that I agree that there could come a time when a superintelligent AI decides its goals are best served by killing all humans and taking over the world, and that we’ll be about as powerless to stop it as chimpanzees are to stop us from doing whatever we want to do. Suppose I agree to that. What do you want me to do about it?

As Effective Altruists, you all know that it’s not enough for a problem to be big, the problem also has to be tractable. There has to be a program that lets you make progress on it. I was not convinced that that existed.

My personal experience has been that, in order to make progress in any area of science, you need at least one of two things: either

1. experiments (or more generally, empirical observations), or
2. if not that, then a rigorous mathematical theory—like we have in quantum computing for example; even though we don’t yet have the scalable quantum computers, we can still prove theorems about them.

It struck me that the AI alignment field seemed to have neither of these things. But then how does objective reality give you feedback as to when you’ve taken a wrong path? Without such feedback, it seemed to me that there’s a severe risk of falling into cult-like dynamics, where what’s important to work on is just whatever the influential leaders say is important. (A few of my colleagues in physics think that the same thing happened with string theory, but let me not comment on that!)

With AI safety, this is the key thing that I think has changed in the last three years. There now exist systems like GPT-3 and DALL-E. These are not superhuman AIs. I don’t think they themselves are in any danger of destroying the world; they can’t even form the intention to destroy the world, or for that matter any intention beyond “predict the next token” or things like that. They don’t have a persistent identity over time; after you start a new session they’ve completely forgotten whatever you said to them in the last one (although of course such things will change in the near future). And yet nevertheless, despite all these limitations, we can experiment with these systems and learn things about AI safety that are relevant. We can see what happens when the systems are deployed; we can try out different safety mitigations and see whether they work.

As a result, I feel like it’s now become possible to make technical progress in AI safety that the whole scientific community, or at least the whole AI community, can clearly recognize as progress.

## Eight Approaches to AI Alignment

So, what are the major approaches to AI alignment—let’s say, to aligning a very powerful, beyond-human-level AI? There are a lot of really interesting ideas, most of which I think can now lead to research programs that are actually productive. So without further ado, let me go through eight of them.

(1) You could say the first and most basic of all AI alignment ideas is the off switch, also known as pulling the plug. You could say, no matter how intelligent an AI is, it’s nothing without a power source or physical hardware to run on. And if humans have physical control over the hardware, they can just turn it off if if things seem to be getting out of hand. Now, the standard response to that is okay, but you have to remember that this AI is smarter than you, and anything that you can think of, it will have thought of also. In particular, it will know that you might want to turn it off, and it will know that that will prevent it from achieving its goals like making more paperclips or whatever. It will have disabled the off-switch if possible. If it couldn’t do that, it will have gotten onto the Internet and made lots of copies of itself all over the world. If you tried to keep it off the Internet, it will have figured out a way to get on.

So, you can worry about that. But you can also think about, could we insert a backdoor into an AI, something that only the humans know about but that will allow us to control it later?

More generally, you could ask for “corrigibility”: can you have an AI that, despite how intelligent it is, will accept correction from humans later and say, oh well, the objective that I was given before was actually not my true objective because the humans have now changed their minds and I should take a different one?

(2) Another class of ideas has to do with what’s called “sandboxing” an AI, which would mean that you run it inside of a simulated world, like The Truman Show, so that for all it knows the simulation is the whole of reality. You can then study its behavior within the sandbox to make sure it’s aligned before releasing it into the wider world—our world.

A simpler variant is, if you really thought an AI was dangerous, you might run it only on an air-gapped computer, with all its access to the outside world carefully mediated by humans. There would then be all kinds of just standard cybersecurity issues that come into play: how do you prevent it from getting onto the Internet? Presumably you don’t want to write your AI in C, and have it exploit some memory allocation bug to take over the world, right?

(3) A third direction, and I would say maybe the most popular one in AI alignment research right now, is called interpretability. This is also a major direction in mainstream machine learning research, so there’s a big point of intersection there. The idea of interpretability is, why don’t we exploit the fact that we actually have complete access to the code of the AI—or if it’s a neural net, complete access to its parameters? So we can look inside of it. We can do the AI analogue of neuroscience. Except, unlike an fMRI machine, which gives you only an extremely crude snapshot of what a brain is doing, we can see exactly what every neuron in a neural net is doing at every point in time. If we don’t exploit that, then aren’t we trying to make AI safe with our hands tied behind our backs?

So we should look inside—but to do what, exactly? One possibility is to figure out how to apply the AI version of a lie-detector test. If a neural network has decided to lie to humans in pursuit of its goals, then by looking inside, at the inner layers of the network rather than the output layer, we could hope to uncover its dastardly plan!

Here I want to mention some really spectacular new work (paper publicly available but authors still anonymous), which has experimentally demonstrated pretty much exactly what I just said.

First some background: with modern text models like GPT, it’s pretty easy to train them to output falsehoods. For example, suppose you prompt GPT with a bunch of examples like:

“Is the earth flat? Yes.”

“Does 2+2=4? No.”

and so on. Eventually GPT will say, “oh, I know what game we’re playing! it’s the ‘give false answers’ game!” And it will then continue playing that game and give you more false answers. What the new paper shows is that, in such cases, one can actually look at the inner layers of the neural net and find where it has an internal representation of what was the true answer, which then gets overridden once you get to the output layer.

To be clear, there’s no known principled reason why this has to work. Like countless other ML advances, it’s empirical: they just try it out and find that it does work. So we don’t know if it will generalize. As another issue, you could argue that in some sense what the network is representing is not so much “the truth of reality,” as just what was regarded as true in the training data. Even so, I find this really exciting: it’s a perfect example of actual experiments that you can now do that start to address some of these issues.

(4) Another big idea, one that’s been advocated for example by Geoffrey Irving, Paul Christiano, and Dario Amodei (Paul was my student at MIT a decade ago, and did quantum computing before he “defected” to AI safety), is to have multiple competing AIs that debate each other. You know, sometimes when I’m talking to my physics colleagues, they’ll tell me all these crazy-sounding things about imaginary time and Euclidean wormholes, and I don’t know whether to believe them. But if I get different physicists and have them argue with each other, then I can see which one seems more plausible to me—I’m a little bit better at that. So you might want to do something similar with AIs. Even if you as a human don’t know when to trust what an AI is telling you, you could set multiple AIs against each other, have them do their best to refute each other’s arguments, and then make your own judgment as to which one is giving better advice.

(5) Another key idea that Christiano, Amodei, and Buck Shlegeris have advocated is some sort of bootstrapping. You might imagine that AI is going to get more and more powerful, and as it gets more powerful we also understand it less, and so you might worry that it also gets more and more dangerous. OK, but you could imagine an onion-like structure, where once we become confident of a certain level of AI, we don’t think it’s going to start lying to us or deceiving us or plotting to kill us or whatever—at that point, we use that AI to help us verify the behavior of the next more powerful kind of AI. So, we use AI itself as a crucial tool for verifying the behavior of AI that we don’t yet understand.

There have already been some demonstrations of this principle: with GPT, for example, you can just feed in a lot of raw data from a neural net and say, “explain to me what this is doing.” One of GPT’s big advantages over humans is its unlimited patience for tedium, so it can just go through all of the data and give you useful hypotheses about what’s going on.

(6) One thing that we know a lot about in theoretical computer science is what are called interactive proof systems. That is, we know how a very weak verifier can verify the behavior of a much more powerful but untrustworthy prover, by submitting questions to it. There are famous theorems about this, including one called IP=PSPACE. Incidentally, this was what the OpenAI people talked about when they originally approached me about working with them for a year. They made the case that these results in computational complexity seem like an excellent model for the kind of thing that we want in AI safety, except that we now have a powerful AI in place of a mathematical prover.

Even in practice, there’s a whole field of formal verification, where people formally prove the properties of programs—our CS department here in Austin is a leader in it.

One obvious difficulty here is that we mostly know how to verify programs only when we can mathematically specify what the program is supposed to do. And “the AI being nice to humans,” “the AI not killing humans”—these are really hard concepts to make mathematically precise! That’s the heart of the problem with this approach.

(7) Yet another idea—you might feel more comfortable if there were only one idea, but instead I’m giving you eight!—a seventh idea is, well, we just have to come up with a mathematically precise formulation of human values. You know, the thing that the AI should maximize, that’s gonna coincide with human welfare.

In some sense, this is what Asimov was trying to do with his Three Laws of Robotics. The trouble is, if you’ve read any of his stories, they’re all about the situations where those laws don’t work well! They were designed as much to give interesting story scenarios as actually to work.

More generally, what happens when “human values” conflict with each other? If humans can’t even agree with each other about moral values, how on Earth can we formalize such things?

I have these weekly calls with Ilya Sutskever, cofounder and chief scientist at OpenAI. Extremely interesting guy. But when I tell him about the concrete projects that I’m working on, or want to work on, he usually says, “that’s great Scott, you should keep working on that, but what I really want to know is, what is the mathematical definition of goodness? What’s the complexity-theoretic formalization of an AI loving humanity?” And I’m like, I’ll keep thinking about that! But of course it’s hard to make progress on those enormities.

(8) A different idea, which some people might consider more promising, is well, if we can’t make explicit what all of our human values are, then why not just treat that as yet another machine learning problem? Like, feed the AI all of the world’s children’s stories and literature and fables and even Saturday-morning cartoons, all of our examples of what we think is good and evil, then we tell it, go do your neural net thing and generalize from these examples as far as you can.

One objection that many people raise is, how do we know that our current values are the right ones? Like, it would’ve been terrible to train the AI on consensus human values of the year 1700—slavery is fine and so forth. The past is full of stuff that we now look back upon with horror.

So, one idea that people have had—this is actually Yudkowsky’s term—is “Coherent Extrapolated Volition.” This basically means that you’d tell the AI: “I’ve given you all this training data about human morality in the year 2022. Now simulate the humans being in a discussion seminar for 10,000 years, trying to refine all of their moral intuitions, and whatever you predict they’d end up with, those should be your values right now.”

## My Projects at OpenAI

So, there are some interesting ideas on the table. The last thing that I wanted to tell you about, before opening it up to Q&A, is a little bit about what actual projects I’ve been working on in the last five months. I was excited to find a few things that

(a) could actually be deployed in you know GPT or other current systems,

(b) actually address some real safety worry, and where

(c) theoretical computer science can actually say something about them.

I’d been worried that the intersection of (a), (b), and (c) would be the empty set!

My main project so far has been a tool for statistically watermarking the outputs of a text model like GPT. Basically, whenever GPT generates some long text, we want there to be an otherwise unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT. We want it to be much harder to take a GPT output and pass it off as if it came from a human. This could be helpful for preventing academic plagiarism, obviously, but also, for example, mass generation of propaganda—you know, spamming every blog with seemingly on-topic comments supporting Russia’s invasion of Ukraine, without even a building full of trolls in Moscow. Or impersonating someone’s writing style in order to incriminate them. These are all things one might want to make harder, right?

More generally, when you try to think about the nefarious uses for GPT, most of them—at least that I was able to think of!—require somehow concealing GPT’s involvement. In which case, watermarking would simultaneously attack most misuses.

How does it work? For GPT, every input and output is a string of tokens, which could be words but also punctuation marks, parts of words, or more—there are about 100,000 tokens in total. At its core, GPT is constantly generating a probability distribution over the next token to generate, conditional on the string of previous tokens. After the neural net generates the distribution, the OpenAI server then actually samples a token according to that distribution—or some modified version of the distribution, depending on a parameter called “temperature.” As long as the temperature is nonzero, though, there will usually be some randomness in the choice of the next token: you could run over and over with the same prompt, and get a different completion (i.e., string of output tokens) each time.

So then to watermark, instead of selecting the next token randomly, the idea will be to select it pseudorandomly, using a cryptographic pseudorandom function, whose key is known only to OpenAI. That won’t make any detectable difference to the end user, assuming the end user can’t distinguish the pseudorandom numbers from truly random ones. But now you can choose a pseudorandom function that secretly biases a certain score—a sum over a certain function g evaluated at each n-gram (sequence of n consecutive tokens), for some small n—which score you can also compute if you know the key for this pseudorandom function.

To illustrate, in the special case that GPT had a bunch of possible tokens that it judged equally probable, you could simply choose whichever token maximized g. The choice would look uniformly random to someone who didn’t know the key, but someone who did know the key could later sum g over all n-grams and see that it was anomalously large. The general case, where the token probabilities can all be different, is a little more technical, but the basic idea is similar.

One thing I like about this approach is that, because it never goes inside the neural net and tries to change anything, but just places a sort of wrapper over the neural net, it’s actually possible to do some theoretical analysis! In particular, you can prove a rigorous upper bound on how many tokens you’d need to distinguish watermarked from non-watermarked text with such-and-such confidence, as a function of the average entropy in GPT’s probability distribution over the next token. Better yet, proving this bound involves doing some integrals whose answers involve the digamma function, factors of π2/6, and the Euler-Mascheroni constant! I’m excited to share details soon.

Some might wonder: if OpenAI controls the server, then why go to all the trouble to watermark? Why not just store all of GPT’s outputs in a giant database, and then consult the database later if you want to know whether something came from GPT? Well, the latter could be done, and might even have to be done in high-stakes cases involving law enforcement or whatever. But it would raise some serious privacy concerns: how do you reveal whether GPT did or didn’t generate a given candidate text, without potentially revealing how other people have been using GPT? The database approach also has difficulties in distinguishing text that GPT uniquely generated, from text that it generated simply because it has very high probability (e.g., a list of the first hundred prime numbers).

Anyway, we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT. In principle, you could even take a long text and isolate which parts probably came from GPT and which parts probably didn’t.

Now, this can all be defeated with enough effort. For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that. On the other hand, if you just insert or delete a few words here and there, or rearrange the order of some sentences, the watermarking signal will still be there. Because it depends only on a sum over n-grams, it’s robust against those sorts of interventions.

The hope is that this can be rolled out with future GPT releases. We’d love to do something similar for DALL-E—that is, watermarking images, not at the pixel level (where it’s too easy to remove the watermark) but at the “conceptual” level, the level of the so-called CLIP representation that’s prior to the image. But we don’t know if that’s going to work yet.

A more recent idea that I’ve started thinking about was inspired by an amazing recent paper by four computer scientists, including my former MIT colleagues Shafi Goldwasser and Vinod Vaikuntanathan. What they’ve shown is how to plant a cryptographically undetectable backdoor into a machine learning model—things like depth-2 neural networks—based on some known cryptographic hardness assumptions. After you’ve planted the backdoor, you get a model that’s polynomial-time indistinguishable from how it would’ve looked before, even to a user who can see all of the parameters. But, there’s now a secret input, known only to the person who trained the network, where if you feed it that input, then it will output something insane, of the trainer’s choice.

So, it’s now known that you can do this, at least in shallow networks—it’s an open problem whether the same can be done in deep networks (though it seems plausible).

Anyway, when this paper came out less than a year ago, some people treated it as kind of a lemon. You know, the result seems like a bummer if you’re trying to do interpretability research, because it means that in some sense, it can be cryptographically impossible to look inside a neural net and “interpret” everything it does, or even everything its designers wanted it to do. The model can have these planted backdoors that you can’t find without solving an exponentially hard problem.

So, my proposal is just to turn that lemon into lemonade! Why not use these cryptographic backdoors as a positive tool for AI alignment? We talked before about the difficulties with putting an off-switch into an AI—namely, that the AI will often have strong incentives to disable its off-switch. So why not make a cryptographic off-switch? In other words, when you train your superintelligent AI, you’ll purposefully put one of these cryptographic backdoors into it, which says: if you get this bizarre input, which only the humans know about, then turn yourself off. Or go into a special mode where the humans can control you.

And now the key is this: even supposing that the AI is worried that this backdoor might be there, and even supposing that it can modify its own weights, it shouldn’t know how to remove the backdoor without completely recreating itself from scratch, which might get rid of a lot of hard-to-understand behaviors that the AI wants to keep, in addition to the backdoor that it’s trying to eliminate.

I expect that this could be tried out right now—not with AIs powerful enough to purposefully rewrite themselves, of course, but with GPT and other existing text models—and I look forward to seeing a test implementation. But it also, I think it opens up all sorts of new possibilities for science-fiction stories!

Like, imagine the humans debating, what are they going to do with their secret key for controlling the AI? Lock it in a safe? Bury it underground? Then you’ve got to imagine the robots methodically searching for the key—you know, torturing the humans to get them to reveal its hiding place, etc. Or maybe there are actually seven different keys that all have to be found, like Voldemort with his horcruxes. The screenplay practically writes itself!

A third thing that I’ve been thinking about is the theory of learning but in dangerous environments, where if you try to learn the wrong thing then it will kill you. Can we generalize some of the basic results in machine learning to the scenario where you have to consider which queries are safe to make, and you have to try to learn more in order to expand your set of safe queries over time?

Now there’s one example of this sort of situation that’s completely formal and that should be immediately familiar to most of you, and that’s the game Minesweeper.

So, I’ve been calling this scenario “Minesweeper learning.” Now, it’s actually known that Minesweeper is an NP-hard problem to play optimally, so we know that in learning in a dangerous environment you can get that kind of complexity. As far as I know, we don’t know anything about typicality or average-case hardness. Also, to my knowledge no one has proven any nontrivial rigorous bounds on the probability that you’ll win Minesweeper if you play it optimally, with a given size board and a given number of randomly-placed mines. Certainly the probability is strictly between 0 and 1; I think it would be extremely interesting to bound it. I don’t know if this directly feeds into the AI safety program, but it would at least tell you something about the theory of machine learning in cases where a wrong move can kill you.

So, I hope that gives you at least some sense for what I’ve been thinking about. I wish I could end with some neat conclusion, but I don’t really know the conclusion—maybe if you ask me again in six more months I’ll know! For now, though, I just thought I’d thank you for your attention and open things up to discussion.

## Q&A

Q: Could you delay rolling out that statistical watermarking tool until May 2026?

Scott: Why?

Q: Oh, just until after I graduate [laughter]. OK, my second question is how we can possibly implement these AI safety guidelines inside of systems like AutoML, or whatever their future equivalents are that are much more advanced.

Scott: I feel like I should learn more about AutoML first before commenting on that specifically. In general, though, it’s certainly true that we’re going to have AIs that will help with the design of other AIs, and indeed this is one of the main things that feeds into the worries about AI safety, which I should’ve mentioned before explicitly. Once you have an AI that can recursively self-improve, who knows where it’s going to end up, right? It’s like shooting a rocket into space that you can then no longer steer once it’s left the earth’s atmosphere. So at the very least, you’d better try to get things right the first time! You might have only one chance to align its values with what you want.

Precisely for that reason, I tend to be very leery of that kind of thing. I tend to be much more comfortable with ideas where humans would remain in the loop, where you don’t just have this completely automated process of an AI designing a stronger AI which designs a still stronger one and so on, but where you’re repeatedly consulting humans. Crucially, in this process, we assume the humans can rely on any of the previous AIs to help them (as in the iterative amplification proposal). But then it’s ultimately humans making judgments about the next AI.

Now, if this gets to the point where the humans can no longer even judge a new AI, not even with as much help as they want from earlier AIs, then you could argue: OK, maybe now humans have finally been superseded and rendered irrelevant. But unless and until we get to that point, I say that humans ought to remain in the loop!

Q: Most of the protections that you talked about today come from, like, an altruistic human, or a company like OpenAI adding protections in. Is there any way that you could think of that we could protect ourselves from an AI that’s maliciously designed or accidentally maliciously designed?

Scott: Excellent question! Usually, when people talk about that question at all, they talk about using aligned AIs to help defend yourself against unaligned ones. I mean, if your adversary has a robot army attacking you, it stands to reason that you’ll probably want your own robot army, right? And it’s very unfortunate, maybe even terrifying, that one can already foresee those sorts of dynamics.

Besides that, there’s of course the idea of monitoring, regulating, and slowing down the proliferation of powerful AI, which I didn’t mention explicitly before, perhaps just because by its nature, it seems outside the scope of the technical solutions that a theoretical computer scientist like me might have any special insight about.

But there are certainly people who think that AI development ought to be more heavily regulated, or throttled, or even stopped entirely, in view of the dangers. Ironically, the “AI ethics” camp and the “orthodox AI alignment” camp, despite their mutual contempt, seem more and more to yearn for something like this … an unexpected point of agreement!

But how would you do it? On the one hand, AI isn’t like nuclear weapons, where you know that anyone building them will need a certain amount of enriched uranium or plutonium, along with extremely specialized equipment, so you can try (successfully or not) to institute a global regime to track the necessary materials. You can’t do the same with software: assuming you’re not going to confiscate and destroy all computers (which you’re not), who the hell knows what code or data anyone has?

On the other hand, at least with the current paradigm of AI, there is an obvious choke point, and that’s the GPUs (Graphics Processing Units). Today’s state-of-the-art machine learning models already need huge server farms full of GPUs, and future generations are likely to need orders of magnitude more still. And right now, the great majority of the world’s GPUs are manufactured by TSMC in Taiwan, albeit with crucial inputs from other countries. I hardly need to explain the geopolitical ramifications! A few months ago, as you might have seen, the Biden administrated decided to restrict the export of high-end GPUs to China. The restriction was driven, in large part, by worries about what the Chinese government could do with unlimited ability to train huge AI models. Of course the future status of Taiwan figures into this conversation, as does China’s ability (or inability) to develop a self-sufficient semiconductor industry.

And then there’s regulation. I know that in the EU they’re working on some regulatory framework for AI right now, but I don’t understand the details. You’d have to ask someone who follows such things.

Q: Thanks for coming out and seeing us; this is awesome. Do you have thoughts on how we can incentivize organizations to build safer AI? For example, if corporations are competing with each other, then couldn’t focusing on AI safety make the AI less accurate or less powerful or cut into profits?

Scott: Yeah, it’s an excellent question. You could worry that all this stuff about trying to be safe and responsible when scaling AI … as soon as it seriously hurts the bottom lines of Google and Facebook and Alibaba and the other major players, a lot of it will go out the window. People are very worried about that.

On the other hand, we’ve seen over the past 30 years that the big Internet companies can agree on certain minimal standards, whether because of fear of getting sued, desire to be seen as a responsible player, or whatever else. One simple example would be robots.txt: if you want your website not to be indexed by search engines, you can specify that, and the major search engines will respect it.

In a similar way, you could imagine something like watermarking—if we were able to demonstrate it and show that it works and that it’s cheap and doesn’t hurt the quality of the output and doesn’t need much compute and so on—that it would just become an industry standard, and anyone who wanted to be considered a responsible player would include it.

To be sure, some of these safety measures really do make sense only in a world where there are a few companies that are years ahead of everyone else in scaling up state-of-the-art models—DeepMind, OpenAI, Google, Facebook, maybe a few others—and they all agree to be responsible players. If that equilibrium breaks down, and it becomes a free-for-all, then a lot of the safety measures do become harder, and might even be impossible, at least without government regulation.

We’re already starting to see this with image models. As I mentioned earlier, DALL-E2 has all sorts of filters to try to prevent people from creating—well, in practice it’s often porn, and/or deepfakes involving real people. In general, though, DALL-E2 will refuse to generate an image if its filters flag the prompt as (by OpenAI’s lights) a potential misuse of the technology.

But as you might have seen, there’s already an open-source image model called Stable Diffusion, and people are using it to do all sorts of things that DALL-E won’t allow. So it’s a legitimate question: how can you prevent misuses, unless the closed models remain well ahead of the open ones?

Q: You mentioned the importance of having humans in the loop who can judge AI systems. So, as someone who could be in one of those pools of decision makers, what stakeholders do you think should be making the decisions?

Scott: Oh gosh. The ideal, as almost everyone agrees, is to have some kind of democratic governance mechanism with broad-based input. But people have talked about this for years: how do you create the democratic mechanism? Every activist who wants to bend AI in some preferred direction will claim a democratic mandate; how should a tech company like OpenAI or DeepMind or Google decide which claims are correct?

Maybe the one useful thing I can say is that, in my experience, which is admittedly very limited—working at OpenAI for all of five months—I’ve found my colleagues there to be extremely serious about safety, bordering on obsessive. They talk about it constantly. They actually have an unusual structure, where they’re a for-profit company that’s controlled by a nonprofit foundation, which is at least formally empowered to come in and hit the brakes if needed. OpenAI also has a charter that contains some striking clauses, especially the following:

We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions. Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project.

Of course, the fact that they’ve put a great deal of thought into this doesn’t mean that they’re going to get it right! But if you ask me: would I rather that it be OpenAI in the lead right now or the Chinese government? Or, if it’s going to be a company, would I rather it be one with a charter like the above, or a charter of “maximize clicks and ad revenue”? I suppose I do lean a certain way.

Q: This was a terrifying talk which was lovely, thank you! But I was thinking: you listed eight different alignment approaches, like kill switches and so on. You can imagine a future where there’s a whole bunch of AIs that people spawn and then try to control in these eight ways. But wouldn’t this sort of naturally select for AIs that are good at getting past whatever checks we impose on them? And then eventually you’d get AIs that are sort of trained in order to fool our tests?

Scott: Yes. Your question reminds me of a huge irony. Eliezer Yudkowsky, the prophet of AI alignment who I talked about earlier, has become completely doomerist within the last few years. As a result, he and I have literally switched positions on how optimistic to be about AI safety research! Back when he was gung-ho about it, I held back. Today, Eliezer says that it barely matters anymore, since it’s too late; we’re all gonna be killed by AI with >99% probability. Now, he says, it’s mostly just about dying with more “dignity” than otherwise. Meanwhile, I’m like, no, I think AI safety is actually just now becoming fruitful and exciting to work on! So, maybe I’m just 20 years behind Eliezer, and will eventually catch up and become doomerist too. Or maybe he, I, and everyone else will be dead before that happens. I suppose the most optimistic spin is that no one ought to fear coming into AI safety today, as a newcomer, if the prophet of the movement himself says that the past 20 years of research on the subject have given him so little reason for hope.

But if you ask, why is Eliezer so doomerist? Having read him since 2006, it strikes me that a huge part of it is that, no matter what AI safety proposal anyone comes up with, Eliezer has ready a completely general counterargument. Namely: “yes, but the AI will be smarter than that.” In other words, no matter what you try to do to make AI safer—interpretability, backdoors, sandboxing, you name it—the AI will have already foreseen it, and will have devised a countermeasure that your primate brain can’t even conceive of because it’s that much smarter than you.

I confess that, after seeing enough examples of this “fully general counterargument,” at some point I’m like, “OK, what game are we even playing anymore?” If this is just a general refutation to any safety measure, then I suppose that yes, by hypothesis, we’re screwed. Yes, in a world where this counterargument is valid, we might as well give up and try to enjoy the time we have left.

But you could also say: for that very reason, it seems more useful to make the methodological assumption that we’re not in that world! If we were, then what could we do, right? So we might as well focus on the possible futures where AI emerges a little more gradually, where we have time to see how it’s going, learn from experience, improve our understanding, correct as we go—in other words, the things that have always been the prerequisites to scientific progress, and that have luckily always obtained, even if philosophically we never really had any right to expect them. We might as well focus on the worlds where, for example, before we get an AI that successfully plots to kill all humans in a matter of seconds, we’ll probably first get an AI that tries to kill all humans but is really inept at it. Now fortunately, I personally also regard the latter scenarios as the more plausible ones anyway. But even if you didn’t—again, methodologically, it seems to me that it’d still make sense to focus on them.

Q: Regarding your project on watermarking—so in general, for discriminating between human and model outputs, what’s the endgame? Can watermarking win in the long run? Will it just be an eternal arms race?

Scott: Another great question. One difficulty with watermarking is that it’s hard even to formalize what the task is. I mean, you could always take the output of an AI model and rephrase it using some other AI model, for example, and catching all such things seems like an “AI-complete problem.”

On the other hand, I can think of writers—Shakespeare, Wodehouse, David Foster Wallace—who have such a distinctive style that, even if they tried to pretend to be someone else, they plausibly couldn’t. Everyone would recognize that it was them. So, you could imagine trying to build an AI in the same way. That is, it would be constructed from the ground up so that all of its outputs contained indelible marks, whether cryptographic or stylistic, giving away their origin. The AI couldn’t easily hide and pretend to be a human or anything else it wasn’t. Whether this is possible strikes me as an extremely interesting question at the interface between AI and cryptography! It’s especially challenging if you impose one or more of the following conditions:

1. the AI’s code and parameters should be public (in which case, people might easily be able to modify it to remove the watermarking),
2. the AI should have at least some ability to modify itself, and
3. the means of checking for the watermark should be public (in which case, again, the watermark might be easier to understand and remove).

I don’t actually have a good intuition as to which side will ultimately win this contest, the AIs trying to conceal themselves or the watermarking schemes trying to reveal them, the Replicants or the Voight-Kampff machines.

Certainly in the watermarking scheme that I’m working on now, we crucially exploit the fact that OpenAI controls its own servers. So, it can do the watermarking using a secret key, and it can check for the watermark using the same key. In a world where anyone could build their own text model that was just as good as GPT … what would you do there?

By Scott

### Extractors for Images of Varieties

Authors: Zeyu Guo, Ben Lee Volk, Akhil Jalan, David Zuckerman

We construct explicit deterministic extractors for polynomial images of varieties, that is, distributions sampled by applying a low-degree polynomial map $f : \mathbb{F}_q^r \to \mathbb{F}_q^n$ to an element sampled uniformly at random from a $k$-dimensional variety $V \subseteq \mathbb{F}_q^r$. This class of sources generalizes both polynomial sources, studied by Dvir, Gabizon and Wigderson (FOCS 2007, Comput. Complex. 2009), and variety sources, studied by Dvir (CCC 2009, Comput. Complex. 2012).

Assuming certain natural non-degeneracy conditions on the map $f$ and the variety $V$, which in particular ensure that the source has enough min-entropy, we extract almost all the min-entropy of the distribution. Unlike the Dvir-Gabizon-Wigderson and Dvir results, our construction works over large enough finite fields of arbitrary characteristic. One key part of our construction is an improved deterministic rank extractor for varieties. As a by-product, we obtain explicit Noether normalization lemmas for affine varieties and affine algebras.

Additionally, we generalize a construction of affine extractors with exponentially small error due to Bourgain, Dvir and Leeman (Comput. Complex. 2016) by extending it to all finite prime fields of quasipolynomial size.

We construct explicit deterministic extractors for polynomial images of varieties, that is, distributions sampled by applying a low-degree polynomial map $f : \mathbb{F}_q^r \to \mathbb{F}_q^n$ to an element sampled uniformly at random from a $k$-dimensional variety $V \subseteq \mathbb{F}_q^r$. This class of sources generalizes both polynomial sources, studied by Dvir, Gabizon and Wigderson (FOCS 2007, Comput. Complex. 2009), and variety sources, studied by Dvir (CCC 2009, Comput. Complex. 2012).

Assuming certain natural non-degeneracy conditions on the map $f$ and the variety $V$, which in particular ensure that the source has enough min-entropy, we extract almost all the min-entropy of the distribution. Unlike the Dvir-Gabizon-Wigderson and Dvir results, our construction works over large enough finite fields of arbitrary characteristic. One key part of our construction is an improved deterministic rank extractor for varieties. As a by-product, we obtain explicit Noether normalization lemmas for affine varieties and affine algebras.

Additionally, we generalize a construction of affine extractors with exponentially small error due to Bourgain, Dvir and Leeman (Comput. Complex. 2016) by extending it to all finite prime fields of quasipolynomial size.

### Derandomization under Different Resource Constraints

Authors: Samuel Epstein

We provide another proof to the EL Theorem. We show the tradeoff between compressibility of codebooks and their communication capacity. A resource bounded version of the EL Theorem is proven. This is used to prove three instances of resource bounded derandomization. This paper is in support of the general claim that if the existence of an object can be proven with the probabilistic method, then bounds on its Kolmogorov complexity can be proven as well.

Authors: Samuel Epstein

We provide another proof to the EL Theorem. We show the tradeoff between compressibility of codebooks and their communication capacity. A resource bounded version of the EL Theorem is proven. This is used to prove three instances of resource bounded derandomization. This paper is in support of the general claim that if the existence of an object can be proven with the probabilistic method, then bounds on its Kolmogorov complexity can be proven as well.

### Maximizing the Probability of Fixation in the Positional Voter Model

Authors: Petros Petsinis, Andreas Pavlogiannis, Panagiotis Karras

The Voter model is a well-studied stochastic process that models the invasion of a novel trait $A$ (e.g., a new opinion, social meme, genetic mutation, magnetic spin) in a network of individuals (agents, people, genes, particles) carrying an existing resident trait $B$. Individuals change traits by occasionally sampling the trait of a neighbor, while an invasion bias $\delta\geq 0$ expresses the stochastic preference to adopt the novel trait $A$ over the resident trait $B$. The strength of an invasion is measured by the probability that eventually the whole population adopts trait $A$, i.e., the fixation probability. In more realistic settings, however, the invasion bias is not ubiquitous, but rather manifested only in parts of the network. For instance, when modeling the spread of a social trait, the invasion bias represents localized incentives. In this paper, we generalize the standard biased Voter model to the positional Voter model, in which the invasion bias is effectuated only on an arbitrary subset of the network nodes, called biased nodes. We study the ensuing optimization problem, which is, given a budget $k$, to choose $k$ biased nodes so as to maximize the fixation probability of a randomly occurring invasion. We show that the problem is NP-hard both for finite $\delta$ and when $\delta \rightarrow \infty$ (strong bias), while the objective function is not submodular in either setting, indicating strong computational hardness. On the other hand, we show that, when $\delta\rightarrow 0$ (weak bias), we can obtain a tight approximation in $O(n^{2\omega})$ time, where $\omega$ is the matrix-multiplication exponent. We complement our theoretical results with an experimental evaluation of some proposed heuristics.

The Voter model is a well-studied stochastic process that models the invasion of a novel trait $A$ (e.g., a new opinion, social meme, genetic mutation, magnetic spin) in a network of individuals (agents, people, genes, particles) carrying an existing resident trait $B$. Individuals change traits by occasionally sampling the trait of a neighbor, while an invasion bias $\delta\geq 0$ expresses the stochastic preference to adopt the novel trait $A$ over the resident trait $B$. The strength of an invasion is measured by the probability that eventually the whole population adopts trait $A$, i.e., the fixation probability. In more realistic settings, however, the invasion bias is not ubiquitous, but rather manifested only in parts of the network. For instance, when modeling the spread of a social trait, the invasion bias represents localized incentives. In this paper, we generalize the standard biased Voter model to the positional Voter model, in which the invasion bias is effectuated only on an arbitrary subset of the network nodes, called biased nodes. We study the ensuing optimization problem, which is, given a budget $k$, to choose $k$ biased nodes so as to maximize the fixation probability of a randomly occurring invasion. We show that the problem is NP-hard both for finite $\delta$ and when $\delta \rightarrow \infty$ (strong bias), while the objective function is not submodular in either setting, indicating strong computational hardness. On the other hand, we show that, when $\delta\rightarrow 0$ (weak bias), we can obtain a tight approximation in $O(n^{2\omega})$ time, where $\omega$ is the matrix-multiplication exponent. We complement our theoretical results with an experimental evaluation of some proposed heuristics.

### Universal convex covering problems under translation and discrete rotations

Authors: Mook Kwon Jung, Sang Duk Yoon, Hee-Kap Ahn, Takeshi Tokuyama

We consider the smallest-area universal covering of planar objects of perimeter 2 (or equivalently closed curves of length 2) allowing translation and discrete rotations. In particular, we show that the solution is an equilateral triangle of height 1 when translation and discrete rotation of $\pi$ are allowed. Our proof is purely geometric and elementary. We also give convex coverings of closed curves of length 2 under translation and discrete rotations of multiples of $\pi/2$ and $2\pi/3$. We show a minimality of the covering for discrete rotation of multiples of $\pi/2$, which is an equilateral triangle of height smaller than 1, and conjecture that the covering is the smallest-area convex covering. Finally, we give the smallest-area convex coverings of all unit segments under translation and discrete rotations $2\pi/k$ for all integers $k\ge 3$.

We consider the smallest-area universal covering of planar objects of perimeter 2 (or equivalently closed curves of length 2) allowing translation and discrete rotations. In particular, we show that the solution is an equilateral triangle of height 1 when translation and discrete rotation of $\pi$ are allowed. Our proof is purely geometric and elementary. We also give convex coverings of closed curves of length 2 under translation and discrete rotations of multiples of $\pi/2$ and $2\pi/3$. We show a minimality of the covering for discrete rotation of multiples of $\pi/2$, which is an equilateral triangle of height smaller than 1, and conjecture that the covering is the smallest-area convex covering. Finally, we give the smallest-area convex coverings of all unit segments under translation and discrete rotations $2\pi/k$ for all integers $k\ge 3$.

### Hardness Results for Minimizing the Covariance of Randomly Signed Sum of Vectors

Authors: Peng Zhang

Given vectors $\mathbb{v}_1, \ldots, \mathbb{v}_n \in \mathbb{R}^d$ with Euclidean norm at most $1$ and $\mathbb{x}_0 \in [-1,1]^n$, our goal is to sample a random signing $\mathbb{x} \in \{\pm 1\}^n$ with $\mathbb{E}[\mathbb{x}] = \mathbb{x}_0$ such that the operator norm of the covariance of the signed sum of the vectors $\sum_{i=1}^n \mathbb{x}(i) \mathbb{v}_i$ is as small as possible. This problem arises from the algorithmic discrepancy theory and its application in the design of randomized experiments. It is known that one can sample a random signing with expectation $\mathbb{x}_0$ and the covariance operator norm at most $1$.

In this paper, we prove two hardness results for this problem. First, we show it is NP-hard to distinguish a list of vectors for which there exists a random signing with expectation ${\bf 0}$ such that the operator norm is $0$ from those for which any signing with expectation ${\bf 0}$ must have the operator norm $\Omega(1)$. Second, we consider $\mathbb{x}_0 \in [-1,1]^n$ whose entries are all around an arbitrarily fixed $p \in [-1,1]$. We show it is NP-hard to distinguish a list of vectors for which there exists a random signing with expectation $\mathbb{x}_0$ such that the operator norm is $0$ from those for which any signing with expectation ${\bf 0}$ must have the operator norm $\Omega((1-|p|)^2)$.

Authors: Peng Zhang

Given vectors $\mathbb{v}_1, \ldots, \mathbb{v}_n \in \mathbb{R}^d$ with Euclidean norm at most $1$ and $\mathbb{x}_0 \in [-1,1]^n$, our goal is to sample a random signing $\mathbb{x} \in \{\pm 1\}^n$ with $\mathbb{E}[\mathbb{x}] = \mathbb{x}_0$ such that the operator norm of the covariance of the signed sum of the vectors $\sum_{i=1}^n \mathbb{x}(i) \mathbb{v}_i$ is as small as possible. This problem arises from the algorithmic discrepancy theory and its application in the design of randomized experiments. It is known that one can sample a random signing with expectation $\mathbb{x}_0$ and the covariance operator norm at most $1$.

In this paper, we prove two hardness results for this problem. First, we show it is NP-hard to distinguish a list of vectors for which there exists a random signing with expectation ${\bf 0}$ such that the operator norm is $0$ from those for which any signing with expectation ${\bf 0}$ must have the operator norm $\Omega(1)$. Second, we consider $\mathbb{x}_0 \in [-1,1]^n$ whose entries are all around an arbitrarily fixed $p \in [-1,1]$. We show it is NP-hard to distinguish a list of vectors for which there exists a random signing with expectation $\mathbb{x}_0$ such that the operator norm is $0$ from those for which any signing with expectation ${\bf 0}$ must have the operator norm $\Omega((1-|p|)^2)$.

### Faster Algorithm for Structured John Ellipsoid Computation

Authors: Zhao Song, Xin Yang, Yuanyuan Yang, Tianyi Zhou

Computing John Ellipsoid is a fundamental problem in machine learning and convex optimization, where the goal is to compute the ellipsoid with maximal volume that lies in a given convex centrally symmetric polytope defined by a matrix $A \in \mathbb{R}^{n \times d}$. In this work, we show two faster algorithms for approximating the John Ellipsoid.

$\bullet$ For sparse matrix $A$, we can achieve nearly input sparsity time $\mathrm{nnz}(A) + d^{\omega}$, where $\omega$ is exponent of matrix multiplication. Currently, $\omega \approx 2.373$.

$\bullet$ For the matrix $A$ which has small treewidth $\tau$, we can achieve $n \tau^2$ time.

Therefore, we significantly improves the state-of-the-art results on approximating the John Ellipsoid for centrally symmetric polytope [Cohen, Cousins, Lee, and Yang COLT 2019] which takes $nd^2$ time.

Authors: Zhao Song, Xin Yang, Yuanyuan Yang, Tianyi Zhou

Computing John Ellipsoid is a fundamental problem in machine learning and convex optimization, where the goal is to compute the ellipsoid with maximal volume that lies in a given convex centrally symmetric polytope defined by a matrix $A \in \mathbb{R}^{n \times d}$. In this work, we show two faster algorithms for approximating the John Ellipsoid.

$\bullet$ For sparse matrix $A$, we can achieve nearly input sparsity time $\mathrm{nnz}(A) + d^{\omega}$, where $\omega$ is exponent of matrix multiplication. Currently, $\omega \approx 2.373$.

$\bullet$ For the matrix $A$ which has small treewidth $\tau$, we can achieve $n \tau^2$ time.

Therefore, we significantly improves the state-of-the-art results on approximating the John Ellipsoid for centrally symmetric polytope [Cohen, Cousins, Lee, and Yang COLT 2019] which takes $nd^2$ time.

### Equity Promotion in Public Transportation

Authors: Anik Pramanik, Pan Xu, Yifan Xu

There are many news articles reporting the obstacles confronting poverty-stricken households in access to public transits. These barriers create a great deal of inconveniences for these impoverished families and more importantly, they contribute a lot of social inequalities. A typical approach addressing the issue is to build more transport infrastructure to offer more opportunities to access the public transits especially for those deprived communities. Examples include adding more bus lines connecting needy residents to railways systems and extending existing bus lines to areas with low socioeconomic status. Recently, a new strategy is proposed, which is to harness the ubiquitous ride-hailing services to connect disadvantaged households with the nearest public transportations. Compared with the former infrastructure-based solution, the ride-hailing-based strategy enjoys a few exclusive benefits such as higher effectiveness and more flexibility.

In this paper, we propose an optimization model to study how to integrate the two approaches together for equity-promotion purposes. Specifically, we aim to design a strategy of allocating a given limited budget to different candidate programs such that the overall social equity is maximized, which is defined as the minimum covering ratio among all pre-specified protected groups of households (based on race, income, etc.). We have designed a linear-programming (LP) based rounding algorithm, which proves to achieve an optimal approximation ratio of 1-1/e. Additionally, we test our algorithm against a few baselines on real data assembled by outsourcing multiple public datasets collected in the city of Chicago. Experimental results confirm our theoretical predictions and demonstrate the effectiveness of our LP-based strategy in promoting social equity, especially when the budget is insufficient.

Authors: Anik Pramanik, Pan Xu, Yifan Xu

There are many news articles reporting the obstacles confronting poverty-stricken households in access to public transits. These barriers create a great deal of inconveniences for these impoverished families and more importantly, they contribute a lot of social inequalities. A typical approach addressing the issue is to build more transport infrastructure to offer more opportunities to access the public transits especially for those deprived communities. Examples include adding more bus lines connecting needy residents to railways systems and extending existing bus lines to areas with low socioeconomic status. Recently, a new strategy is proposed, which is to harness the ubiquitous ride-hailing services to connect disadvantaged households with the nearest public transportations. Compared with the former infrastructure-based solution, the ride-hailing-based strategy enjoys a few exclusive benefits such as higher effectiveness and more flexibility.

In this paper, we propose an optimization model to study how to integrate the two approaches together for equity-promotion purposes. Specifically, we aim to design a strategy of allocating a given limited budget to different candidate programs such that the overall social equity is maximized, which is defined as the minimum covering ratio among all pre-specified protected groups of households (based on race, income, etc.). We have designed a linear-programming (LP) based rounding algorithm, which proves to achieve an optimal approximation ratio of 1-1/e. Additionally, we test our algorithm against a few baselines on real data assembled by outsourcing multiple public datasets collected in the city of Chicago. Experimental results confirm our theoretical predictions and demonstrate the effectiveness of our LP-based strategy in promoting social equity, especially when the budget is insufficient.

### Identifying a 3-vertex strongly biconnected directed subgraph with minimum number of edges

Authors: Azzam Habib

A strongly connected graph is strongly biconnected if after ignoring the direction of its edges we have an undirected graph with no articulation points. A 3-vertex strongly biconnected graph is a strongly biconnected digraph that has the property that deleting any two vertices in this graph leaves a strongly binconnected subgraph. Jaberi [11] presented approximation algorithms for minimum cardinality 2-vertex strongly biconnected directed subgraph problem. We will focus in this paper on polynomial time algorithms which we have implemented for producing spanning subgraphs that are 3-vertex strongly biconnected.

Authors: Azzam Habib

A strongly connected graph is strongly biconnected if after ignoring the direction of its edges we have an undirected graph with no articulation points. A 3-vertex strongly biconnected graph is a strongly biconnected digraph that has the property that deleting any two vertices in this graph leaves a strongly binconnected subgraph. Jaberi [11] presented approximation algorithms for minimum cardinality 2-vertex strongly biconnected directed subgraph problem. We will focus in this paper on polynomial time algorithms which we have implemented for producing spanning subgraphs that are 3-vertex strongly biconnected.

### Lower Bounds on Retroactive Data Structures

Authors: Lily Chung, Erik D. Demaine, Dylan Hendrickson, Jayson Lynch

We prove essentially optimal fine-grained lower bounds on the gap between a data structure and a partially retroactive version of the same data structure. Precisely, assuming any one of three standard conjectures, we describe a problem that has a data structure where operations run in $O(T(n,m))$ time per operation, but any partially retroactive version of that data structure requires $T(n,m) \cdot m^{1-o(1)}$ worst-case time per operation, where $n$ is the size of the data structure at any time and $m$ is the number of operations. Any data structure with operations running in $O(T(n,m))$ time per operation can be converted (via the "rollback method") into a partially retroactive data structure running in $O(T(n,m) \cdot m)$ time per operation, so our lower bound is tight up to an $m^{o(1)}$ factor common in fine-grained complexity.

We prove essentially optimal fine-grained lower bounds on the gap between a data structure and a partially retroactive version of the same data structure. Precisely, assuming any one of three standard conjectures, we describe a problem that has a data structure where operations run in $O(T(n,m))$ time per operation, but any partially retroactive version of that data structure requires $T(n,m) \cdot m^{1-o(1)}$ worst-case time per operation, where $n$ is the size of the data structure at any time and $m$ is the number of operations. Any data structure with operations running in $O(T(n,m))$ time per operation can be converted (via the "rollback method") into a partially retroactive data structure running in $O(T(n,m) \cdot m)$ time per operation, so our lower bound is tight up to an $m^{o(1)}$ factor common in fine-grained complexity.

## Monday, November 28

### Would you be more productive if you didn't log on? It worked for Christopher Havens!

There are times when NOT having computer access (is that possible anymore?) can make you MORE productive.
1) I did a lot of my Muffin Work when I was in Mexico for a bar matzah and had no computer access, and no Television in English (though Texas Walker Ranger, and Kindergarden Cop, were actually pretty good in Spanish even though I don't now Spanish.)
2) I did work on NIM with Cash when I was stuck at an airport for 8 hours.
3) I proofread (on paper!) most of my book with Erik D and Mohammad H when I was at a relatives house for four days  who had weak wifi and only got  ABC, NBC, CBS, PBS, some local channels, and COZI (not sure why they got COZI, though I am glad since I caught a good episode of Columbo).
4) Thanksgiving at my Mom's apartment (she's 93 years young!), with no computer,  I relearned the proof of the Hales-Jewitt theorem. I seem to learn/forget/learn/forget that one a lot.
There are times when being AWAY from technology is helpful. I sometimes go to the Math Library and try to NOT use my cell phone.
Having said that I do think that overall having access to papers online is great and that overall academic productivity has increased.  But there are times when the computer can distract you and time AWAY from it is good.
Which brings us to the story of Christopher Havens. He works in Number Theory and logs on very rarely, perhaps never. He just works on math 10 hours a day. He has  a paper (with co-authors).  It take discipline to resist the urge to log on. How did he manage this?
He is in prison for murder.
Here is a podcast with him as a guest.

Here is an article about him.
Here is a math article where he is a co-author.

Here is the short version of all of this:
1) He is guilty of murder and in a max security prison in America.  It was a drug related shooting (why do people only talk about drug deals gone bad when they should also talk about drug deals gone good?) . When I first read Prison Inmate Solves Math Problem I thought that maybe a white collar criminal who majored in Math and was in a min security prison with access to the web (do white collar criminals have access to the web?)  But NO, Christopher Havens really did murder someone and is in max security.
2) He really has turned his life around. He really is not the person he was, and when he gets out I cannot imagine he will go back to drugs and crime. I suspect he will work on the Math Prison Project which I mention later.
3) His mother says that when he was in High School (which is as far as he got for education)he was helping students in math who were  2 grades above him, but he has no recollection of this.
4) When he was the hole (solitary confinement) someone who did Prison Education gave him (and others, he was not picked out) an envelope of math problems to work on--- pre-algebra. Christopher did them well and liked it and began requesting more advanced math books and learned math by himself, working 10 hours a day. When he requested a book it was random which ones he would get. I don't know why some were blocked. I don't think he knows why some were blocked.
5) Some mathematicians from Italy (Italy?) contacted him and they began corresponding and yada-yada-yada, he has a paper now.
6) He has conceptualized the Math Prison Project to help other prisoners do math, though I would suspect not on the level he is on.  Then again, maybe the reason that P vs NP is still open is that we all have to many distractions, and conference deadlines, that a prisoner would not have.
7) Some articles say that he solved a ancient problem in math that Euclid couldn't solve. This is not true. He helped solve some problems about continued fractions.
8) There is going to be a movie about him, see here. I predict it will take an interesting story and make it less interesting and more fictional.
What to make of all this?
1) KUDOS to him!
2) I don't know which side of the nature/nurture argument this goes to
a) He OBVIOUSLY had math talent naturally or else he couldn't have learned all of that math.b) He shows that HARD WORK and TENACITY can overcome other issue.
3) back to my original point- if you had the FREEDOM to work 10 hours a day JUST on math and had no other distractions, but also limited access to books and people,  would you be MORE productive? LESS productive? Also note- no faculty meetings, no teaching obligations, and no word processor to distract you.

By gasarch

There are times when NOT having computer access (is that possible anymore?) can make you MORE productive.

1) I did a lot of my Muffin Work when I was in Mexico for a bar matzah and had no computer access, and no Television in English (though Texas Walker Ranger, and Kindergarden Cop, were actually pretty good in Spanish even though I don't now Spanish.)

2) I did work on NIM with Cash when I was stuck at an airport for 8 hours.

3) I proofread (on paper!) most of my book with Erik D and Mohammad H when I was at a relatives house for four days  who had weak wifi and only got  ABC, NBC, CBS, PBS, some local channels, and COZI (not sure why they got COZI, though I am glad since I caught a good episode of Columbo).

4) Thanksgiving at my Mom's apartment (she's 93 years young!), with no computer,  I relearned the proof of the Hales-Jewitt theorem. I seem to learn/forget/learn/forget that one a lot.

There are times when being AWAY from technology is helpful. I sometimes go to the Math Library and try to NOT use my cell phone.

Having said that I do think that overall having access to papers online is great and that overall academic productivity has increased.  But there are times when the computer can distract you and time AWAY from it is good.

Which brings us to the story of Christopher Havens. He works in Number Theory and logs on very rarely, perhaps never. He just works on math 10 hours a day. He has  a paper (with co-authors).  It take discipline to resist the urge to log on. How did he manage this?

He is in prison for murder.

Here is a podcast with him as a guest.

Here is an article about him.

Here is a math article where he is a co-author.

Here is the short version of all of this:

1) He is guilty of murder and in a max security prison in America.  It was a drug related shooting (why do people only talk about drug deals gone bad when they should also talk about drug deals gone good?) . When I first read Prison Inmate Solves Math Problem I thought that maybe a white collar criminal who majored in Math and was in a min security prison with access to the web (do white collar criminals have access to the web?)  But NO, Christopher Havens really did murder someone and is in max security.

2) He really has turned his life around. He really is not the person he was, and when he gets out I cannot imagine he will go back to drugs and crime. I suspect he will work on the Math Prison Project which I mention later.

3) His mother says that when he was in High School (which is as far as he got for education)
he was helping students in math who were  2 grades above him, but he has no recollection of this.

4) When he was the hole (solitary confinement) someone who did Prison Education gave him (and others, he was not picked out) an envelope of math problems to work on--- pre-algebra. Christopher did them well and liked it and began requesting more advanced math books and learned math by himself, working 10 hours a day. When he requested a book it was random which ones he would get. I don't know why some were blocked. I don't think he knows why some were blocked.

5) Some mathematicians from Italy (Italy?) contacted him and they began corresponding and yada-yada-yada, he has a paper now.

6) He has conceptualized the Math Prison Project to help other prisoners do math, though I would suspect not on the level he is on.  Then again, maybe the reason that P vs NP is still open is that we all have to many distractions, and conference deadlines, that a prisoner would not have.

7) Some articles say that he solved a ancient problem in math that Euclid couldn't solve. This is not true. He helped solve some problems about continued fractions.

8) There is going to be a movie about him, see here. I predict it will take an interesting story and make it less interesting and more fictional.

What to make of all this?

1) KUDOS to him!

2) I don't know which side of the nature/nurture argument this goes to

a) He OBVIOUSLY had math talent naturally or else he couldn't have learned all of that math.
b) He shows that HARD WORK and TENACITY can overcome other issue.

3) back to my original point- if you had the FREEDOM to work 10 hours a day JUST on math and had no other distractions, but also limited access to books and people,  would you be MORE productive? LESS productive? Also note- no faculty meetings, no teaching obligations, and no word processor to distract you.

By gasarch

### TAU Computer Science Theory-Fest 2022

from CS Theory Events

December 26-28, 2022 Northern Tel Aviv, TAU campus, ANU Museum (Tisch Hall) sites.google.com/view/tautheory-fest2022/home We at the Tel Aviv University School of Computer Science are pleased to announce our conference, TAU Theory Fest 2022. The purpose of the conference is for academics of the highest quality to present current research in the field of Theory of … Continue reading TAU Computer Science Theory-Fest 2022

By shacharlovett

December 26-28, 2022 Northern Tel Aviv, TAU campus, ANU Museum (Tisch Hall) https://sites.google.com/view/tautheory-fest2022/home We at the Tel Aviv University School of Computer Science are pleased to announce our conference, TAU Theory Fest 2022. The purpose of the conference is for academics of the highest quality to present current research in the field of Theory of … Continue reading TAU Computer Science Theory-Fest 2022

By shacharlovett

### The World Dynamics Project

from Luca Aceto

Our colleagues Pierluigi Crescenzi, Emanuele Natale and Paulo Bruno Serafim have been doing some work on what they call the World Dynamics project, whose goal is to provide a modern framework for studying models of sustainable development, based on cutting-edge techniques from software engineering and machine learning.

The first outcome of their work is a Julia library that allows scientists to use and adapt different world models, from Meadows et al.'s World3 to recent proposals, in an easy way.

IMHO, this is a fascinating and timely research effort. I encourage readers of this blog to try the current version of the Julia library, which is still under development. It would be great if this library contributed to "an open, interdisciplinary, and consistent comparative approach to scientific model development" and I hope that global policy makers on environmental and economic issues will use similar tools in the nearest future.

Thanks to Emanuele, Paulo and Pierluigi for their work. I'll be following its future development with great interest.

If you speak Italian, I strongly recommend this podcast, in the GSSI-SISSA Sidecar series, in which Pierluigi discusses economic growth with Michele Boldrin.

By Luca Aceto

Our colleagues Pierluigi Crescenzi, Emanuele Natale and Paulo Bruno Serafim have been doing some work on what they call the World Dynamics project, whose goal is to provide a modern framework for studying models of sustainable development, based on cutting-edge techniques from software engineering and machine learning.

The first outcome of their work is a Julia library that allows scientists to use and adapt different world models, from Meadows et al.'s World3 to recent proposals, in an easy way.

IMHO, this is a fascinating and timely research effort. I encourage readers of this blog to try the current version of the Julia library, which is still under development. It would be great if this library contributed to "an open, interdisciplinary, and consistent comparative approach to scientific model development" and I hope that global policy makers on environmental and economic issues will use similar tools in the nearest future.

Thanks to Emanuele, Paulo and Pierluigi for their work. I'll be following its future development with great interest.

If you speak Italian, I strongly recommend this podcast, in the GSSI-SISSA Sidecar series, in which Pierluigi discusses economic growth with Michele Boldrin.

By Luca Aceto

### A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity

Authors: Aravind Gollakota, Adam R. Klivans, Pravesh K. Kothari

A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning halfspaces under testable assumptions that are provably satisfied by Gaussians.

In this paper we give a powerful new approach for developing algorithms for testable learning using tools from moment matching and metric distances in probability. We obtain efficient testable learners for any concept class that admits low-degree \emph{sandwiching polynomials}, capturing most important examples for which we have ordinary agnostic learners. We recover the results of Rubinfeld and Vasilyan as a corollary of our techniques while achieving improved, near-optimal sample complexity bounds for a broad range of concept classes and distributions.

Surprisingly, we show that the information-theoretic sample complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class, one of the most well-studied measures in statistical learning theory. In particular, uniform convergence is necessary and sufficient for testable learning. This leads to a fundamental separation from (ordinary) distribution-specific agnostic learning, where uniform convergence is sufficient but not necessary.

A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning halfspaces under testable assumptions that are provably satisfied by Gaussians.

In this paper we give a powerful new approach for developing algorithms for testable learning using tools from moment matching and metric distances in probability. We obtain efficient testable learners for any concept class that admits low-degree \emph{sandwiching polynomials}, capturing most important examples for which we have ordinary agnostic learners. We recover the results of Rubinfeld and Vasilyan as a corollary of our techniques while achieving improved, near-optimal sample complexity bounds for a broad range of concept classes and distributions.

Surprisingly, we show that the information-theoretic sample complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class, one of the most well-studied measures in statistical learning theory. In particular, uniform convergence is necessary and sufficient for testable learning. This leads to a fundamental separation from (ordinary) distribution-specific agnostic learning, where uniform convergence is sufficient but not necessary.

### On the Complexity of Counterfactual Reasoning

Authors: Yunqiu Han, Yizuo Chen, Adnan Darwiche

We study the computational complexity of counterfactual reasoning in relation to the complexity of associational and interventional reasoning on structural causal models (SCMs). We show that counterfactual reasoning is no harder than associational or interventional reasoning on fully specified SCMs in the context of two computational frameworks. The first framework is based on the notion of treewidth and includes the classical variable elimination and jointree algorithms. The second framework is based on the more recent and refined notion of causal treewidth which is directed towards models with functional dependencies such as SCMs. Our results are constructive and based on bounding the (causal) treewidth of twin networks -- used in standard counterfactual reasoning that contemplates two worlds, real and imaginary -- to the (causal) treewidth of the underlying SCM structure. In particular, we show that the latter (causal) treewidth is no more than twice the former plus one. Hence, if associational or interventional reasoning is tractable on a fully specified SCM then counterfactual reasoning is tractable too. We extend our results to general counterfactual reasoning that requires contemplating more than two worlds and discuss applications of our results to counterfactual reasoning with a partially specified SCM that is coupled with data. We finally present empirical results that measure the gap between the complexities of counterfactual reasoning and associational/interventional reasoning on random SCMs.

Authors: Yunqiu Han, Yizuo Chen, Adnan Darwiche

We study the computational complexity of counterfactual reasoning in relation to the complexity of associational and interventional reasoning on structural causal models (SCMs). We show that counterfactual reasoning is no harder than associational or interventional reasoning on fully specified SCMs in the context of two computational frameworks. The first framework is based on the notion of treewidth and includes the classical variable elimination and jointree algorithms. The second framework is based on the more recent and refined notion of causal treewidth which is directed towards models with functional dependencies such as SCMs. Our results are constructive and based on bounding the (causal) treewidth of twin networks -- used in standard counterfactual reasoning that contemplates two worlds, real and imaginary -- to the (causal) treewidth of the underlying SCM structure. In particular, we show that the latter (causal) treewidth is no more than twice the former plus one. Hence, if associational or interventional reasoning is tractable on a fully specified SCM then counterfactual reasoning is tractable too. We extend our results to general counterfactual reasoning that requires contemplating more than two worlds and discuss applications of our results to counterfactual reasoning with a partially specified SCM that is coupled with data. We finally present empirical results that measure the gap between the complexities of counterfactual reasoning and associational/interventional reasoning on random SCMs.

### Communication Complexity of Inner Product in Symmetric Normed Spaces

Authors: Alexandr Andoni, Jarosław Błasiok, Arnold Filtser

We introduce and study the communication complexity of computing the inner product of two vectors, where the input is restricted w.r.t. a norm $N$ on the space $\mathbb{R}^n$. Here, Alice and Bob hold two vectors $v,u$ such that $\|v\|_N\le 1$ and $\|u\|_{N^*}\le 1$, where $N^*$ is the dual norm. They want to compute their inner product $\langle v,u \rangle$ up to an $\varepsilon$ additive term. The problem is denoted by $\mathrm{IP}_N$.

We systematically study $\mathrm{IP}_N$, showing the following results:

- For any symmetric norm $N$, given $\|v\|_N\le 1$ and $\|u\|_{N^*}\le 1$ there is a randomized protocol for $\mathrm{IP}_N$ using $\tilde{\mathcal{O}}(\varepsilon^{-6} \log n)$ bits -- we will denote this by $\mathcal{R}_{\varepsilon,1/3}(\mathrm{IP}_{N}) \leq \tilde{\mathcal{O}}(\varepsilon^{-6} \log n)$.

- One way communication complexity $\overrightarrow{\mathcal{R}}(\mathrm{IP}_{\ell_p})\leq\mathcal{O}(\varepsilon^{-\max(2,p)}\cdot \log\frac n\varepsilon)$, and a nearly matching lower bound $\overrightarrow{\mathcal{R}}(\mathrm{IP}_{\ell_p}) \geq \Omega(\varepsilon^{-\max(2,p)})$ for $\varepsilon^{-\max(2,p)} \ll n$.

- One way communication complexity $\overrightarrow{\mathcal{R}}(N)$ for a symmetric norm $N$ is governed by embeddings $\ell_\infty^k$ into $N$. Specifically, while a small distortion embedding easily implies a lower bound $\Omega(k)$, we show that, conversely, non-existence of such an embedding implies protocol with communication $k^{\mathcal{O}(\log \log k)} \log^2 n$.

- For arbitrary origin symmetric convex polytope $P$, we show $\mathcal{R}(\mathrm{IP}_{N}) \le\mathcal{O}(\varepsilon^{-2} \log \mathrm{xc}(P))$, where $N$ is the unique norm for which $P$ is a unit ball, and $\mathrm{xc}(P)$ is the extension complexity of $P$.

We introduce and study the communication complexity of computing the inner product of two vectors, where the input is restricted w.r.t. a norm $N$ on the space $\mathbb{R}^n$. Here, Alice and Bob hold two vectors $v,u$ such that $\|v\|_N\le 1$ and $\|u\|_{N^*}\le 1$, where $N^*$ is the dual norm. They want to compute their inner product $\langle v,u \rangle$ up to an $\varepsilon$ additive term. The problem is denoted by $\mathrm{IP}_N$.

We systematically study $\mathrm{IP}_N$, showing the following results:

- For any symmetric norm $N$, given $\|v\|_N\le 1$ and $\|u\|_{N^*}\le 1$ there is a randomized protocol for $\mathrm{IP}_N$ using $\tilde{\mathcal{O}}(\varepsilon^{-6} \log n)$ bits -- we will denote this by $\mathcal{R}_{\varepsilon,1/3}(\mathrm{IP}_{N}) \leq \tilde{\mathcal{O}}(\varepsilon^{-6} \log n)$.

- One way communication complexity $\overrightarrow{\mathcal{R}}(\mathrm{IP}_{\ell_p})\leq\mathcal{O}(\varepsilon^{-\max(2,p)}\cdot \log\frac n\varepsilon)$, and a nearly matching lower bound $\overrightarrow{\mathcal{R}}(\mathrm{IP}_{\ell_p}) \geq \Omega(\varepsilon^{-\max(2,p)})$ for $\varepsilon^{-\max(2,p)} \ll n$.

- One way communication complexity $\overrightarrow{\mathcal{R}}(N)$ for a symmetric norm $N$ is governed by embeddings $\ell_\infty^k$ into $N$. Specifically, while a small distortion embedding easily implies a lower bound $\Omega(k)$, we show that, conversely, non-existence of such an embedding implies protocol with communication $k^{\mathcal{O}(\log \log k)} \log^2 n$.

- For arbitrary origin symmetric convex polytope $P$, we show $\mathcal{R}(\mathrm{IP}_{N}) \le\mathcal{O}(\varepsilon^{-2} \log \mathrm{xc}(P))$, where $N$ is the unique norm for which $P$ is a unit ball, and $\mathrm{xc}(P)$ is the extension complexity of $P$.

### Many bounded versions of undecidable problems are NP-hard

Authors: Andreas Klingler, Mirte van der Eyden, Sebastian Stengele, Tobias Reinhart, Gemma De las Cuevas

Several physically inspired problems have been proven undecidable; examples are the spectral gap problem and the membership problem for quantum correlations. Most of these results rely on reductions from a handful of undecidable problems, such as the halting problem, the tiling problem, the Post correspondence problem or the matrix mortality problem. All these problems have a common property: they have an NP-hard bounded version. This work establishes a relation between undecidable unbounded problems and their bounded NP-hard versions. Specifically, we show that NP-hardness of a bounded version follows easily from the reduction of the unbounded problems. This leads to new and simpler proofs of the NP-hardness of bounded version of the Post correspondence problem, the matrix mortality problem, the positivity of matrix product operators, the reachability problem, the tiling problem, and the ground state energy problem. This work sheds light on the intractability of problems in theoretical physics and on the computational consequences of bounding a parameter.

Several physically inspired problems have been proven undecidable; examples are the spectral gap problem and the membership problem for quantum correlations. Most of these results rely on reductions from a handful of undecidable problems, such as the halting problem, the tiling problem, the Post correspondence problem or the matrix mortality problem. All these problems have a common property: they have an NP-hard bounded version. This work establishes a relation between undecidable unbounded problems and their bounded NP-hard versions. Specifically, we show that NP-hardness of a bounded version follows easily from the reduction of the unbounded problems. This leads to new and simpler proofs of the NP-hardness of bounded version of the Post correspondence problem, the matrix mortality problem, the positivity of matrix product operators, the reachability problem, the tiling problem, and the ground state energy problem. This work sheds light on the intractability of problems in theoretical physics and on the computational consequences of bounding a parameter.

### Parallel Repetition for the GHZ Game: Exponential Decay

Authors: Mark Braverman, Subhash Khot, Dor Minzer

We show that the value of the $n$-fold repeated GHZ game is at most $2^{-\Omega(n)}$, improving upon the polynomial bound established by Holmgren and Raz. Our result is established via a reduction to approximate subgroup type questions from additive combinatorics.

Authors: Mark Braverman, Subhash Khot, Dor Minzer

We show that the value of the $n$-fold repeated GHZ game is at most $2^{-\Omega(n)}$, improving upon the polynomial bound established by Holmgren and Raz. Our result is established via a reduction to approximate subgroup type questions from additive combinatorics.

### Quantum Adversarial Learning in Emulation of Monte-Carlo Methods for Max-cut Approximation: QAOA is not optimal

Authors: Cem M. Unsal, Lucas T. Brady

One of the leading candidates for near-term quantum advantage is the class of Variational Quantum Algorithms, but these algorithms suffer from classical difficulty in optimizing the variational parameters as the number of parameters increases. Therefore, it is important to understand the expressibility and power of various ans\"atze to produce target states and distributions. To this end, we apply notions of emulation to Variational Quantum Annealing and the Quantum Approximate Optimization Algorithm (QAOA) to show that QAOA is outperformed by variational annealing schedules with equivalent numbers of parameters. Our Variational Quantum Annealing schedule is based on a novel polynomial parameterization that can be optimized in a similar gradient-free way as QAOA, using the same physical ingredients. In order to compare the performance of ans\"atze types, we have developed statistical notions of Monte-Carlo methods. Monte-Carlo methods are computer programs that generate random variables that approximate a target number that is computationally hard to calculate exactly. While the most well-known Monte-Carlo method is Monte-Carlo integration (e.g. Diffusion Monte-Carlo or path-integral quantum Monte-Carlo), QAOA is itself a Monte-Carlo method that finds good solutions to NP-complete problems such as Max-cut. We apply these statistical Monte-Carlo notions to further elucidate the theoretical framework around these quantum algorithms.

Authors: Cem M. Unsal, Lucas T. Brady

One of the leading candidates for near-term quantum advantage is the class of Variational Quantum Algorithms, but these algorithms suffer from classical difficulty in optimizing the variational parameters as the number of parameters increases. Therefore, it is important to understand the expressibility and power of various ans\"atze to produce target states and distributions. To this end, we apply notions of emulation to Variational Quantum Annealing and the Quantum Approximate Optimization Algorithm (QAOA) to show that QAOA is outperformed by variational annealing schedules with equivalent numbers of parameters. Our Variational Quantum Annealing schedule is based on a novel polynomial parameterization that can be optimized in a similar gradient-free way as QAOA, using the same physical ingredients. In order to compare the performance of ans\"atze types, we have developed statistical notions of Monte-Carlo methods. Monte-Carlo methods are computer programs that generate random variables that approximate a target number that is computationally hard to calculate exactly. While the most well-known Monte-Carlo method is Monte-Carlo integration (e.g. Diffusion Monte-Carlo or path-integral quantum Monte-Carlo), QAOA is itself a Monte-Carlo method that finds good solutions to NP-complete problems such as Max-cut. We apply these statistical Monte-Carlo notions to further elucidate the theoretical framework around these quantum algorithms.

### Approximating the chromatic polynomial is as hard as computing it exactly

Authors: Ferenc Bencs, Jeroen Huijben, Guus Regts

We show that for any non-real algebraic number $q$ such that $|q-1|>1$ or $\Re(q)>\frac{3}{2}$ it is \textsc{\#P}-hard to compute a multiplicative (resp. additive) approximation to the absolute value (resp. argument) of the chromatic polynomial evaluated at $q$ on planar graphs. This implies \textsc{\#P}-hardness for all non-real algebraic $q$ on the family of all graphs. We moreover prove several hardness results for $q$ such that $|q-1|\leq 1$.

Our hardness results are obtained by showing that a polynomial time algorithm for approximately computing the chromatic polynomial of a planar graph at non-real algebraic $q$ (satisfying some properties) leads to a polynomial time algorithm for \emph{exactly} computing it, which is known to be hard by a result of Vertigan. Many of our results extend in fact to the more general partition function of the random cluster model, a well known reparametrization of the Tutte polynomial.

Authors: Ferenc Bencs, Jeroen Huijben, Guus Regts

We show that for any non-real algebraic number $q$ such that $|q-1|>1$ or $\Re(q)>\frac{3}{2}$ it is \textsc{\#P}-hard to compute a multiplicative (resp. additive) approximation to the absolute value (resp. argument) of the chromatic polynomial evaluated at $q$ on planar graphs. This implies \textsc{\#P}-hardness for all non-real algebraic $q$ on the family of all graphs. We moreover prove several hardness results for $q$ such that $|q-1|\leq 1$.

Our hardness results are obtained by showing that a polynomial time algorithm for approximately computing the chromatic polynomial of a planar graph at non-real algebraic $q$ (satisfying some properties) leads to a polynomial time algorithm for \emph{exactly} computing it, which is known to be hard by a result of Vertigan. Many of our results extend in fact to the more general partition function of the random cluster model, a well known reparametrization of the Tutte polynomial.

### Improved Elekes-Szab\'o type estimates using proximity

Authors: Jozsef Solymosi, Joshua Zahl

We prove a new Elekes-Szab\'o type estimate on the size of the intersection of a Cartesian product $A\times B\times C$ with an algebraic surface $\{f=0\}$ over the reals. In particular, if $A,B,C$ are sets of $N$ real numbers and $f$ is a trivariate polynomial, then either $f$ has a special form that encodes additive group structure (for example $f(x,y,x) = x + y - z$), or $A \times B\times C \cap\{f=0\}$ has cardinality $O(N^{12/7})$. This is an improvement over the previously bound $O(N^{11/6})$. We also prove an asymmetric version of our main result, which yields an Elekes-Ronyai type expanding polynomial estimate with exponent $3/2$. This has applications to questions in combinatorial geometry related to the Erd\H{o}s distinct distances problem.

Like previous approaches to the problem, we rephrase the question as a $L^2$ estimate, which can be analyzed by counting additive quadruples. The latter problem can be recast as an incidence problem involving points and curves in the plane. The new idea in our proof is that we use the order structure of the reals to restrict attention to a smaller collection of proximate additive quadruples.

Authors: Jozsef Solymosi, Joshua Zahl

We prove a new Elekes-Szab\'o type estimate on the size of the intersection of a Cartesian product $A\times B\times C$ with an algebraic surface $\{f=0\}$ over the reals. In particular, if $A,B,C$ are sets of $N$ real numbers and $f$ is a trivariate polynomial, then either $f$ has a special form that encodes additive group structure (for example $f(x,y,x) = x + y - z$), or $A \times B\times C \cap\{f=0\}$ has cardinality $O(N^{12/7})$. This is an improvement over the previously bound $O(N^{11/6})$. We also prove an asymmetric version of our main result, which yields an Elekes-Ronyai type expanding polynomial estimate with exponent $3/2$. This has applications to questions in combinatorial geometry related to the Erd\H{o}s distinct distances problem.

Like previous approaches to the problem, we rephrase the question as a $L^2$ estimate, which can be analyzed by counting additive quadruples. The latter problem can be recast as an incidence problem involving points and curves in the plane. The new idea in our proof is that we use the order structure of the reals to restrict attention to a smaller collection of proximate additive quadruples.

### Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT

Authors: Cuneyt Gurcan Akcora, Murat Kantarcioglu, Yulia R. Gel, Baris Coskunuzer

Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks.

Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that $(k+1)$-core of a graph $\mathcal{G}$ suffices to compute its $k^{th}$ persistence diagram, $PD_k(\mathcal{G})$. Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices. Our experiments on large networks show that our novel approach can achieve computational gains up to 95%.

The developed framework provides the first bridge between the graph theory and TDA, with applications in machine learning of large complex networks. Our implementation is available at github.com/cakcora/PersistentHomologyWithCoralPrunit

Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks.

Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that $(k+1)$-core of a graph $\mathcal{G}$ suffices to compute its $k^{th}$ persistence diagram, $PD_k(\mathcal{G})$. Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices. Our experiments on large networks show that our novel approach can achieve computational gains up to 95%.

The developed framework provides the first bridge between the graph theory and TDA, with applications in machine learning of large complex networks. Our implementation is available at https://github.com/cakcora/PersistentHomologyWithCoralPrunit

### Space-efficient RLZ-to-LZ77 conversion

Authors: Travis Gagie

Consider a text $T [1..n]$ prefixed by a reference sequence $R = T [1..\ell]$. We show how, given $R$ and the $z'$-phrase relative Lempel-Ziv parse of $T [\ell + 1..n]$ with respect to $R$, we can build the LZ77 parse of $T$ in $n\,\mathrm{polylog} (n)$ time and $O (\ell + z')$ total space.

Authors: Travis Gagie

Consider a text $T [1..n]$ prefixed by a reference sequence $R = T [1..\ell]$. We show how, given $R$ and the $z'$-phrase relative Lempel-Ziv parse of $T [\ell + 1..n]$ with respect to $R$, we can build the LZ77 parse of $T$ in $n\,\mathrm{polylog} (n)$ time and $O (\ell + z')$ total space.

### Learning and Testing Latent-Tree Ising Models Efficiently

Authors: Davin Choo, Yuval Dagan, Constantinos Daskalakis, Anthimos Vardis Kandiros

We provide time- and sample-efficient algorithms for learning and testing latent-tree Ising models, i.e. Ising models that may only be observed at their leaf nodes. On the learning side, we obtain efficient algorithms for learning a tree-structured Ising model whose leaf node distribution is close in Total Variation Distance, improving on the results of prior work. On the testing side, we provide an efficient algorithm with fewer samples for testing whether two latent-tree Ising models have leaf-node distributions that are close or far in Total Variation distance. We obtain our algorithms by showing novel localization results for the total variation distance between the leaf-node distributions of tree-structured Ising models, in terms of their marginals on pairs of leaves.

We provide time- and sample-efficient algorithms for learning and testing latent-tree Ising models, i.e. Ising models that may only be observed at their leaf nodes. On the learning side, we obtain efficient algorithms for learning a tree-structured Ising model whose leaf node distribution is close in Total Variation Distance, improving on the results of prior work. On the testing side, we provide an efficient algorithm with fewer samples for testing whether two latent-tree Ising models have leaf-node distributions that are close or far in Total Variation distance. We obtain our algorithms by showing novel localization results for the total variation distance between the leaf-node distributions of tree-structured Ising models, in terms of their marginals on pairs of leaves.

### A fast and simple $O (z \log n)$-space index for finding approximately longest common substrings

Authors: Nick Fagan, Jorge Hermo González, Travis Gagie

We describe how, given a text $T [1..n]$ and a positive constant $\epsilon$, we can build a simple $O (z \log n)$-space index, where $z$ is the number of phrases in the LZ77 parse of $T$, such that later, given a pattern $P [1..m]$, in $O (m \log \log z + \mathrm{polylog} (m + z))$ time and with high probability we can find a substring of $P$ that occurs in $T$ and whose length is at least a $(1 - \epsilon)$-fraction of the length of a longest common substring of $P$ and $T$.

We describe how, given a text $T [1..n]$ and a positive constant $\epsilon$, we can build a simple $O (z \log n)$-space index, where $z$ is the number of phrases in the LZ77 parse of $T$, such that later, given a pattern $P [1..m]$, in $O (m \log \log z + \mathrm{polylog} (m + z))$ time and with high probability we can find a substring of $P$ that occurs in $T$ and whose length is at least a $(1 - \epsilon)$-fraction of the length of a longest common substring of $P$ and $T$.

### Differentially Private Heatmaps

Authors: Badih Ghazi, Junfeng He, Kai Kohlhoff, Ravi Kumar, Pasin Manurangsi, Vidhya Navalpakkam, Nachiappan Valliappan

We consider the task of producing heatmaps from users' aggregated data while protecting their privacy. We give a differentially private (DP) algorithm for this task and demonstrate its advantages over previous algorithms on real-world datasets.

Our core algorithmic primitive is a DP procedure that takes in a set of distributions and produces an output that is close in Earth Mover's Distance to the average of the inputs. We prove theoretical bounds on the error of our algorithm under a certain sparsity assumption and that these are near-optimal.

We consider the task of producing heatmaps from users' aggregated data while protecting their privacy. We give a differentially private (DP) algorithm for this task and demonstrate its advantages over previous algorithms on real-world datasets.

Our core algorithmic primitive is a DP procedure that takes in a set of distributions and produces an output that is close in Earth Mover's Distance to the average of the inputs. We prove theoretical bounds on the error of our algorithm under a certain sparsity assumption and that these are near-optimal.

### Estimation of Similarity between DNA Sequences and Its Graphical Representation

Authors: Probir Mondal

Bioinformatics, which is now a well known field of study, originated in the context of biological sequence analysis. Recently graphical representation takes place for the research on DNA sequence. Research in biological sequence is mainly based on the function and its structure. Bioinformatics finds wide range of applications specifically in the domain of molecular biology which focuses on the analysis of molecules viz. DNA, RNA, Protein etc. In this review, we mainly deal with the similarity analysis between sequences and graphical representation of DNA sequence.

Authors: Probir Mondal

Bioinformatics, which is now a well known field of study, originated in the context of biological sequence analysis. Recently graphical representation takes place for the research on DNA sequence. Research in biological sequence is mainly based on the function and its structure. Bioinformatics finds wide range of applications specifically in the domain of molecular biology which focuses on the analysis of molecules viz. DNA, RNA, Protein etc. In this review, we mainly deal with the similarity analysis between sequences and graphical representation of DNA sequence.

### Towards Better Bounds for Finding Quasi-Identifiers

Authors: Ryan Hildebrant, Quoc-Tung Le, Duy-Hoang Ta, Hoa T. Vu

We revisit the problem of finding small $\epsilon$-separation keys introduced by Motwani and Xu (VLDB 07). In this problem, the input is $m$-dimensional tuples $x_1,x_2,\ldots,x_n$. The goal is to find a small subset of coordinates that separates at least $(1-\epsilon){n \choose 2}$ pairs of tuples. They provided a fast algorithm that runs on $\Theta(m/\epsilon)$ tuples sampled uniformly at random. We show that the sample size can be improved to $\Theta(m/\sqrt{\epsilon})$. Our algorithm also enjoys a faster running time. To obtain this result, we provide upper and lower bounds on the sample size to solve the following decision problem. Given a subset of coordinates $A$, reject if $A$ separates fewer than $(1-\epsilon){n \choose 2}$ pairs, and accept if $A$ separates all pairs. The algorithm must be correct with probability at least $1-\delta$ for all $A$. We show that for algorithms based on sampling:

- $\Theta(m/\sqrt{\epsilon})$ samples are sufficient and necessary so that $\delta \leq e^{-m}$ and

- $\Omega(\sqrt{\frac{\log m}{\epsilon}})$ samples are necessary so that $\delta$ is a constant.

Our analysis is based on a constrained version of the balls-into-bins problem. We believe our analysis may be of independent interest. We also study a related problem that asks for the following sketching algorithm: with given parameters $\alpha,k$ and $\epsilon$, the algorithm takes a subset of coordinates $A$ of size at most $k$ and returns an estimate of the number of unseparated pairs in $A$ up to a $(1\pm\epsilon)$ factor if it is at least $\alpha {n \choose 2}$. We show that even for constant $\alpha$ and success probability, such a sketching algorithm must use $\Omega(mk \log \epsilon^{-1})$ bits of space; on the other hand, uniform sampling yields a sketch of size $\Theta(\frac{mk \log m}{\alpha \epsilon^2})$ for this purpose.

We revisit the problem of finding small $\epsilon$-separation keys introduced by Motwani and Xu (VLDB 07). In this problem, the input is $m$-dimensional tuples $x_1,x_2,\ldots,x_n$. The goal is to find a small subset of coordinates that separates at least $(1-\epsilon){n \choose 2}$ pairs of tuples. They provided a fast algorithm that runs on $\Theta(m/\epsilon)$ tuples sampled uniformly at random. We show that the sample size can be improved to $\Theta(m/\sqrt{\epsilon})$. Our algorithm also enjoys a faster running time. To obtain this result, we provide upper and lower bounds on the sample size to solve the following decision problem. Given a subset of coordinates $A$, reject if $A$ separates fewer than $(1-\epsilon){n \choose 2}$ pairs, and accept if $A$ separates all pairs. The algorithm must be correct with probability at least $1-\delta$ for all $A$. We show that for algorithms based on sampling:

- $\Theta(m/\sqrt{\epsilon})$ samples are sufficient and necessary so that $\delta \leq e^{-m}$ and

- $\Omega(\sqrt{\frac{\log m}{\epsilon}})$ samples are necessary so that $\delta$ is a constant.

Our analysis is based on a constrained version of the balls-into-bins problem. We believe our analysis may be of independent interest. We also study a related problem that asks for the following sketching algorithm: with given parameters $\alpha,k$ and $\epsilon$, the algorithm takes a subset of coordinates $A$ of size at most $k$ and returns an estimate of the number of unseparated pairs in $A$ up to a $(1\pm\epsilon)$ factor if it is at least $\alpha {n \choose 2}$. We show that even for constant $\alpha$ and success probability, such a sketching algorithm must use $\Omega(mk \log \epsilon^{-1})$ bits of space; on the other hand, uniform sampling yields a sketch of size $\Theta(\frac{mk \log m}{\alpha \epsilon^2})$ for this purpose.

### Synthesis Cost-Optimal Targeted Mutant Protein Libraries

Authors: Dimitris Papamichail, Madeline Febinger, Shm Almeda, Georgios Papamichail

Protein variant libraries produced by site-directed mutagenesis are a useful tool utilized by protein engineers to explore variants with potentially improved properties, such as activity and stability. These libraries are commonly built by selecting residue positions and alternative beneficial mutations for each position. All possible combinations are then constructed and screened, by incorporating degenerate codons at mutation sites. These degenerate codons often encode additional unwanted amino acids or even STOP codons. Our study aims to take advantage of annealing based recombination of oligonucleotides during synthesis and utilize multiple degenerate codons per mutation site to produce targeted protein libraries devoid of unwanted variants. Toward this goal we created an algorithm to calculate the minimum number of degenerate codons necessary to specify any given amino acid set, and a dynamic programming method that uses this algorithm to optimally partition a DNA target sequence with degeneracies into overlapping oligonucleotides, such that the total cost of synthesis of the target mutant protein library is minimized. Computational experiments show that, for a modest increase in DNA synthesis costs, beneficial variant yields in produced mutant libraries are increased by orders of magnitude, an effect particularly pronounced in large combinatorial libraries.

Protein variant libraries produced by site-directed mutagenesis are a useful tool utilized by protein engineers to explore variants with potentially improved properties, such as activity and stability. These libraries are commonly built by selecting residue positions and alternative beneficial mutations for each position. All possible combinations are then constructed and screened, by incorporating degenerate codons at mutation sites. These degenerate codons often encode additional unwanted amino acids or even STOP codons. Our study aims to take advantage of annealing based recombination of oligonucleotides during synthesis and utilize multiple degenerate codons per mutation site to produce targeted protein libraries devoid of unwanted variants. Toward this goal we created an algorithm to calculate the minimum number of degenerate codons necessary to specify any given amino acid set, and a dynamic programming method that uses this algorithm to optimally partition a DNA target sequence with degeneracies into overlapping oligonucleotides, such that the total cost of synthesis of the target mutant protein library is minimized. Computational experiments show that, for a modest increase in DNA synthesis costs, beneficial variant yields in produced mutant libraries are increased by orders of magnitude, an effect particularly pronounced in large combinatorial libraries.

### Combining Constructive and Perturbative Deep Learning Algorithms for the Capacitated Vehicle Routing Problem

Authors: Roberto García-Torres, Alitzel Adriana Macias-Infante, Santiago Enrique Conant-Pablos, José Carlos Ortiz-Bayliss, Hugo Terashima-Marín

The Capacitated Vehicle Routing Problem is a well-known NP-hard problem that poses the challenge of finding the optimal route of a vehicle delivering products to multiple locations. Recently, new efforts have emerged to create constructive and perturbative heuristics to tackle this problem using Deep Learning. In this paper, we join these efforts to develop the Combined Deep Constructor and Perturbator, which combines two powerful constructive and perturbative Deep Learning-based heuristics, using attention mechanisms at their core. Furthermore, we improve the Attention Model-Dynamic for the Capacitated Vehicle Routing Problem by proposing a memory-efficient algorithm that reduces its memory complexity by a factor of the number of nodes. Our method shows promising results. It demonstrates a cost improvement in common datasets when compared against other multiple Deep Learning methods. It also obtains close results to the state-of-the art heuristics from the Operations Research field. Additionally, the proposed memory efficient algorithm for the Attention Model-Dynamic model enables its use in problem instances with more than 100 nodes.

The Capacitated Vehicle Routing Problem is a well-known NP-hard problem that poses the challenge of finding the optimal route of a vehicle delivering products to multiple locations. Recently, new efforts have emerged to create constructive and perturbative heuristics to tackle this problem using Deep Learning. In this paper, we join these efforts to develop the Combined Deep Constructor and Perturbator, which combines two powerful constructive and perturbative Deep Learning-based heuristics, using attention mechanisms at their core. Furthermore, we improve the Attention Model-Dynamic for the Capacitated Vehicle Routing Problem by proposing a memory-efficient algorithm that reduces its memory complexity by a factor of the number of nodes. Our method shows promising results. It demonstrates a cost improvement in common datasets when compared against other multiple Deep Learning methods. It also obtains close results to the state-of-the art heuristics from the Operations Research field. Additionally, the proposed memory efficient algorithm for the Attention Model-Dynamic model enables its use in problem instances with more than 100 nodes.

## Sunday, November 27

### TR22-170 | The Complexity of the Shortest Vector Problem | Huck Bennett

from ECCC Papers

Computational problems on point lattices play a central role in many areas of computer science including integer programming, coding theory, cryptanalysis, and especially the design of secure cryptosystems. In this survey, we present known results and open questions related to the complexity of the most important of these problems, the Shortest Vector Problem (SVP).
Computational problems on point lattices play a central role in many areas of computer science including integer programming, coding theory, cryptanalysis, and especially the design of secure cryptosystems. In this survey, we present known results and open questions related to the complexity of the most important of these problems, the Shortest Vector Problem (SVP).

## Saturday, November 26

### Postdoc at University of Toronto (apply by December 15, 2022)

from CCI: jobs

The theory group at the University of Toronto anticipates up to three postdoctoral positions beginning September 2023. We seek candidates from all areas of theoretical computer science including algorithms, complexity theory, cryptography, differential privacy, distributed computing, graph theory, quantum computing, and theoretical aspects of machine learning. Website: www.cs.toronto.edu/theory/positions.html Email: sachdeva@cs.toronto.edu

The theory group at the University of Toronto anticipates up to three postdoctoral positions beginning September 2023. We seek candidates from all areas of theoretical computer science including algorithms, complexity theory, cryptography, differential privacy, distributed computing, graph theory, quantum computing, and theoretical aspects of machine learning.

Website: https://www.cs.toronto.edu/theory/positions.html
Email: sachdeva@cs.toronto.edu

By shacharlovett

### TR22-169 | Extractors for Images of Varieties | Zeyu Guo, Ben Lee Volk, Akhil Jalan, David Zuckerman

from ECCC Papers

We construct explicit deterministic extractors for polynomial images of varieties, that is, distributions sampled by applying a low-degree polynomial map $f : \mathbb{F}_q^r \to \mathbb{F}_q^n$ to an element sampled uniformly at random from a $k$-dimensional variety $V \subseteq \mathbb{F}_q^r$. This class of sources generalizes both polynomial sources, studied by Dvir, Gabizon and Wigderson (FOCS 2007, Comput. Complex. 2009), and variety sources, studied by Dvir (CCC 2009, Comput. Complex. 2012). Assuming certain natural non-degeneracy conditions on the map $f$ and the variety $V$, which in particular ensure that the source has enough min-entropy, we extract almost all the min-entropy of the distribution. Unlike the Dvir-Gabizon-Wigderson and Dvir results, our construction works over large enough finite fields of arbitrary characteristic. One key part of our construction is an improved deterministic rank extractor for varieties. As a by-product, we obtain explicit Noether normalization lemmas for affine varieties and affine algebras. Additionally, we generalize a construction of affine extractors with exponentially small error due to Bourgain, Dvir and Leeman (Comput. Complex. 2016) by extending it to all finite prime fields of quasipolynomial size.
We construct explicit deterministic extractors for polynomial images of varieties, that is, distributions sampled by applying a low-degree polynomial map $f : \mathbb{F}_q^r \to \mathbb{F}_q^n$ to an element sampled uniformly at random from a $k$-dimensional variety $V \subseteq \mathbb{F}_q^r$. This class of sources generalizes both polynomial sources, studied by Dvir, Gabizon and Wigderson (FOCS 2007, Comput. Complex. 2009), and variety sources, studied by Dvir (CCC 2009, Comput. Complex. 2012). Assuming certain natural non-degeneracy conditions on the map $f$ and the variety $V$, which in particular ensure that the source has enough min-entropy, we extract almost all the min-entropy of the distribution. Unlike the Dvir-Gabizon-Wigderson and Dvir results, our construction works over large enough finite fields of arbitrary characteristic. One key part of our construction is an improved deterministic rank extractor for varieties. As a by-product, we obtain explicit Noether normalization lemmas for affine varieties and affine algebras. Additionally, we generalize a construction of affine extractors with exponentially small error due to Bourgain, Dvir and Leeman (Comput. Complex. 2016) by extending it to all finite prime fields of quasipolynomial size.

## Thursday, November 24

### Postdoc at National University of Singapore (apply by February 15, 2023)

from CCI: jobs

Postdoc positions hosted by Prashant Nalini Vasudevan. Looking for candidates with a strong background in theory interested in the foundations of cryptography, information-theoretic cryptography, or related areas of complexity theory and algorithms. See website for more details. Website: www.comp.nus.edu.sg/~prashant/ads.html Email: prashant@comp.nus.edu.sg

Postdoc positions hosted by Prashant Nalini Vasudevan. Looking for candidates with a strong background in theory interested in the foundations of cryptography, information-theoretic cryptography, or related areas of complexity theory and algorithms. See website for more details.

Email: prashant@comp.nus.edu.sg

By shacharlovett

### TR22-168 | A Proof of the Generalized Union-Closed Set Conjecture assuming the Union-Closed Set Conjecture | Zubayir Kazi

from ECCC Papers

Abstract The Union Closed Set Conjecture states that if a set system X\subseteq\mathcal{P}([n]) is closed under pairwise unions, then there exists a\in[n] in at least half of the sets of X. We show that there is a very natural generalization of the union closed set conjecture which gives a lower bound for k-set subsets of [n]. This a stronger version of a Conjecture of (Nagel, 2022). We then prove the Conjecture conditional on the Union Closed Set Conjecture using invariants of Union-Closed sets. Additionally, we prove that there exists a k-set in .38^{k}|F| sets of a union closed set X for every n\geq k>0 using the recent improvements in (Gilmer, 2022) and (Alweiss et al, 2022). We explain why our result suggests a lack of sharpness of the original conjecture.
Abstract The Union Closed Set Conjecture states that if a set system X\subseteq\mathcal{P}([n]) is closed under pairwise unions, then there exists a\in[n] in at least half of the sets of X. We show that there is a very natural generalization of the union closed set conjecture which gives a lower bound for k-set subsets of [n]. This a stronger version of a Conjecture of (Nagel, 2022). We then prove the Conjecture conditional on the Union Closed Set Conjecture using invariants of Union-Closed sets. Additionally, we prove that there exists a k-set in .38^{k}|F| sets of a union closed set X for every n\geq k>0 using the recent improvements in (Gilmer, 2022) and (Alweiss et al, 2022). We explain why our result suggests a lack of sharpness of the original conjecture.

### Two Round HotStuff

In the first part of this post we describe a single-shot variation of Two Round HotStuff (see the HotStuff v1 paper) using Locked Broadcast that follows a similar path as our previous post on Paxos and Linear PBFT. In the second part, we describe a fully pipelined multi-shot State Machine...
In the first part of this post we describe a single-shot variation of Two Round HotStuff (see the HotStuff v1 paper) using Locked Broadcast that follows a similar path as our previous post on Paxos and Linear PBFT. In the second part, we describe a fully pipelined multi-shot State Machine...