Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed, or follow on Twitter, to stay up to date.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Powered by Pluto.

Theory of Computing Report

Thursday, June 08

Optimizing Sphere Valued Gaussian Noise Stability

from arXiv: Computational Complexity

Authors: Steven Heilman

We prove a vector-valued inequality for the Gaussian noise stability (i.e. we prove a vector-valued Borell inequality) for Euclidean functions taking values in the two-dimensional sphere, for all correlation parameters at most $1/10$ in absolute value. This inequality was conjectured (for all correlation parameters at most $1$ in absolute value) by Hwang, Neeman, Parekh, Thompson and Wright. Such an inequality is needed to prove sharp computational hardness of the product state Quantum MAX-CUT problem, assuming the Unique Games Conjecture.

Authors: Steven Heilman

We prove a vector-valued inequality for the Gaussian noise stability (i.e. we prove a vector-valued Borell inequality) for Euclidean functions taking values in the two-dimensional sphere, for all correlation parameters at most $1/10$ in absolute value. This inequality was conjectured (for all correlation parameters at most $1$ in absolute value) by Hwang, Neeman, Parekh, Thompson and Wright. Such an inequality is needed to prove sharp computational hardness of the product state Quantum MAX-CUT problem, assuming the Unique Games Conjecture.

Hardness of Deceptive Certificate Selection

from arXiv: Computational Complexity

Authors: Stephan Wäldchen

Recent progress towards theoretical interpretability guarantees for AI has been made with classifiers that are based on interactive proof systems. A prover selects a certificate from the datapoint and sends it to a verifier who decides the class. In the context of machine learning, such a certificate can be a feature that is informative of the class. For a setup with high soundness and completeness, the exchanged certificates must have a high mutual information with the true class of the datapoint. However, this guarantee relies on a bound on the Asymmetric Feature Correlation of the dataset, a property that so far is difficult to estimate for high-dimensional data. It was conjectured in W\"aldchen et al. that it is computationally hard to exploit the AFC, which is what we prove here.

We consider a malicious prover-verifier duo that aims to exploit the AFC to achieve high completeness and soundness while using uninformative certificates. We show that this task is $\mathsf{NP}$-hard and cannot be approximated better than $\mathcal{O}(m^{1/8 - \epsilon})$, where $m$ is the number of possible certificates, for $\epsilon>0$ under the Dense-vs-Random conjecture. This is some evidence that AFC should not prevent the use of interactive classification for real-world tasks, as it is computationally hard to be exploited.

Authors: Stephan Wäldchen

Recent progress towards theoretical interpretability guarantees for AI has been made with classifiers that are based on interactive proof systems. A prover selects a certificate from the datapoint and sends it to a verifier who decides the class. In the context of machine learning, such a certificate can be a feature that is informative of the class. For a setup with high soundness and completeness, the exchanged certificates must have a high mutual information with the true class of the datapoint. However, this guarantee relies on a bound on the Asymmetric Feature Correlation of the dataset, a property that so far is difficult to estimate for high-dimensional data. It was conjectured in W\"aldchen et al. that it is computationally hard to exploit the AFC, which is what we prove here.

We consider a malicious prover-verifier duo that aims to exploit the AFC to achieve high completeness and soundness while using uninformative certificates. We show that this task is $\mathsf{NP}$-hard and cannot be approximated better than $\mathcal{O}(m^{1/8 - \epsilon})$, where $m$ is the number of possible certificates, for $\epsilon>0$ under the Dense-vs-Random conjecture. This is some evidence that AFC should not prevent the use of interactive classification for real-world tasks, as it is computationally hard to be exploited.

Querying Circumscribed Description Logic Knowledge Bases

from arXiv: Computational Complexity

Authors: Carsten Lutz, Quentin Manière, Robin Nolte

Circumscription is one of the main approaches for defining non-monotonic description logics (DLs). While the decidability and complexity of traditional reasoning tasks such as satisfiability of circumscribed DL knowledge bases (KBs) is well understood, for evaluating conjunctive queries (CQs) and unions thereof (UCQs), not even decidability had been established. In this paper, we prove decidability of (U)CQ evaluation on circumscribed DL KBs and obtain a rather complete picture of both the combined complexity and the data complexity, for DLs ranging from ALCHIO via EL to various versions of DL-Lite. We also study the much simpler atomic queries (AQs).

Authors: Carsten Lutz, Quentin Manière, Robin Nolte

Circumscription is one of the main approaches for defining non-monotonic description logics (DLs). While the decidability and complexity of traditional reasoning tasks such as satisfiability of circumscribed DL knowledge bases (KBs) is well understood, for evaluating conjunctive queries (CQs) and unions thereof (UCQs), not even decidability had been established. In this paper, we prove decidability of (U)CQ evaluation on circumscribed DL KBs and obtain a rather complete picture of both the combined complexity and the data complexity, for DLs ranging from ALCHIO via EL to various versions of DL-Lite. We also study the much simpler atomic queries (AQs).

Recognition of Seifert fibered spaces with boundary is in NP

from arXiv: Computational Complexity

Authors: Adele Jackson

We show that the decision problem of recognising whether a triangulated 3-manifold admits a Seifert fibered structure with non-empty boundary is in NP. We also show that the problem of producing Seifert data for a triangulation of such a manifold is in the complexity class FNP. We do this by proving that in any triangulation of a Seifert fibered space with boundary there is both a fundamental horizontal surface of small degree and a complete collection of normal vertical annuli whose total weight is bounded by an exponential in the square of the triangulation size.

Authors: Adele Jackson

We show that the decision problem of recognising whether a triangulated 3-manifold admits a Seifert fibered structure with non-empty boundary is in NP. We also show that the problem of producing Seifert data for a triangulation of such a manifold is in the complexity class FNP. We do this by proving that in any triangulation of a Seifert fibered space with boundary there is both a fundamental horizontal surface of small degree and a complete collection of normal vertical annuli whose total weight is bounded by an exponential in the square of the triangulation size.

Optimal Transport Model Distributional Robustness

from arXiv: Computational Geometry

Authors: Van-Anh Nguyen, Trung Le, Anh Tuan Bui, Thanh-Toan Do, Dinh Phung

Distributional robustness is a promising framework for training deep learning models that are less vulnerable to adversarial examples and data distribution shifts. Previous works have mainly focused on exploiting distributional robustness in data space. In this work, we explore an optimal transport-based distributional robustness framework on model spaces. Specifically, we examine a model distribution in a Wasserstein ball of a given center model distribution that maximizes the loss. We have developed theories that allow us to learn the optimal robust center model distribution. Interestingly, through our developed theories, we can flexibly incorporate the concept of sharpness awareness into training a single model, ensemble models, and Bayesian Neural Networks by considering specific forms of the center model distribution, such as a Dirac delta distribution over a single model, a uniform distribution over several models, and a general Bayesian Neural Network. Furthermore, we demonstrate that sharpness-aware minimization (SAM) is a specific case of our framework when using a Dirac delta distribution over a single model, while our framework can be viewed as a probabilistic extension of SAM. We conduct extensive experiments to demonstrate the usefulness of our framework in the aforementioned settings, and the results show remarkable improvements in our approaches to the baselines.

Authors: Van-Anh Nguyen, Trung Le, Anh Tuan Bui, Thanh-Toan Do, Dinh Phung

Distributional robustness is a promising framework for training deep learning models that are less vulnerable to adversarial examples and data distribution shifts. Previous works have mainly focused on exploiting distributional robustness in data space. In this work, we explore an optimal transport-based distributional robustness framework on model spaces. Specifically, we examine a model distribution in a Wasserstein ball of a given center model distribution that maximizes the loss. We have developed theories that allow us to learn the optimal robust center model distribution. Interestingly, through our developed theories, we can flexibly incorporate the concept of sharpness awareness into training a single model, ensemble models, and Bayesian Neural Networks by considering specific forms of the center model distribution, such as a Dirac delta distribution over a single model, a uniform distribution over several models, and a general Bayesian Neural Network. Furthermore, we demonstrate that sharpness-aware minimization (SAM) is a specific case of our framework when using a Dirac delta distribution over a single model, while our framework can be viewed as a probabilistic extension of SAM. We conduct extensive experiments to demonstrate the usefulness of our framework in the aforementioned settings, and the results show remarkable improvements in our approaches to the baselines.

Point in polygon calculation using vector geometric methods with application to geospatial data

from arXiv: Computational Geometry

Authors: Eyram Schwinger, Ralph Twum, Thomas Katsekpor, Gladys Schwinger

In this work, we designed algorithms for the point in polygon problem based on the ray casting algorithm using equations from vector geometry. The algorithms were implemented using the python programming language. We tested the algorithm against the point in polygon algorithms used by the shapely (and by extension geopandas) library and the OpenCV library using points from the google Open Buildings project. Our algorithm in pure python performed much better than the shapely implementation. It also performed better than the OpenCV implementation when combined with the Numba optimization library. We also performed simulations to verify that our algorithm performance was of the order O(n).

Authors: Eyram Schwinger, Ralph Twum, Thomas Katsekpor, Gladys Schwinger

In this work, we designed algorithms for the point in polygon problem based on the ray casting algorithm using equations from vector geometry. The algorithms were implemented using the python programming language. We tested the algorithm against the point in polygon algorithms used by the shapely (and by extension geopandas) library and the OpenCV library using points from the google Open Buildings project. Our algorithm in pure python performed much better than the shapely implementation. It also performed better than the OpenCV implementation when combined with the Numba optimization library. We also performed simulations to verify that our algorithm performance was of the order O(n).

Linear Time Algorithms for NP-hard Problems restricted to GaTEx Graphs

from arXiv: Data Structures and Algorithms

Authors: Marc Hellmuth, Guillaume E. Scholz

The class of Galled-Tree Explainable (GaTEx) graphs has just recently been discovered as a natural generalization of cographs. Cographs are precisely those graphs that can be uniquely represented by a rooted tree where the leaves of the tree correspond to the vertices of the graph. As a generalization, GaTEx graphs are precisely those graphs that can be uniquely represented by a particular rooted directed acyclic graph (called galled-tree).

We consider here four prominent problems that are, in general, NP-hard: computing the size $\omega(G)$ of a maximum clique, the size $\chi(G)$ of an optimal vertex-coloring and the size $\alpha(G)$ of a maximum independent set of a given graph $G$ as well as determining whether a graph is perfectly orderable. We show here that $\omega(G)$, $\chi(G)$, $\alpha(G)$ can be computed in linear-time for GaTEx graphs $G$. The crucial idea for the linear-time algorithms is to avoid working on the GaTEx graphs $G$ directly, but to use the the galled-trees that explain $G$ as a guide for the algorithms to compute these invariants. In particular, we show first how to employ the galled-tree structure to compute a perfect ordering of GaTEx graphs in linear-time which is then used to determine $\omega(G)$, $\chi(G)$, $\alpha(G)$.

Authors: Marc Hellmuth, Guillaume E. Scholz

The class of Galled-Tree Explainable (GaTEx) graphs has just recently been discovered as a natural generalization of cographs. Cographs are precisely those graphs that can be uniquely represented by a rooted tree where the leaves of the tree correspond to the vertices of the graph. As a generalization, GaTEx graphs are precisely those graphs that can be uniquely represented by a particular rooted directed acyclic graph (called galled-tree).

We consider here four prominent problems that are, in general, NP-hard: computing the size $\omega(G)$ of a maximum clique, the size $\chi(G)$ of an optimal vertex-coloring and the size $\alpha(G)$ of a maximum independent set of a given graph $G$ as well as determining whether a graph is perfectly orderable. We show here that $\omega(G)$, $\chi(G)$, $\alpha(G)$ can be computed in linear-time for GaTEx graphs $G$. The crucial idea for the linear-time algorithms is to avoid working on the GaTEx graphs $G$ directly, but to use the the galled-trees that explain $G$ as a guide for the algorithms to compute these invariants. In particular, we show first how to employ the galled-tree structure to compute a perfect ordering of GaTEx graphs in linear-time which is then used to determine $\omega(G)$, $\chi(G)$, $\alpha(G)$.

One-sided Matrix Completion from Two Observations Per Row

from arXiv: Data Structures and Algorithms

Authors: Steven Cao, Percy Liang, Gregory Valiant

Given only a few observed entries from a low-rank matrix $X$, matrix completion is the problem of imputing the missing entries, and it formalizes a wide range of real-world settings that involve estimating missing data. However, when there are too few observed entries to complete the matrix, what other aspects of the underlying matrix can be reliably recovered? We study one such problem setting, that of "one-sided" matrix completion, where our goal is to recover the right singular vectors of $X$, even in the regime where recovering the left singular vectors is impossible, which arises when there are more rows than columns and very few observations. We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns. We evaluate our algorithm on one-sided recovery of synthetic data and low-coverage genome sequencing. In these settings, our algorithm substantially outperforms standard matrix completion and a variety of direct factorization methods.

Authors: Steven Cao, Percy Liang, Gregory Valiant

Given only a few observed entries from a low-rank matrix $X$, matrix completion is the problem of imputing the missing entries, and it formalizes a wide range of real-world settings that involve estimating missing data. However, when there are too few observed entries to complete the matrix, what other aspects of the underlying matrix can be reliably recovered? We study one such problem setting, that of "one-sided" matrix completion, where our goal is to recover the right singular vectors of $X$, even in the regime where recovering the left singular vectors is impossible, which arises when there are more rows than columns and very few observations. We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns. We evaluate our algorithm on one-sided recovery of synthetic data and low-coverage genome sequencing. In these settings, our algorithm substantially outperforms standard matrix completion and a variety of direct factorization methods.

Quantum Distance Calculation for $\epsilon$-Graph Construction

from arXiv: Data Structures and Algorithms

Authors: Naomi Mona Chmielewski (EDF R&D OSIRIS, L2S), Nina Amini (CNRS, L2S), Paulin Jacquot (EDF R&D OSIRIS), Joseph Mikael (EDF R&D OSIRIS)

In machine learning and particularly in topological data analysis, $\epsilon$-graphs are important tools but are generally hard to compute as the distance calculation between n points takes time O(n^2) classically. Recently, quantum approaches for calculating distances between n quantum states have been proposed, taking advantage of quantum superposition and entanglement. We investigate the potential for quantum advantage in the case of quantum distance calculation for computing $\epsilon$-graphs. We show that, relying on existing quantum multi-state SWAP test based algorithms, the query complexity for correctly identifying (with a given probability) that two points are not $\epsilon$-neighbours is at least O(n^3 / ln n), showing that this approach, if used directly for $\epsilon$-graph construction, does not bring a computational advantage when compared to a classical approach.

Authors: Naomi Mona Chmielewski (EDF R&D OSIRIS, L2S), Nina Amini (CNRS, L2S), Paulin Jacquot (EDF R&D OSIRIS), Joseph Mikael (EDF R&D OSIRIS)

In machine learning and particularly in topological data analysis, $\epsilon$-graphs are important tools but are generally hard to compute as the distance calculation between n points takes time O(n^2) classically. Recently, quantum approaches for calculating distances between n quantum states have been proposed, taking advantage of quantum superposition and entanglement. We investigate the potential for quantum advantage in the case of quantum distance calculation for computing $\epsilon$-graphs. We show that, relying on existing quantum multi-state SWAP test based algorithms, the query complexity for correctly identifying (with a given probability) that two points are not $\epsilon$-neighbours is at least O(n^3 / ln n), showing that this approach, if used directly for $\epsilon$-graph construction, does not bring a computational advantage when compared to a classical approach.

Matroid-Constrained Vertex Cover

from arXiv: Data Structures and Algorithms

Authors: Chien-Chung Huang, François Sellier

In this paper, we introduce the problem of Matroid-Constrained Vertex Cover: given a graph with weights on the edges and a matroid imposed on the vertices, our problem is to choose a subset of vertices that is independent in the matroid, with the objective of maximizing the total weight of covered edges. This problem is a generalization of the much studied max $k$-vertex cover problem, in which the matroid is the simple uniform matroid, and it is also a special case of the problem of maximizing a monotone submodular function under a matroid constraint.

First, we give a Fixed-Parameter Tractable Approximation Scheme (FPT-AS) when the given matroid is a partition matroid, a laminar matroid, or a transversal matroid. Precisely, if $k$ is the rank of the matroid, we obtain $(1 - \varepsilon)$ approximation using $(1/\varepsilon)^{O(k)}n^{O(1)}$ time for partition and laminar matroids and using $(1/\varepsilon+k)^{O(k)}n^{O(1)}$ time for transversal matroids. This extends a result of Manurangsi for uniform matroids [Manurangsi, 2018]. We also show that these ideas can be applied in the context of (single-pass) streaming algorithms. Besides, our FPT-AS introduces a new technique based on matroid union, which may be of independent interest in extremal combinatorics.

In the second part, we consider general matroids. We propose a simple local search algorithm that guarantees $2/3 \approx 0.66$ approximation. For the more general problem where two matroids are imposed on the vertices and a feasible solution must be a common independent set, we show that a local search algorithm gives a $2/3 \cdot (1 - 1/(p+1))$ approximation in $n^{O(p)}$ time, for any integer $p$. We also provide some evidence to show that with the constraint of one or two matroids, the approximation ratio of $2/3$ is likely the best possible, using the currently known techniques of local search.

Authors: Chien-Chung Huang, François Sellier

In this paper, we introduce the problem of Matroid-Constrained Vertex Cover: given a graph with weights on the edges and a matroid imposed on the vertices, our problem is to choose a subset of vertices that is independent in the matroid, with the objective of maximizing the total weight of covered edges. This problem is a generalization of the much studied max $k$-vertex cover problem, in which the matroid is the simple uniform matroid, and it is also a special case of the problem of maximizing a monotone submodular function under a matroid constraint.

First, we give a Fixed-Parameter Tractable Approximation Scheme (FPT-AS) when the given matroid is a partition matroid, a laminar matroid, or a transversal matroid. Precisely, if $k$ is the rank of the matroid, we obtain $(1 - \varepsilon)$ approximation using $(1/\varepsilon)^{O(k)}n^{O(1)}$ time for partition and laminar matroids and using $(1/\varepsilon+k)^{O(k)}n^{O(1)}$ time for transversal matroids. This extends a result of Manurangsi for uniform matroids [Manurangsi, 2018]. We also show that these ideas can be applied in the context of (single-pass) streaming algorithms. Besides, our FPT-AS introduces a new technique based on matroid union, which may be of independent interest in extremal combinatorics.

In the second part, we consider general matroids. We propose a simple local search algorithm that guarantees $2/3 \approx 0.66$ approximation. For the more general problem where two matroids are imposed on the vertices and a feasible solution must be a common independent set, we show that a local search algorithm gives a $2/3 \cdot (1 - 1/(p+1))$ approximation in $n^{O(p)}$ time, for any integer $p$. We also provide some evidence to show that with the constraint of one or two matroids, the approximation ratio of $2/3$ is likely the best possible, using the currently known techniques of local search.

On Computing Optimal Tree Ensembles

from arXiv: Data Structures and Algorithms

Authors: Christian Komusiewicz, Pascal Kunz, Frank Sommer, Manuel Sorge

Random forests and, more generally, (decision\nobreakdash-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are not aware of such research for tree ensembles and aim to contribute to this area. Mainly, we provide two novel algorithms and corresponding lower bounds. First, we are able to carry over and substantially improve on tractability results for decision trees, obtaining a $(6\delta D S)^S \cdot poly$-time algorithm, where $S$ is the number of cuts in the tree ensemble, $D$ the largest domain size, and $\delta$ is the largest number of features in which two examples differ. To achieve this, we introduce the witness-tree technique which also seems promising for practice. Second, we show that dynamic programming, which has been successful for decision trees, may also be viable for tree ensembles, providing an $\ell^n \cdot poly$-time algorithm, where $\ell$ is the number of trees and $n$ the number of examples. Finally, we compare the number of cuts necessary to classify training data sets for decision trees and tree ensembles, showing that ensembles may need exponentially fewer cuts for increasing number of trees.

Authors: Christian Komusiewicz, Pascal Kunz, Frank Sommer, Manuel Sorge

Random forests and, more generally, (decision\nobreakdash-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are not aware of such research for tree ensembles and aim to contribute to this area. Mainly, we provide two novel algorithms and corresponding lower bounds. First, we are able to carry over and substantially improve on tractability results for decision trees, obtaining a $(6\delta D S)^S \cdot poly$-time algorithm, where $S$ is the number of cuts in the tree ensemble, $D$ the largest domain size, and $\delta$ is the largest number of features in which two examples differ. To achieve this, we introduce the witness-tree technique which also seems promising for practice. Second, we show that dynamic programming, which has been successful for decision trees, may also be viable for tree ensembles, providing an $\ell^n \cdot poly$-time algorithm, where $\ell$ is the number of trees and $n$ the number of examples. Finally, we compare the number of cuts necessary to classify training data sets for decision trees and tree ensembles, showing that ensembles may need exponentially fewer cuts for increasing number of trees.

Maintaining the cycle structure of dynamic permutations

from arXiv: Data Structures and Algorithms

Authors: Zsuzsanna Lipták, Francesco Masillo, Gonzalo Navarro

We present a new data structure for maintaining dynamic permutations, which we call a $\textit{forest of splay trees (FST)}$. The FST allows one to efficiently maintain the cycle structure of a permutation $\pi$ when the allowed updates are transpositions. The structure stores one conceptual splay tree for each cycle of $\pi$, using the position within the cycle as the key. Updating $\pi$ to $\tau\cdot\pi$, for a transposition $\tau$, takes $\mathcal{O}(\log n)$ amortized time, where $n$ is the size of $\pi$. The FST computes any $\pi(i)$, $\pi^{-1}(i)$, $\pi^k(i)$ and $\pi^{-k}(i)$, in $\mathcal{O}(\log n)$ amortized time. Further, it supports cycle-specific queries such as determining whether two elements belong to the same cycle, flip a segment of a cycle, and others, again within $\mathcal{O}(\log n)$ amortized time.

Authors: Zsuzsanna Lipták, Francesco Masillo, Gonzalo Navarro

We present a new data structure for maintaining dynamic permutations, which we call a $\textit{forest of splay trees (FST)}$. The FST allows one to efficiently maintain the cycle structure of a permutation $\pi$ when the allowed updates are transpositions. The structure stores one conceptual splay tree for each cycle of $\pi$, using the position within the cycle as the key. Updating $\pi$ to $\tau\cdot\pi$, for a transposition $\tau$, takes $\mathcal{O}(\log n)$ amortized time, where $n$ is the size of $\pi$. The FST computes any $\pi(i)$, $\pi^{-1}(i)$, $\pi^k(i)$ and $\pi^{-k}(i)$, in $\mathcal{O}(\log n)$ amortized time. Further, it supports cycle-specific queries such as determining whether two elements belong to the same cycle, flip a segment of a cycle, and others, again within $\mathcal{O}(\log n)$ amortized time.

Wednesday, June 07

On the complexity of isomorphism problems for tensors, groups, and polynomials III: actions by classical groups

from arXiv: Computational Complexity

Authors: Zhili Chen, Joshua A. Grochow, Youming Qiao, Gang Tang, Chuanqi Zhang

We study the complexity of isomorphism problems for d-way arrays, or tensors, under natural actions by classical groups such as orthogonal, unitary, and symplectic groups. Such problems arise naturally in statistical data analysis and quantum information. We study two types of complexity-theoretic questions. First, for a fixed action type (isomorphism, conjugacy, etc.), we relate the complexity of the isomorphism problem over a classical group to that over the general linear group. Second, for a fixed group type (orthogonal, unitary, or symplectic), we compare the complexity of the decision problems for different actions.

Our main results are as follows. First, for orthogonal and symplectic groups acting on 3-way arrays, the isomorphism problems reduce to the corresponding problem over the general linear group. Second, for orthogonal and unitary groups, the isomorphism problems of five natural actions on 3-way arrays are polynomial-time equivalent, and the d-tensor isomorphism problem reduces to the 3-tensor isomorphism problem for any fixed d>3. For unitary groups, the preceding result implies that LOCC classification of tripartite quantum states is at least as difficult as LOCC classification of d-partite quantum states for any d. Lastly, we also show that the graph isomorphism problem reduces to the tensor isomorphism problem over orthogonal and unitary groups.

Authors: Zhili Chen, Joshua A. Grochow, Youming Qiao, Gang Tang, Chuanqi Zhang

We study the complexity of isomorphism problems for d-way arrays, or tensors, under natural actions by classical groups such as orthogonal, unitary, and symplectic groups. Such problems arise naturally in statistical data analysis and quantum information. We study two types of complexity-theoretic questions. First, for a fixed action type (isomorphism, conjugacy, etc.), we relate the complexity of the isomorphism problem over a classical group to that over the general linear group. Second, for a fixed group type (orthogonal, unitary, or symplectic), we compare the complexity of the decision problems for different actions.

Our main results are as follows. First, for orthogonal and symplectic groups acting on 3-way arrays, the isomorphism problems reduce to the corresponding problem over the general linear group. Second, for orthogonal and unitary groups, the isomorphism problems of five natural actions on 3-way arrays are polynomial-time equivalent, and the d-tensor isomorphism problem reduces to the 3-tensor isomorphism problem for any fixed d>3. For unitary groups, the preceding result implies that LOCC classification of tripartite quantum states is at least as difficult as LOCC classification of d-partite quantum states for any d. Lastly, we also show that the graph isomorphism problem reduces to the tensor isomorphism problem over orthogonal and unitary groups.

On the Role of Entanglement and Statistics in Learning

from arXiv: Computational Complexity

Authors: Srinivasan Arunachalam, Vojtech Havlicek, Louis Schatzki

In this work we make progress in understanding the relationship between learning models with access to entangled, separable and statistical measurements in the quantum statistical query (QSQ) model. To this end, we show the following results.

$\textbf{Entangled versus separable measurements.}$ The goal here is to learn an unknown $f$ from the concept class $C\subseteq \{f:\{0,1\}^n\rightarrow [k]\}$ given copies of $\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$. We show that, if $T$ copies suffice to learn $f$ using entangled measurements, then $O(nT^2)$ copies suffice to learn $f$ using just separable measurements.

$\textbf{Entangled versus statistical measurements}$ The goal here is to learn a function $f \in C$ given access to separable measurements and statistical measurements. We exhibit a class $C$ that gives an exponential separation between QSQ learning and quantum learning with entangled measurements (even in the presence of noise). This proves the "quantum analogue" of the seminal result of Blum et al. [BKW'03]. that separates classical SQ and PAC learning with classification noise.

$\textbf{QSQ lower bounds for learning states.}$ We introduce a quantum statistical query dimension (QSD), which we use to give lower bounds on the QSQ learning. With this we prove superpolynomial QSQ lower bounds for testing purity, shadow tomography, Abelian hidden subgroup problem, degree-$2$ functions, planted bi-clique states and output states of Clifford circuits of depth $\textsf{polylog}(n)$.

$\textbf{Further applications.}$ We give and $\textit{unconditional}$ separation between weak and strong error mitigation and prove lower bounds for learning distributions in the QSQ model. Prior works by Quek et al. [QFK+'22], Hinsche et al. [HIN+'22], and Nietner et al. [NIS+'23] proved the analogous results $\textit{assuming}$ diagonal measurements and our work removes this assumption.

Authors: Srinivasan Arunachalam, Vojtech Havlicek, Louis Schatzki

In this work we make progress in understanding the relationship between learning models with access to entangled, separable and statistical measurements in the quantum statistical query (QSQ) model. To this end, we show the following results.

$\textbf{Entangled versus separable measurements.}$ The goal here is to learn an unknown $f$ from the concept class $C\subseteq \{f:\{0,1\}^n\rightarrow [k]\}$ given copies of $\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$. We show that, if $T$ copies suffice to learn $f$ using entangled measurements, then $O(nT^2)$ copies suffice to learn $f$ using just separable measurements.

$\textbf{Entangled versus statistical measurements}$ The goal here is to learn a function $f \in C$ given access to separable measurements and statistical measurements. We exhibit a class $C$ that gives an exponential separation between QSQ learning and quantum learning with entangled measurements (even in the presence of noise). This proves the "quantum analogue" of the seminal result of Blum et al. [BKW'03]. that separates classical SQ and PAC learning with classification noise.

$\textbf{QSQ lower bounds for learning states.}$ We introduce a quantum statistical query dimension (QSD), which we use to give lower bounds on the QSQ learning. With this we prove superpolynomial QSQ lower bounds for testing purity, shadow tomography, Abelian hidden subgroup problem, degree-$2$ functions, planted bi-clique states and output states of Clifford circuits of depth $\textsf{polylog}(n)$.

$\textbf{Further applications.}$ We give and $\textit{unconditional}$ separation between weak and strong error mitigation and prove lower bounds for learning distributions in the QSQ model. Prior works by Quek et al. [QFK+'22], Hinsche et al. [HIN+'22], and Nietner et al. [NIS+'23] proved the analogous results $\textit{assuming}$ diagonal measurements and our work removes this assumption.

Three Candidate Plurality is Stablest for Correlations at most 1/11

from arXiv: Computational Complexity

Authors: Steven Heilman

We prove the three candidate Plurality is Stablest Conjecture of Khot-Kindler-Mossel-O'Donnell from 2005 for correlations $\rho$ satisfying $-1/36<\rho<1/11$: the Plurality function is the most noise stable three candidate election method with small influences, when the corrupted votes have correlation $-1/36<\rho<1/11$ with the original votes. The previous best result of this type only achieved positive correlations at most $10^{-10^{10}}$. Our result follows by solving the three set Standard Simplex Conjecture of Isaksson-Mossel from 2011 for all correlations $-1/36<\rho<1/11$.

The Gaussian Double Bubble Problem corresponds to the case $\rho\to1^{-}$, so in some sense, our result is a generalization of the Gaussian Double Bubble Problem. Our result is also notable since it is the first result for any $\rho<0$, which is the only relevant case for computational hardness of MAX-3-CUT. As an additional corollary, we conclude that three candidate Borda Count is stablest for all $-1/36<\rho<1/11$.

Authors: Steven Heilman

We prove the three candidate Plurality is Stablest Conjecture of Khot-Kindler-Mossel-O'Donnell from 2005 for correlations $\rho$ satisfying $-1/36<\rho<1/11$: the Plurality function is the most noise stable three candidate election method with small influences, when the corrupted votes have correlation $-1/36<\rho<1/11$ with the original votes. The previous best result of this type only achieved positive correlations at most $10^{-10^{10}}$. Our result follows by solving the three set Standard Simplex Conjecture of Isaksson-Mossel from 2011 for all correlations $-1/36<\rho<1/11$.

The Gaussian Double Bubble Problem corresponds to the case $\rho\to1^{-}$, so in some sense, our result is a generalization of the Gaussian Double Bubble Problem. Our result is also notable since it is the first result for any $\rho<0$, which is the only relevant case for computational hardness of MAX-3-CUT. As an additional corollary, we conclude that three candidate Borda Count is stablest for all $-1/36<\rho<1/11$.

Complexity of Anchored Crossing Number and Crossing Number of Almost Planar Graphs

from arXiv: Computational Geometry

Authors: Petr Hliněný

In this paper we deal with the problem of computing the exact crossing number of almost planar graphs and the closely related problem of computing the exact anchored crossing number of a pair of planar graphs. It was shown by [Cabello and Mohar, 2013] that both problems are NP-hard; although they required an unbounded number of high-degree vertices (in the first problem) or an unbounded number of anchors (in the second problem) to prove their result. Somehow surprisingly, only three vertices of degree greater than 3, or only three anchors, are sufficient to maintain hardness of these problems, as we prove here. The new result also improves the previous result on hardness of joint crossing number on surfaces by [Hlin\v{e}n\'y and Salazar, 2015]. Our result is best possible in the anchored case since the anchored crossing number of a pair of planar graphs with two anchors each is trivial, and close to being best possible in the almost planar case since the crossing number is efficiently computable for almost planar graphs of maximum degree 3 [Riskin 1996, Cabello and Mohar 2011].

Authors: Petr Hliněný

In this paper we deal with the problem of computing the exact crossing number of almost planar graphs and the closely related problem of computing the exact anchored crossing number of a pair of planar graphs. It was shown by [Cabello and Mohar, 2013] that both problems are NP-hard; although they required an unbounded number of high-degree vertices (in the first problem) or an unbounded number of anchors (in the second problem) to prove their result. Somehow surprisingly, only three vertices of degree greater than 3, or only three anchors, are sufficient to maintain hardness of these problems, as we prove here. The new result also improves the previous result on hardness of joint crossing number on surfaces by [Hlin\v{e}n\'y and Salazar, 2015]. Our result is best possible in the anchored case since the anchored crossing number of a pair of planar graphs with two anchors each is trivial, and close to being best possible in the almost planar case since the crossing number is efficiently computable for almost planar graphs of maximum degree 3 [Riskin 1996, Cabello and Mohar 2011].

Tight Complexity Bounds for Counting Generalized Dominating Sets in Bounded-Treewidth Graphs Part II: Hardness Results

from arXiv: Data Structures and Algorithms

Authors: Jacob Focke, Dániel Marx, Fionn Mc Inerney, Daniel Neuen, Govind S. Sankar, Philipp Schepper, Philip Wellnitz

For a well-studied family of domination-type problems, in bounded-treewidth graphs, we investigate whether it is possible to find faster algorithms. For sets $\sigma,\rho$ of non-negative integers, a $(\sigma,\rho)$-set of a graph $G$ is a set $S$ of vertices such that $|N(u)\cap S|\in \sigma$ for every $u\in S$, and $|N(v)\cap S|\in \rho$ for every $v\not\in S$. The problem of finding a $(\sigma,\rho)$-set (of a certain size) unifies common problems like $\text{Independent Set}$, $\text{Dominating Set}$, $\text{Independent Dominating Set}$, and many others.

In an accompanying paper, it is proven that, for all pairs of finite or cofinite sets $(\sigma,\rho)$, there is an algorithm that counts $(\sigma,\rho)$-sets in time $(c_{\sigma,\rho})^{\text{tw}}\cdot n^{O(1)}$ (if a tree decomposition of width $\text{tw}$ is given in the input). Here, $c_{\sigma,\rho}$ is a constant with an intricate dependency on $\sigma$ and $\rho$. Despite this intricacy, we show that the algorithms in the accompanying paper are most likely optimal, i.e., for any pair $(\sigma, \rho)$ of finite or cofinite sets where the problem is non-trivial, and any $\varepsilon>0$, a $(c_{\sigma,\rho}-\varepsilon)^{\text{tw}}\cdot n^{O(1)}$-algorithm counting the number of $(\sigma,\rho)$-sets would violate the Counting Strong Exponential-Time Hypothesis ($\#$SETH). For finite sets $\sigma$ and $\rho$, our lower bounds also extend to the decision version, showing that those algorithms are optimal in this setting as well.

Authors: Jacob Focke, Dániel Marx, Fionn Mc Inerney, Daniel Neuen, Govind S. Sankar, Philipp Schepper, Philip Wellnitz

For a well-studied family of domination-type problems, in bounded-treewidth graphs, we investigate whether it is possible to find faster algorithms. For sets $\sigma,\rho$ of non-negative integers, a $(\sigma,\rho)$-set of a graph $G$ is a set $S$ of vertices such that $|N(u)\cap S|\in \sigma$ for every $u\in S$, and $|N(v)\cap S|\in \rho$ for every $v\not\in S$. The problem of finding a $(\sigma,\rho)$-set (of a certain size) unifies common problems like $\text{Independent Set}$, $\text{Dominating Set}$, $\text{Independent Dominating Set}$, and many others.

In an accompanying paper, it is proven that, for all pairs of finite or cofinite sets $(\sigma,\rho)$, there is an algorithm that counts $(\sigma,\rho)$-sets in time $(c_{\sigma,\rho})^{\text{tw}}\cdot n^{O(1)}$ (if a tree decomposition of width $\text{tw}$ is given in the input). Here, $c_{\sigma,\rho}$ is a constant with an intricate dependency on $\sigma$ and $\rho$. Despite this intricacy, we show that the algorithms in the accompanying paper are most likely optimal, i.e., for any pair $(\sigma, \rho)$ of finite or cofinite sets where the problem is non-trivial, and any $\varepsilon>0$, a $(c_{\sigma,\rho}-\varepsilon)^{\text{tw}}\cdot n^{O(1)}$-algorithm counting the number of $(\sigma,\rho)$-sets would violate the Counting Strong Exponential-Time Hypothesis ($\#$SETH). For finite sets $\sigma$ and $\rho$, our lower bounds also extend to the decision version, showing that those algorithms are optimal in this setting as well.

On the Parameterized Complexity of Computing $st$-Orientations with Few Transitive Edges

from arXiv: Data Structures and Algorithms

Authors: Carla Binucci, Giuseppe Liotta, Fabrizio Montecchiani, Giacomo Ortali, Tommaso Piselli

Orienting the edges of an undirected graph such that the resulting digraph satisfies some given constraints is a classical problem in graph theory, with multiple algorithmic applications. In particular, an $st$-orientation orients each edge of the input graph such that the resulting digraph is acyclic, and it contains a single source $s$ and a single sink $t$. Computing an $st$-orientation of a graph can be done efficiently, and it finds notable applications in graph algorithms and in particular in graph drawing. On the other hand, finding an $st$-orientation with at most $k$ transitive edges is more challenging and it was recently proven to be NP-hard already when $k=0$. We strengthen this result by showing that the problem remains NP-hard even for graphs of bounded diameter, and for graphs of bounded vertex degree. These computational lower bounds naturally raise the question about which structural parameters can lead to tractable parameterizations of the problem. Our main result is a fixed-parameter tractable algorithm parameterized by treewidth.

Authors: Carla Binucci, Giuseppe Liotta, Fabrizio Montecchiani, Giacomo Ortali, Tommaso Piselli

Orienting the edges of an undirected graph such that the resulting digraph satisfies some given constraints is a classical problem in graph theory, with multiple algorithmic applications. In particular, an $st$-orientation orients each edge of the input graph such that the resulting digraph is acyclic, and it contains a single source $s$ and a single sink $t$. Computing an $st$-orientation of a graph can be done efficiently, and it finds notable applications in graph algorithms and in particular in graph drawing. On the other hand, finding an $st$-orientation with at most $k$ transitive edges is more challenging and it was recently proven to be NP-hard already when $k=0$. We strengthen this result by showing that the problem remains NP-hard even for graphs of bounded diameter, and for graphs of bounded vertex degree. These computational lower bounds naturally raise the question about which structural parameters can lead to tractable parameterizations of the problem. Our main result is a fixed-parameter tractable algorithm parameterized by treewidth.

Accelerating Range Minimum Queries with Ray Tracing Cores

from arXiv: Data Structures and Algorithms

Authors: Enzo Meneses, Cristóbal A. Navarro, Héctor Ferrada, Felipe A. Quezada

During the last decade GPU technology has shifted from pure general purpose computation to the inclusion of application specific integrated circuits (ASICs), such as Tensor Cores and Ray Tracing (RT) cores. Although these special purpose GPU cores were designed to further accelerate specific fields such as AI and real-time rendering, recent research has managed to exploit them to further accelerate other tasks that typically used regular GPU computing. In this work we present RTXRMQ, a new approach that can compute range minimum queries (RMQs) with RT cores. The main contribution is the proposal of a geometric solution for RMQ, where elements become triangles that are placed and shaped according to the element's value and position in the array, respectively, such that the closest hit of a ray launched from a point given by the query parameters corresponds to the result of that query. Experimental results show that RTXRMQ is currently best suited for small query ranges relative to the problem size, achieving up to $5\times$ and $2.3\times$ of speedup over state of the art CPU (HRMQ) and GPU (LCA) approaches, respectively. Although for medium and large query ranges RTXRMQ is currently surpassed by LCA, it is still competitive by being $2.5\times$ and $4\times$ faster than HRMQ which is a highly parallel CPU approach. Furthermore, performance scaling experiments across the latest RTX GPU architectures show that if the current RT scaling trend continues, then RTXRMQ's performance would scale at a higher rate than HRMQ and LCA, making the approach even more relevant for future high performance applications that employ batches of RMQs.

Authors: Enzo Meneses, Cristóbal A. Navarro, Héctor Ferrada, Felipe A. Quezada

During the last decade GPU technology has shifted from pure general purpose computation to the inclusion of application specific integrated circuits (ASICs), such as Tensor Cores and Ray Tracing (RT) cores. Although these special purpose GPU cores were designed to further accelerate specific fields such as AI and real-time rendering, recent research has managed to exploit them to further accelerate other tasks that typically used regular GPU computing. In this work we present RTXRMQ, a new approach that can compute range minimum queries (RMQs) with RT cores. The main contribution is the proposal of a geometric solution for RMQ, where elements become triangles that are placed and shaped according to the element's value and position in the array, respectively, such that the closest hit of a ray launched from a point given by the query parameters corresponds to the result of that query. Experimental results show that RTXRMQ is currently best suited for small query ranges relative to the problem size, achieving up to $5\times$ and $2.3\times$ of speedup over state of the art CPU (HRMQ) and GPU (LCA) approaches, respectively. Although for medium and large query ranges RTXRMQ is currently surpassed by LCA, it is still competitive by being $2.5\times$ and $4\times$ faster than HRMQ which is a highly parallel CPU approach. Furthermore, performance scaling experiments across the latest RTX GPU architectures show that if the current RT scaling trend continues, then RTXRMQ's performance would scale at a higher rate than HRMQ and LCA, making the approach even more relevant for future high performance applications that employ batches of RMQs.

Tracking Evolving labels using Cone based Oracles

from arXiv: Data Structures and Algorithms

Authors: Aditya Acharya, David Mount

The evolving data framework was first proposed by Anagnostopoulos et al., where an evolver makes small changes to a structure behind the scenes. Instead of taking a single input and producing a single output, an algorithm judiciously probes the current state of the structure and attempts to continuously maintain a sketch of the structure that is as close as possible to its actual state. There have been a number of problems that have been studied in the evolving framework including our own work on labeled trees. We were motivated by the problem of maintaining a labeling in the plane, where updating the labels require physically moving them. Applications involve tracking evolving disease hot-spots via mobile testing units , and tracking unmanned aerial vehicles. To be specific, we consider the problem of tracking labeled nodes in the plane, where an evolver continuously swaps labels of any two nearby nodes in the background unknown to us. We are tasked with maintaining a hypothesis, an approximate sketch of the locations of these labels, which we can only update by physically moving them over a sparse graph. We assume the existence of an Oracle, which when suitably probed, guides us in fixing our hypothesis.

Authors: Aditya Acharya, David Mount

The evolving data framework was first proposed by Anagnostopoulos et al., where an evolver makes small changes to a structure behind the scenes. Instead of taking a single input and producing a single output, an algorithm judiciously probes the current state of the structure and attempts to continuously maintain a sketch of the structure that is as close as possible to its actual state. There have been a number of problems that have been studied in the evolving framework including our own work on labeled trees. We were motivated by the problem of maintaining a labeling in the plane, where updating the labels require physically moving them. Applications involve tracking evolving disease hot-spots via mobile testing units , and tracking unmanned aerial vehicles. To be specific, we consider the problem of tracking labeled nodes in the plane, where an evolver continuously swaps labels of any two nearby nodes in the background unknown to us. We are tasked with maintaining a hypothesis, an approximate sketch of the locations of these labels, which we can only update by physically moving them over a sparse graph. We assume the existence of an Oracle, which when suitably probed, guides us in fixing our hypothesis.

A Combinatorial Certifying Algorithm for Linear Programming Problems with Gainfree Leontief Substitution Systems

from arXiv: Data Structures and Algorithms

Authors: Kei Kimura, Kazuhisa Makino

Linear programming (LP) problems with gainfree Leontief substitution systems have been intensively studied in economics and operations research, and include the feasibility problem of a class of Horn systems, which arises in, e.g., polyhedral combinatorics and logic. This subclass of LP problems admits a strongly polynomial time algorithm, where devising such an algorithm for general LP problems is one of the major theoretical open questions in mathematical optimization and computer science. Recently, much attention has been paid to devising certifying algorithms in software engineering, since those algorithms enable one to confirm the correctness of outputs of programs with simple computations. In this paper, we provide the first combinatorial (and strongly polynomial time) certifying algorithm for LP problems with gainfree Leontief substitution systems. As a by-product, we answer affirmatively an open question whether the feasibility problem of the class of Horn systems admits a combinatorial certifying algorithm.

Authors: Kei Kimura, Kazuhisa Makino

Linear programming (LP) problems with gainfree Leontief substitution systems have been intensively studied in economics and operations research, and include the feasibility problem of a class of Horn systems, which arises in, e.g., polyhedral combinatorics and logic. This subclass of LP problems admits a strongly polynomial time algorithm, where devising such an algorithm for general LP problems is one of the major theoretical open questions in mathematical optimization and computer science. Recently, much attention has been paid to devising certifying algorithms in software engineering, since those algorithms enable one to confirm the correctness of outputs of programs with simple computations. In this paper, we provide the first combinatorial (and strongly polynomial time) certifying algorithm for LP problems with gainfree Leontief substitution systems. As a by-product, we answer affirmatively an open question whether the feasibility problem of the class of Horn systems admits a combinatorial certifying algorithm.

Rigorous Runtime Analysis of MOEA/D for Solving Multi-Objective Minimum Weight Base Problems

from arXiv: Data Structures and Algorithms

Authors: Anh Viet Do, Aneta Neumann, Frank Neumann, Andrew M. Sutton

We study the multi-objective minimum weight base problem, an abstraction of classical NP-hard combinatorial problems such as the multi-objective minimum spanning tree problem. We prove some important properties of the convex hull of the non-dominated front, such as its approximation quality and an upper bound on the number of extreme points. Using these properties, we give the first run-time analysis of the MOEA/D algorithm for this problem, an evolutionary algorithm that effectively optimizes by decomposing the objectives into single-objective components. We show that the MOEA/D, given an appropriate decomposition setting, finds all extreme points within expected fixed-parameter polynomial time in the oracle model, the parameter being the number of objectives. Experiments are conducted on random bi-objective minimum spanning tree instances, and the results agree with our theoretical findings. Furthermore, compared with a previously studied evolutionary algorithm for the problem GSEMO, MOEA/D finds all extreme points much faster across all instances.

Authors: Anh Viet Do, Aneta Neumann, Frank Neumann, Andrew M. Sutton

We study the multi-objective minimum weight base problem, an abstraction of classical NP-hard combinatorial problems such as the multi-objective minimum spanning tree problem. We prove some important properties of the convex hull of the non-dominated front, such as its approximation quality and an upper bound on the number of extreme points. Using these properties, we give the first run-time analysis of the MOEA/D algorithm for this problem, an evolutionary algorithm that effectively optimizes by decomposing the objectives into single-objective components. We show that the MOEA/D, given an appropriate decomposition setting, finds all extreme points within expected fixed-parameter polynomial time in the oracle model, the parameter being the number of objectives. Experiments are conducted on random bi-objective minimum spanning tree instances, and the results agree with our theoretical findings. Furthermore, compared with a previously studied evolutionary algorithm for the problem GSEMO, MOEA/D finds all extreme points much faster across all instances.

Minimizing Hitting Time between Disparate Groups with Shortcut Edges

from arXiv: Data Structures and Algorithms

Authors: Florian Adriaens, Honglian Wang, Aristides Gionis

Structural bias or segregation of networks refers to situations where two or more disparate groups are present in the network, so that the groups are highly connected internally, but loosely connected to each other. In many cases it is of interest to increase the connectivity of disparate groups so as to, e.g., minimize social friction, or expose individuals to diverse viewpoints. A commonly-used mechanism for increasing the network connectivity is to add edge shortcuts between pairs of nodes. In many applications of interest, edge shortcuts typically translate to recommendations, e.g., what video to watch, or what news article to read next. The problem of reducing structural bias or segregation via edge shortcuts has recently been studied in the literature, and random walks have been an essential tool for modeling navigation and connectivity in the underlying networks. Existing methods, however, either do not offer approximation guarantees, or engineer the objective so that it satisfies certain desirable properties that simplify the optimization~task. In this paper we address the problem of adding a given number of shortcut edges in the network so as to directly minimize the average hitting time and the maximum hitting time between two disparate groups. Our algorithm for minimizing average hitting time is a greedy bicriteria that relies on supermodularity. In contrast, maximum hitting time is not supermodular. Despite, we develop an approximation algorithm for that objective as well, by leveraging connections with average hitting time and the asymmetric k-center problem.

Authors: Florian Adriaens, Honglian Wang, Aristides Gionis

Structural bias or segregation of networks refers to situations where two or more disparate groups are present in the network, so that the groups are highly connected internally, but loosely connected to each other. In many cases it is of interest to increase the connectivity of disparate groups so as to, e.g., minimize social friction, or expose individuals to diverse viewpoints. A commonly-used mechanism for increasing the network connectivity is to add edge shortcuts between pairs of nodes. In many applications of interest, edge shortcuts typically translate to recommendations, e.g., what video to watch, or what news article to read next. The problem of reducing structural bias or segregation via edge shortcuts has recently been studied in the literature, and random walks have been an essential tool for modeling navigation and connectivity in the underlying networks. Existing methods, however, either do not offer approximation guarantees, or engineer the objective so that it satisfies certain desirable properties that simplify the optimization~task. In this paper we address the problem of adding a given number of shortcut edges in the network so as to directly minimize the average hitting time and the maximum hitting time between two disparate groups. Our algorithm for minimizing average hitting time is a greedy bicriteria that relies on supermodularity. In contrast, maximum hitting time is not supermodular. Despite, we develop an approximation algorithm for that objective as well, by leveraging connections with average hitting time and the asymmetric k-center problem.

Representative set statements for delta-matroids and the Mader delta-matroid

from arXiv: Data Structures and Algorithms

Authors: Magnus Wahlström

We present representative sets-style statements for linear delta-matroids, which are set systems that generalize matroids, with important connections to matching theory and graph embeddings. Furthermore, our proof uses a new approach of sieving polynomial families, which generalizes the linear algebra approach of the representative sets lemma to a setting of bounded-degree polynomials. The representative sets statements for linear delta-matroids then follow by analyzing the Pfaffian of the skew-symmetric matrix representing the delta-matroid. Applying the same framework to the determinant instead of the Pfaffian recovers the representative sets lemma for linear matroids. Altogether, this significantly extends the toolbox available for kernelization.

As an application, we show an exact sparsification result for Mader networks: Let $G=(V,E)$ be a graph and $\mathcal{T}$ a partition of a set of terminals $T \subseteq V(G)$, $|T|=k$. A $\mathcal{T}$-path in $G$ is a path with endpoints in distinct parts of $\mathcal{T}$ and internal vertices disjoint from $T$. In polynomial time, we can derive a graph $G'=(V',E')$ with $T \subseteq V(G')$, such that for every subset $S \subseteq T$ there is a packing of $\mathcal{T}$-paths with endpoints $S$ in $G$ if and only if there is one in $G'$, and $|V(G')|=O(k^3)$. This generalizes the (undirected version of the) cut-covering lemma, which corresponds to the case that $\mathcal{T}$ contains only two blocks.

To prove the Mader network sparsification result, we furthermore define the class of Mader delta-matroids, and show that they have linear representations. This should be of independent interest.

Authors: Magnus Wahlström

We present representative sets-style statements for linear delta-matroids, which are set systems that generalize matroids, with important connections to matching theory and graph embeddings. Furthermore, our proof uses a new approach of sieving polynomial families, which generalizes the linear algebra approach of the representative sets lemma to a setting of bounded-degree polynomials. The representative sets statements for linear delta-matroids then follow by analyzing the Pfaffian of the skew-symmetric matrix representing the delta-matroid. Applying the same framework to the determinant instead of the Pfaffian recovers the representative sets lemma for linear matroids. Altogether, this significantly extends the toolbox available for kernelization.

As an application, we show an exact sparsification result for Mader networks: Let $G=(V,E)$ be a graph and $\mathcal{T}$ a partition of a set of terminals $T \subseteq V(G)$, $|T|=k$. A $\mathcal{T}$-path in $G$ is a path with endpoints in distinct parts of $\mathcal{T}$ and internal vertices disjoint from $T$. In polynomial time, we can derive a graph $G'=(V',E')$ with $T \subseteq V(G')$, such that for every subset $S \subseteq T$ there is a packing of $\mathcal{T}$-paths with endpoints $S$ in $G$ if and only if there is one in $G'$, and $|V(G')|=O(k^3)$. This generalizes the (undirected version of the) cut-covering lemma, which corresponds to the case that $\mathcal{T}$ contains only two blocks.

To prove the Mader network sparsification result, we furthermore define the class of Mader delta-matroids, and show that they have linear representations. This should be of independent interest.

Buying Information for Stochastic Optimization

from arXiv: Data Structures and Algorithms

Authors: Mingchen Ma, Christos Tzamos

Stochastic optimization is one of the central problems in Machine Learning and Theoretical Computer Science. In the standard model, the algorithm is given a fixed distribution known in advance. In practice though, one may acquire at a cost extra information to make better decisions. In this paper, we study how to buy information for stochastic optimization and formulate this question as an online learning problem. Assuming the learner has an oracle for the original optimization problem, we design a $2$-competitive deterministic algorithm and a $e/(e-1)$-competitive randomized algorithm for buying information. We show that this ratio is tight as the problem is equivalent to a robust generalization of the ski-rental problem, which we call super-martingale stopping.

We also consider an adaptive setting where the learner can choose to buy information after taking some actions for the underlying optimization problem. We focus on the classic optimization problem, Min-Sum Set Cover, where the goal is to quickly find an action that covers a given request drawn from a known distribution. We provide an $8$-competitive algorithm running in polynomial time that chooses actions and decides when to buy information about the underlying request.

Authors: Mingchen Ma, Christos Tzamos

Stochastic optimization is one of the central problems in Machine Learning and Theoretical Computer Science. In the standard model, the algorithm is given a fixed distribution known in advance. In practice though, one may acquire at a cost extra information to make better decisions. In this paper, we study how to buy information for stochastic optimization and formulate this question as an online learning problem. Assuming the learner has an oracle for the original optimization problem, we design a $2$-competitive deterministic algorithm and a $e/(e-1)$-competitive randomized algorithm for buying information. We show that this ratio is tight as the problem is equivalent to a robust generalization of the ski-rental problem, which we call super-martingale stopping.

We also consider an adaptive setting where the learner can choose to buy information after taking some actions for the underlying optimization problem. We focus on the classic optimization problem, Min-Sum Set Cover, where the goal is to quickly find an action that covers a given request drawn from a known distribution. We provide an $8$-competitive algorithm running in polynomial time that chooses actions and decides when to buy information about the underlying request.

Constant Sequence Extension for Fast Search Using Weighted Hamming Distance

from arXiv: Data Structures and Algorithms

Authors: Zhenyu Weng, Huiping Zhuang, Haizhou Li, Zhiping Lin

Representing visual data using compact binary codes is attracting increasing attention as binary codes are used as direct indices into hash table(s) for fast non-exhaustive search. Recent methods show that ranking binary codes using weighted Hamming distance (WHD) rather than Hamming distance (HD) by generating query-adaptive weights for each bit can better retrieve query-related items. However, search using WHD is slower than that using HD. One main challenge is that the complexity of extending a monotone increasing sequence using WHD to probe buckets in hash table(s) for existing methods is at least proportional to the square of the sequence length, while that using HD is proportional to the sequence length. To overcome this challenge, we propose a novel fast non-exhaustive search method using WHD. The key idea is to design a constant sequence extension algorithm to perform each sequence extension in constant computational complexity and the total complexity is proportional to the sequence length, which is justified by theoretical analysis. Experimental results show that our method is faster than other WHD-based search methods. Also, compared with the HD-based non-exhaustive search method, our method has comparable efficiency but retrieves more query-related items for the dataset of up to one billion items.

Authors: Zhenyu Weng, Huiping Zhuang, Haizhou Li, Zhiping Lin

Representing visual data using compact binary codes is attracting increasing attention as binary codes are used as direct indices into hash table(s) for fast non-exhaustive search. Recent methods show that ranking binary codes using weighted Hamming distance (WHD) rather than Hamming distance (HD) by generating query-adaptive weights for each bit can better retrieve query-related items. However, search using WHD is slower than that using HD. One main challenge is that the complexity of extending a monotone increasing sequence using WHD to probe buckets in hash table(s) for existing methods is at least proportional to the square of the sequence length, while that using HD is proportional to the sequence length. To overcome this challenge, we propose a novel fast non-exhaustive search method using WHD. The key idea is to design a constant sequence extension algorithm to perform each sequence extension in constant computational complexity and the total complexity is proportional to the sequence length, which is justified by theoretical analysis. Experimental results show that our method is faster than other WHD-based search methods. Also, compared with the HD-based non-exhaustive search method, our method has comparable efficiency but retrieves more query-related items for the dataset of up to one billion items.

Efficient Centrality Maximization with Rademacher Averages

from arXiv: Data Structures and Algorithms

Authors: Leonardo Pellegrina

The identification of the set of k most central nodes of a graph, or centrality maximization, is a key task in network analysis, with various applications ranging from finding communities in social and biological networks to understanding which seed nodes are important to diffuse information in a graph. As the exact computation of centrality measures does not scale to modern-sized networks, the most practical solution is to resort to rigorous, but efficiently computable, randomized approximations. In this work we present CentRA, the first algorithm based on progressive sampling to compute high-quality approximations of the set of k most central nodes. CentRA is based on a novel approach to efficiently estimate Monte Carlo Rademacher Averages, a powerful tool from statistical learning theory to compute sharp data-dependent approximation bounds. Then, we study the sample complexity of centrality maximization using the VC-dimension, a key concept from statistical learning theory. We show that the number of random samples required to compute high-quality approximations scales with finer characteristics of the graph, such as its vertex diameter, or of the centrality of interest, significantly improving looser bounds derived from standard techniques. We apply CentRA to analyze large real-world networks, showing that it significantly outperforms the state-of-the-art approximation algorithm in terms of number of samples, running times, and accuracy.

Authors: Leonardo Pellegrina

The identification of the set of k most central nodes of a graph, or centrality maximization, is a key task in network analysis, with various applications ranging from finding communities in social and biological networks to understanding which seed nodes are important to diffuse information in a graph. As the exact computation of centrality measures does not scale to modern-sized networks, the most practical solution is to resort to rigorous, but efficiently computable, randomized approximations. In this work we present CentRA, the first algorithm based on progressive sampling to compute high-quality approximations of the set of k most central nodes. CentRA is based on a novel approach to efficiently estimate Monte Carlo Rademacher Averages, a powerful tool from statistical learning theory to compute sharp data-dependent approximation bounds. Then, we study the sample complexity of centrality maximization using the VC-dimension, a key concept from statistical learning theory. We show that the number of random samples required to compute high-quality approximations scales with finer characteristics of the graph, such as its vertex diameter, or of the centrality of interest, significantly improving looser bounds derived from standard techniques. We apply CentRA to analyze large real-world networks, showing that it significantly outperforms the state-of-the-art approximation algorithm in terms of number of samples, running times, and accuracy.

Tuesday, June 06

A survey of approximation algorithms for capacitated vehicle routing problems

from arXiv: Data Structures and Algorithms

Authors: Yongyu Chen

Finding the shortest travelling tour of vehicles with capacity k from the depot to the customers is called the Capacity vehicle routing problem (CVRP). CVRP plays an essential position in logistics systems, and it is the most intensively studied problem in combinatorial optimization. In complexity, CVRP with k $\ge$ 3 is an NP-hard problem, and it is APX-hard as well. We already knew that it could not be approximated in metric space. Moreover, it is the first problem resisting Arora's famous approximation framework. So, whether there is, a polynomial-time (1+$\epsilon$)-approximation for the Euclidean CVRP for any $\epsilon>0$ is still an open problem. This paper will summarize the research progress from history to up-to-date developments. The survey will be updated periodically.

Authors: Yongyu Chen

Finding the shortest travelling tour of vehicles with capacity k from the depot to the customers is called the Capacity vehicle routing problem (CVRP). CVRP plays an essential position in logistics systems, and it is the most intensively studied problem in combinatorial optimization. In complexity, CVRP with k $\ge$ 3 is an NP-hard problem, and it is APX-hard as well. We already knew that it could not be approximated in metric space. Moreover, it is the first problem resisting Arora's famous approximation framework. So, whether there is, a polynomial-time (1+$\epsilon$)-approximation for the Euclidean CVRP for any $\epsilon>0$ is still an open problem. This paper will summarize the research progress from history to up-to-date developments. The survey will be updated periodically.

Revisiting Garg's 2-Approximation Algorithm for the k-MST Problem in Graphs

from arXiv: Data Structures and Algorithms

Authors: Emmett Breen, Renee Mirka, Zichen Wang, David P. Williamson

This paper revisits the 2-approximation algorithm for $k$-MST presented by Garg in light of a recent paper of Paul et al.. In the $k$-MST problem, the goal is to return a tree spanning $k$ vertices of minimum total edge cost. Paul et al. extend Garg's primal-dual subroutine to improve the approximation ratios for the budgeted prize-collecting traveling salesman and minimum spanning tree problems. We follow their algorithm and analysis to provide a cleaner version of Garg's result. Additionally, we introduce the novel concept of a kernel which allows an easier visualization of the stages of the algorithm and a clearer understanding of the pruning phase. Other notable updates include presenting a linear programming formulation of the $k$-MST problem, including pseudocode, replacing the coloring scheme used by Garg with the simpler concept of neutral sets, and providing an explicit potential function.

Authors: Emmett Breen, Renee Mirka, Zichen Wang, David P. Williamson

This paper revisits the 2-approximation algorithm for $k$-MST presented by Garg in light of a recent paper of Paul et al.. In the $k$-MST problem, the goal is to return a tree spanning $k$ vertices of minimum total edge cost. Paul et al. extend Garg's primal-dual subroutine to improve the approximation ratios for the budgeted prize-collecting traveling salesman and minimum spanning tree problems. We follow their algorithm and analysis to provide a cleaner version of Garg's result. Additionally, we introduce the novel concept of a kernel which allows an easier visualization of the stages of the algorithm and a clearer understanding of the pruning phase. Other notable updates include presenting a linear programming formulation of the $k$-MST problem, including pseudocode, replacing the coloring scheme used by Garg with the simpler concept of neutral sets, and providing an explicit potential function.

Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization

from arXiv: Data Structures and Algorithms

Authors: Ameya Velingker, Maximilian Vötsch, David P. Woodruff, Samson Zhou

We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and $\mathbf{V}\in\{0,1\}^{k\times d}$. Equivalently, we want to find $\mathbf{U}$ and $\mathbf{V}$ that minimize the Frobenius loss $\|\mathbf{U}\mathbf{V} - \mathbf{A}\|_F^2$. Before this work, the state-of-the-art for this problem was the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a $C$-approximation for some constant $C\ge 576$. We give the first $(1+\varepsilon)$-approximation algorithm using running time singly exponential in $k$, where $k$ is typically a small integer. Our techniques generalize to other common variants of the BMF problem, admitting bicriteria $(1+\varepsilon)$-approximation algorithms for $L_p$ loss functions and the setting where matrix operations are performed in $\mathbb{F}_2$. Our approach can be implemented in standard big data models, such as the streaming or distributed models.

Authors: Ameya Velingker, Maximilian Vötsch, David P. Woodruff, Samson Zhou

We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and $\mathbf{V}\in\{0,1\}^{k\times d}$. Equivalently, we want to find $\mathbf{U}$ and $\mathbf{V}$ that minimize the Frobenius loss $\|\mathbf{U}\mathbf{V} - \mathbf{A}\|_F^2$. Before this work, the state-of-the-art for this problem was the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a $C$-approximation for some constant $C\ge 576$. We give the first $(1+\varepsilon)$-approximation algorithm using running time singly exponential in $k$, where $k$ is typically a small integer. Our techniques generalize to other common variants of the BMF problem, admitting bicriteria $(1+\varepsilon)$-approximation algorithms for $L_p$ loss functions and the setting where matrix operations are performed in $\mathbb{F}_2$. Our approach can be implemented in standard big data models, such as the streaming or distributed models.

Auditable data structures: theory and applications

from arXiv: Data Structures and Algorithms

Authors: Andrea Canciani, Claudio Felicioli, Fabio Severino, Domenico Tortola

Every digital process needs to consume some data in order to work properly. It is very common for applications to use some external data in their processes, getting them by sources such as external APIs. Therefore, trusting the received data becomes crucial in such scenarios, considering that if the data are not self-produced by the consumer, the trust in the external data source, or in the data that the source produces, can not always be taken for granted. The most used approach to generate trust in the external source is based on authenticated data structures, that are able to authenticate the source when queried through the generation of proofs. Such proofs are useful to assess authenticity or integrity, however, an external user could also be interested in verifying the data history and its consistency. This problem seems to be unaddressed by current literature, which proposes some approaches aimed at executing audits by internal actors with prior knowledge about the data structures. In this paper, we address the scenario of an external auditor with no data knowledge that wants to verify the data history consistency. We analyze the terminology and the current state of the art of the auditable data structures, then we will propose a general framework to support external audits from both internal and external users.

Authors: Andrea Canciani, Claudio Felicioli, Fabio Severino, Domenico Tortola

Every digital process needs to consume some data in order to work properly. It is very common for applications to use some external data in their processes, getting them by sources such as external APIs. Therefore, trusting the received data becomes crucial in such scenarios, considering that if the data are not self-produced by the consumer, the trust in the external data source, or in the data that the source produces, can not always be taken for granted. The most used approach to generate trust in the external source is based on authenticated data structures, that are able to authenticate the source when queried through the generation of proofs. Such proofs are useful to assess authenticity or integrity, however, an external user could also be interested in verifying the data history and its consistency. This problem seems to be unaddressed by current literature, which proposes some approaches aimed at executing audits by internal actors with prior knowledge about the data structures. In this paper, we address the scenario of an external auditor with no data knowledge that wants to verify the data history consistency. We analyze the terminology and the current state of the art of the auditable data structures, then we will propose a general framework to support external audits from both internal and external users.

Monday, June 05

The (local) unit of intelligence is FLOPs

from Windows on Theory

[Crossposting again on Lesswrong and Windowsontheory, with the hope I am not overstaying my welcome in LW.] Wealth can be measured by dollars. This is not a perfect measurement: it’s hard to account for purchasing power and circumstances when comparing people across varying countries or time periods. However, within a particular place and time, one can measure … Continue reading The (local) unit of intelligence is FLOPs

[Crossposting again on Lesswrong and Windowsontheory, with the hope I am not overstaying my welcome in LW.]


Wealth can be measured by dollars. This is not a perfect measurement: it’s hard to account for purchasing power and circumstances when comparing people across varying countries or time periods. However, within a particular place and time, one can measure wealth in the local currency. It still does not capture everything (e.g., future earnings, social connections). But generally, all else being roughly equal, the more dollars one has, the wealthier one is.

How do we measure intelligence? I am not interested in measuring the intelligence of individual humans or individual animals. Nor am I looking for a universal absolute scale of intelligence on which we could rank humans, elephants, and GPT4. (Indeed, it doesn’t seem that a one-dimensional comparison can be made; for example, we seem to be more intelligent than elephants on most dimensions, but they do have an impressive memory.)  Rather, I want to compare different species within the same genus or different models within the same general architecture (e.g., Transformers). 

I think it’s fair to say that the local unit of intelligence for animal species is neurons. While elephants have larger brains than humans, within the genus Homo, to a first approximation, the bigger the brain, the more intelligent the species. 

(Figure from Bolihus et al.)

I claim that within the current architectures and training frameworks of large language models, the local unit of intelligence is FLOPs. That is, as long as we follow the current paradigm of training transformer-based architectures within best practices of scaling compute and data, the more compute resources (FLOPs) invested in training the model, the more intelligent it is. This is an imperfect measurement, but probably one that is better than trying to give models “IQ exams” that were designed for humans (and even there have dubious value).  Another way to say this is that the intelligence of the model scales with the number of “load-bearing gradient steps” that have gone into training it.

So far, it might seem like a tautology, but as I claimed in the “intelligence forklift” post, this does have some implications. In particular, current general-purpose models such as ChatGPT are built in two phases. The first phase is a pretraining phase, in which the model is trained in a Trillion or more gradient steps on the next-token prediction task. The second phase is the adaptation/fine-tuning phase, in which, whether through instruction-tuning, reinforcement learning on human feedback (RLHF) or other methods, the model is “fine tuned” using fewer than a million gradient steps to be a better instruction-following or chatting agent. In other words, more than 99.9% (maybe as much as 99.9999%) of the FLOPs / gradient steps in training the model are invested during its pretraining phase. (One reason that the fine-tuning phase involves much fewer gradient steps is that, while the first phase can use any static data grabbed from the Internet, the second phase requires data that was especially collected for this task and often needs human labeling as well.) 

The adaptation phase can make a huge difference in the usefulness of the model. The chatbot arena doesn’t even contain non-fine-tuned models, and we can see that smaller but well-tuned models can put up a decent fight against ones that have at least 10 times the parameters (and so roughly at least 100 times the training compute). Unlike sashimi, language models should not be consumed raw.  


However, their “intelligence” is ultimately derived from the FLOPs invested in the base models. (See also this paper on the limitations of fine-tuning to close capability gaps.) Fine-tuning, whether using RL or not, is the proverbial “cherry on the cake” and the pre-trained model captures more than 99.9% of the intelligence of the model.  That pretrained model is not an agent and does not have goals though it can “play one on TV” in the sense of coming up with plans and proposed actions if prompted to do so. (In LW language, a simulator.) This is why a pretrained model can be modeled as an “intelligence forklift”. Just like a forklift supplies strength but is useless without someone driving it, so does the pretrained model supply intelligence, but that intelligence needs to be directed via fine-tuning, conditioning on prompts, etc.  Another way to think of the pre-trained model is as the bee colony and the adapter as the queen. (That is, if the queen bee was actually telling bees what to do rather than just laying eggs.)

In that sense, while I agree with Gwern that agentic models are more useful and that “we don’t want low log-loss error on ImageNet, we want to refind a particular personal photo” , I disagree that “Agent AIs [will be] more intelligent than Tool AIs.” Intelligence and usefulness are not the same thing.

Implications for alignment

If the pre-trained model does not have goals, then there is no sense in “aligning” it. Rather, there is a separation of concerns, with a highly intelligent but goal-less pre-trained model (“forklift”) and a not-so-intelligent but goal-directed adaptor (“driver”). It is the latter one that we need to align:

The component of an AI system that needs to be aligned is not the component that accounts for its intelligence.

That is a hopeful lesson since the adaptor can be a much smaller (e.g. have drastically fewer parameters) and tractable object. However, it does not mean that the alignment problem is easy and that we are insulated from the complexities of the pretrained model:

A forklift with a speed of 1000mph might not be actively trying to kill you, but this could still be the end result.

In particular, we don’t understand the biases the pre-trained model inherits from the data, nor the way that these may play out when we use the model in applications. However, it does seem that for a pretrained model to be as good at its job as possible, it should learn all the biases in its data but not be constrained to any of them. It should be able to adapt to any context real or imagined and be the “perfect actor” that can take on any character’s personality.

The traditional “anthropomorphic” view of intelligence is as something that “belongs” to an individual or agent and that this agent has some sort of preferences or goals (a.k.a a utility function). Hence a potential future super-intelligent AI was thought of as an “alien” that pursues some goals. Under this viewpoint, we want to either “box” the alien to control its impact or “align” its goals to ours. Both of these options treat the AI system as a single component encompassing both goals and intelligence. However, if goals and intelligence parts correspond to different components, we may be able to “take the alien’s brain for a ride” and build a variety of systems that share the same capabilities but have very different objectives and profiles.

To be clear, the “intelligence forklift” view does not preclude building an “anti-aligned” agent on top of a pre-trained model that is maliciousdishonest, and harmful. It just means that such an agent would not have an automatic intelligence advantage over other agents (including humans) since all of them can have access to a shared “intelligence engine” provided by the goal-less pretrained models. This is what I illustrated as “scenario 2” in this figure (taken from my previous post):

What about “self play”?

The above assumes that the intelligence component of a model is obtained by executing gradient steps on static data, but what if this data is itself generated by the model? This is what happened with games such as Go and Chess. Originally models were trained by predicting the next move of human games scraped from the Internet, but to improve beyond the quality of these data, models needed to play against themselves and generate new games. They could then filter out only the most successful ones and hence generate data that is of higher quality than the original games they trained on. (Eventually, it turned out that with this approach you don’t need to start with any data for games such as Chess and Go, hence the “Zero” in AlphaZero.)  

Self-play makes a lot of sense in games where there is a very clear notion of winning and losing, but what would be the analog for language models? I don’t know the answer to this in general, but in the realm of scientific literature, there is an analogous process. The model could play the roles of authors and reviewers alike, generate new papers, subject them to peer review, revise and resubmit, etc. At least in fields that don’t require “wet labs”, this could lead to the model simulating the scientific literature of 2024, then 2025, and so on and so forth. Models that manage to do this would be amazing and would speed up scientific progress tremendously. However, I believe they could still be (just more powerful) “intelligence forklifts”. Model outputs influencing its inputs can lead to a “positive feedback loop,” and so this is not certain. But I do not see an inherent reason why models could not be arbitrarily intelligent and still completely without goals. In the words of Scott Alexander, no matter how intelligent they are, models could still be “enlightened” and realize that

“once you stop obsessing over the character you’re playing, you notice the GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN.”
 

By Boaz Barak

Mathematics of the impossible, draft of a book

from Emanuele Viola

I posted a first draft of the book, here. It has more material than the previous blog posts, including a chapter on communication complexity. I plan a major revision, including adding several chapters, but it seems that won’t happen right away, so I am releasing what I have for now. Any comments are appreciated, either […]

I posted a first draft of the book, here. It has more material than the previous blog posts, including a chapter on communication complexity. I plan a major revision, including adding several chapters, but it seems that won’t happen right away, so I am releasing what I have for now. Any comments are appreciated, either on this blog or via email.

By Manu

Tenure-track faculty at The Australian National University (apply by May 31, 2024)

from CCI: jobs

Tenure-track faculty members in the School of Computing at the Australian National University. Please see the link below for more information. Website: jobs.anu.edu.au/jobs/tenure-track-lecturer-senior-lecturer-associate-professor-school-of-computing-canberra-act-act-australia Email: ahadn.zehmakan@anu.edu.au

Tenure-track faculty members in the School of Computing at the Australian National University. Please see the link below for more information.

Website: https://jobs.anu.edu.au/jobs/tenure-track-lecturer-senior-lecturer-associate-professor-school-of-computing-canberra-act-act-australia
Email: ahadn.zehmakan@anu.edu.au

By shacharlovett

TR23-085 | Average-Case PAC-Learning from Nisan's Natural Proofs | Ari Karchmer

from ECCC Papers

Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds imply algorithms for learning circuits with membership queries over the uniform distribution. Indeed, they exercised this implication to obtain a quasi-polynomial time learning algorithm for ${AC}^0[p]$ circuits, for any prime $p$, by leveraging the existing natural proofs from Razborov (1987) and Smolensky (1987). This achievement raises a logical question: can existing natural proofs be adapted into learning algorithms that utilize random examples and learn over unknown, arbitrary example distributions? In this work, we show that natural circuit lower bounds proven by specific communication complexity arguments (e.g., Nisan (1994)) witness a ``yes'' answer to this question, under the one limitation of average-case learning. Our primary technical contribution demonstrates a connection between the complexity of learning a concept class in the average-case, and the randomized communication complexity of an evaluation game associated with the class. We apply this finding to derive polynomial time average-case PAC-learning algorithms that use only random examples from arbitrary and unknown distributions, for any concept class that may be evaluated by (for instance) a majority vote of linear threshold functions. Additionally, our work contributes to a better understanding of the optimal parameters in XOR lemmas for communication complexity. We address a question posed by Viola and Wigderson (2007) by demonstrating that certain enhancements of parameters in their XOR lemmas are false, assuming the existence of one-way functions.
Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds imply algorithms for learning circuits with membership queries over the uniform distribution. Indeed, they exercised this implication to obtain a quasi-polynomial time learning algorithm for ${AC}^0[p]$ circuits, for any prime $p$, by leveraging the existing natural proofs from Razborov (1987) and Smolensky (1987). This achievement raises a logical question: can existing natural proofs be adapted into learning algorithms that utilize random examples and learn over unknown, arbitrary example distributions? In this work, we show that natural circuit lower bounds proven by specific communication complexity arguments (e.g., Nisan (1994)) witness a ``yes'' answer to this question, under the one limitation of average-case learning. Our primary technical contribution demonstrates a connection between the complexity of learning a concept class in the average-case, and the randomized communication complexity of an evaluation game associated with the class. We apply this finding to derive polynomial time average-case PAC-learning algorithms that use only random examples from arbitrary and unknown distributions, for any concept class that may be evaluated by (for instance) a majority vote of linear threshold functions. Additionally, our work contributes to a better understanding of the optimal parameters in XOR lemmas for communication complexity. We address a question posed by Viola and Wigderson (2007) by demonstrating that certain enhancements of parameters in their XOR lemmas are false, assuming the existence of one-way functions.

Quantifiers: To Parenthesize or not to Parenthesize? Matrix of Formula: To Bracket or not to Bracket?

from Computational Complexity

 For the book 

Computational  Intractability: A Guide to Algorithmic Lower Bounds

by Demaine-Gasarch-Hajiaghayi 

(See  here for a link to a first draft.) 

we had to make some choices about which notation to use. One of the least important ones was the following: 

When defining NP, and in a few other places should we use: 

                                                  (\exists y)(\forall y)[B(x,y)]

or 

                                                   \exists x : \forall y : B(x,y)

or 

                                                    something else.

We ended up doing it the second way.  But I wondered, which, if either, is standard. So I looked in many math and theoretical CS books looking for places they used quantifiers. Here is what I found

a) Most papers and books really don't use quantifiers at all!  This surprised me. 

b) When quantifiers are used, they are used in definitions, not theorems. 

c) One exception is in logic when they deal with formulas as objects onto themselves.  For example, the inductive definition of a formula will have a step:

                       If f(x_1,...,x_n) is a formula then (\exists x_i)[f(x_1,...,x_n)] is a formula. 

d) Here is a list of the few places I saw quantifiers used and if they used parenthesis or not. I say if it has parenthesis (abbreviated Parens)  or not, and if the matrix of the formula is in square brackets, no brackets, or something ese. 

Cook's classic paper . Page 154 Parens, no Brackets (1971) 

Stockmeyer's paper where he defines PH.  Page 6 Parens and Brackets  (1976)

Computers and Intractability by Garey & Johnson. Page 164. Parens and Brackets (1979)

Morass-like construction of aleph_2 trees in L by Devlin.  Page 2 Parens and matrix in Parens (1979)

Descriptive Complexity by Immerman. Page 38 Parens no Brackets (1999) 

Bounded Queries in Recursion Theory by Gasarch and Martin. Parens and Brackets  Throughout the book.  (1999)

Complexity Theory from Godel to Feynman by Rudich. No Parens, No Brackets in Def of PH. (2003) 

Parameterized Complexity Theory by Flum & Grohe. Page 81 no Parens and no Brackets. 

Computational Complexity: A Modern Approach by Arora & Barak. Page 40. No Parens No Brackets.(2007) 

Computational Complexity: A Conceptual Prospective by Goldreich.  Page 114 no parents, no brackets (2008) 

On Quantifer Rank Equivalence between linear orders by Siders. On page 417 they use quantifiers to state a theorem, which is unusual. Parens no brackets.

Presburger arithmetic, Rational Generating Functions, and quasi polynomials by Woods. Parens no  Brackets. (2012) 

Who witness's the Witness by Abel et al. On Page 69 (which the pointer takes you to) No Parens, no brackets. Colons between quantifiers (2018).

e) What to make of all this?

First off- the RARITY of the use of quantifiers really surprised me. The only place I saw them used a lot was my book, co-authored with Georgie Martin,  Bounded Queries in Recursion Theory. Perhaps it would have sold better if I didn't use so many quantifiers. Oh well. 

Second off- Later works don't use parens and brackets. This is most clear if you just look at Complexity Theory Books 

Garey & Johnson - 1979- parens and brackets

Flun & Grohe- 1998- no parens and no brackts

Immerman- 1999 - parens but no brackets (this is the one exception) 

Arora & Barack- 2007 no parens and  no brackets

Goldreich-2008- no parens and no brackets

If you have a complexity theory book around that is not on this list, look up the definition of NP and the definition of the Poly Hierarchy and see (a) if they use parens around the quantifiers, and (b) if they use square brackets or no brackets of something else. Please leave a comment about it so I test the conjecture that parenthesis are just so 1979. 








By gasarch

 For the book 

Computational  Intractability: A Guide to Algorithmic Lower Bounds

by Demaine-Gasarch-Hajiaghayi 

(See  here for a link to a first draft.) 

we had to make some choices about which notation to use. One of the least important ones was the following: 

When defining NP, and in a few other places should we use: 

                                                  (\exists y)(\forall y)[B(x,y)]

or 

                                                   \exists x : \forall y : B(x,y)

or 

                                                    something else.

We ended up doing it the second way.  But I wondered, which, if either, is standard. So I looked in many math and theoretical CS books looking for places they used quantifiers. Here is what I found

a) Most papers and books really don't use quantifiers at all!  This surprised me. 

b) When quantifiers are used, they are used in definitions, not theorems. 

c) One exception is in logic when they deal with formulas as objects onto themselves.  For example, the inductive definition of a formula will have a step:

                       If f(x_1,...,x_n) is a formula then (\exists x_i)[f(x_1,...,x_n)] is a formula. 

d) Here is a list of the few places I saw quantifiers used and if they used parenthesis or not. I say if it has parenthesis (abbreviated Parens)  or not, and if the matrix of the formula is in square brackets, no brackets, or something ese. 

Cook's classic paper . Page 154 Parens, no Brackets (1971) 

Stockmeyer's paper where he defines PH.  Page 6 Parens and Brackets  (1976)

Computers and Intractability by Garey & Johnson. Page 164. Parens and Brackets (1979)

Morass-like construction of aleph_2 trees in L by Devlin.  Page 2 Parens and matrix in Parens (1979)

Descriptive Complexity by Immerman. Page 38 Parens no Brackets (1999) 

Bounded Queries in Recursion Theory by Gasarch and Martin. Parens and Brackets  Throughout the book.  (1999)

Complexity Theory from Godel to Feynman by Rudich. No Parens, No Brackets in Def of PH. (2003) 

Parameterized Complexity Theory by Flum & Grohe. Page 81 no Parens and no Brackets. 

Computational Complexity: A Modern Approach by Arora & Barak. Page 40. No Parens No Brackets.(2007) 

Computational Complexity: A Conceptual Prospective by Goldreich.  Page 114 no parents, no brackets (2008) 

On Quantifer Rank Equivalence between linear orders by SidersOn page 417 they use quantifiers to state a theorem, which is unusual. Parens no brackets.

Presburger arithmetic, Rational Generating Functions, and quasi polynomials by Woods. Parens no  Brackets. (2012) 

Who witness's the Witness by Abel et al. On Page 69 (which the pointer takes you to) No Parens, no brackets. Colons between quantifiers (2018).

e) What to make of all this?

First off- the RARITY of the use of quantifiers really surprised me. The only place I saw them used a lot was my book, co-authored with Georgie Martin,  Bounded Queries in Recursion Theory. Perhaps it would have sold better if I didn't use so many quantifiers. Oh well. 

Second off- Later works don't use parens and brackets. This is most clear if you just look at Complexity Theory Books 

Garey & Johnson - 1979- parens and brackets

Flun & Grohe- 1998- no parens and no brackts

Immerman- 1999 - parens but no brackets (this is the one exception) 

Arora & Barack- 2007 no parens and  no brackets

Goldreich-2008- no parens and no brackets

If you have a complexity theory book around that is not on this list, look up the definition of NP and the definition of the Poly Hierarchy and see (a) if they use parens around the quantifiers, and (b) if they use square brackets or no brackets of something else. Please leave a comment about it so I test the conjecture that parenthesis are just so 1979. 








By gasarch

Complexity of Motion Planning of Arbitrarily Many Robots: Gadgets, Petri Nets, and Counter Machines

from arXiv: Computational Complexity

Authors: Joshua Ani, Michael Coulombe, Erik D. Demaine, Yevhenii Diomidov, Timothy Gomez, Dylan Hendrickson, Jayson Lynch

We extend the motion-planning-through-gadgets framework to several new scenarios involving various numbers of robots/agents, and analyze the complexity of the resulting motion-planning problems. While past work considers just one robot or one robot per player, most of our models allow for one or more locations to spawn new robots in each time step, leading to arbitrarily many robots. In the 0-player context, where all motion is deterministically forced, we prove that deciding whether any robot ever reaches a specified location is undecidable, by representing a counter machine. In the 1-player context, where the player can choose how to move the robots, we prove equivalence to Petri nets, EXPSPACE-completeness for reaching a specified location, PSPACE-completeness for reconfiguration, and ACKERMANN-completeness for reconfiguration when robots can be destroyed in addition to spawned. Finally, we consider a variation on the standard 2-player context where, instead of one robot per player, we have one robot shared by the players, along with a ko rule to prevent immediately undoing the previous move. We prove this impartial 2-player game EXPTIME-complete.

Authors: Joshua Ani, Michael Coulombe, Erik D. Demaine, Yevhenii Diomidov, Timothy Gomez, Dylan Hendrickson, Jayson Lynch

We extend the motion-planning-through-gadgets framework to several new scenarios involving various numbers of robots/agents, and analyze the complexity of the resulting motion-planning problems. While past work considers just one robot or one robot per player, most of our models allow for one or more locations to spawn new robots in each time step, leading to arbitrarily many robots. In the 0-player context, where all motion is deterministically forced, we prove that deciding whether any robot ever reaches a specified location is undecidable, by representing a counter machine. In the 1-player context, where the player can choose how to move the robots, we prove equivalence to Petri nets, EXPSPACE-completeness for reaching a specified location, PSPACE-completeness for reconfiguration, and ACKERMANN-completeness for reconfiguration when robots can be destroyed in addition to spawned. Finally, we consider a variation on the standard 2-player context where, instead of one robot per player, we have one robot shared by the players, along with a ko rule to prevent immediately undoing the previous move. We prove this impartial 2-player game EXPTIME-complete.

Trade-offs between Entanglement and Communication

from arXiv: Computational Complexity

Authors: Srinivasan Arunachalam, Uma Girish

We study the advantages of quantum communication models over classical communication models that are equipped with a limited number of qubits of entanglement. In this direction, we give explicit partial functions on $n$ bits for which reducing the entanglement increases the classical communication complexity exponentially. Our separations are as follows. For every $k\ge 1$:

$Q\|^*$ versus $R2^*$: We show that quantum simultaneous protocols with $\tilde{\Theta}(k^5 \log^3 n)$ qubits of entanglement can exponentially outperform two-way randomized protocols with $O(k)$ qubits of entanglement. This resolves an open problem from [Gav08] and improves the state-of-the-art separations between quantum simultaneous protocols with entanglement and two-way randomized protocols without entanglement [Gav19, GRT22].

$R\|^*$ versus $Q\|^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k \log n)$ qubits of entanglement can exponentially outperform quantum simultaneous protocols with $O(k)$ qubits of entanglement, resolving an open question from [GKRW06, Gav19]. The best result prior to our work was a relational separation against protocols without entanglement [GKRW06].

$R\|^*$ versus $R1^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k\log n)$ qubits of entanglement can exponentially outperform randomized one-way protocols with $O(k)$ qubits of entanglement. Prior to our work, only a relational separation was known [Gav08].

Authors: Srinivasan Arunachalam, Uma Girish

We study the advantages of quantum communication models over classical communication models that are equipped with a limited number of qubits of entanglement. In this direction, we give explicit partial functions on $n$ bits for which reducing the entanglement increases the classical communication complexity exponentially. Our separations are as follows. For every $k\ge 1$:

$Q\|^*$ versus $R2^*$: We show that quantum simultaneous protocols with $\tilde{\Theta}(k^5 \log^3 n)$ qubits of entanglement can exponentially outperform two-way randomized protocols with $O(k)$ qubits of entanglement. This resolves an open problem from [Gav08] and improves the state-of-the-art separations between quantum simultaneous protocols with entanglement and two-way randomized protocols without entanglement [Gav19, GRT22].

$R\|^*$ versus $Q\|^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k \log n)$ qubits of entanglement can exponentially outperform quantum simultaneous protocols with $O(k)$ qubits of entanglement, resolving an open question from [GKRW06, Gav19]. The best result prior to our work was a relational separation against protocols without entanglement [GKRW06].

$R\|^*$ versus $R1^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k\log n)$ qubits of entanglement can exponentially outperform randomized one-way protocols with $O(k)$ qubits of entanglement. Prior to our work, only a relational separation was known [Gav08].

Discreteness of asymptotic tensor ranks

from arXiv: Computational Complexity

Authors: Jop Briët, Matthias Christandl, Itai Leigh, Amir Shpilka, Jeroen Zuiddam

Tensor parameters that are amortized or regularized over large tensor powers, often called "asymptotic" tensor parameters, play a central role in several areas including algebraic complexity theory (constructing fast matrix multiplication algorithms), quantum information (entanglement cost and distillable entanglement), and additive combinatorics (bounds on cap sets, sunflower-free sets, etc.). Examples are the asymptotic tensor rank, asymptotic slice rank and asymptotic subrank. Recent works (Costa-Dalai, Blatter-Draisma-Rupniewski, Christandl-Gesmundo-Zuiddam) have investigated notions of discreteness (no accumulation points) or "gaps" in the values of such tensor parameters.

We prove a general discreteness theorem for asymptotic tensor parameters of order-three tensors and use this to prove that (1) over any finite field, the asymptotic subrank and the asymptotic slice rank have no accumulation points, and (2) over the complex numbers, the asymptotic slice rank has no accumulation points.

Central to our approach are two new general lower bounds on the asymptotic subrank of tensors, which measures how much a tensor can be diagonalized. The first lower bound says that the asymptotic subrank of any concise three-tensor is at least the cube-root of the smallest dimension. The second lower bound says that any three-tensor that is "narrow enough" (has one dimension much smaller than the other two) has maximal asymptotic subrank.

Our proofs rely on new lower bounds on the maximum rank in matrix subspaces that are obtained by slicing a three-tensor in the three different directions. We prove that for any concise tensor the product of any two such maximum ranks must be large, and as a consequence there are always two distinct directions with large max-rank.

Authors: Jop Briët, Matthias Christandl, Itai Leigh, Amir Shpilka, Jeroen Zuiddam

Tensor parameters that are amortized or regularized over large tensor powers, often called "asymptotic" tensor parameters, play a central role in several areas including algebraic complexity theory (constructing fast matrix multiplication algorithms), quantum information (entanglement cost and distillable entanglement), and additive combinatorics (bounds on cap sets, sunflower-free sets, etc.). Examples are the asymptotic tensor rank, asymptotic slice rank and asymptotic subrank. Recent works (Costa-Dalai, Blatter-Draisma-Rupniewski, Christandl-Gesmundo-Zuiddam) have investigated notions of discreteness (no accumulation points) or "gaps" in the values of such tensor parameters.

We prove a general discreteness theorem for asymptotic tensor parameters of order-three tensors and use this to prove that (1) over any finite field, the asymptotic subrank and the asymptotic slice rank have no accumulation points, and (2) over the complex numbers, the asymptotic slice rank has no accumulation points.

Central to our approach are two new general lower bounds on the asymptotic subrank of tensors, which measures how much a tensor can be diagonalized. The first lower bound says that the asymptotic subrank of any concise three-tensor is at least the cube-root of the smallest dimension. The second lower bound says that any three-tensor that is "narrow enough" (has one dimension much smaller than the other two) has maximal asymptotic subrank.

Our proofs rely on new lower bounds on the maximum rank in matrix subspaces that are obtained by slicing a three-tensor in the three different directions. We prove that for any concise tensor the product of any two such maximum ranks must be large, and as a consequence there are always two distinct directions with large max-rank.

Efficient Quantum State Synthesis with One Query

from arXiv: Computational Complexity

Authors: Gregory Rosenthal

We present a polynomial-time quantum algorithm making a single query (in superposition) to a classical oracle, such that for every state $|\psi\rangle$ there exists a choice of oracle that makes the algorithm construct an exponentially close approximation of $|\psi\rangle$. Previous algorithms for this problem either used a linear number of queries and polynomial time [arXiv:1607.05256], or a constant number of queries and polynomially many ancillae but no nontrivial bound on the runtime [arXiv:2111.02999]. As corollaries we do the following:

- We simplify the proof that statePSPACE $\subseteq$ stateQIP [arXiv:2108.07192] (a quantum state analogue of PSPACE $\subseteq$ IP) and show that a constant number of rounds of interaction suffices.

- We show that QAC$\mathsf{_f^0}$ lower bounds for constructing explicit states would imply breakthrough circuit lower bounds for computing explicit boolean functions.

- We prove that every $n$-qubit state can be constructed to within 0.01 error by an $O(2^n/n)$-size circuit over an appropriate finite gate set. More generally we give a size-error tradeoff which, by a counting argument, is optimal for any finite gate set.

Authors: Gregory Rosenthal

We present a polynomial-time quantum algorithm making a single query (in superposition) to a classical oracle, such that for every state $|\psi\rangle$ there exists a choice of oracle that makes the algorithm construct an exponentially close approximation of $|\psi\rangle$. Previous algorithms for this problem either used a linear number of queries and polynomial time [arXiv:1607.05256], or a constant number of queries and polynomially many ancillae but no nontrivial bound on the runtime [arXiv:2111.02999]. As corollaries we do the following:

- We simplify the proof that statePSPACE $\subseteq$ stateQIP [arXiv:2108.07192] (a quantum state analogue of PSPACE $\subseteq$ IP) and show that a constant number of rounds of interaction suffices.

- We show that QAC$\mathsf{_f^0}$ lower bounds for constructing explicit states would imply breakthrough circuit lower bounds for computing explicit boolean functions.

- We prove that every $n$-qubit state can be constructed to within 0.01 error by an $O(2^n/n)$-size circuit over an appropriate finite gate set. More generally we give a size-error tradeoff which, by a counting argument, is optimal for any finite gate set.

Does it pay to optimize AUC?

from arXiv: Computational Geometry

Authors: Baojian Zhou, Steven Skiena

The Area Under the ROC Curve (AUC) is an important model metric for evaluating binary classifiers, and many algorithms have been proposed to optimize AUC approximately. It raises the question of whether the generally insignificant gains observed by previous studies are due to inherent limitations of the metric or the inadequate quality of optimization.

To better understand the value of optimizing for AUC, we present an efficient algorithm, namely AUC-opt, to find the provably optimal AUC linear classifier in $\mathbb{R}^2$, which runs in $\mathcal{O}(n_+ n_- \log (n_+ n_-))$ where $n_+$ and $n_-$ are the number of positive and negative samples respectively. Furthermore, it can be naturally extended to $\mathbb{R}^d$ in $\mathcal{O}((n_+n_-)^{d-1}\log (n_+n_-))$ by calling AUC-opt in lower-dimensional spaces recursively. We prove the problem is NP-complete when $d$ is not fixed, reducing from the \textit{open hemisphere problem}.

Experiments show that compared with other methods, AUC-opt achieves statistically significant improvements on between 17 to 40 in $\mathbb{R}^2$ and between 4 to 42 in $\mathbb{R}^3$ of 50 t-SNE training datasets. However, generally the gain proves insignificant on most testing datasets compared to the best standard classifiers. Similar observations are found for nonlinear AUC methods under real-world datasets.

Authors: Baojian Zhou, Steven Skiena

The Area Under the ROC Curve (AUC) is an important model metric for evaluating binary classifiers, and many algorithms have been proposed to optimize AUC approximately. It raises the question of whether the generally insignificant gains observed by previous studies are due to inherent limitations of the metric or the inadequate quality of optimization.

To better understand the value of optimizing for AUC, we present an efficient algorithm, namely AUC-opt, to find the provably optimal AUC linear classifier in $\mathbb{R}^2$, which runs in $\mathcal{O}(n_+ n_- \log (n_+ n_-))$ where $n_+$ and $n_-$ are the number of positive and negative samples respectively. Furthermore, it can be naturally extended to $\mathbb{R}^d$ in $\mathcal{O}((n_+n_-)^{d-1}\log (n_+n_-))$ by calling AUC-opt in lower-dimensional spaces recursively. We prove the problem is NP-complete when $d$ is not fixed, reducing from the \textit{open hemisphere problem}.

Experiments show that compared with other methods, AUC-opt achieves statistically significant improvements on between 17 to 40 in $\mathbb{R}^2$ and between 4 to 42 in $\mathbb{R}^3$ of 50 t-SNE training datasets. However, generally the gain proves insignificant on most testing datasets compared to the best standard classifiers. Similar observations are found for nonlinear AUC methods under real-world datasets.

No-dimensional Tverberg Partitions Revisited

from arXiv: Computational Geometry

Authors: Sariel Har-Peled, Eliot W. Robson

$ \newcommand{\epsA}{\Mh{\delta}} \newcommand{\Re}{\mathbb{R}} \newcommand{\reals}{\mathbb{R}} \newcommand{\SetX}{\mathsf{X}} \newcommand{\diam}{\Delta} \newcommand{\Mh}[1]{#1} \newcommand{\query}{q} \newcommand{\eps}{\varepsilon} \newcommand{\VorX}[1]{\mathcal{V} \pth{#1}} \newcommand{\IntRange}[1]{[ #1 ]} \newcommand{\Space}{\overline{\mathsf{m}}} \newcommand{\pth}[2][\!]{#1\left({#2}\right)} \newcommand{\polylog}{\mathrm{polylog}} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\pt}{p} \newcommand{\distY}[2]{\left\| {#1} - {#2} \right\|} \newcommand{\PP}{P} \newcommand{\ptq}{q} \newcommand{\pts}{s}$ Given a set $\PP \subset \Re^d$ of $n$ points, with diameter $\diam$, and a parameter $\epsA \in (0,1)$, it is known that there is a partition of $\PP$ into sets $\PP_1, \ldots, \PP_t$, each of size $O(1/\epsA^2)$, such that their convex-hulls all intersect a common ball of radius $\epsA \diam$. We prove that a random partition, with a simple alteration step, yields the desired partition, resulting in a linear time algorithm. Previous proofs were either existential (i.e., at least exponential time), or required much bigger sets. In addition, the algorithm and its proof of correctness are significantly simpler than previous work, and the constants are slightly better.

In addition, we provide a linear time algorithm for computing a ``fuzzy'' centerpoint. We also prove a no-dimensional weak $\eps$-net theorem with an improved constant.

Authors: Sariel Har-Peled, Eliot W. Robson

$ \newcommand{\epsA}{\Mh{\delta}} \newcommand{\Re}{\mathbb{R}} \newcommand{\reals}{\mathbb{R}} \newcommand{\SetX}{\mathsf{X}} \newcommand{\diam}{\Delta} \newcommand{\Mh}[1]{#1} \newcommand{\query}{q} \newcommand{\eps}{\varepsilon} \newcommand{\VorX}[1]{\mathcal{V} \pth{#1}} \newcommand{\IntRange}[1]{[ #1 ]} \newcommand{\Space}{\overline{\mathsf{m}}} \newcommand{\pth}[2][\!]{#1\left({#2}\right)} \newcommand{\polylog}{\mathrm{polylog}} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\pt}{p} \newcommand{\distY}[2]{\left\| {#1} - {#2} \right\|} \newcommand{\PP}{P} \newcommand{\ptq}{q} \newcommand{\pts}{s}$ Given a set $\PP \subset \Re^d$ of $n$ points, with diameter $\diam$, and a parameter $\epsA \in (0,1)$, it is known that there is a partition of $\PP$ into sets $\PP_1, \ldots, \PP_t$, each of size $O(1/\epsA^2)$, such that their convex-hulls all intersect a common ball of radius $\epsA \diam$. We prove that a random partition, with a simple alteration step, yields the desired partition, resulting in a linear time algorithm. Previous proofs were either existential (i.e., at least exponential time), or required much bigger sets. In addition, the algorithm and its proof of correctness are significantly simpler than previous work, and the constants are slightly better.

In addition, we provide a linear time algorithm for computing a ``fuzzy'' centerpoint. We also prove a no-dimensional weak $\eps$-net theorem with an improved constant.

The Maximum Matrix Contraction Problem

from arXiv: Data Structures and Algorithms

Authors: Dimitri Watel (ENSIIE, CEDRIC - OC), Pierre-Louis Poirion (CEDRIC - OC)

In this paper, we introduce the Maximum Matrix Contraction problem, where we aim to contract as much as possible a binary matrix in order to maximize its density. We study the complexity and the polynomial approximability of the problem. Especially, we prove this problem to be NP-Complete and that every algorithm solving this problem is at most a $2\sqrt{n}$-approximation algorithm where n is the number of ones in the matrix. We then focus on efficient algorithms to solve the problem: an integer linear program and three heuristics.

Authors: Dimitri Watel (ENSIIE, CEDRIC - OC), Pierre-Louis Poirion (CEDRIC - OC)

In this paper, we introduce the Maximum Matrix Contraction problem, where we aim to contract as much as possible a binary matrix in order to maximize its density. We study the complexity and the polynomial approximability of the problem. Especially, we prove this problem to be NP-Complete and that every algorithm solving this problem is at most a $2\sqrt{n}$-approximation algorithm where n is the number of ones in the matrix. We then focus on efficient algorithms to solve the problem: an integer linear program and three heuristics.

Improved Algorithms for Distance Selection and Related Problems

from arXiv: Data Structures and Algorithms

Authors: Haitao Wang, Yiming Zhao

In this paper, we propose new techniques for solving geometric optimization problems involving interpoint distances of a point set in the plane. Given a set $P$ of $n$ points in the plane and an integer $1 \leq k \leq \binom{n}{2}$, the distance selection problem is to find the $k$-th smallest interpoint distance among all pairs of points of $P$. The previously best deterministic algorithm solves the problem in $O(n^{4/3} \log^2 n)$ time [Katz and Sharir, SIAM J. Comput. 1997 and SoCG 1993]. In this paper, we improve their algorithm to $O(n^{4/3} \log n)$ time. Using similar techniques, we also give improved algorithms on both the two-sided and the one-sided discrete Fr\'{e}chet distance with shortcuts problem for two point sets in the plane. For the two-sided problem (resp., one-sided problem), we improve the previous work [Avraham, Filtser, Kaplan, Katz, and Sharir, ACM Trans. Algorithms 2015 and SoCG 2014] by a factor of roughly $\log^2(m+n)$ (resp., $(m+n)^{\epsilon}$), where $m$ and $n$ are the sizes of the two input point sets, respectively. Other problems whose solutions can be improved by our techniques include the reverse shortest path problems for unit-disk graphs. Our techniques are quite general and we believe they will find many other applications in future.

Authors: Haitao Wang, Yiming Zhao

In this paper, we propose new techniques for solving geometric optimization problems involving interpoint distances of a point set in the plane. Given a set $P$ of $n$ points in the plane and an integer $1 \leq k \leq \binom{n}{2}$, the distance selection problem is to find the $k$-th smallest interpoint distance among all pairs of points of $P$. The previously best deterministic algorithm solves the problem in $O(n^{4/3} \log^2 n)$ time [Katz and Sharir, SIAM J. Comput. 1997 and SoCG 1993]. In this paper, we improve their algorithm to $O(n^{4/3} \log n)$ time. Using similar techniques, we also give improved algorithms on both the two-sided and the one-sided discrete Fr\'{e}chet distance with shortcuts problem for two point sets in the plane. For the two-sided problem (resp., one-sided problem), we improve the previous work [Avraham, Filtser, Kaplan, Katz, and Sharir, ACM Trans. Algorithms 2015 and SoCG 2014] by a factor of roughly $\log^2(m+n)$ (resp., $(m+n)^{\epsilon}$), where $m$ and $n$ are the sizes of the two input point sets, respectively. Other problems whose solutions can be improved by our techniques include the reverse shortest path problems for unit-disk graphs. Our techniques are quite general and we believe they will find many other applications in future.

Labeled Interleaving Distance for Reeb Graphs

from arXiv: Data Structures and Algorithms

Authors: Fangfei Lan, Salman Parsa, Bei Wang

Merge trees, contour trees, and Reeb graphs are graph-based topological descriptors that capture topological changes of (sub)level sets of scalar fields. Comparing scalar fields using their topological descriptors has many applications in topological data analysis and visualization of scientific data. Recently, Munch and Stefanou introduced a labeled interleaving distance for comparing two labeled merge trees, which enjoys a number of theoretical and algorithmic properties. In particular, the labeled interleaving distance between merge trees can be computed in polynomial time. In this work, we define the labeled interleaving distance for labeled Reeb graphs. We then prove that the (ordinary) interleaving distance between Reeb graphs equals the minimum of the labeled interleaving distance over all labelings. We also provide an efficient algorithm for computing the labeled interleaving distance between two labeled contour trees (which are special types of Reeb graphs that arise from simply-connected domains). In the case of merge trees, the notion of the labeled interleaving distance was used by Gasparovic et al. to prove that the (ordinary) interleaving distance on the set of (unlabeled) merge trees is intrinsic. As our final contribution, we present counterexamples showing that, on the contrary, the (ordinary) interleaving distance on (unlabeled) Reeb graphs (and contour trees) is not intrinsic. It turns out that, under mild conditions on the labelings, the labeled interleaving distance is a metric on isomorphism classes of Reeb graphs, analogous to the ordinary interleaving distance. This provides new metrics on large classes of Reeb graphs.

Authors: Fangfei Lan, Salman Parsa, Bei Wang

Merge trees, contour trees, and Reeb graphs are graph-based topological descriptors that capture topological changes of (sub)level sets of scalar fields. Comparing scalar fields using their topological descriptors has many applications in topological data analysis and visualization of scientific data. Recently, Munch and Stefanou introduced a labeled interleaving distance for comparing two labeled merge trees, which enjoys a number of theoretical and algorithmic properties. In particular, the labeled interleaving distance between merge trees can be computed in polynomial time. In this work, we define the labeled interleaving distance for labeled Reeb graphs. We then prove that the (ordinary) interleaving distance between Reeb graphs equals the minimum of the labeled interleaving distance over all labelings. We also provide an efficient algorithm for computing the labeled interleaving distance between two labeled contour trees (which are special types of Reeb graphs that arise from simply-connected domains). In the case of merge trees, the notion of the labeled interleaving distance was used by Gasparovic et al. to prove that the (ordinary) interleaving distance on the set of (unlabeled) merge trees is intrinsic. As our final contribution, we present counterexamples showing that, on the contrary, the (ordinary) interleaving distance on (unlabeled) Reeb graphs (and contour trees) is not intrinsic. It turns out that, under mild conditions on the labelings, the labeled interleaving distance is a metric on isomorphism classes of Reeb graphs, analogous to the ordinary interleaving distance. This provides new metrics on large classes of Reeb graphs.

Fast Matrix Multiplication Without Tears: A Constraint Programming Approach

from arXiv: Data Structures and Algorithms

Authors: Arnaud Deza, Chang Liu, Pashootan Vaezipoor, Elias B. Khalil

It is known that the multiplication of an $N \times M$ matrix with an $M \times P$ matrix can be performed using fewer multiplications than what the naive $NMP$ approach suggests. The most famous instance of this is Strassen's algorithm for multiplying two $2\times 2$ matrices in 7 instead of 8 multiplications. This gives rise to the constraint satisfaction problem of fast matrix multiplication, where a set of $R < NMP$ multiplication terms must be chosen and combined such that they satisfy correctness constraints on the output matrix. Despite its highly combinatorial nature, this problem has not been exhaustively examined from that perspective, as evidenced for example by the recent deep reinforcement learning approach of AlphaTensor. In this work, we propose a simple yet novel Constraint Programming approach to find non-commutative algorithms for fast matrix multiplication or provide proof of infeasibility otherwise. We propose a set of symmetry-breaking constraints and valid inequalities that are particularly helpful in proving infeasibility. On the feasible side, we find that exploiting solver performance variability in conjunction with a sparsity-based problem decomposition enables finding solutions for larger (feasible) instances of fast matrix multiplication. Our experimental results using CP Optimizer demonstrate that we can find fast matrix multiplication algorithms for matrices up to $3\times 3$ in a short amount of time.

Authors: Arnaud Deza, Chang Liu, Pashootan Vaezipoor, Elias B. Khalil

It is known that the multiplication of an $N \times M$ matrix with an $M \times P$ matrix can be performed using fewer multiplications than what the naive $NMP$ approach suggests. The most famous instance of this is Strassen's algorithm for multiplying two $2\times 2$ matrices in 7 instead of 8 multiplications. This gives rise to the constraint satisfaction problem of fast matrix multiplication, where a set of $R < NMP$ multiplication terms must be chosen and combined such that they satisfy correctness constraints on the output matrix. Despite its highly combinatorial nature, this problem has not been exhaustively examined from that perspective, as evidenced for example by the recent deep reinforcement learning approach of AlphaTensor. In this work, we propose a simple yet novel Constraint Programming approach to find non-commutative algorithms for fast matrix multiplication or provide proof of infeasibility otherwise. We propose a set of symmetry-breaking constraints and valid inequalities that are particularly helpful in proving infeasibility. On the feasible side, we find that exploiting solver performance variability in conjunction with a sparsity-based problem decomposition enables finding solutions for larger (feasible) instances of fast matrix multiplication. Our experimental results using CP Optimizer demonstrate that we can find fast matrix multiplication algorithms for matrices up to $3\times 3$ in a short amount of time.

Parameterized Complexity of Broadcasting in Graphs

from arXiv: Data Structures and Algorithms

Authors: Fedor V. Fomin, Pierre Fraigniaud, Petr A. Golovach

The task of the broadcast problem is, given a graph G and a source vertex s, to compute the minimum number of rounds required to disseminate a piece of information from s to all vertices in the graph. It is assumed that, at each round, an informed vertex can transmit the information to at most one of its neighbors. The broadcast problem is known to NP-hard. We show that the problem is FPT when parametrized by the size k of a feedback edge-set, or by the size k of a vertex-cover, or by k=n-t where t is the input deadline for the broadcast protocol to complete.

Authors: Fedor V. Fomin, Pierre Fraigniaud, Petr A. Golovach

The task of the broadcast problem is, given a graph G and a source vertex s, to compute the minimum number of rounds required to disseminate a piece of information from s to all vertices in the graph. It is assumed that, at each round, an informed vertex can transmit the information to at most one of its neighbors. The broadcast problem is known to NP-hard. We show that the problem is FPT when parametrized by the size k of a feedback edge-set, or by the size k of a vertex-cover, or by k=n-t where t is the input deadline for the broadcast protocol to complete.

Sunday, June 04

Soddy’s quadlet

from David Eppstein

Soddy’s hexlet is a famous system of nine spheres in three-dimensional Euclidean space, consisting of a ring of six spheres, tangent in consecutive pairs, and a ring of three spheres, tangent in pairs, with every sphere in one ring tangent to every sphere of the other ring. The easy way to construct it is to observe that its properties are invariant under Möbius transformations, as long as you count planes as spheres, and parallel planes as tangent spheres. If you take two of the three spheres in the three-sphere ring to be parallel planes, the rest have to form seven congruent spheres, six of them in a ring around the seventh. Any other form of the hexlet can be obtained by a Möbius transformation from this one.

Soddy’s hexlet is a famous system of nine spheres in three-dimensional Euclidean space, consisting of a ring of six spheres, tangent in consecutive pairs, and a ring of three spheres, tangent in pairs, with every sphere in one ring tangent to every sphere of the other ring. The easy way to construct it is to observe that its properties are invariant under Möbius transformations, as long as you count planes as spheres, and parallel planes as tangent spheres. If you take two of the three spheres in the three-sphere ring to be parallel planes, the rest have to form seven congruent spheres, six of them in a ring around the seventh. Any other form of the hexlet can be obtained by a Möbius transformation from this one.

Soddy's hexlet, in the form of seven congruent spheres between two parallel planes

But there’s another way of constructing the same shape, coming from four-dimensional polyhedral geometry, which can also be used to construct another related pair of interlocking rings of spheres.

By analogy, consider a cube in 3d, and start growing a sphere with its center at the center of the cube. As you grow the sphere, it will start to bulge out through the faces of the cube, which cut it in six circles. Initially, those circles will be small and disjoint from each other, but as they grow larger they will eventually touch at the midpoint of a cube edge, and then cross each other. At the time when any two of the growing circles touch each other, all six will touch four others, by the symmetries of the cube. In this way, we have generated a configuration of six circles, on the surface of a sphere, each touching four others with the same touching pattern as the square faces of a cube.

A cube and its midsphere

Now do the same thing in 4d with the (3,6)-duoprism, the Cartesian product of an equilateral triangle and a regular hexagon. The resulting 4-dimensional polytope has facets of two type: six triangular prisms (attached to each other on their triangle faces) and three hexagonal prisms (attached to each other on their hexagon faces). Rectangular faces connect triangular prisms to a hexagonal prism. There is a choice for how big to make the triangle edges relative to the hexagon edges. You want them to be in the proportion \(\sqrt3:1\), so that both kinds of prism have an inscribed sphere touching all their faces. Instead, the only figures I could find of the duoprism show it with edges in the wrong proportion \(1:1\), generating square faces. But I want the rectangular one.

Skeleton of the square (3,6)-duoprism

Grow a hypersphere from the center of the duoprism. As it grows, it will intersect the prism facets of the duosphere in spheres, centered at the centers of the prisms. Initially small and disjoint, these spheres will grow until they become the inscribed spheres of the prisms, touching each other at the center of each triangle, hexagon, or rectangle of the duoprism. You have created a hexlet, simultaneously drawn on a sphere and inscribed in the faces of a duoprism! You can map it into the usual hexlet of three-dimensional Euclidean geometry (instead of three-dimensional spherical geometry) by a stereographic projection from the hypersphere to a flat three-dimensional space.

Almost all the other duoprisms do not have this coincidence, that when you adjust the proportions to make one kind of prism have inscribed spheres, the other one does the same thing with the same proportions. There’s only one other duoprism for which this works: the (4,4)-duoprism, better known as the 4-hypercube or tesseract. In this case, all the facets are the same, and all the 2-faces are the same. If we grow a hypersphere, centered at the center of the hypercube, it will cross the hypercube facets (which are cubes) in spheres. When these spheres grow to the size where they are inscribed in each cube facet, they will be tangent to each other at the centers of the square two-dimensional faces of the hypercube. At this point, you will have formed two rings of four tangent spheres, tangent in consecutive pairs, with every sphere in one ring tangent to every sphere of the other ring. We could call it the “quadlet”.

Now that you’ve constructed a quadlet on a hypersphere in 4d, you can apply a stereographic projection to get the same quadlet as a collection of ordinary spheres in 3-dimensional Euclidean space. One of the more symmetric ways of doing this projection takes one of the two 4-sphere rings to two unit spheres sandwiched between two parallel planes at distance 4 from each other. The four spheres of the other ring all have radius 2, and wrap around the central two unit spheres. It’s not obvious to me that these two parallel planes, two unit spheres, and four radius-2 spheres can all be tangent in this pattern, unless we calculate the coordinates of their tangencies or use reasoning based on the spheres inscribed on the facets of a hypercube, for which the same pattern of tangencies is obvious.

Soddy's quadlet, in the form of four radius-2 spheres and two radius-1 spheres between two parallel planes, top and side view

You can rotate the four radius-2 spheres around the axis formed by the central unit spheres arbitrarily without changing the pattern of tangencies. This is more or less analogous to the fact that with the hexlet, you can start the ring of six tangent spheres at any sphere tangent to all three spheres of the other ring, and then add spheres to the ring one by one, keeping each sphere tangent to its predecessor in the same ring and to all the spheres of the other ring. It will always close up after six spheres to form a hexlet. You can also fix in place the six-sphere ring, choose any sphere tangent to all of them to start the three-sphere ring, and it will always close up after three spheres to form a hexlet. And once you have one of the four-sphere rings of a quadlet, you can choose any sphere tangent to all four to start the other ring, and it will always close up after four spheres to form a quadlet. For the hexlet, this becomes obvious after we do a Möbius transformation to take it into the form with two parallel planes and seven congruent spheres. For the quadlet, it is similarly obvious by doing a Möbius transformation to take it into a form with two parallel planes in one ring and four congruent spheres in the other. The only way for the remaining two spheres to complete the first ring is for them to fill the hole between the four congruent spheres, one directly on top of each other. They might not be the same size as each other but one more Möbius transformation makes them so. So just like the hexlet, all quadlets are Möbius-equivalent.

Incidentally, the fact that you can get systems of three-dimensional tangent spheres from four-dimensional polytopes is not particularly new. I used it long ago in my paper with Kuperberg and Ziegler, “Fat 4-polytopes and fatter 3-spheres”, to get a finite set of spheres with high kissing number from the 120-cell and its relatives. For more on the connection between sphere packings and 4-polytopes, including the construction of the hexlet from the duoprism, see Hao Chen’s papers and especially “Apollonian ball packings and stacked polytopes”.

(Discuss on Mastodon)

By David Eppstein

TR23-084 | Time-Space Lower Bounds for Bounded-Error Computation in the Random-Query Model | Itai Dinur

from ECCC Papers

The random-query model was introduced by Raz and Zhan at ITCS 2020 as a new model of space-bounded computation. In this model, a branching program of length $T$ and width $2^{S}$ attempts to compute a function $f:\{0,1\}^n \rightarrow \{0,1 \}$. However, instead of receiving direct access to the input bits $(x_1,\ldots,x_n)$, the input is given in pairs of the form $(i_j, x_{i_j}) \in \{1,\ldots,n\} \times \{0,1\}$ for $j = 1,2,\ldots,T$, where the indices $i_1,\ldots,i_T$ are chosen at random from a pre-fixed distribution. Raz and Zhan proved that any branching program in the random-query model with the independent distribution (where $\{i_j\}_{j = 1,\ldots,T}$ are uniform and independent) that computes a function $f$ with sensitivity $k$ satisfies $T \cdot (S + \log n) \geq \Omega(n \cdot k)$. This gives a quadratic time-space lower bound for many natural functions which have sensitivity $\Omega(n)$, such as XOR and Majority. The bound was proved in the zero-error regime, where for each input, the branching program is required to output a value with high probability, and given that a value is output, it must be correct with probability $1$. Furthermore, Raz and Zhan conjectured that (up to logarithmic factors in $n$) a quadratic time-space lower bound still holds for the XOR function in the more conventional bounded-error regime, where for each input, the output must be correct with high probability. In this paper, we prove this conjecture. More generally, let $f:\{0,1\}^n \rightarrow \{0,1 \}$ have average sensitivity (or total influence) $\mathrm{I}[f]$. We prove that any branching program in the random-query model with the independent distribution that computes $f$ in the bounded-error regime satisfies $T \cdot S \geq \tilde{\Omega}(n) \cdot \mathrm{I}[f]$ (where $\tilde{\Omega}$ hides logarithmic factors in $n$). Moreover, we prove a quadratic time-space lower bound for the Majority function, even though its total influence is $\Theta(\sqrt{n})$. Our proof is based on a reduction from a communication complexity problem.
The random-query model was introduced by Raz and Zhan at ITCS 2020 as a new model of space-bounded computation. In this model, a branching program of length $T$ and width $2^{S}$ attempts to compute a function $f:\{0,1\}^n \rightarrow \{0,1 \}$. However, instead of receiving direct access to the input bits $(x_1,\ldots,x_n)$, the input is given in pairs of the form $(i_j, x_{i_j}) \in \{1,\ldots,n\} \times \{0,1\}$ for $j = 1,2,\ldots,T$, where the indices $i_1,\ldots,i_T$ are chosen at random from a pre-fixed distribution. Raz and Zhan proved that any branching program in the random-query model with the independent distribution (where $\{i_j\}_{j = 1,\ldots,T}$ are uniform and independent) that computes a function $f$ with sensitivity $k$ satisfies $T \cdot (S + \log n) \geq \Omega(n \cdot k)$. This gives a quadratic time-space lower bound for many natural functions which have sensitivity $\Omega(n)$, such as XOR and Majority. The bound was proved in the zero-error regime, where for each input, the branching program is required to output a value with high probability, and given that a value is output, it must be correct with probability $1$. Furthermore, Raz and Zhan conjectured that (up to logarithmic factors in $n$) a quadratic time-space lower bound still holds for the XOR function in the more conventional bounded-error regime, where for each input, the output must be correct with high probability. In this paper, we prove this conjecture. More generally, let $f:\{0,1\}^n \rightarrow \{0,1 \}$ have average sensitivity (or total influence) $\mathrm{I}[f]$. We prove that any branching program in the random-query model with the independent distribution that computes $f$ in the bounded-error regime satisfies $T \cdot S \geq \tilde{\Omega}(n) \cdot \mathrm{I}[f]$ (where $\tilde{\Omega}$ hides logarithmic factors in $n$). Moreover, we prove a quadratic time-space lower bound for the Majority function, even though its total influence is $\Theta(\sqrt{n})$. Our proof is based on a reduction from a communication complexity problem.

TR23-083 | Trade-offs between Entanglement and Communication | Srinivasan A, Uma Girish

from ECCC Papers

We study the advantages of quantum communication models over classical communication models that are equipped with a limited number of qubits of entanglement. In this direction, we give explicit partial functions on $n$ bits for which reducing the entanglement increases the classical communication complexity exponentially. Our separations are as follows. For every $k\ge 1$: $Q\|^*$ versus $R2^*$: We show that quantum simultaneous protocols with $\tilde{\Theta}(k^5 \log^3 n)$ qubits of entanglement can exponentially outperform two-way randomized protocols with $O(k)$ qubits of entanglement. This resolves an open problem from [Gav08] and improves the state-of-the-art separations between quantum simultaneous protocols with entanglement and two-way randomized protocols without entanglement [Gav19, GRT22]. $R\|^*$ versus $Q\|^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k \log n)$ qubits of entanglement can exponentially outperform quantum simultaneous protocols with $O(k)$ qubits of entanglement, resolving an open question from [GKRW06, Gav19]. The best result prior to our work was a relational separation against protocols without entanglement [GKRW06]. $R\|^*$ versus $R1^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k\log n)$ qubits of entanglement can exponentially outperform randomized one-way protocols with $O(k)$ qubits of entanglement. Prior to our work, only a relational separation was known [Gav08].
We study the advantages of quantum communication models over classical communication models that are equipped with a limited number of qubits of entanglement. In this direction, we give explicit partial functions on $n$ bits for which reducing the entanglement increases the classical communication complexity exponentially. Our separations are as follows. For every $k\ge 1$: $Q\|^*$ versus $R2^*$: We show that quantum simultaneous protocols with $\tilde{\Theta}(k^5 \log^3 n)$ qubits of entanglement can exponentially outperform two-way randomized protocols with $O(k)$ qubits of entanglement. This resolves an open problem from [Gav08] and improves the state-of-the-art separations between quantum simultaneous protocols with entanglement and two-way randomized protocols without entanglement [Gav19, GRT22]. $R\|^*$ versus $Q\|^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k \log n)$ qubits of entanglement can exponentially outperform quantum simultaneous protocols with $O(k)$ qubits of entanglement, resolving an open question from [GKRW06, Gav19]. The best result prior to our work was a relational separation against protocols without entanglement [GKRW06]. $R\|^*$ versus $R1^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k\log n)$ qubits of entanglement can exponentially outperform randomized one-way protocols with $O(k)$ qubits of entanglement. Prior to our work, only a relational separation was known [Gav08].