We prove a vector-valued inequality for the Gaussian noise stability (i.e. we
prove a vector-valued Borell inequality) for Euclidean functions taking values
in the two-dimensional sphere, for all correlation parameters at most $1/10$ in
absolute value. This inequality was conjectured (for all correlation parameters
at most $1$ in absolute value) by Hwang, Neeman, Parekh, Thompson and Wright.
Such an inequality is needed to prove sharp computational hardness of the
product state Quantum MAX-CUT problem, assuming the Unique Games Conjecture.

We prove a vector-valued inequality for the Gaussian noise stability (i.e. we
prove a vector-valued Borell inequality) for Euclidean functions taking values
in the two-dimensional sphere, for all correlation parameters at most $1/10$ in
absolute value. This inequality was conjectured (for all correlation parameters
at most $1$ in absolute value) by Hwang, Neeman, Parekh, Thompson and Wright.
Such an inequality is needed to prove sharp computational hardness of the
product state Quantum MAX-CUT problem, assuming the Unique Games Conjecture.

Recent progress towards theoretical interpretability guarantees for AI has
been made with classifiers that are based on interactive proof systems. A
prover selects a certificate from the datapoint and sends it to a verifier who
decides the class. In the context of machine learning, such a certificate can
be a feature that is informative of the class. For a setup with high soundness
and completeness, the exchanged certificates must have a high mutual
information with the true class of the datapoint. However, this guarantee
relies on a bound on the Asymmetric Feature Correlation of the dataset, a
property that so far is difficult to estimate for high-dimensional data. It was
conjectured in W\"aldchen et al. that it is computationally hard to exploit the
AFC, which is what we prove here.

We consider a malicious prover-verifier duo that aims to exploit the AFC to
achieve high completeness and soundness while using uninformative certificates.
We show that this task is $\mathsf{NP}$-hard and cannot be approximated better
than $\mathcal{O}(m^{1/8 - \epsilon})$, where $m$ is the number of possible
certificates, for $\epsilon>0$ under the Dense-vs-Random conjecture. This is
some evidence that AFC should not prevent the use of interactive classification
for real-world tasks, as it is computationally hard to be exploited.

Recent progress towards theoretical interpretability guarantees for AI has
been made with classifiers that are based on interactive proof systems. A
prover selects a certificate from the datapoint and sends it to a verifier who
decides the class. In the context of machine learning, such a certificate can
be a feature that is informative of the class. For a setup with high soundness
and completeness, the exchanged certificates must have a high mutual
information with the true class of the datapoint. However, this guarantee
relies on a bound on the Asymmetric Feature Correlation of the dataset, a
property that so far is difficult to estimate for high-dimensional data. It was
conjectured in W\"aldchen et al. that it is computationally hard to exploit the
AFC, which is what we prove here.

We consider a malicious prover-verifier duo that aims to exploit the AFC to
achieve high completeness and soundness while using uninformative certificates.
We show that this task is $\mathsf{NP}$-hard and cannot be approximated better
than $\mathcal{O}(m^{1/8 - \epsilon})$, where $m$ is the number of possible
certificates, for $\epsilon>0$ under the Dense-vs-Random conjecture. This is
some evidence that AFC should not prevent the use of interactive classification
for real-world tasks, as it is computationally hard to be exploited.

Authors: Carsten Lutz, Quentin Manière, Robin Nolte

Circumscription is one of the main approaches for defining non-monotonic
description logics (DLs). While the decidability and complexity of traditional
reasoning tasks such as satisfiability of circumscribed DL knowledge bases
(KBs) is well understood, for evaluating conjunctive queries (CQs) and unions
thereof (UCQs), not even decidability had been established. In this paper, we
prove decidability of (U)CQ evaluation on circumscribed DL KBs and obtain a
rather complete picture of both the combined complexity and the data
complexity, for DLs ranging from ALCHIO via EL to various versions of DL-Lite.
We also study the much simpler atomic queries (AQs).

Circumscription is one of the main approaches for defining non-monotonic
description logics (DLs). While the decidability and complexity of traditional
reasoning tasks such as satisfiability of circumscribed DL knowledge bases
(KBs) is well understood, for evaluating conjunctive queries (CQs) and unions
thereof (UCQs), not even decidability had been established. In this paper, we
prove decidability of (U)CQ evaluation on circumscribed DL KBs and obtain a
rather complete picture of both the combined complexity and the data
complexity, for DLs ranging from ALCHIO via EL to various versions of DL-Lite.
We also study the much simpler atomic queries (AQs).

We show that the decision problem of recognising whether a triangulated
3-manifold admits a Seifert fibered structure with non-empty boundary is in NP.
We also show that the problem of producing Seifert data for a triangulation of
such a manifold is in the complexity class FNP. We do this by proving that in
any triangulation of a Seifert fibered space with boundary there is both a
fundamental horizontal surface of small degree and a complete collection of
normal vertical annuli whose total weight is bounded by an exponential in the
square of the triangulation size.

We show that the decision problem of recognising whether a triangulated
3-manifold admits a Seifert fibered structure with non-empty boundary is in NP.
We also show that the problem of producing Seifert data for a triangulation of
such a manifold is in the complexity class FNP. We do this by proving that in
any triangulation of a Seifert fibered space with boundary there is both a
fundamental horizontal surface of small degree and a complete collection of
normal vertical annuli whose total weight is bounded by an exponential in the
square of the triangulation size.

Authors: Van-Anh Nguyen, Trung Le, Anh Tuan Bui, Thanh-Toan Do, Dinh Phung

Distributional robustness is a promising framework for training deep learning
models that are less vulnerable to adversarial examples and data distribution
shifts. Previous works have mainly focused on exploiting distributional
robustness in data space. In this work, we explore an optimal transport-based
distributional robustness framework on model spaces. Specifically, we examine a
model distribution in a Wasserstein ball of a given center model distribution
that maximizes the loss. We have developed theories that allow us to learn the
optimal robust center model distribution. Interestingly, through our developed
theories, we can flexibly incorporate the concept of sharpness awareness into
training a single model, ensemble models, and Bayesian Neural Networks by
considering specific forms of the center model distribution, such as a Dirac
delta distribution over a single model, a uniform distribution over several
models, and a general Bayesian Neural Network. Furthermore, we demonstrate that
sharpness-aware minimization (SAM) is a specific case of our framework when
using a Dirac delta distribution over a single model, while our framework can
be viewed as a probabilistic extension of SAM. We conduct extensive experiments
to demonstrate the usefulness of our framework in the aforementioned settings,
and the results show remarkable improvements in our approaches to the
baselines.

Distributional robustness is a promising framework for training deep learning
models that are less vulnerable to adversarial examples and data distribution
shifts. Previous works have mainly focused on exploiting distributional
robustness in data space. In this work, we explore an optimal transport-based
distributional robustness framework on model spaces. Specifically, we examine a
model distribution in a Wasserstein ball of a given center model distribution
that maximizes the loss. We have developed theories that allow us to learn the
optimal robust center model distribution. Interestingly, through our developed
theories, we can flexibly incorporate the concept of sharpness awareness into
training a single model, ensemble models, and Bayesian Neural Networks by
considering specific forms of the center model distribution, such as a Dirac
delta distribution over a single model, a uniform distribution over several
models, and a general Bayesian Neural Network. Furthermore, we demonstrate that
sharpness-aware minimization (SAM) is a specific case of our framework when
using a Dirac delta distribution over a single model, while our framework can
be viewed as a probabilistic extension of SAM. We conduct extensive experiments
to demonstrate the usefulness of our framework in the aforementioned settings,
and the results show remarkable improvements in our approaches to the
baselines.

Authors: Eyram Schwinger, Ralph Twum, Thomas Katsekpor, Gladys Schwinger

In this work, we designed algorithms for the point in polygon problem based
on the ray casting algorithm using equations from vector geometry. The
algorithms were implemented using the python programming language. We tested
the algorithm against the point in polygon algorithms used by the shapely (and
by extension geopandas) library and the OpenCV library using points from the
google Open Buildings project. Our algorithm in pure python performed much
better than the shapely implementation. It also performed better than the
OpenCV implementation when combined with the Numba optimization library. We
also performed simulations to verify that our algorithm performance was of the
order O(n).

In this work, we designed algorithms for the point in polygon problem based
on the ray casting algorithm using equations from vector geometry. The
algorithms were implemented using the python programming language. We tested
the algorithm against the point in polygon algorithms used by the shapely (and
by extension geopandas) library and the OpenCV library using points from the
google Open Buildings project. Our algorithm in pure python performed much
better than the shapely implementation. It also performed better than the
OpenCV implementation when combined with the Numba optimization library. We
also performed simulations to verify that our algorithm performance was of the
order O(n).

The class of Galled-Tree Explainable (GaTEx) graphs has just recently been
discovered as a natural generalization of cographs. Cographs are precisely
those graphs that can be uniquely represented by a rooted tree where the leaves
of the tree correspond to the vertices of the graph. As a generalization, GaTEx
graphs are precisely those graphs that can be uniquely represented by a
particular rooted directed acyclic graph (called galled-tree).

We consider here four prominent problems that are, in general, NP-hard:
computing the size $\omega(G)$ of a maximum clique, the size $\chi(G)$ of an
optimal vertex-coloring and the size $\alpha(G)$ of a maximum independent set
of a given graph $G$ as well as determining whether a graph is perfectly
orderable. We show here that $\omega(G)$, $\chi(G)$, $\alpha(G)$ can be
computed in linear-time for GaTEx graphs $G$. The crucial idea for the
linear-time algorithms is to avoid working on the GaTEx graphs $G$ directly,
but to use the the galled-trees that explain $G$ as a guide for the algorithms
to compute these invariants. In particular, we show first how to employ the
galled-tree structure to compute a perfect ordering of GaTEx graphs in
linear-time which is then used to determine $\omega(G)$, $\chi(G)$,
$\alpha(G)$.

The class of Galled-Tree Explainable (GaTEx) graphs has just recently been
discovered as a natural generalization of cographs. Cographs are precisely
those graphs that can be uniquely represented by a rooted tree where the leaves
of the tree correspond to the vertices of the graph. As a generalization, GaTEx
graphs are precisely those graphs that can be uniquely represented by a
particular rooted directed acyclic graph (called galled-tree).

We consider here four prominent problems that are, in general, NP-hard:
computing the size $\omega(G)$ of a maximum clique, the size $\chi(G)$ of an
optimal vertex-coloring and the size $\alpha(G)$ of a maximum independent set
of a given graph $G$ as well as determining whether a graph is perfectly
orderable. We show here that $\omega(G)$, $\chi(G)$, $\alpha(G)$ can be
computed in linear-time for GaTEx graphs $G$. The crucial idea for the
linear-time algorithms is to avoid working on the GaTEx graphs $G$ directly,
but to use the the galled-trees that explain $G$ as a guide for the algorithms
to compute these invariants. In particular, we show first how to employ the
galled-tree structure to compute a perfect ordering of GaTEx graphs in
linear-time which is then used to determine $\omega(G)$, $\chi(G)$,
$\alpha(G)$.

Given only a few observed entries from a low-rank matrix $X$, matrix
completion is the problem of imputing the missing entries, and it formalizes a
wide range of real-world settings that involve estimating missing data.
However, when there are too few observed entries to complete the matrix, what
other aspects of the underlying matrix can be reliably recovered? We study one
such problem setting, that of "one-sided" matrix completion, where our goal is
to recover the right singular vectors of $X$, even in the regime where
recovering the left singular vectors is impossible, which arises when there are
more rows than columns and very few observations. We propose a natural
algorithm that involves imputing the missing values of the matrix $X^TX$ and
show that even with only two observations per row in $X$, we can provably
recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where
$r$ is the rank and $d$ is the number of columns. We evaluate our algorithm on
one-sided recovery of synthetic data and low-coverage genome sequencing. In
these settings, our algorithm substantially outperforms standard matrix
completion and a variety of direct factorization methods.

Given only a few observed entries from a low-rank matrix $X$, matrix
completion is the problem of imputing the missing entries, and it formalizes a
wide range of real-world settings that involve estimating missing data.
However, when there are too few observed entries to complete the matrix, what
other aspects of the underlying matrix can be reliably recovered? We study one
such problem setting, that of "one-sided" matrix completion, where our goal is
to recover the right singular vectors of $X$, even in the regime where
recovering the left singular vectors is impossible, which arises when there are
more rows than columns and very few observations. We propose a natural
algorithm that involves imputing the missing values of the matrix $X^TX$ and
show that even with only two observations per row in $X$, we can provably
recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where
$r$ is the rank and $d$ is the number of columns. We evaluate our algorithm on
one-sided recovery of synthetic data and low-coverage genome sequencing. In
these settings, our algorithm substantially outperforms standard matrix
completion and a variety of direct factorization methods.

In machine learning and particularly in topological data analysis,
$\epsilon$-graphs are important tools but are generally hard to compute as the
distance calculation between n points takes time O(n^2) classically. Recently,
quantum approaches for calculating distances between n quantum states have been
proposed, taking advantage of quantum superposition and entanglement. We
investigate the potential for quantum advantage in the case of quantum distance
calculation for computing $\epsilon$-graphs. We show that, relying on existing
quantum multi-state SWAP test based algorithms, the query complexity for
correctly identifying (with a given probability) that two points are not
$\epsilon$-neighbours is at least O(n^3 / ln n), showing that this approach, if
used directly for $\epsilon$-graph construction, does not bring a computational
advantage when compared to a classical approach.

In machine learning and particularly in topological data analysis,
$\epsilon$-graphs are important tools but are generally hard to compute as the
distance calculation between n points takes time O(n^2) classically. Recently,
quantum approaches for calculating distances between n quantum states have been
proposed, taking advantage of quantum superposition and entanglement. We
investigate the potential for quantum advantage in the case of quantum distance
calculation for computing $\epsilon$-graphs. We show that, relying on existing
quantum multi-state SWAP test based algorithms, the query complexity for
correctly identifying (with a given probability) that two points are not
$\epsilon$-neighbours is at least O(n^3 / ln n), showing that this approach, if
used directly for $\epsilon$-graph construction, does not bring a computational
advantage when compared to a classical approach.

In this paper, we introduce the problem of Matroid-Constrained Vertex Cover:
given a graph with weights on the edges and a matroid imposed on the vertices,
our problem is to choose a subset of vertices that is independent in the
matroid, with the objective of maximizing the total weight of covered edges.
This problem is a generalization of the much studied max $k$-vertex cover
problem, in which the matroid is the simple uniform matroid, and it is also a
special case of the problem of maximizing a monotone submodular function under
a matroid constraint.

First, we give a Fixed-Parameter Tractable Approximation Scheme (FPT-AS) when
the given matroid is a partition matroid, a laminar matroid, or a transversal
matroid. Precisely, if $k$ is the rank of the matroid, we obtain $(1 -
\varepsilon)$ approximation using $(1/\varepsilon)^{O(k)}n^{O(1)}$ time for
partition and laminar matroids and using $(1/\varepsilon+k)^{O(k)}n^{O(1)}$
time for transversal matroids. This extends a result of Manurangsi for uniform
matroids [Manurangsi, 2018]. We also show that these ideas can be applied in
the context of (single-pass) streaming algorithms. Besides, our FPT-AS
introduces a new technique based on matroid union, which may be of independent
interest in extremal combinatorics.

In the second part, we consider general matroids. We propose a simple local
search algorithm that guarantees $2/3 \approx 0.66$ approximation. For the more
general problem where two matroids are imposed on the vertices and a feasible
solution must be a common independent set, we show that a local search
algorithm gives a $2/3 \cdot (1 - 1/(p+1))$ approximation in $n^{O(p)}$ time,
for any integer $p$. We also provide some evidence to show that with the
constraint of one or two matroids, the approximation ratio of $2/3$ is likely
the best possible, using the currently known techniques of local search.

In this paper, we introduce the problem of Matroid-Constrained Vertex Cover:
given a graph with weights on the edges and a matroid imposed on the vertices,
our problem is to choose a subset of vertices that is independent in the
matroid, with the objective of maximizing the total weight of covered edges.
This problem is a generalization of the much studied max $k$-vertex cover
problem, in which the matroid is the simple uniform matroid, and it is also a
special case of the problem of maximizing a monotone submodular function under
a matroid constraint.

First, we give a Fixed-Parameter Tractable Approximation Scheme (FPT-AS) when
the given matroid is a partition matroid, a laminar matroid, or a transversal
matroid. Precisely, if $k$ is the rank of the matroid, we obtain $(1 -
\varepsilon)$ approximation using $(1/\varepsilon)^{O(k)}n^{O(1)}$ time for
partition and laminar matroids and using $(1/\varepsilon+k)^{O(k)}n^{O(1)}$
time for transversal matroids. This extends a result of Manurangsi for uniform
matroids [Manurangsi, 2018]. We also show that these ideas can be applied in
the context of (single-pass) streaming algorithms. Besides, our FPT-AS
introduces a new technique based on matroid union, which may be of independent
interest in extremal combinatorics.

In the second part, we consider general matroids. We propose a simple local
search algorithm that guarantees $2/3 \approx 0.66$ approximation. For the more
general problem where two matroids are imposed on the vertices and a feasible
solution must be a common independent set, we show that a local search
algorithm gives a $2/3 \cdot (1 - 1/(p+1))$ approximation in $n^{O(p)}$ time,
for any integer $p$. We also provide some evidence to show that with the
constraint of one or two matroids, the approximation ratio of $2/3$ is likely
the best possible, using the currently known techniques of local search.

Authors: Christian Komusiewicz, Pascal Kunz, Frank Sommer, Manuel Sorge

Random forests and, more generally, (decision\nobreakdash-)tree ensembles are
widely used methods for classification and regression. Recent algorithmic
advances allow to compute decision trees that are optimal for various measures
such as their size or depth. We are not aware of such research for tree
ensembles and aim to contribute to this area. Mainly, we provide two novel
algorithms and corresponding lower bounds. First, we are able to carry over and
substantially improve on tractability results for decision trees, obtaining a
$(6\delta D S)^S \cdot poly$-time algorithm, where $S$ is the number of cuts in
the tree ensemble, $D$ the largest domain size, and $\delta$ is the largest
number of features in which two examples differ. To achieve this, we introduce
the witness-tree technique which also seems promising for practice. Second, we
show that dynamic programming, which has been successful for decision trees,
may also be viable for tree ensembles, providing an $\ell^n \cdot poly$-time
algorithm, where $\ell$ is the number of trees and $n$ the number of examples.
Finally, we compare the number of cuts necessary to classify training data sets
for decision trees and tree ensembles, showing that ensembles may need
exponentially fewer cuts for increasing number of trees.

Random forests and, more generally, (decision\nobreakdash-)tree ensembles are
widely used methods for classification and regression. Recent algorithmic
advances allow to compute decision trees that are optimal for various measures
such as their size or depth. We are not aware of such research for tree
ensembles and aim to contribute to this area. Mainly, we provide two novel
algorithms and corresponding lower bounds. First, we are able to carry over and
substantially improve on tractability results for decision trees, obtaining a
$(6\delta D S)^S \cdot poly$-time algorithm, where $S$ is the number of cuts in
the tree ensemble, $D$ the largest domain size, and $\delta$ is the largest
number of features in which two examples differ. To achieve this, we introduce
the witness-tree technique which also seems promising for practice. Second, we
show that dynamic programming, which has been successful for decision trees,
may also be viable for tree ensembles, providing an $\ell^n \cdot poly$-time
algorithm, where $\ell$ is the number of trees and $n$ the number of examples.
Finally, we compare the number of cuts necessary to classify training data sets
for decision trees and tree ensembles, showing that ensembles may need
exponentially fewer cuts for increasing number of trees.

Authors: Zsuzsanna Lipták, Francesco Masillo, Gonzalo Navarro

We present a new data structure for maintaining dynamic permutations, which
we call a $\textit{forest of splay trees (FST)}$. The FST allows one to
efficiently maintain the cycle structure of a permutation $\pi$ when the
allowed updates are transpositions. The structure stores one conceptual splay
tree for each cycle of $\pi$, using the position within the cycle as the key.
Updating $\pi$ to $\tau\cdot\pi$, for a transposition $\tau$, takes
$\mathcal{O}(\log n)$ amortized time, where $n$ is the size of $\pi$. The FST
computes any $\pi(i)$, $\pi^{-1}(i)$, $\pi^k(i)$ and $\pi^{-k}(i)$, in
$\mathcal{O}(\log n)$ amortized time. Further, it supports cycle-specific
queries such as determining whether two elements belong to the same cycle, flip
a segment of a cycle, and others, again within $\mathcal{O}(\log n)$ amortized
time.

We present a new data structure for maintaining dynamic permutations, which
we call a $\textit{forest of splay trees (FST)}$. The FST allows one to
efficiently maintain the cycle structure of a permutation $\pi$ when the
allowed updates are transpositions. The structure stores one conceptual splay
tree for each cycle of $\pi$, using the position within the cycle as the key.
Updating $\pi$ to $\tau\cdot\pi$, for a transposition $\tau$, takes
$\mathcal{O}(\log n)$ amortized time, where $n$ is the size of $\pi$. The FST
computes any $\pi(i)$, $\pi^{-1}(i)$, $\pi^k(i)$ and $\pi^{-k}(i)$, in
$\mathcal{O}(\log n)$ amortized time. Further, it supports cycle-specific
queries such as determining whether two elements belong to the same cycle, flip
a segment of a cycle, and others, again within $\mathcal{O}(\log n)$ amortized
time.

Authors: Zhili Chen, Joshua A. Grochow, Youming Qiao, Gang Tang, Chuanqi Zhang

We study the complexity of isomorphism problems for d-way arrays, or tensors,
under natural actions by classical groups such as orthogonal, unitary, and
symplectic groups. Such problems arise naturally in statistical data analysis
and quantum information. We study two types of complexity-theoretic questions.
First, for a fixed action type (isomorphism, conjugacy, etc.), we relate the
complexity of the isomorphism problem over a classical group to that over the
general linear group. Second, for a fixed group type (orthogonal, unitary, or
symplectic), we compare the complexity of the decision problems for different
actions.

Our main results are as follows. First, for orthogonal and symplectic groups
acting on 3-way arrays, the isomorphism problems reduce to the corresponding
problem over the general linear group. Second, for orthogonal and unitary
groups, the isomorphism problems of five natural actions on 3-way arrays are
polynomial-time equivalent, and the d-tensor isomorphism problem reduces to the
3-tensor isomorphism problem for any fixed d>3. For unitary groups, the
preceding result implies that LOCC classification of tripartite quantum states
is at least as difficult as LOCC classification of d-partite quantum states for
any d. Lastly, we also show that the graph isomorphism problem reduces to the
tensor isomorphism problem over orthogonal and unitary groups.

We study the complexity of isomorphism problems for d-way arrays, or tensors,
under natural actions by classical groups such as orthogonal, unitary, and
symplectic groups. Such problems arise naturally in statistical data analysis
and quantum information. We study two types of complexity-theoretic questions.
First, for a fixed action type (isomorphism, conjugacy, etc.), we relate the
complexity of the isomorphism problem over a classical group to that over the
general linear group. Second, for a fixed group type (orthogonal, unitary, or
symplectic), we compare the complexity of the decision problems for different
actions.

Our main results are as follows. First, for orthogonal and symplectic groups
acting on 3-way arrays, the isomorphism problems reduce to the corresponding
problem over the general linear group. Second, for orthogonal and unitary
groups, the isomorphism problems of five natural actions on 3-way arrays are
polynomial-time equivalent, and the d-tensor isomorphism problem reduces to the
3-tensor isomorphism problem for any fixed d>3. For unitary groups, the
preceding result implies that LOCC classification of tripartite quantum states
is at least as difficult as LOCC classification of d-partite quantum states for
any d. Lastly, we also show that the graph isomorphism problem reduces to the
tensor isomorphism problem over orthogonal and unitary groups.

Authors: Srinivasan Arunachalam, Vojtech Havlicek, Louis Schatzki

In this work we make progress in understanding the relationship between
learning models with access to entangled, separable and statistical
measurements in the quantum statistical query (QSQ) model. To this end, we show
the following results.

$\textbf{Entangled versus separable measurements.}$ The goal here is to learn
an unknown $f$ from the concept class $C\subseteq \{f:\{0,1\}^n\rightarrow
[k]\}$ given copies of $\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$. We
show that, if $T$ copies suffice to learn $f$ using entangled measurements,
then $O(nT^2)$ copies suffice to learn $f$ using just separable measurements.

$\textbf{Entangled versus statistical measurements}$ The goal here is to
learn a function $f \in C$ given access to separable measurements and
statistical measurements. We exhibit a class $C$ that gives an exponential
separation between QSQ learning and quantum learning with entangled
measurements (even in the presence of noise). This proves the "quantum
analogue" of the seminal result of Blum et al. [BKW'03]. that separates
classical SQ and PAC learning with classification noise.

$\textbf{QSQ lower bounds for learning states.}$ We introduce a quantum
statistical query dimension (QSD), which we use to give lower bounds on the QSQ
learning. With this we prove superpolynomial QSQ lower bounds for testing
purity, shadow tomography, Abelian hidden subgroup problem, degree-$2$
functions, planted bi-clique states and output states of Clifford circuits of
depth $\textsf{polylog}(n)$.

$\textbf{Further applications.}$ We give and $\textit{unconditional}$
separation between weak and strong error mitigation and prove lower bounds for
learning distributions in the QSQ model. Prior works by Quek et al. [QFK+'22],
Hinsche et al. [HIN+'22], and Nietner et al. [NIS+'23] proved the analogous
results $\textit{assuming}$ diagonal measurements and our work removes this
assumption.

In this work we make progress in understanding the relationship between
learning models with access to entangled, separable and statistical
measurements in the quantum statistical query (QSQ) model. To this end, we show
the following results.

$\textbf{Entangled versus separable measurements.}$ The goal here is to learn
an unknown $f$ from the concept class $C\subseteq \{f:\{0,1\}^n\rightarrow
[k]\}$ given copies of $\frac{1}{\sqrt{2^n}}\sum_x \vert x,f(x)\rangle$. We
show that, if $T$ copies suffice to learn $f$ using entangled measurements,
then $O(nT^2)$ copies suffice to learn $f$ using just separable measurements.

$\textbf{Entangled versus statistical measurements}$ The goal here is to
learn a function $f \in C$ given access to separable measurements and
statistical measurements. We exhibit a class $C$ that gives an exponential
separation between QSQ learning and quantum learning with entangled
measurements (even in the presence of noise). This proves the "quantum
analogue" of the seminal result of Blum et al. [BKW'03]. that separates
classical SQ and PAC learning with classification noise.

$\textbf{QSQ lower bounds for learning states.}$ We introduce a quantum
statistical query dimension (QSD), which we use to give lower bounds on the QSQ
learning. With this we prove superpolynomial QSQ lower bounds for testing
purity, shadow tomography, Abelian hidden subgroup problem, degree-$2$
functions, planted bi-clique states and output states of Clifford circuits of
depth $\textsf{polylog}(n)$.

$\textbf{Further applications.}$ We give and $\textit{unconditional}$
separation between weak and strong error mitigation and prove lower bounds for
learning distributions in the QSQ model. Prior works by Quek et al. [QFK+'22],
Hinsche et al. [HIN+'22], and Nietner et al. [NIS+'23] proved the analogous
results $\textit{assuming}$ diagonal measurements and our work removes this
assumption.

We prove the three candidate Plurality is Stablest Conjecture of
Khot-Kindler-Mossel-O'Donnell from 2005 for correlations $\rho$ satisfying
$-1/36<\rho<1/11$: the Plurality function is the most noise stable three
candidate election method with small influences, when the corrupted votes have
correlation $-1/36<\rho<1/11$ with the original votes. The previous best result
of this type only achieved positive correlations at most $10^{-10^{10}}$. Our
result follows by solving the three set Standard Simplex Conjecture of
Isaksson-Mossel from 2011 for all correlations $-1/36<\rho<1/11$.

The Gaussian Double Bubble Problem corresponds to the case $\rho\to1^{-}$, so
in some sense, our result is a generalization of the Gaussian Double Bubble
Problem. Our result is also notable since it is the first result for any
$\rho<0$, which is the only relevant case for computational hardness of
MAX-3-CUT. As an additional corollary, we conclude that three candidate Borda
Count is stablest for all $-1/36<\rho<1/11$.

We prove the three candidate Plurality is Stablest Conjecture of
Khot-Kindler-Mossel-O'Donnell from 2005 for correlations $\rho$ satisfying
$-1/36<\rho<1/11$: the Plurality function is the most noise stable three
candidate election method with small influences, when the corrupted votes have
correlation $-1/36<\rho<1/11$ with the original votes. The previous best result
of this type only achieved positive correlations at most $10^{-10^{10}}$. Our
result follows by solving the three set Standard Simplex Conjecture of
Isaksson-Mossel from 2011 for all correlations $-1/36<\rho<1/11$.

The Gaussian Double Bubble Problem corresponds to the case $\rho\to1^{-}$, so
in some sense, our result is a generalization of the Gaussian Double Bubble
Problem. Our result is also notable since it is the first result for any
$\rho<0$, which is the only relevant case for computational hardness of
MAX-3-CUT. As an additional corollary, we conclude that three candidate Borda
Count is stablest for all $-1/36<\rho<1/11$.

In this paper we deal with the problem of computing the exact crossing number
of almost planar graphs and the closely related problem of computing the exact
anchored crossing number of a pair of planar graphs. It was shown by [Cabello
and Mohar, 2013] that both problems are NP-hard; although they required an
unbounded number of high-degree vertices (in the first problem) or an unbounded
number of anchors (in the second problem) to prove their result. Somehow
surprisingly, only three vertices of degree greater than 3, or only three
anchors, are sufficient to maintain hardness of these problems, as we prove
here. The new result also improves the previous result on hardness of joint
crossing number on surfaces by [Hlin\v{e}n\'y and Salazar, 2015]. Our result is
best possible in the anchored case since the anchored crossing number of a pair
of planar graphs with two anchors each is trivial, and close to being best
possible in the almost planar case since the crossing number is efficiently
computable for almost planar graphs of maximum degree 3 [Riskin 1996, Cabello
and Mohar 2011].

In this paper we deal with the problem of computing the exact crossing number
of almost planar graphs and the closely related problem of computing the exact
anchored crossing number of a pair of planar graphs. It was shown by [Cabello
and Mohar, 2013] that both problems are NP-hard; although they required an
unbounded number of high-degree vertices (in the first problem) or an unbounded
number of anchors (in the second problem) to prove their result. Somehow
surprisingly, only three vertices of degree greater than 3, or only three
anchors, are sufficient to maintain hardness of these problems, as we prove
here. The new result also improves the previous result on hardness of joint
crossing number on surfaces by [Hlin\v{e}n\'y and Salazar, 2015]. Our result is
best possible in the anchored case since the anchored crossing number of a pair
of planar graphs with two anchors each is trivial, and close to being best
possible in the almost planar case since the crossing number is efficiently
computable for almost planar graphs of maximum degree 3 [Riskin 1996, Cabello
and Mohar 2011].

Authors: Jacob Focke, Dániel Marx, Fionn Mc Inerney, Daniel Neuen, Govind S. Sankar, Philipp Schepper, Philip Wellnitz

For a well-studied family of domination-type problems, in bounded-treewidth
graphs, we investigate whether it is possible to find faster algorithms. For
sets $\sigma,\rho$ of non-negative integers, a $(\sigma,\rho)$-set of a graph
$G$ is a set $S$ of vertices such that $|N(u)\cap S|\in \sigma$ for every $u\in
S$, and $|N(v)\cap S|\in \rho$ for every $v\not\in S$. The problem of finding a
$(\sigma,\rho)$-set (of a certain size) unifies common problems like
$\text{Independent Set}$, $\text{Dominating Set}$, $\text{Independent
Dominating Set}$, and many others.

In an accompanying paper, it is proven that, for all pairs of finite or
cofinite sets $(\sigma,\rho)$, there is an algorithm that counts
$(\sigma,\rho)$-sets in time $(c_{\sigma,\rho})^{\text{tw}}\cdot n^{O(1)}$ (if
a tree decomposition of width $\text{tw}$ is given in the input). Here,
$c_{\sigma,\rho}$ is a constant with an intricate dependency on $\sigma$ and
$\rho$. Despite this intricacy, we show that the algorithms in the accompanying
paper are most likely optimal, i.e., for any pair $(\sigma, \rho)$ of finite or
cofinite sets where the problem is non-trivial, and any $\varepsilon>0$, a
$(c_{\sigma,\rho}-\varepsilon)^{\text{tw}}\cdot n^{O(1)}$-algorithm counting
the number of $(\sigma,\rho)$-sets would violate the Counting Strong
Exponential-Time Hypothesis ($\#$SETH). For finite sets $\sigma$ and $\rho$,
our lower bounds also extend to the decision version, showing that those
algorithms are optimal in this setting as well.

For a well-studied family of domination-type problems, in bounded-treewidth
graphs, we investigate whether it is possible to find faster algorithms. For
sets $\sigma,\rho$ of non-negative integers, a $(\sigma,\rho)$-set of a graph
$G$ is a set $S$ of vertices such that $|N(u)\cap S|\in \sigma$ for every $u\in
S$, and $|N(v)\cap S|\in \rho$ for every $v\not\in S$. The problem of finding a
$(\sigma,\rho)$-set (of a certain size) unifies common problems like
$\text{Independent Set}$, $\text{Dominating Set}$, $\text{Independent
Dominating Set}$, and many others.

In an accompanying paper, it is proven that, for all pairs of finite or
cofinite sets $(\sigma,\rho)$, there is an algorithm that counts
$(\sigma,\rho)$-sets in time $(c_{\sigma,\rho})^{\text{tw}}\cdot n^{O(1)}$ (if
a tree decomposition of width $\text{tw}$ is given in the input). Here,
$c_{\sigma,\rho}$ is a constant with an intricate dependency on $\sigma$ and
$\rho$. Despite this intricacy, we show that the algorithms in the accompanying
paper are most likely optimal, i.e., for any pair $(\sigma, \rho)$ of finite or
cofinite sets where the problem is non-trivial, and any $\varepsilon>0$, a
$(c_{\sigma,\rho}-\varepsilon)^{\text{tw}}\cdot n^{O(1)}$-algorithm counting
the number of $(\sigma,\rho)$-sets would violate the Counting Strong
Exponential-Time Hypothesis ($\#$SETH). For finite sets $\sigma$ and $\rho$,
our lower bounds also extend to the decision version, showing that those
algorithms are optimal in this setting as well.

Orienting the edges of an undirected graph such that the resulting digraph
satisfies some given constraints is a classical problem in graph theory, with
multiple algorithmic applications. In particular, an $st$-orientation orients
each edge of the input graph such that the resulting digraph is acyclic, and it
contains a single source $s$ and a single sink $t$. Computing an
$st$-orientation of a graph can be done efficiently, and it finds notable
applications in graph algorithms and in particular in graph drawing. On the
other hand, finding an $st$-orientation with at most $k$ transitive edges is
more challenging and it was recently proven to be NP-hard already when $k=0$.
We strengthen this result by showing that the problem remains NP-hard even for
graphs of bounded diameter, and for graphs of bounded vertex degree. These
computational lower bounds naturally raise the question about which structural
parameters can lead to tractable parameterizations of the problem. Our main
result is a fixed-parameter tractable algorithm parameterized by treewidth.

Orienting the edges of an undirected graph such that the resulting digraph
satisfies some given constraints is a classical problem in graph theory, with
multiple algorithmic applications. In particular, an $st$-orientation orients
each edge of the input graph such that the resulting digraph is acyclic, and it
contains a single source $s$ and a single sink $t$. Computing an
$st$-orientation of a graph can be done efficiently, and it finds notable
applications in graph algorithms and in particular in graph drawing. On the
other hand, finding an $st$-orientation with at most $k$ transitive edges is
more challenging and it was recently proven to be NP-hard already when $k=0$.
We strengthen this result by showing that the problem remains NP-hard even for
graphs of bounded diameter, and for graphs of bounded vertex degree. These
computational lower bounds naturally raise the question about which structural
parameters can lead to tractable parameterizations of the problem. Our main
result is a fixed-parameter tractable algorithm parameterized by treewidth.

Authors: Enzo Meneses, Cristóbal A. Navarro, Héctor Ferrada, Felipe A. Quezada

During the last decade GPU technology has shifted from pure general purpose
computation to the inclusion of application specific integrated circuits
(ASICs), such as Tensor Cores and Ray Tracing (RT) cores. Although these
special purpose GPU cores were designed to further accelerate specific fields
such as AI and real-time rendering, recent research has managed to exploit them
to further accelerate other tasks that typically used regular GPU computing. In
this work we present RTXRMQ, a new approach that can compute range minimum
queries (RMQs) with RT cores. The main contribution is the proposal of a
geometric solution for RMQ, where elements become triangles that are placed and
shaped according to the element's value and position in the array,
respectively, such that the closest hit of a ray launched from a point given by
the query parameters corresponds to the result of that query. Experimental
results show that RTXRMQ is currently best suited for small query ranges
relative to the problem size, achieving up to $5\times$ and $2.3\times$ of
speedup over state of the art CPU (HRMQ) and GPU (LCA) approaches,
respectively. Although for medium and large query ranges RTXRMQ is currently
surpassed by LCA, it is still competitive by being $2.5\times$ and $4\times$
faster than HRMQ which is a highly parallel CPU approach. Furthermore,
performance scaling experiments across the latest RTX GPU architectures show
that if the current RT scaling trend continues, then RTXRMQ's performance would
scale at a higher rate than HRMQ and LCA, making the approach even more
relevant for future high performance applications that employ batches of RMQs.

During the last decade GPU technology has shifted from pure general purpose
computation to the inclusion of application specific integrated circuits
(ASICs), such as Tensor Cores and Ray Tracing (RT) cores. Although these
special purpose GPU cores were designed to further accelerate specific fields
such as AI and real-time rendering, recent research has managed to exploit them
to further accelerate other tasks that typically used regular GPU computing. In
this work we present RTXRMQ, a new approach that can compute range minimum
queries (RMQs) with RT cores. The main contribution is the proposal of a
geometric solution for RMQ, where elements become triangles that are placed and
shaped according to the element's value and position in the array,
respectively, such that the closest hit of a ray launched from a point given by
the query parameters corresponds to the result of that query. Experimental
results show that RTXRMQ is currently best suited for small query ranges
relative to the problem size, achieving up to $5\times$ and $2.3\times$ of
speedup over state of the art CPU (HRMQ) and GPU (LCA) approaches,
respectively. Although for medium and large query ranges RTXRMQ is currently
surpassed by LCA, it is still competitive by being $2.5\times$ and $4\times$
faster than HRMQ which is a highly parallel CPU approach. Furthermore,
performance scaling experiments across the latest RTX GPU architectures show
that if the current RT scaling trend continues, then RTXRMQ's performance would
scale at a higher rate than HRMQ and LCA, making the approach even more
relevant for future high performance applications that employ batches of RMQs.

The evolving data framework was first proposed by Anagnostopoulos et al.,
where an evolver makes small changes to a structure behind the scenes. Instead
of taking a single input and producing a single output, an algorithm
judiciously probes the current state of the structure and attempts to
continuously maintain a sketch of the structure that is as close as possible to
its actual state. There have been a number of problems that have been studied
in the evolving framework including our own work on labeled trees. We were
motivated by the problem of maintaining a labeling in the plane, where updating
the labels require physically moving them. Applications involve tracking
evolving disease hot-spots via mobile testing units , and tracking unmanned
aerial vehicles. To be specific, we consider the problem of tracking labeled
nodes in the plane, where an evolver continuously swaps labels of any two
nearby nodes in the background unknown to us. We are tasked with maintaining a
hypothesis, an approximate sketch of the locations of these labels, which we
can only update by physically moving them over a sparse graph. We assume the
existence of an Oracle, which when suitably probed, guides us in fixing our
hypothesis.

The evolving data framework was first proposed by Anagnostopoulos et al.,
where an evolver makes small changes to a structure behind the scenes. Instead
of taking a single input and producing a single output, an algorithm
judiciously probes the current state of the structure and attempts to
continuously maintain a sketch of the structure that is as close as possible to
its actual state. There have been a number of problems that have been studied
in the evolving framework including our own work on labeled trees. We were
motivated by the problem of maintaining a labeling in the plane, where updating
the labels require physically moving them. Applications involve tracking
evolving disease hot-spots via mobile testing units , and tracking unmanned
aerial vehicles. To be specific, we consider the problem of tracking labeled
nodes in the plane, where an evolver continuously swaps labels of any two
nearby nodes in the background unknown to us. We are tasked with maintaining a
hypothesis, an approximate sketch of the locations of these labels, which we
can only update by physically moving them over a sparse graph. We assume the
existence of an Oracle, which when suitably probed, guides us in fixing our
hypothesis.

Linear programming (LP) problems with gainfree Leontief substitution systems
have been intensively studied in economics and operations research, and include
the feasibility problem of a class of Horn systems, which arises in, e.g.,
polyhedral combinatorics and logic. This subclass of LP problems admits a
strongly polynomial time algorithm, where devising such an algorithm for
general LP problems is one of the major theoretical open questions in
mathematical optimization and computer science. Recently, much attention has
been paid to devising certifying algorithms in software engineering, since
those algorithms enable one to confirm the correctness of outputs of programs
with simple computations. In this paper, we provide the first combinatorial
(and strongly polynomial time) certifying algorithm for LP problems with
gainfree Leontief substitution systems. As a by-product, we answer
affirmatively an open question whether the feasibility problem of the class of
Horn systems admits a combinatorial certifying algorithm.

Linear programming (LP) problems with gainfree Leontief substitution systems
have been intensively studied in economics and operations research, and include
the feasibility problem of a class of Horn systems, which arises in, e.g.,
polyhedral combinatorics and logic. This subclass of LP problems admits a
strongly polynomial time algorithm, where devising such an algorithm for
general LP problems is one of the major theoretical open questions in
mathematical optimization and computer science. Recently, much attention has
been paid to devising certifying algorithms in software engineering, since
those algorithms enable one to confirm the correctness of outputs of programs
with simple computations. In this paper, we provide the first combinatorial
(and strongly polynomial time) certifying algorithm for LP problems with
gainfree Leontief substitution systems. As a by-product, we answer
affirmatively an open question whether the feasibility problem of the class of
Horn systems admits a combinatorial certifying algorithm.

Authors: Anh Viet Do, Aneta Neumann, Frank Neumann, Andrew M. Sutton

We study the multi-objective minimum weight base problem, an abstraction of
classical NP-hard combinatorial problems such as the multi-objective minimum
spanning tree problem. We prove some important properties of the convex hull of
the non-dominated front, such as its approximation quality and an upper bound
on the number of extreme points. Using these properties, we give the first
run-time analysis of the MOEA/D algorithm for this problem, an evolutionary
algorithm that effectively optimizes by decomposing the objectives into
single-objective components. We show that the MOEA/D, given an appropriate
decomposition setting, finds all extreme points within expected fixed-parameter
polynomial time in the oracle model, the parameter being the number of
objectives. Experiments are conducted on random bi-objective minimum spanning
tree instances, and the results agree with our theoretical findings.
Furthermore, compared with a previously studied evolutionary algorithm for the
problem GSEMO, MOEA/D finds all extreme points much faster across all
instances.

We study the multi-objective minimum weight base problem, an abstraction of
classical NP-hard combinatorial problems such as the multi-objective minimum
spanning tree problem. We prove some important properties of the convex hull of
the non-dominated front, such as its approximation quality and an upper bound
on the number of extreme points. Using these properties, we give the first
run-time analysis of the MOEA/D algorithm for this problem, an evolutionary
algorithm that effectively optimizes by decomposing the objectives into
single-objective components. We show that the MOEA/D, given an appropriate
decomposition setting, finds all extreme points within expected fixed-parameter
polynomial time in the oracle model, the parameter being the number of
objectives. Experiments are conducted on random bi-objective minimum spanning
tree instances, and the results agree with our theoretical findings.
Furthermore, compared with a previously studied evolutionary algorithm for the
problem GSEMO, MOEA/D finds all extreme points much faster across all
instances.

Structural bias or segregation of networks refers to situations where two or
more disparate groups are present in the network, so that the groups are highly
connected internally, but loosely connected to each other. In many cases it is
of interest to increase the connectivity of disparate groups so as to, e.g.,
minimize social friction, or expose individuals to diverse viewpoints. A
commonly-used mechanism for increasing the network connectivity is to add edge
shortcuts between pairs of nodes. In many applications of interest, edge
shortcuts typically translate to recommendations, e.g., what video to watch, or
what news article to read next. The problem of reducing structural bias or
segregation via edge shortcuts has recently been studied in the literature, and
random walks have been an essential tool for modeling navigation and
connectivity in the underlying networks. Existing methods, however, either do
not offer approximation guarantees, or engineer the objective so that it
satisfies certain desirable properties that simplify the optimization~task. In
this paper we address the problem of adding a given number of shortcut edges in
the network so as to directly minimize the average hitting time and the maximum
hitting time between two disparate groups. Our algorithm for minimizing average
hitting time is a greedy bicriteria that relies on supermodularity. In
contrast, maximum hitting time is not supermodular. Despite, we develop an
approximation algorithm for that objective as well, by leveraging connections
with average hitting time and the asymmetric k-center problem.

Structural bias or segregation of networks refers to situations where two or
more disparate groups are present in the network, so that the groups are highly
connected internally, but loosely connected to each other. In many cases it is
of interest to increase the connectivity of disparate groups so as to, e.g.,
minimize social friction, or expose individuals to diverse viewpoints. A
commonly-used mechanism for increasing the network connectivity is to add edge
shortcuts between pairs of nodes. In many applications of interest, edge
shortcuts typically translate to recommendations, e.g., what video to watch, or
what news article to read next. The problem of reducing structural bias or
segregation via edge shortcuts has recently been studied in the literature, and
random walks have been an essential tool for modeling navigation and
connectivity in the underlying networks. Existing methods, however, either do
not offer approximation guarantees, or engineer the objective so that it
satisfies certain desirable properties that simplify the optimization~task. In
this paper we address the problem of adding a given number of shortcut edges in
the network so as to directly minimize the average hitting time and the maximum
hitting time between two disparate groups. Our algorithm for minimizing average
hitting time is a greedy bicriteria that relies on supermodularity. In
contrast, maximum hitting time is not supermodular. Despite, we develop an
approximation algorithm for that objective as well, by leveraging connections
with average hitting time and the asymmetric k-center problem.

We present representative sets-style statements for linear delta-matroids,
which are set systems that generalize matroids, with important connections to
matching theory and graph embeddings. Furthermore, our proof uses a new
approach of sieving polynomial families, which generalizes the linear algebra
approach of the representative sets lemma to a setting of bounded-degree
polynomials. The representative sets statements for linear delta-matroids then
follow by analyzing the Pfaffian of the skew-symmetric matrix representing the
delta-matroid. Applying the same framework to the determinant instead of the
Pfaffian recovers the representative sets lemma for linear matroids.
Altogether, this significantly extends the toolbox available for kernelization.

As an application, we show an exact sparsification result for Mader networks:
Let $G=(V,E)$ be a graph and $\mathcal{T}$ a partition of a set of terminals $T
\subseteq V(G)$, $|T|=k$. A $\mathcal{T}$-path in $G$ is a path with endpoints
in distinct parts of $\mathcal{T}$ and internal vertices disjoint from $T$. In
polynomial time, we can derive a graph $G'=(V',E')$ with $T \subseteq V(G')$,
such that for every subset $S \subseteq T$ there is a packing of
$\mathcal{T}$-paths with endpoints $S$ in $G$ if and only if there is one in
$G'$, and $|V(G')|=O(k^3)$. This generalizes the (undirected version of the)
cut-covering lemma, which corresponds to the case that $\mathcal{T}$ contains
only two blocks.

To prove the Mader network sparsification result, we furthermore define the
class of Mader delta-matroids, and show that they have linear representations.
This should be of independent interest.

We present representative sets-style statements for linear delta-matroids,
which are set systems that generalize matroids, with important connections to
matching theory and graph embeddings. Furthermore, our proof uses a new
approach of sieving polynomial families, which generalizes the linear algebra
approach of the representative sets lemma to a setting of bounded-degree
polynomials. The representative sets statements for linear delta-matroids then
follow by analyzing the Pfaffian of the skew-symmetric matrix representing the
delta-matroid. Applying the same framework to the determinant instead of the
Pfaffian recovers the representative sets lemma for linear matroids.
Altogether, this significantly extends the toolbox available for kernelization.

As an application, we show an exact sparsification result for Mader networks:
Let $G=(V,E)$ be a graph and $\mathcal{T}$ a partition of a set of terminals $T
\subseteq V(G)$, $|T|=k$. A $\mathcal{T}$-path in $G$ is a path with endpoints
in distinct parts of $\mathcal{T}$ and internal vertices disjoint from $T$. In
polynomial time, we can derive a graph $G'=(V',E')$ with $T \subseteq V(G')$,
such that for every subset $S \subseteq T$ there is a packing of
$\mathcal{T}$-paths with endpoints $S$ in $G$ if and only if there is one in
$G'$, and $|V(G')|=O(k^3)$. This generalizes the (undirected version of the)
cut-covering lemma, which corresponds to the case that $\mathcal{T}$ contains
only two blocks.

To prove the Mader network sparsification result, we furthermore define the
class of Mader delta-matroids, and show that they have linear representations.
This should be of independent interest.

Stochastic optimization is one of the central problems in Machine Learning
and Theoretical Computer Science. In the standard model, the algorithm is given
a fixed distribution known in advance. In practice though, one may acquire at a
cost extra information to make better decisions. In this paper, we study how to
buy information for stochastic optimization and formulate this question as an
online learning problem. Assuming the learner has an oracle for the original
optimization problem, we design a $2$-competitive deterministic algorithm and a
$e/(e-1)$-competitive randomized algorithm for buying information. We show that
this ratio is tight as the problem is equivalent to a robust generalization of
the ski-rental problem, which we call super-martingale stopping.

We also consider an adaptive setting where the learner can choose to buy
information after taking some actions for the underlying optimization problem.
We focus on the classic optimization problem, Min-Sum Set Cover, where the goal
is to quickly find an action that covers a given request drawn from a known
distribution. We provide an $8$-competitive algorithm running in polynomial
time that chooses actions and decides when to buy information about the
underlying request.

Stochastic optimization is one of the central problems in Machine Learning
and Theoretical Computer Science. In the standard model, the algorithm is given
a fixed distribution known in advance. In practice though, one may acquire at a
cost extra information to make better decisions. In this paper, we study how to
buy information for stochastic optimization and formulate this question as an
online learning problem. Assuming the learner has an oracle for the original
optimization problem, we design a $2$-competitive deterministic algorithm and a
$e/(e-1)$-competitive randomized algorithm for buying information. We show that
this ratio is tight as the problem is equivalent to a robust generalization of
the ski-rental problem, which we call super-martingale stopping.

We also consider an adaptive setting where the learner can choose to buy
information after taking some actions for the underlying optimization problem.
We focus on the classic optimization problem, Min-Sum Set Cover, where the goal
is to quickly find an action that covers a given request drawn from a known
distribution. We provide an $8$-competitive algorithm running in polynomial
time that chooses actions and decides when to buy information about the
underlying request.

Authors: Zhenyu Weng, Huiping Zhuang, Haizhou Li, Zhiping Lin

Representing visual data using compact binary codes is attracting increasing
attention as binary codes are used as direct indices into hash table(s) for
fast non-exhaustive search. Recent methods show that ranking binary codes using
weighted Hamming distance (WHD) rather than Hamming distance (HD) by generating
query-adaptive weights for each bit can better retrieve query-related items.
However, search using WHD is slower than that using HD. One main challenge is
that the complexity of extending a monotone increasing sequence using WHD to
probe buckets in hash table(s) for existing methods is at least proportional to
the square of the sequence length, while that using HD is proportional to the
sequence length. To overcome this challenge, we propose a novel fast
non-exhaustive search method using WHD. The key idea is to design a constant
sequence extension algorithm to perform each sequence extension in constant
computational complexity and the total complexity is proportional to the
sequence length, which is justified by theoretical analysis. Experimental
results show that our method is faster than other WHD-based search methods.
Also, compared with the HD-based non-exhaustive search method, our method has
comparable efficiency but retrieves more query-related items for the dataset of
up to one billion items.

Representing visual data using compact binary codes is attracting increasing
attention as binary codes are used as direct indices into hash table(s) for
fast non-exhaustive search. Recent methods show that ranking binary codes using
weighted Hamming distance (WHD) rather than Hamming distance (HD) by generating
query-adaptive weights for each bit can better retrieve query-related items.
However, search using WHD is slower than that using HD. One main challenge is
that the complexity of extending a monotone increasing sequence using WHD to
probe buckets in hash table(s) for existing methods is at least proportional to
the square of the sequence length, while that using HD is proportional to the
sequence length. To overcome this challenge, we propose a novel fast
non-exhaustive search method using WHD. The key idea is to design a constant
sequence extension algorithm to perform each sequence extension in constant
computational complexity and the total complexity is proportional to the
sequence length, which is justified by theoretical analysis. Experimental
results show that our method is faster than other WHD-based search methods.
Also, compared with the HD-based non-exhaustive search method, our method has
comparable efficiency but retrieves more query-related items for the dataset of
up to one billion items.

The identification of the set of k most central nodes of a graph, or
centrality maximization, is a key task in network analysis, with various
applications ranging from finding communities in social and biological networks
to understanding which seed nodes are important to diffuse information in a
graph. As the exact computation of centrality measures does not scale to
modern-sized networks, the most practical solution is to resort to rigorous,
but efficiently computable, randomized approximations. In this work we present
CentRA, the first algorithm based on progressive sampling to compute
high-quality approximations of the set of k most central nodes. CentRA is based
on a novel approach to efficiently estimate Monte Carlo Rademacher Averages, a
powerful tool from statistical learning theory to compute sharp data-dependent
approximation bounds. Then, we study the sample complexity of centrality
maximization using the VC-dimension, a key concept from statistical learning
theory. We show that the number of random samples required to compute
high-quality approximations scales with finer characteristics of the graph,
such as its vertex diameter, or of the centrality of interest, significantly
improving looser bounds derived from standard techniques. We apply CentRA to
analyze large real-world networks, showing that it significantly outperforms
the state-of-the-art approximation algorithm in terms of number of samples,
running times, and accuracy.

The identification of the set of k most central nodes of a graph, or
centrality maximization, is a key task in network analysis, with various
applications ranging from finding communities in social and biological networks
to understanding which seed nodes are important to diffuse information in a
graph. As the exact computation of centrality measures does not scale to
modern-sized networks, the most practical solution is to resort to rigorous,
but efficiently computable, randomized approximations. In this work we present
CentRA, the first algorithm based on progressive sampling to compute
high-quality approximations of the set of k most central nodes. CentRA is based
on a novel approach to efficiently estimate Monte Carlo Rademacher Averages, a
powerful tool from statistical learning theory to compute sharp data-dependent
approximation bounds. Then, we study the sample complexity of centrality
maximization using the VC-dimension, a key concept from statistical learning
theory. We show that the number of random samples required to compute
high-quality approximations scales with finer characteristics of the graph,
such as its vertex diameter, or of the centrality of interest, significantly
improving looser bounds derived from standard techniques. We apply CentRA to
analyze large real-world networks, showing that it significantly outperforms
the state-of-the-art approximation algorithm in terms of number of samples,
running times, and accuracy.

Finding the shortest travelling tour of vehicles with capacity k from the
depot to the customers is called the Capacity vehicle routing problem (CVRP).
CVRP plays an essential position in logistics systems, and it is the most
intensively studied problem in combinatorial optimization. In complexity, CVRP
with k $\ge$ 3 is an NP-hard problem, and it is APX-hard as well. We already
knew that it could not be approximated in metric space. Moreover, it is the
first problem resisting Arora's famous approximation framework. So, whether
there is, a polynomial-time (1+$\epsilon$)-approximation for the Euclidean CVRP
for any $\epsilon>0$ is still an open problem. This paper will summarize the
research progress from history to up-to-date developments. The survey will be
updated periodically.

Finding the shortest travelling tour of vehicles with capacity k from the
depot to the customers is called the Capacity vehicle routing problem (CVRP).
CVRP plays an essential position in logistics systems, and it is the most
intensively studied problem in combinatorial optimization. In complexity, CVRP
with k $\ge$ 3 is an NP-hard problem, and it is APX-hard as well. We already
knew that it could not be approximated in metric space. Moreover, it is the
first problem resisting Arora's famous approximation framework. So, whether
there is, a polynomial-time (1+$\epsilon$)-approximation for the Euclidean CVRP
for any $\epsilon>0$ is still an open problem. This paper will summarize the
research progress from history to up-to-date developments. The survey will be
updated periodically.

Authors: Emmett Breen, Renee Mirka, Zichen Wang, David P. Williamson

This paper revisits the 2-approximation algorithm for $k$-MST presented by
Garg in light of a recent paper of Paul et al.. In the $k$-MST problem, the
goal is to return a tree spanning $k$ vertices of minimum total edge cost. Paul
et al. extend Garg's primal-dual subroutine to improve the approximation ratios
for the budgeted prize-collecting traveling salesman and minimum spanning tree
problems. We follow their algorithm and analysis to provide a cleaner version
of Garg's result. Additionally, we introduce the novel concept of a kernel
which allows an easier visualization of the stages of the algorithm and a
clearer understanding of the pruning phase. Other notable updates include
presenting a linear programming formulation of the $k$-MST problem, including
pseudocode, replacing the coloring scheme used by Garg with the simpler concept
of neutral sets, and providing an explicit potential function.

This paper revisits the 2-approximation algorithm for $k$-MST presented by
Garg in light of a recent paper of Paul et al.. In the $k$-MST problem, the
goal is to return a tree spanning $k$ vertices of minimum total edge cost. Paul
et al. extend Garg's primal-dual subroutine to improve the approximation ratios
for the budgeted prize-collecting traveling salesman and minimum spanning tree
problems. We follow their algorithm and analysis to provide a cleaner version
of Garg's result. Additionally, we introduce the novel concept of a kernel
which allows an easier visualization of the stages of the algorithm and a
clearer understanding of the pruning phase. Other notable updates include
presenting a linear programming formulation of the $k$-MST problem, including
pseudocode, replacing the coloring scheme used by Garg with the simpler concept
of neutral sets, and providing an explicit potential function.

Authors: Ameya Velingker, Maximilian Vötsch, David P. Woodruff, Samson Zhou

We introduce efficient $(1+\varepsilon)$-approximation algorithms for the
binary matrix factorization (BMF) problem, where the inputs are a matrix
$\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an
accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$
as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and
$\mathbf{V}\in\{0,1\}^{k\times d}$. Equivalently, we want to find $\mathbf{U}$
and $\mathbf{V}$ that minimize the Frobenius loss $\|\mathbf{U}\mathbf{V} -
\mathbf{A}\|_F^2$. Before this work, the state-of-the-art for this problem was
the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a
$C$-approximation for some constant $C\ge 576$. We give the first
$(1+\varepsilon)$-approximation algorithm using running time singly exponential
in $k$, where $k$ is typically a small integer. Our techniques generalize to
other common variants of the BMF problem, admitting bicriteria
$(1+\varepsilon)$-approximation algorithms for $L_p$ loss functions and the
setting where matrix operations are performed in $\mathbb{F}_2$. Our approach
can be implemented in standard big data models, such as the streaming or
distributed models.

We introduce efficient $(1+\varepsilon)$-approximation algorithms for the
binary matrix factorization (BMF) problem, where the inputs are a matrix
$\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an
accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$
as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and
$\mathbf{V}\in\{0,1\}^{k\times d}$. Equivalently, we want to find $\mathbf{U}$
and $\mathbf{V}$ that minimize the Frobenius loss $\|\mathbf{U}\mathbf{V} -
\mathbf{A}\|_F^2$. Before this work, the state-of-the-art for this problem was
the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a
$C$-approximation for some constant $C\ge 576$. We give the first
$(1+\varepsilon)$-approximation algorithm using running time singly exponential
in $k$, where $k$ is typically a small integer. Our techniques generalize to
other common variants of the BMF problem, admitting bicriteria
$(1+\varepsilon)$-approximation algorithms for $L_p$ loss functions and the
setting where matrix operations are performed in $\mathbb{F}_2$. Our approach
can be implemented in standard big data models, such as the streaming or
distributed models.

Authors: Andrea Canciani, Claudio Felicioli, Fabio Severino, Domenico Tortola

Every digital process needs to consume some data in order to work properly.
It is very common for applications to use some external data in their
processes, getting them by sources such as external APIs. Therefore, trusting
the received data becomes crucial in such scenarios, considering that if the
data are not self-produced by the consumer, the trust in the external data
source, or in the data that the source produces, can not always be taken for
granted. The most used approach to generate trust in the external source is
based on authenticated data structures, that are able to authenticate the
source when queried through the generation of proofs. Such proofs are useful to
assess authenticity or integrity, however, an external user could also be
interested in verifying the data history and its consistency. This problem
seems to be unaddressed by current literature, which proposes some approaches
aimed at executing audits by internal actors with prior knowledge about the
data structures. In this paper, we address the scenario of an external auditor
with no data knowledge that wants to verify the data history consistency. We
analyze the terminology and the current state of the art of the auditable data
structures, then we will propose a general framework to support external audits
from both internal and external users.

Every digital process needs to consume some data in order to work properly.
It is very common for applications to use some external data in their
processes, getting them by sources such as external APIs. Therefore, trusting
the received data becomes crucial in such scenarios, considering that if the
data are not self-produced by the consumer, the trust in the external data
source, or in the data that the source produces, can not always be taken for
granted. The most used approach to generate trust in the external source is
based on authenticated data structures, that are able to authenticate the
source when queried through the generation of proofs. Such proofs are useful to
assess authenticity or integrity, however, an external user could also be
interested in verifying the data history and its consistency. This problem
seems to be unaddressed by current literature, which proposes some approaches
aimed at executing audits by internal actors with prior knowledge about the
data structures. In this paper, we address the scenario of an external auditor
with no data knowledge that wants to verify the data history consistency. We
analyze the terminology and the current state of the art of the auditable data
structures, then we will propose a general framework to support external audits
from both internal and external users.

[Crossposting again on Lesswrong and Windowsontheory, with the hope I am not overstaying my welcome in LW.] Wealth can be measured by dollars. This is not a perfect measurement: it’s hard to account for purchasing power and circumstances when comparing people across varying countries or time periods. However, within a particular place and time, one can measure … Continue reading The (local) unit of intelligence is FLOPs

[Crossposting again on Lesswrong and Windowsontheory, with the hope I am not overstaying my welcome in LW.]

Wealth can be measured by dollars. This is not a perfect measurement: it’s hard to account for purchasing power and circumstances when comparing people across varying countries or time periods. However, within a particular place and time, one can measure wealth in the local currency. It still does not capture everything (e.g., future earnings, social connections). But generally, all else being roughly equal, the more dollars one has, the wealthier one is.

How do we measure intelligence? I am not interested in measuring the intelligence of individual humans or individual animals. Nor am I looking for a universal absolute scale of intelligence on which we could rank humans, elephants, and GPT4. (Indeed, it doesn’t seem that a one-dimensional comparison can be made; for example, we seem to be more intelligent than elephants on most dimensions, but they do have an impressive memory.) Rather, I want to compare different species within the same genus or different models within the same general architecture (e.g., Transformers).

I think it’s fair to say that the local unit of intelligence for animal species is neurons. While elephants have larger brains than humans, within the genus Homo, to a first approximation, the bigger the brain, the more intelligent the species.

I claim that within the current architectures and training frameworks of large language models, the local unit of intelligence is FLOPs. That is, as long as we follow the current paradigm of training transformer-based architectures within best practices of scaling compute and data, the more compute resources (FLOPs) invested in training the model, the more intelligent it is. This is an imperfect measurement, but probably one that is better than trying to give models “IQ exams” that were designed for humans (and even there have dubious value). Another way to say this is that the intelligence of the model scales with the number of “load-bearing gradient steps” that have gone into training it.

So far, it might seem like a tautology, but as I claimed in the “intelligence forklift” post, this does have some implications. In particular, current general-purpose models such as ChatGPT are built in two phases. The first phase is a pretraining phase, in which the model is trained in a Trillion or more gradient steps on the next-token prediction task. The second phase is the adaptation/fine-tuning phase, in which, whether through instruction-tuning, reinforcement learning on human feedback (RLHF) or other methods, the model is “fine tuned” using fewer than a million gradient steps to be a better instruction-following or chatting agent. In other words, more than 99.9% (maybe as much as 99.9999%) of the FLOPs / gradient steps in training the model are invested during its pretraining phase. (One reason that the fine-tuning phase involves much fewer gradient steps is that, while the first phase can use any static data grabbed from the Internet, the second phase requires data that was especially collected for this task and often needs human labeling as well.)

The adaptation phase can make a huge difference in the usefulness of the model. The chatbot arena doesn’t even contain non-fine-tuned models, and we can see that smaller but well-tuned models can put up a decent fight against ones that have at least 10 times the parameters (and so roughly at least 100 times the training compute). Unlike sashimi, language models should not be consumed raw.

However, their “intelligence” is ultimately derived from the FLOPs invested in the base models. (See also this paper on the limitations of fine-tuning to close capability gaps.) Fine-tuning, whether using RL or not, is the proverbial “cherry on the cake” and the pre-trained model captures more than 99.9% of the intelligence of the model. That pretrained model is not an agent and does not have goals though it can “play one on TV” in the sense of coming up with plans and proposed actions if prompted to do so. (In LW language, a simulator.) This is why a pretrained model can be modeled as an “intelligence forklift”. Just like a forklift supplies strength but is useless without someone driving it, so does the pretrained model supply intelligence, but that intelligence needs to be directed via fine-tuning, conditioning on prompts, etc. Another way to think of the pre-trained model is as the bee colony and the adapter as the queen. (That is, if the queen bee was actually telling bees what to do rather than just laying eggs.)

In that sense, while I agree with Gwern that agentic models are more useful and that “we don’t want low log-loss error on ImageNet, we want to refind a particular personal photo” , I disagree that “Agent AIs [will be] more intelligent than Tool AIs.” Intelligence and usefulness are not the same thing.

Implications for alignment

If the pre-trained model does not have goals, then there is no sense in “aligning” it. Rather, there is a separation of concerns, with a highly intelligent but goal-less pre-trained model (“forklift”) and a not-so-intelligent but goal-directed adaptor (“driver”). It is the latter one that we need to align:

The component of an AI system that needs to be aligned is not the component that accounts for its intelligence.

That is a hopeful lesson since the adaptor can be a much smaller (e.g. have drastically fewer parameters) and tractable object. However, it does not mean that the alignment problem is easy and that we are insulated from the complexities of the pretrained model:

A forklift with a speed of 1000mph might not be actively trying to kill you, but this could still be the end result.

In particular, we don’t understand the biases the pre-trained model inherits from the data, nor the way that these may play out when we use the model in applications. However, it does seem that for a pretrained model to be as good at its job as possible, it should learn all the biases in its data but not be constrained to any of them. It should be able to adapt to any context real or imagined and be the “perfect actor” that can take on any character’s personality.

The traditional “anthropomorphic” view of intelligence is as something that “belongs” to an individual or agent and that this agent has some sort of preferences or goals (a.k.a a utility function). Hence a potential future super-intelligent AI was thought of as an “alien” that pursues some goals. Under this viewpoint, we want to either “box” the alien to control its impact or “align” its goals to ours. Both of these options treat the AI system as a single component encompassing both goals and intelligence. However, if goals and intelligence parts correspond to different components, we may be able to “take the alien’s brain for a ride” and build a variety of systems that share the same capabilities but have very different objectives and profiles.

To be clear, the “intelligence forklift” view does not preclude building an “anti-aligned” agent on top of a pre-trained model that is malicious, dishonest, and harmful. It just means that such an agent would not have an automatic intelligence advantage over other agents (including humans) since all of them can have access to a shared “intelligence engine” provided by the goal-less pretrained models. This is what I illustrated as “scenario 2” in this figure (taken from my previous post):

What about “self play”?

The above assumes that the intelligence component of a model is obtained by executing gradient steps on static data, but what if this data is itself generated by the model? This is what happened with games such as Go and Chess. Originally models were trained by predicting the next move of human games scraped from the Internet, but to improve beyond the quality of these data, models needed to play against themselves and generate new games. They could then filter out only the most successful ones and hence generate data that is of higher quality than the original games they trained on. (Eventually, it turned out that with this approach you don’t need to start with any data for games such as Chess and Go, hence the “Zero” in AlphaZero.)

Self-play makes a lot of sense in games where there is a very clear notion of winning and losing, but what would be the analog for language models? I don’t know the answer to this in general, but in the realm of scientific literature, there is an analogous process. The model could play the roles of authors and reviewers alike, generate new papers, subject them to peer review, revise and resubmit, etc. At least in fields that don’t require “wet labs”, this could lead to the model simulating the scientific literature of 2024, then 2025, and so on and so forth. Models that manage to do this would be amazing and would speed up scientific progress tremendously. However, I believe they could still be (just more powerful) “intelligence forklifts”. Model outputs influencing its inputs can lead to a “positive feedback loop,” and so this is not certain. But I do not see an inherent reason why models could not be arbitrarily intelligent and still completely without goals. In the words of Scott Alexander, no matter how intelligent they are, models could still be “enlightened” and realize that

“once you stop obsessing over the character you’re playing, you notice the GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN.”

I posted a first draft of the book, here. It has more material than the previous blog posts, including a chapter on communication complexity. I plan a major revision, including adding several chapters, but it seems that won’t happen right away, so I am releasing what I have for now. Any comments are appreciated, either […]

I posted a first draft of the book, here. It has more material than the previous blog posts, including a chapter on communication complexity. I plan a major revision, including adding several chapters, but it seems that won’t happen right away, so I am releasing what I have for now. Any comments are appreciated, either on this blog or via email.

Tenure-track faculty members in the School of Computing at the Australian National University. Please see the link below for more information. Website: jobs.anu.edu.au/jobs/tenure-track-lecturer-senior-lecturer-associate-professor-school-of-computing-canberra-act-act-australia Email: ahadn.zehmakan@anu.edu.au

Tenure-track faculty members in the School of Computing at the Australian National University. Please see the link below for more information.

Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds imply algorithms for learning circuits with membership queries over the uniform distribution. Indeed, they exercised this implication to obtain a quasi-polynomial time learning algorithm for ${AC}^0[p]$ circuits, for any prime $p$, by leveraging the existing natural proofs from Razborov (1987) and Smolensky (1987). This achievement raises a logical question: can existing natural proofs be adapted into learning algorithms that utilize random examples and learn over unknown, arbitrary example distributions?
In this work, we show that natural circuit lower bounds proven by specific communication complexity arguments (e.g., Nisan (1994)) witness a ``yes'' answer to this question, under the one limitation of average-case learning. Our primary technical contribution demonstrates a connection between the complexity of learning a concept class in the average-case, and the randomized communication complexity of an evaluation game associated with the class. We apply this finding to derive polynomial time average-case PAC-learning algorithms that use only random examples from arbitrary and unknown distributions, for any concept class that may be evaluated by (for instance) a majority vote of linear threshold functions.
Additionally, our work contributes to a better understanding of the optimal parameters in XOR lemmas for communication complexity. We address a question posed by Viola and Wigderson (2007) by demonstrating that certain enhancements of parameters in their XOR lemmas are false, assuming the existence of one-way functions.

Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds imply algorithms for learning circuits with membership queries over the uniform distribution. Indeed, they exercised this implication to obtain a quasi-polynomial time learning algorithm for ${AC}^0[p]$ circuits, for any prime $p$, by leveraging the existing natural proofs from Razborov (1987) and Smolensky (1987). This achievement raises a logical question: can existing natural proofs be adapted into learning algorithms that utilize random examples and learn over unknown, arbitrary example distributions?
In this work, we show that natural circuit lower bounds proven by specific communication complexity arguments (e.g., Nisan (1994)) witness a ``yes'' answer to this question, under the one limitation of average-case learning. Our primary technical contribution demonstrates a connection between the complexity of learning a concept class in the average-case, and the randomized communication complexity of an evaluation game associated with the class. We apply this finding to derive polynomial time average-case PAC-learning algorithms that use only random examples from arbitrary and unknown distributions, for any concept class that may be evaluated by (for instance) a majority vote of linear threshold functions.
Additionally, our work contributes to a better understanding of the optimal parameters in XOR lemmas for communication complexity. We address a question posed by Viola and Wigderson (2007) by demonstrating that certain enhancements of parameters in their XOR lemmas are false, assuming the existence of one-way functions.

Computational Intractability: A Guide to Algorithmic Lower Bounds

by Demaine-Gasarch-Hajiaghayi

(See here for a link to a first draft.)

we had to make some choices about which notation to use. One of the least important ones was the following:

When defining NP, and in a few other places should we use:

(\exists y)(\forall y)[B(x,y)]

or

\exists x : \forall y : B(x,y)

or

something else.

We ended up doing it the second way. But I wondered, which, if either, is standard. So I looked in many math and theoretical CS books looking for places they used quantifiers. Here is what I found

a) Most papers and books really don't use quantifiers at all! This surprised me.

b) When quantifiers are used, they are used in definitions, not theorems.

c) One exception is in logic when they deal with formulas as objects onto themselves. For example, the inductive definition of a formula will have a step:

If f(x_1,...,x_n) is a formula then (\exists x_i)[f(x_1,...,x_n)] is a formula.

d) Here is a list of the few places I saw quantifiers used and if they used parenthesis or not. I say if it has parenthesis (abbreviated Parens) or not, and if the matrix of the formula is in square brackets, no brackets, or something ese.

Cook's classic paper . Page 154 Parens, no Brackets (1971)

Stockmeyer's paper where he defines PH. Page 6 Parens and Brackets (1976)

Computers and Intractability by Garey & Johnson. Page 164. Parens and Brackets (1979)

Morass-like construction of aleph_2 trees in L by Devlin. Page 2 Parens and matrix in Parens (1979)

Descriptive Complexity by Immerman. Page 38 Parens no Brackets (1999)

Bounded Queries in Recursion Theory by Gasarch and Martin. Parens and Brackets Throughout the book. (1999)

Complexity Theory from Godel to Feynman by Rudich. No Parens, No Brackets in Def of PH. (2003)

Parameterized Complexity Theory by Flum & Grohe. Page 81 no Parens and no Brackets.

Computational Complexity: A Modern Approach by Arora & Barak. Page 40. No Parens No Brackets.(2007)

Computational Complexity: A Conceptual Prospective by Goldreich. Page 114 no parents, no brackets (2008)

On Quantifer Rank Equivalence between linear orders by Siders. On page 417 they use quantifiers to state a theorem, which is unusual. Parens no brackets.

Presburger arithmetic, Rational Generating Functions, and quasi polynomials by Woods. Parens no Brackets. (2012)

Who witness's the Witness by Abel et al. On Page 69 (which the pointer takes you to) No Parens, no brackets. Colons between quantifiers (2018).

e) What to make of all this?

First off- the RARITY of the use of quantifiers really surprised me. The only place I saw them used a lot was my book, co-authored with Georgie Martin, Bounded Queries in Recursion Theory. Perhaps it would have sold better if I didn't use so many quantifiers. Oh well.

Second off- Later works don't use parens and brackets. This is most clear if you just look at Complexity Theory Books

Garey & Johnson - 1979- parens and brackets

Flun & Grohe- 1998- no parens and no brackts

Immerman- 1999 - parens but no brackets (this is the one exception)

Arora & Barack- 2007 no parens and no brackets

Goldreich-2008- no parens and no brackets

If you have a complexity theory book around that is not on this list, look up the definition of NP and the definition of the Poly Hierarchy and see (a) if they use parens around the quantifiers, and (b) if they use square brackets or no brackets of something else. Please leave a comment about it so I test the conjecture that parenthesis are just so 1979.

By gasarch

For the book

Computational Intractability: A Guide to Algorithmic Lower Bounds

we had to make some choices about which notation to use. One of the least important ones was the following:

When defining NP, and in a few other places should we use:

(\exists y)(\forall y)[B(x,y)]

or

\exists x : \forall y : B(x,y)

or

something else.

We ended up doing it the second way. But I wondered, which, if either, is standard. So I looked in many math and theoretical CS books looking for places they used quantifiers. Here is what I found

a) Most papers and books really don't use quantifiers at all! This surprised me.

b) When quantifiers are used, they are used in definitions, not theorems.

c) One exception is in logic when they deal with formulas as objects onto themselves. For example, the inductive definition of a formula will have a step:

If f(x_1,...,x_n) is a formula then (\exists x_i)[f(x_1,...,x_n)] is a formula.

d) Here is a list of the few places I saw quantifiers used and if they used parenthesis or not. I say if it has parenthesis (abbreviated Parens) or not, and if the matrix of the formula is in square brackets, no brackets, or something ese.

First off- the RARITY of the use of quantifiers really surprised me. The only place I saw them used a lot was my book, co-authored with Georgie Martin, Bounded Queries in Recursion Theory. Perhaps it would have sold better if I didn't use so many quantifiers. Oh well.

Second off- Later works don't use parens and brackets. This is most clear if you just look at Complexity Theory Books

Garey & Johnson - 1979- parens and brackets

Flun & Grohe- 1998- no parens and no brackts

Immerman- 1999 - parens but no brackets (this is the one exception)

Arora & Barack- 2007 no parens and no brackets

Goldreich-2008- no parens and no brackets

If you have a complexity theory book around that is not on this list, look up the definition of NP and the definition of the Poly Hierarchy and see (a) if they use parens around the quantifiers, and (b) if they use square brackets or no brackets of something else. Please leave a comment about it so I test the conjecture that parenthesis are just so 1979.

Authors: Joshua Ani, Michael Coulombe, Erik D. Demaine, Yevhenii Diomidov, Timothy Gomez, Dylan Hendrickson, Jayson Lynch

We extend the motion-planning-through-gadgets framework to several new
scenarios involving various numbers of robots/agents, and analyze the
complexity of the resulting motion-planning problems. While past work considers
just one robot or one robot per player, most of our models allow for one or
more locations to spawn new robots in each time step, leading to arbitrarily
many robots. In the 0-player context, where all motion is deterministically
forced, we prove that deciding whether any robot ever reaches a specified
location is undecidable, by representing a counter machine. In the 1-player
context, where the player can choose how to move the robots, we prove
equivalence to Petri nets, EXPSPACE-completeness for reaching a specified
location, PSPACE-completeness for reconfiguration, and ACKERMANN-completeness
for reconfiguration when robots can be destroyed in addition to spawned.
Finally, we consider a variation on the standard 2-player context where,
instead of one robot per player, we have one robot shared by the players, along
with a ko rule to prevent immediately undoing the previous move. We prove this
impartial 2-player game EXPTIME-complete.

We extend the motion-planning-through-gadgets framework to several new
scenarios involving various numbers of robots/agents, and analyze the
complexity of the resulting motion-planning problems. While past work considers
just one robot or one robot per player, most of our models allow for one or
more locations to spawn new robots in each time step, leading to arbitrarily
many robots. In the 0-player context, where all motion is deterministically
forced, we prove that deciding whether any robot ever reaches a specified
location is undecidable, by representing a counter machine. In the 1-player
context, where the player can choose how to move the robots, we prove
equivalence to Petri nets, EXPSPACE-completeness for reaching a specified
location, PSPACE-completeness for reconfiguration, and ACKERMANN-completeness
for reconfiguration when robots can be destroyed in addition to spawned.
Finally, we consider a variation on the standard 2-player context where,
instead of one robot per player, we have one robot shared by the players, along
with a ko rule to prevent immediately undoing the previous move. We prove this
impartial 2-player game EXPTIME-complete.

We study the advantages of quantum communication models over classical
communication models that are equipped with a limited number of qubits of
entanglement. In this direction, we give explicit partial functions on $n$ bits
for which reducing the entanglement increases the classical communication
complexity exponentially. Our separations are as follows. For every $k\ge 1$:

$Q\|^*$ versus $R2^*$: We show that quantum simultaneous protocols with
$\tilde{\Theta}(k^5 \log^3 n)$ qubits of entanglement can exponentially
outperform two-way randomized protocols with $O(k)$ qubits of entanglement.
This resolves an open problem from [Gav08] and improves the state-of-the-art
separations between quantum simultaneous protocols with entanglement and
two-way randomized protocols without entanglement [Gav19, GRT22].

$R\|^*$ versus $Q\|^*$: We show that classical simultaneous protocols with
$\tilde{\Theta}(k \log n)$ qubits of entanglement can exponentially outperform
quantum simultaneous protocols with $O(k)$ qubits of entanglement, resolving an
open question from [GKRW06, Gav19]. The best result prior to our work was a
relational separation against protocols without entanglement [GKRW06].

$R\|^*$ versus $R1^*$: We show that classical simultaneous protocols with
$\tilde{\Theta}(k\log n)$ qubits of entanglement can exponentially outperform
randomized one-way protocols with $O(k)$ qubits of entanglement. Prior to our
work, only a relational separation was known [Gav08].

We study the advantages of quantum communication models over classical
communication models that are equipped with a limited number of qubits of
entanglement. In this direction, we give explicit partial functions on $n$ bits
for which reducing the entanglement increases the classical communication
complexity exponentially. Our separations are as follows. For every $k\ge 1$:

$Q\|^*$ versus $R2^*$: We show that quantum simultaneous protocols with
$\tilde{\Theta}(k^5 \log^3 n)$ qubits of entanglement can exponentially
outperform two-way randomized protocols with $O(k)$ qubits of entanglement.
This resolves an open problem from [Gav08] and improves the state-of-the-art
separations between quantum simultaneous protocols with entanglement and
two-way randomized protocols without entanglement [Gav19, GRT22].

$R\|^*$ versus $Q\|^*$: We show that classical simultaneous protocols with
$\tilde{\Theta}(k \log n)$ qubits of entanglement can exponentially outperform
quantum simultaneous protocols with $O(k)$ qubits of entanglement, resolving an
open question from [GKRW06, Gav19]. The best result prior to our work was a
relational separation against protocols without entanglement [GKRW06].

$R\|^*$ versus $R1^*$: We show that classical simultaneous protocols with
$\tilde{\Theta}(k\log n)$ qubits of entanglement can exponentially outperform
randomized one-way protocols with $O(k)$ qubits of entanglement. Prior to our
work, only a relational separation was known [Gav08].

Tensor parameters that are amortized or regularized over large tensor powers,
often called "asymptotic" tensor parameters, play a central role in several
areas including algebraic complexity theory (constructing fast matrix
multiplication algorithms), quantum information (entanglement cost and
distillable entanglement), and additive combinatorics (bounds on cap sets,
sunflower-free sets, etc.). Examples are the asymptotic tensor rank, asymptotic
slice rank and asymptotic subrank. Recent works (Costa-Dalai,
Blatter-Draisma-Rupniewski, Christandl-Gesmundo-Zuiddam) have investigated
notions of discreteness (no accumulation points) or "gaps" in the values of
such tensor parameters.

We prove a general discreteness theorem for asymptotic tensor parameters of
order-three tensors and use this to prove that (1) over any finite field, the
asymptotic subrank and the asymptotic slice rank have no accumulation points,
and (2) over the complex numbers, the asymptotic slice rank has no accumulation
points.

Central to our approach are two new general lower bounds on the asymptotic
subrank of tensors, which measures how much a tensor can be diagonalized. The
first lower bound says that the asymptotic subrank of any concise three-tensor
is at least the cube-root of the smallest dimension. The second lower bound
says that any three-tensor that is "narrow enough" (has one dimension much
smaller than the other two) has maximal asymptotic subrank.

Our proofs rely on new lower bounds on the maximum rank in matrix subspaces
that are obtained by slicing a three-tensor in the three different directions.
We prove that for any concise tensor the product of any two such maximum ranks
must be large, and as a consequence there are always two distinct directions
with large max-rank.

Tensor parameters that are amortized or regularized over large tensor powers,
often called "asymptotic" tensor parameters, play a central role in several
areas including algebraic complexity theory (constructing fast matrix
multiplication algorithms), quantum information (entanglement cost and
distillable entanglement), and additive combinatorics (bounds on cap sets,
sunflower-free sets, etc.). Examples are the asymptotic tensor rank, asymptotic
slice rank and asymptotic subrank. Recent works (Costa-Dalai,
Blatter-Draisma-Rupniewski, Christandl-Gesmundo-Zuiddam) have investigated
notions of discreteness (no accumulation points) or "gaps" in the values of
such tensor parameters.

We prove a general discreteness theorem for asymptotic tensor parameters of
order-three tensors and use this to prove that (1) over any finite field, the
asymptotic subrank and the asymptotic slice rank have no accumulation points,
and (2) over the complex numbers, the asymptotic slice rank has no accumulation
points.

Central to our approach are two new general lower bounds on the asymptotic
subrank of tensors, which measures how much a tensor can be diagonalized. The
first lower bound says that the asymptotic subrank of any concise three-tensor
is at least the cube-root of the smallest dimension. The second lower bound
says that any three-tensor that is "narrow enough" (has one dimension much
smaller than the other two) has maximal asymptotic subrank.

Our proofs rely on new lower bounds on the maximum rank in matrix subspaces
that are obtained by slicing a three-tensor in the three different directions.
We prove that for any concise tensor the product of any two such maximum ranks
must be large, and as a consequence there are always two distinct directions
with large max-rank.

We present a polynomial-time quantum algorithm making a single query (in
superposition) to a classical oracle, such that for every state $|\psi\rangle$
there exists a choice of oracle that makes the algorithm construct an
exponentially close approximation of $|\psi\rangle$. Previous algorithms for
this problem either used a linear number of queries and polynomial time
[arXiv:1607.05256], or a constant number of queries and polynomially many
ancillae but no nontrivial bound on the runtime [arXiv:2111.02999]. As
corollaries we do the following:

- We simplify the proof that statePSPACE $\subseteq$ stateQIP
[arXiv:2108.07192] (a quantum state analogue of PSPACE $\subseteq$ IP) and show
that a constant number of rounds of interaction suffices.

- We show that QAC$\mathsf{_f^0}$ lower bounds for constructing explicit
states would imply breakthrough circuit lower bounds for computing explicit
boolean functions.

- We prove that every $n$-qubit state can be constructed to within 0.01 error
by an $O(2^n/n)$-size circuit over an appropriate finite gate set. More
generally we give a size-error tradeoff which, by a counting argument, is
optimal for any finite gate set.

We present a polynomial-time quantum algorithm making a single query (in
superposition) to a classical oracle, such that for every state $|\psi\rangle$
there exists a choice of oracle that makes the algorithm construct an
exponentially close approximation of $|\psi\rangle$. Previous algorithms for
this problem either used a linear number of queries and polynomial time
[arXiv:1607.05256], or a constant number of queries and polynomially many
ancillae but no nontrivial bound on the runtime [arXiv:2111.02999]. As
corollaries we do the following:

- We simplify the proof that statePSPACE $\subseteq$ stateQIP
[arXiv:2108.07192] (a quantum state analogue of PSPACE $\subseteq$ IP) and show
that a constant number of rounds of interaction suffices.

- We show that QAC$\mathsf{_f^0}$ lower bounds for constructing explicit
states would imply breakthrough circuit lower bounds for computing explicit
boolean functions.

- We prove that every $n$-qubit state can be constructed to within 0.01 error
by an $O(2^n/n)$-size circuit over an appropriate finite gate set. More
generally we give a size-error tradeoff which, by a counting argument, is
optimal for any finite gate set.

The Area Under the ROC Curve (AUC) is an important model metric for
evaluating binary classifiers, and many algorithms have been proposed to
optimize AUC approximately. It raises the question of whether the generally
insignificant gains observed by previous studies are due to inherent
limitations of the metric or the inadequate quality of optimization.

To better understand the value of optimizing for AUC, we present an efficient
algorithm, namely AUC-opt, to find the provably optimal AUC linear classifier
in $\mathbb{R}^2$, which runs in $\mathcal{O}(n_+ n_- \log (n_+ n_-))$ where
$n_+$ and $n_-$ are the number of positive and negative samples respectively.
Furthermore, it can be naturally extended to $\mathbb{R}^d$ in
$\mathcal{O}((n_+n_-)^{d-1}\log (n_+n_-))$ by calling AUC-opt in
lower-dimensional spaces recursively. We prove the problem is NP-complete when
$d$ is not fixed, reducing from the \textit{open hemisphere problem}.

Experiments show that compared with other methods, AUC-opt achieves
statistically significant improvements on between 17 to 40 in $\mathbb{R}^2$
and between 4 to 42 in $\mathbb{R}^3$ of 50 t-SNE training datasets. However,
generally the gain proves insignificant on most testing datasets compared to
the best standard classifiers. Similar observations are found for nonlinear AUC
methods under real-world datasets.

The Area Under the ROC Curve (AUC) is an important model metric for
evaluating binary classifiers, and many algorithms have been proposed to
optimize AUC approximately. It raises the question of whether the generally
insignificant gains observed by previous studies are due to inherent
limitations of the metric or the inadequate quality of optimization.

To better understand the value of optimizing for AUC, we present an efficient
algorithm, namely AUC-opt, to find the provably optimal AUC linear classifier
in $\mathbb{R}^2$, which runs in $\mathcal{O}(n_+ n_- \log (n_+ n_-))$ where
$n_+$ and $n_-$ are the number of positive and negative samples respectively.
Furthermore, it can be naturally extended to $\mathbb{R}^d$ in
$\mathcal{O}((n_+n_-)^{d-1}\log (n_+n_-))$ by calling AUC-opt in
lower-dimensional spaces recursively. We prove the problem is NP-complete when
$d$ is not fixed, reducing from the \textit{open hemisphere problem}.

Experiments show that compared with other methods, AUC-opt achieves
statistically significant improvements on between 17 to 40 in $\mathbb{R}^2$
and between 4 to 42 in $\mathbb{R}^3$ of 50 t-SNE training datasets. However,
generally the gain proves insignificant on most testing datasets compared to
the best standard classifiers. Similar observations are found for nonlinear AUC
methods under real-world datasets.

$ \newcommand{\epsA}{\Mh{\delta}} \newcommand{\Re}{\mathbb{R}}
\newcommand{\reals}{\mathbb{R}} \newcommand{\SetX}{\mathsf{X}}
\newcommand{\diam}{\Delta} \newcommand{\Mh}[1]{#1} \newcommand{\query}{q}
\newcommand{\eps}{\varepsilon} \newcommand{\VorX}[1]{\mathcal{V} \pth{#1}}
\newcommand{\IntRange}[1]{[ #1 ]} \newcommand{\Space}{\overline{\mathsf{m}}}
\newcommand{\pth}[2][\!]{#1\left({#2}\right)}
\newcommand{\polylog}{\mathrm{polylog}} \newcommand{\N}{\mathbb N}
\newcommand{\Z}{\mathbb Z} \newcommand{\pt}{p} \newcommand{\distY}[2]{\left\|
{#1} - {#2} \right\|} \newcommand{\PP}{P} \newcommand{\ptq}{q}
\newcommand{\pts}{s}$ Given a set $\PP \subset \Re^d$ of $n$ points, with
diameter $\diam$, and a parameter $\epsA \in (0,1)$, it is known that there is
a partition of $\PP$ into sets $\PP_1, \ldots, \PP_t$, each of size
$O(1/\epsA^2)$, such that their convex-hulls all intersect a common ball of
radius $\epsA \diam$. We prove that a random partition, with a simple
alteration step, yields the desired partition, resulting in a linear time
algorithm. Previous proofs were either existential (i.e., at least exponential
time), or required much bigger sets. In addition, the algorithm and its proof
of correctness are significantly simpler than previous work, and the constants
are slightly better.

In addition, we provide a linear time algorithm for computing a ``fuzzy''
centerpoint. We also prove a no-dimensional weak $\eps$-net theorem with an
improved constant.

$ \newcommand{\epsA}{\Mh{\delta}} \newcommand{\Re}{\mathbb{R}}
\newcommand{\reals}{\mathbb{R}} \newcommand{\SetX}{\mathsf{X}}
\newcommand{\diam}{\Delta} \newcommand{\Mh}[1]{#1} \newcommand{\query}{q}
\newcommand{\eps}{\varepsilon} \newcommand{\VorX}[1]{\mathcal{V} \pth{#1}}
\newcommand{\IntRange}[1]{[ #1 ]} \newcommand{\Space}{\overline{\mathsf{m}}}
\newcommand{\pth}[2][\!]{#1\left({#2}\right)}
\newcommand{\polylog}{\mathrm{polylog}} \newcommand{\N}{\mathbb N}
\newcommand{\Z}{\mathbb Z} \newcommand{\pt}{p} \newcommand{\distY}[2]{\left\|
{#1} - {#2} \right\|} \newcommand{\PP}{P} \newcommand{\ptq}{q}
\newcommand{\pts}{s}$ Given a set $\PP \subset \Re^d$ of $n$ points, with
diameter $\diam$, and a parameter $\epsA \in (0,1)$, it is known that there is
a partition of $\PP$ into sets $\PP_1, \ldots, \PP_t$, each of size
$O(1/\epsA^2)$, such that their convex-hulls all intersect a common ball of
radius $\epsA \diam$. We prove that a random partition, with a simple
alteration step, yields the desired partition, resulting in a linear time
algorithm. Previous proofs were either existential (i.e., at least exponential
time), or required much bigger sets. In addition, the algorithm and its proof
of correctness are significantly simpler than previous work, and the constants
are slightly better.

In addition, we provide a linear time algorithm for computing a ``fuzzy''
centerpoint. We also prove a no-dimensional weak $\eps$-net theorem with an
improved constant.

In this paper, we introduce the Maximum Matrix Contraction problem, where we
aim to contract as much as possible a binary matrix in order to maximize its
density. We study the complexity and the polynomial approximability of the
problem. Especially, we prove this problem to be NP-Complete and that every
algorithm solving this problem is at most a $2\sqrt{n}$-approximation algorithm
where n is the number of ones in the matrix. We then focus on efficient
algorithms to solve the problem: an integer linear program and three
heuristics.

In this paper, we introduce the Maximum Matrix Contraction problem, where we
aim to contract as much as possible a binary matrix in order to maximize its
density. We study the complexity and the polynomial approximability of the
problem. Especially, we prove this problem to be NP-Complete and that every
algorithm solving this problem is at most a $2\sqrt{n}$-approximation algorithm
where n is the number of ones in the matrix. We then focus on efficient
algorithms to solve the problem: an integer linear program and three
heuristics.

In this paper, we propose new techniques for solving geometric optimization
problems involving interpoint distances of a point set in the plane. Given a
set $P$ of $n$ points in the plane and an integer $1 \leq k \leq \binom{n}{2}$,
the distance selection problem is to find the $k$-th smallest interpoint
distance among all pairs of points of $P$. The previously best deterministic
algorithm solves the problem in $O(n^{4/3} \log^2 n)$ time [Katz and Sharir,
SIAM J. Comput. 1997 and SoCG 1993]. In this paper, we improve their algorithm
to $O(n^{4/3} \log n)$ time. Using similar techniques, we also give improved
algorithms on both the two-sided and the one-sided discrete Fr\'{e}chet
distance with shortcuts problem for two point sets in the plane. For the
two-sided problem (resp., one-sided problem), we improve the previous work
[Avraham, Filtser, Kaplan, Katz, and Sharir, ACM Trans. Algorithms 2015 and
SoCG 2014] by a factor of roughly $\log^2(m+n)$ (resp., $(m+n)^{\epsilon}$),
where $m$ and $n$ are the sizes of the two input point sets, respectively.
Other problems whose solutions can be improved by our techniques include the
reverse shortest path problems for unit-disk graphs. Our techniques are quite
general and we believe they will find many other applications in future.

In this paper, we propose new techniques for solving geometric optimization
problems involving interpoint distances of a point set in the plane. Given a
set $P$ of $n$ points in the plane and an integer $1 \leq k \leq \binom{n}{2}$,
the distance selection problem is to find the $k$-th smallest interpoint
distance among all pairs of points of $P$. The previously best deterministic
algorithm solves the problem in $O(n^{4/3} \log^2 n)$ time [Katz and Sharir,
SIAM J. Comput. 1997 and SoCG 1993]. In this paper, we improve their algorithm
to $O(n^{4/3} \log n)$ time. Using similar techniques, we also give improved
algorithms on both the two-sided and the one-sided discrete Fr\'{e}chet
distance with shortcuts problem for two point sets in the plane. For the
two-sided problem (resp., one-sided problem), we improve the previous work
[Avraham, Filtser, Kaplan, Katz, and Sharir, ACM Trans. Algorithms 2015 and
SoCG 2014] by a factor of roughly $\log^2(m+n)$ (resp., $(m+n)^{\epsilon}$),
where $m$ and $n$ are the sizes of the two input point sets, respectively.
Other problems whose solutions can be improved by our techniques include the
reverse shortest path problems for unit-disk graphs. Our techniques are quite
general and we believe they will find many other applications in future.

Merge trees, contour trees, and Reeb graphs are graph-based topological
descriptors that capture topological changes of (sub)level sets of scalar
fields. Comparing scalar fields using their topological descriptors has many
applications in topological data analysis and visualization of scientific data.
Recently, Munch and Stefanou introduced a labeled interleaving distance for
comparing two labeled merge trees, which enjoys a number of theoretical and
algorithmic properties. In particular, the labeled interleaving distance
between merge trees can be computed in polynomial time. In this work, we define
the labeled interleaving distance for labeled Reeb graphs. We then prove that
the (ordinary) interleaving distance between Reeb graphs equals the minimum of
the labeled interleaving distance over all labelings. We also provide an
efficient algorithm for computing the labeled interleaving distance between two
labeled contour trees (which are special types of Reeb graphs that arise from
simply-connected domains). In the case of merge trees, the notion of the
labeled interleaving distance was used by Gasparovic et al. to prove that the
(ordinary) interleaving distance on the set of (unlabeled) merge trees is
intrinsic. As our final contribution, we present counterexamples showing that,
on the contrary, the (ordinary) interleaving distance on (unlabeled) Reeb
graphs (and contour trees) is not intrinsic. It turns out that, under mild
conditions on the labelings, the labeled interleaving distance is a metric on
isomorphism classes of Reeb graphs, analogous to the ordinary interleaving
distance. This provides new metrics on large classes of Reeb graphs.

Merge trees, contour trees, and Reeb graphs are graph-based topological
descriptors that capture topological changes of (sub)level sets of scalar
fields. Comparing scalar fields using their topological descriptors has many
applications in topological data analysis and visualization of scientific data.
Recently, Munch and Stefanou introduced a labeled interleaving distance for
comparing two labeled merge trees, which enjoys a number of theoretical and
algorithmic properties. In particular, the labeled interleaving distance
between merge trees can be computed in polynomial time. In this work, we define
the labeled interleaving distance for labeled Reeb graphs. We then prove that
the (ordinary) interleaving distance between Reeb graphs equals the minimum of
the labeled interleaving distance over all labelings. We also provide an
efficient algorithm for computing the labeled interleaving distance between two
labeled contour trees (which are special types of Reeb graphs that arise from
simply-connected domains). In the case of merge trees, the notion of the
labeled interleaving distance was used by Gasparovic et al. to prove that the
(ordinary) interleaving distance on the set of (unlabeled) merge trees is
intrinsic. As our final contribution, we present counterexamples showing that,
on the contrary, the (ordinary) interleaving distance on (unlabeled) Reeb
graphs (and contour trees) is not intrinsic. It turns out that, under mild
conditions on the labelings, the labeled interleaving distance is a metric on
isomorphism classes of Reeb graphs, analogous to the ordinary interleaving
distance. This provides new metrics on large classes of Reeb graphs.

Authors: Arnaud Deza, Chang Liu, Pashootan Vaezipoor, Elias B. Khalil

It is known that the multiplication of an $N \times M$ matrix with an $M
\times P$ matrix can be performed using fewer multiplications than what the
naive $NMP$ approach suggests. The most famous instance of this is Strassen's
algorithm for multiplying two $2\times 2$ matrices in 7 instead of 8
multiplications. This gives rise to the constraint satisfaction problem of fast
matrix multiplication, where a set of $R < NMP$ multiplication terms must be
chosen and combined such that they satisfy correctness constraints on the
output matrix. Despite its highly combinatorial nature, this problem has not
been exhaustively examined from that perspective, as evidenced for example by
the recent deep reinforcement learning approach of AlphaTensor. In this work,
we propose a simple yet novel Constraint Programming approach to find
non-commutative algorithms for fast matrix multiplication or provide proof of
infeasibility otherwise. We propose a set of symmetry-breaking constraints and
valid inequalities that are particularly helpful in proving infeasibility. On
the feasible side, we find that exploiting solver performance variability in
conjunction with a sparsity-based problem decomposition enables finding
solutions for larger (feasible) instances of fast matrix multiplication. Our
experimental results using CP Optimizer demonstrate that we can find fast
matrix multiplication algorithms for matrices up to $3\times 3$ in a short
amount of time.

It is known that the multiplication of an $N \times M$ matrix with an $M
\times P$ matrix can be performed using fewer multiplications than what the
naive $NMP$ approach suggests. The most famous instance of this is Strassen's
algorithm for multiplying two $2\times 2$ matrices in 7 instead of 8
multiplications. This gives rise to the constraint satisfaction problem of fast
matrix multiplication, where a set of $R < NMP$ multiplication terms must be
chosen and combined such that they satisfy correctness constraints on the
output matrix. Despite its highly combinatorial nature, this problem has not
been exhaustively examined from that perspective, as evidenced for example by
the recent deep reinforcement learning approach of AlphaTensor. In this work,
we propose a simple yet novel Constraint Programming approach to find
non-commutative algorithms for fast matrix multiplication or provide proof of
infeasibility otherwise. We propose a set of symmetry-breaking constraints and
valid inequalities that are particularly helpful in proving infeasibility. On
the feasible side, we find that exploiting solver performance variability in
conjunction with a sparsity-based problem decomposition enables finding
solutions for larger (feasible) instances of fast matrix multiplication. Our
experimental results using CP Optimizer demonstrate that we can find fast
matrix multiplication algorithms for matrices up to $3\times 3$ in a short
amount of time.

Authors: Fedor V. Fomin, Pierre Fraigniaud, Petr A. Golovach

The task of the broadcast problem is, given a graph G and a source vertex s,
to compute the minimum number of rounds required to disseminate a piece of
information from s to all vertices in the graph. It is assumed that, at each
round, an informed vertex can transmit the information to at most one of its
neighbors. The broadcast problem is known to NP-hard. We show that the problem
is FPT when parametrized by the size k of a feedback edge-set, or by the size k
of a vertex-cover, or by k=n-t where t is the input deadline for the broadcast
protocol to complete.

The task of the broadcast problem is, given a graph G and a source vertex s,
to compute the minimum number of rounds required to disseminate a piece of
information from s to all vertices in the graph. It is assumed that, at each
round, an informed vertex can transmit the information to at most one of its
neighbors. The broadcast problem is known to NP-hard. We show that the problem
is FPT when parametrized by the size k of a feedback edge-set, or by the size k
of a vertex-cover, or by k=n-t where t is the input deadline for the broadcast
protocol to complete.

Soddy’s hexlet is a famous system of nine spheres in three-dimensional Euclidean space, consisting of a ring of six spheres, tangent in consecutive pairs, and a ring of three spheres, tangent in pairs, with every sphere in one ring tangent to every sphere of the other ring. The easy way to construct it is to observe that its properties are invariant under Möbius transformations, as long as you count planes as spheres, and parallel planes as tangent spheres. If you take two of the three spheres in the three-sphere ring to be parallel planes, the rest have to form seven congruent spheres, six of them in a ring around the seventh. Any other form of the hexlet can be obtained by a Möbius transformation from this one.

Soddy’s hexlet is a famous system of nine spheres in three-dimensional Euclidean space, consisting of a ring of six spheres, tangent in consecutive pairs, and a ring of three spheres, tangent in pairs, with every sphere in one ring tangent to every sphere of the other ring. The easy way to construct it is to observe that its properties are invariant under Möbius transformations, as long as you count planes as spheres, and parallel planes as tangent spheres. If you take two of the three spheres in the three-sphere ring to be parallel planes, the rest have to form seven congruent spheres, six of them in a ring around the seventh. Any other form of the hexlet can be obtained by a Möbius transformation from this one.

But there’s another way of constructing the same shape, coming from four-dimensional polyhedral geometry, which can also be used to construct another related pair of interlocking rings of spheres.

By analogy, consider a cube in 3d, and start growing a sphere with its center at the center of the cube. As you grow the sphere, it will start to bulge out through the faces of the cube, which cut it in six circles. Initially, those circles will be small and disjoint from each other, but as they grow larger they will eventually touch at the midpoint of a cube edge, and then cross each other. At the time when any two of the growing circles touch each other, all six will touch four others, by the symmetries of the cube. In this way, we have generated a configuration of six circles, on the surface of a sphere, each touching four others with the same touching pattern as the square faces of a cube.

Now do the same thing in 4d with the (3,6)-duoprism, the Cartesian product of an equilateral triangle and a regular hexagon. The resulting 4-dimensional polytope has facets of two type: six triangular prisms (attached to each other on their triangle faces) and three hexagonal prisms (attached to each other on their hexagon faces). Rectangular faces connect triangular prisms to a hexagonal prism. There is a choice for how big to make the triangle edges relative to the hexagon edges. You want them to be in the proportion \(\sqrt3:1\), so that both kinds of prism have an inscribed sphere touching all their faces. Instead, the only figures I could find of the duoprism show it with edges in the wrong proportion \(1:1\), generating square faces. But I want the rectangular one.

Grow a hypersphere from the center of the duoprism. As it grows, it will intersect the prism facets of the duosphere in spheres, centered at the centers of the prisms. Initially small and disjoint, these spheres will grow until they become the inscribed spheres of the prisms, touching each other at the center of each triangle, hexagon, or rectangle of the duoprism. You have created a hexlet, simultaneously drawn on a sphere and inscribed in the faces of a duoprism! You can map it into the usual hexlet of three-dimensional Euclidean geometry (instead of three-dimensional spherical geometry) by a stereographic projection from the hypersphere to a flat three-dimensional space.

Almost all the other duoprisms do not have this coincidence, that when you adjust the proportions to make one kind of prism have inscribed spheres, the other one does the same thing with the same proportions. There’s only one other duoprism for which this works: the (4,4)-duoprism, better known as the 4-hypercube or tesseract. In this case, all the facets are the same, and all the 2-faces are the same. If we grow a hypersphere, centered at the center of the hypercube, it will cross the hypercube facets (which are cubes) in spheres. When these spheres grow to the size where they are inscribed in each cube facet, they will be tangent to each other at the centers of the square two-dimensional faces of the hypercube. At this point, you will have formed two rings of four tangent spheres, tangent in consecutive pairs, with every sphere in one ring tangent to every sphere of the other ring. We could call it the “quadlet”.

Now that you’ve constructed a quadlet on a hypersphere in 4d, you can apply a stereographic projection to get the same quadlet as a collection of ordinary spheres in 3-dimensional Euclidean space. One of the more symmetric ways of doing this projection takes one of the two 4-sphere rings to two unit spheres sandwiched between two parallel planes at distance 4 from each other. The four spheres of the other ring all have radius 2, and wrap around the central two unit spheres. It’s not obvious to me that these two parallel planes, two unit spheres, and four radius-2 spheres can all be tangent in this pattern, unless we calculate the coordinates of their tangencies or use reasoning based on the spheres inscribed on the facets of a hypercube, for which the same pattern of tangencies is obvious.

You can rotate the four radius-2 spheres around the axis formed by the central unit spheres arbitrarily without changing the pattern of tangencies. This is more or less analogous to the fact that with the hexlet, you can start the ring of six tangent spheres at any sphere tangent to all three spheres of the other ring, and then add spheres to the ring one by one, keeping each sphere tangent to its predecessor in the same ring and to all the spheres of the other ring. It will always close up after six spheres to form a hexlet. You can also fix in place the six-sphere ring, choose any sphere tangent to all of them to start the three-sphere ring, and it will always close up after three spheres to form a hexlet. And once you have one of the four-sphere rings of a quadlet, you can choose any sphere tangent to all four to start the other ring, and it will always close up after four spheres to form a quadlet. For the hexlet, this becomes obvious after we do a Möbius transformation to take it into the form with two parallel planes and seven congruent spheres. For the quadlet, it is similarly obvious by doing a Möbius transformation to take it into a form with two parallel planes in one ring and four congruent spheres in the other. The only way for the remaining two spheres to complete the first ring is for them to fill the hole between the four congruent spheres, one directly on top of each other. They might not be the same size as each other but one more Möbius transformation makes them so. So just like the hexlet, all quadlets are Möbius-equivalent.

Incidentally, the fact that you can get systems of three-dimensional tangent spheres from four-dimensional polytopes is not particularly new. I used it long ago in my paper with Kuperberg and Ziegler, “Fat 4-polytopes and fatter 3-spheres”, to get a finite set of spheres with high kissing number from the 120-cell and its relatives. For more on the connection between sphere packings and 4-polytopes, including the construction of the hexlet from the duoprism, see Hao Chen’s papers and especially “Apollonian ball packings and stacked polytopes”.

The random-query model was introduced by Raz and Zhan at ITCS 2020 as a new model of space-bounded computation. In this model, a branching program of length $T$ and width $2^{S}$ attempts to compute a function $f:\{0,1\}^n \rightarrow \{0,1 \}$. However, instead of receiving direct access to the input bits $(x_1,\ldots,x_n)$, the input is given in pairs of the form $(i_j, x_{i_j}) \in \{1,\ldots,n\} \times \{0,1\}$ for $j = 1,2,\ldots,T$, where the indices $i_1,\ldots,i_T$ are chosen at random from a pre-fixed distribution.
Raz and Zhan proved that any branching program in the random-query model with the independent distribution (where $\{i_j\}_{j = 1,\ldots,T}$ are uniform and independent) that computes a function $f$ with sensitivity $k$ satisfies $T \cdot (S + \log n) \geq \Omega(n \cdot k)$.
This gives a quadratic time-space lower bound for many natural functions which have sensitivity $\Omega(n)$, such as XOR and Majority. The bound was proved in the zero-error regime, where for each input, the branching program is required to output a value with high probability, and given that a value is output, it must be correct with probability $1$.
Furthermore, Raz and Zhan conjectured that (up to logarithmic factors in $n$) a quadratic time-space lower bound still holds for the XOR function in the more conventional bounded-error regime, where for each input, the output must be correct with high probability.
In this paper, we prove this conjecture. More generally, let $f:\{0,1\}^n \rightarrow \{0,1 \}$ have average sensitivity (or total influence) $\mathrm{I}[f]$. We prove that any branching program in the random-query model with the independent distribution that computes $f$ in the bounded-error regime satisfies $T \cdot S \geq \tilde{\Omega}(n) \cdot \mathrm{I}[f]$ (where $\tilde{\Omega}$ hides logarithmic factors in $n$). Moreover, we prove a quadratic time-space lower bound for the Majority function, even though its total influence is $\Theta(\sqrt{n})$.
Our proof is based on a reduction from a communication complexity problem.

The random-query model was introduced by Raz and Zhan at ITCS 2020 as a new model of space-bounded computation. In this model, a branching program of length $T$ and width $2^{S}$ attempts to compute a function $f:\{0,1\}^n \rightarrow \{0,1 \}$. However, instead of receiving direct access to the input bits $(x_1,\ldots,x_n)$, the input is given in pairs of the form $(i_j, x_{i_j}) \in \{1,\ldots,n\} \times \{0,1\}$ for $j = 1,2,\ldots,T$, where the indices $i_1,\ldots,i_T$ are chosen at random from a pre-fixed distribution.
Raz and Zhan proved that any branching program in the random-query model with the independent distribution (where $\{i_j\}_{j = 1,\ldots,T}$ are uniform and independent) that computes a function $f$ with sensitivity $k$ satisfies $T \cdot (S + \log n) \geq \Omega(n \cdot k)$.
This gives a quadratic time-space lower bound for many natural functions which have sensitivity $\Omega(n)$, such as XOR and Majority. The bound was proved in the zero-error regime, where for each input, the branching program is required to output a value with high probability, and given that a value is output, it must be correct with probability $1$.
Furthermore, Raz and Zhan conjectured that (up to logarithmic factors in $n$) a quadratic time-space lower bound still holds for the XOR function in the more conventional bounded-error regime, where for each input, the output must be correct with high probability.
In this paper, we prove this conjecture. More generally, let $f:\{0,1\}^n \rightarrow \{0,1 \}$ have average sensitivity (or total influence) $\mathrm{I}[f]$. We prove that any branching program in the random-query model with the independent distribution that computes $f$ in the bounded-error regime satisfies $T \cdot S \geq \tilde{\Omega}(n) \cdot \mathrm{I}[f]$ (where $\tilde{\Omega}$ hides logarithmic factors in $n$). Moreover, we prove a quadratic time-space lower bound for the Majority function, even though its total influence is $\Theta(\sqrt{n})$.
Our proof is based on a reduction from a communication complexity problem.

We study the advantages of quantum communication models over classical communication models that are equipped with a limited number of qubits of entanglement. In this direction, we give explicit partial functions on $n$ bits for which reducing the entanglement increases the classical communication complexity exponentially. Our separations are as follows. For every $k\ge 1$:
$Q\|^*$ versus $R2^*$: We show that quantum simultaneous protocols with $\tilde{\Theta}(k^5 \log^3 n)$ qubits of entanglement can exponentially outperform two-way randomized protocols with $O(k)$ qubits of entanglement. This resolves an open problem from [Gav08] and improves the state-of-the-art separations between quantum simultaneous protocols with entanglement and two-way randomized protocols without entanglement [Gav19, GRT22].
$R\|^*$ versus $Q\|^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k \log n)$ qubits of entanglement can exponentially outperform quantum simultaneous protocols with $O(k)$ qubits of entanglement, resolving an open question from [GKRW06, Gav19]. The best result prior to our work was a relational separation against protocols without entanglement [GKRW06].
$R\|^*$ versus $R1^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k\log n)$ qubits of entanglement can exponentially outperform randomized one-way protocols with $O(k)$ qubits of entanglement. Prior to our work, only a relational separation was known [Gav08].

We study the advantages of quantum communication models over classical communication models that are equipped with a limited number of qubits of entanglement. In this direction, we give explicit partial functions on $n$ bits for which reducing the entanglement increases the classical communication complexity exponentially. Our separations are as follows. For every $k\ge 1$:
$Q\|^*$ versus $R2^*$: We show that quantum simultaneous protocols with $\tilde{\Theta}(k^5 \log^3 n)$ qubits of entanglement can exponentially outperform two-way randomized protocols with $O(k)$ qubits of entanglement. This resolves an open problem from [Gav08] and improves the state-of-the-art separations between quantum simultaneous protocols with entanglement and two-way randomized protocols without entanglement [Gav19, GRT22].
$R\|^*$ versus $Q\|^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k \log n)$ qubits of entanglement can exponentially outperform quantum simultaneous protocols with $O(k)$ qubits of entanglement, resolving an open question from [GKRW06, Gav19]. The best result prior to our work was a relational separation against protocols without entanglement [GKRW06].
$R\|^*$ versus $R1^*$: We show that classical simultaneous protocols with $\tilde{\Theta}(k\log n)$ qubits of entanglement can exponentially outperform randomized one-way protocols with $O(k)$ qubits of entanglement. Prior to our work, only a relational separation was known [Gav08].