Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Friday, March 21

Dowling Fellowship at Cambridge (apply by April 7, 2025)

from CCI: jobs

Tom Gur and Prakash Murali are seeking to jointly recruit a postdoctoral researcher–Dowling fellow–at Cambridge, to conduct research focused on quantum algorithms, complexity, error correction, and architecture. The position has funding for up to 5 years. Website: www.jobs.cam.ac.uk/job/50485/ Email: tom.gur@cl.cam.ac.uk

Tom Gur and Prakash Murali are seeking to jointly recruit a postdoctoral researcher–Dowling fellow–at Cambridge, to conduct research focused on quantum algorithms, complexity, error correction, and architecture. The position has funding for up to 5 years.

Website: http://www.jobs.cam.ac.uk/job/50485/
Email: tom.gur@cl.cam.ac.uk

By shacharlovett

TR25-033 | Boolean Circuit Complexity and Two-Dimensional Cover Problems | Bruno Pasqualotto Cavalar, Igor Oliveira

from ECCC Papers

We reduce the problem of proving deterministic and nondeterministic Boolean circuit size lower bounds to the analysis of certain two-dimensional combinatorial cover problems. This is obtained by combining results of Razborov (1989), Karchmer (1993), and Wigderson (1993) in the context of the fusion method for circuit lower bounds with the graph complexity framework of Pudlák, Rödl, and Savický (1988). For convenience, we formalize these ideas in the more general setting of “discrete complexity”, i.e., the natural set-theoretic formulation of circuit complexity, variants of communication complexity, graph complexity, and other measures. We show that random graphs have linear graph cover complexity, and that explicit super-logarithmic graph cover complexity lower bounds would have significant consequences in circuit complexity. We then use discrete complexity, the fusion method, and a result of Karchmer and Wigderson (1993) to introduce nondeterministic graph complexity. This allows us to establish a connection between graph complexity and nondeterministic circuit complexity. Finally, complementing these results, we describe an exact characterization of the power of the fusion method in discrete complexity. This is obtained via an adaptation of a result of Nakayama and Maruoka (1995) that connects the fusion method to the complexity of “cyclic” Boolean circuits, which generalize the computation of a circuit by allowing cycles in its specification.

We reduce the problem of proving deterministic and nondeterministic Boolean circuit size lower bounds to the analysis of certain two-dimensional combinatorial cover problems. This is obtained by combining results of Razborov (1989), Karchmer (1993), and Wigderson (1993) in the context of the fusion method for circuit lower bounds with the graph complexity framework of Pudlák, Rödl, and Savický (1988). For convenience, we formalize these ideas in the more general setting of “discrete complexity”, i.e., the natural set-theoretic formulation of circuit complexity, variants of communication complexity, graph complexity, and other measures. We show that random graphs have linear graph cover complexity, and that explicit super-logarithmic graph cover complexity lower bounds would have significant consequences in circuit complexity. We then use discrete complexity, the fusion method, and a result of Karchmer and Wigderson (1993) to introduce nondeterministic graph complexity. This allows us to establish a connection between graph complexity and nondeterministic circuit complexity. Finally, complementing these results, we describe an exact characterization of the power of the fusion method in discrete complexity. This is obtained via an adaptation of a result of Nakayama and Maruoka (1995) that connects the fusion method to the complexity of “cyclic” Boolean circuits, which generalize the computation of a circuit by allowing cycles in its specification.

TR25-032 | Graph Colouring Is Hard on Average for Polynomial Calculus and Nullstellensatz | Jonas Conneryd, Susanna F. de Rezende, Jakob Nordström, Shuo Pang, Kilian Risse

from ECCC Papers

We prove that polynomial calculus (and hence also Nullstellensatz) over any field requires linear degree to refute that sparse random regular graphs, as well as sparse Erd?s-Rényi random graphs, are 3-colourable. Using the known relation between size and degree for polynomial calculus proofs, this implies strongly exponential lower bounds on proof size.

We prove that polynomial calculus (and hence also Nullstellensatz) over any field requires linear degree to refute that sparse random regular graphs, as well as sparse Erd?s-Rényi random graphs, are 3-colourable. Using the known relation between size and degree for polynomial calculus proofs, this implies strongly exponential lower bounds on proof size.

Recognizing and Realizing Temporal Reachability Graphs

from arXiv: Computational Complexity

Authors: Thomas Erlebach, Othon Michail, Nils Morawietz

A temporal graph $\mathcal{G}=(G,\lambda)$ can be represented by an underlying graph $G=(V,E)$ together with a function $\lambda$ that assigns to each edge $e\in E$ the set of time steps during which $e$ is present. The reachability graph of $\mathcal{G}$ is the directed graph $D=(V,A)$ with $(u,v)\in A$ if only if there is a temporal path from $u$ to $v$. We study the Reachability Graph Realizability (RGR) problem that asks whether a given directed graph $D=(V,A)$ is the reachability graph of some temporal graph. The question can be asked for undirected or directed temporal graphs, for reachability defined via strict or non-strict temporal paths, and with or without restrictions on $\lambda$ (proper, simple, or happy). Answering an open question posed by Casteigts et al. (Theoretical Computer Science 991 (2024)), we show that all variants of the problem are NP-complete, except for two variants that become trivial in the directed case. For undirected temporal graphs, we consider the complexity of the problem with respect to the solid graph, that is, the graph containing all edges that could potentially receive a label in any realization. We show that the RGR problem is polynomial-time solvable if the solid graph is a tree and fixed-parameter tractable with respect to the feedback edge set number of the solid graph. As we show, the latter parameter can presumably not be replaced by smaller parameters like feedback vertex set or treedepth, since the problem is W[2]-hard with respect to these parameters.

Authors: Thomas Erlebach, Othon Michail, Nils Morawietz

A temporal graph $\mathcal{G}=(G,\lambda)$ can be represented by an underlying graph $G=(V,E)$ together with a function $\lambda$ that assigns to each edge $e\in E$ the set of time steps during which $e$ is present. The reachability graph of $\mathcal{G}$ is the directed graph $D=(V,A)$ with $(u,v)\in A$ if only if there is a temporal path from $u$ to $v$. We study the Reachability Graph Realizability (RGR) problem that asks whether a given directed graph $D=(V,A)$ is the reachability graph of some temporal graph. The question can be asked for undirected or directed temporal graphs, for reachability defined via strict or non-strict temporal paths, and with or without restrictions on $\lambda$ (proper, simple, or happy). Answering an open question posed by Casteigts et al. (Theoretical Computer Science 991 (2024)), we show that all variants of the problem are NP-complete, except for two variants that become trivial in the directed case. For undirected temporal graphs, we consider the complexity of the problem with respect to the solid graph, that is, the graph containing all edges that could potentially receive a label in any realization. We show that the RGR problem is polynomial-time solvable if the solid graph is a tree and fixed-parameter tractable with respect to the feedback edge set number of the solid graph. As we show, the latter parameter can presumably not be replaced by smaller parameters like feedback vertex set or treedepth, since the problem is W[2]-hard with respect to these parameters.

Query-Efficient Fixpoints of $\ell_p$-Contractions

from arXiv: Computational Geometry

Authors: Sebastian Haslebacher, Jonas Lill, Patrick Schnider, Simon Weber

We prove that an $\epsilon$-approximate fixpoint of a map $f:[0,1]^d\rightarrow [0,1]^d$ can be found with $\mathcal{O}(d^2(\log\frac{1}{\epsilon} + \log\frac{1}{1-\lambda}))$ queries to $f$ if $f$ is $\lambda$-contracting with respect to an $\ell_p$-metric for some $p\in [1,\infty)\cup\{\infty\}$. This generalizes a recent result of Chen, Li, and Yannakakis [STOC'24] from the $\ell_\infty$-case to all $\ell_p$-metrics. Previously, all query upper bounds for $p\in [1,\infty) \setminus \{2\}$ were either exponential in $d$, $\log\frac{1}{\epsilon}$, or $\log\frac{1}{1-\lambda}$. Chen, Li, and Yannakakis also show how to ensure that all queries to $f$ lie on a discrete grid of limited granularity in the $\ell_\infty$-case. We provide such a rounding for the $\ell_1$-case, placing an appropriately defined version of the $\ell_1$-case in $\textsf{FP}^{dt}$. To prove our results, we introduce the notion of $\ell_p$-halfspaces and generalize the classical centerpoint theorem from discrete geometry: for any $p \in [1, \infty) \cup \{\infty\}$ and any mass distribution (or point set), we prove that there exists a centerpoint $c$ such that every $\ell_p$-halfspace defined by $c$ and a normal vector contains at least a $\frac{1}{d+1}$-fraction of the mass (or points).

Authors: Sebastian Haslebacher, Jonas Lill, Patrick Schnider, Simon Weber

We prove that an $\epsilon$-approximate fixpoint of a map $f:[0,1]^d\rightarrow [0,1]^d$ can be found with $\mathcal{O}(d^2(\log\frac{1}{\epsilon} + \log\frac{1}{1-\lambda}))$ queries to $f$ if $f$ is $\lambda$-contracting with respect to an $\ell_p$-metric for some $p\in [1,\infty)\cup\{\infty\}$. This generalizes a recent result of Chen, Li, and Yannakakis [STOC'24] from the $\ell_\infty$-case to all $\ell_p$-metrics. Previously, all query upper bounds for $p\in [1,\infty) \setminus \{2\}$ were either exponential in $d$, $\log\frac{1}{\epsilon}$, or $\log\frac{1}{1-\lambda}$. Chen, Li, and Yannakakis also show how to ensure that all queries to $f$ lie on a discrete grid of limited granularity in the $\ell_\infty$-case. We provide such a rounding for the $\ell_1$-case, placing an appropriately defined version of the $\ell_1$-case in $\textsf{FP}^{dt}$. To prove our results, we introduce the notion of $\ell_p$-halfspaces and generalize the classical centerpoint theorem from discrete geometry: for any $p \in [1, \infty) \cup \{\infty\}$ and any mass distribution (or point set), we prove that there exists a centerpoint $c$ such that every $\ell_p$-halfspace defined by $c$ and a normal vector contains at least a $\frac{1}{d+1}$-fraction of the mass (or points).

Neural Lyapunov Function Approximation with Self-Supervised Reinforcement Learning

from arXiv: Computational Geometry

Authors: Luc McCutcheon, Bahman Gharesifard, Saber Fallah

Control Lyapunov functions are traditionally used to design a controller which ensures convergence to a desired state, yet deriving these functions for nonlinear systems remains a complex challenge. This paper presents a novel, sample-efficient method for neural approximation of nonlinear Lyapunov functions, leveraging self-supervised Reinforcement Learning (RL) to enhance training data generation, particularly for inaccurately represented regions of the state space. The proposed approach employs a data-driven World Model to train Lyapunov functions from off-policy trajectories. The method is validated on both standard and goal-conditioned robotic tasks, demonstrating faster convergence and higher approximation accuracy compared to the state-of-the-art neural Lyapunov approximation baseline. The code is available at: github.com/CAV-Research-Lab/SACLA.git

Authors: Luc McCutcheon, Bahman Gharesifard, Saber Fallah

Control Lyapunov functions are traditionally used to design a controller which ensures convergence to a desired state, yet deriving these functions for nonlinear systems remains a complex challenge. This paper presents a novel, sample-efficient method for neural approximation of nonlinear Lyapunov functions, leveraging self-supervised Reinforcement Learning (RL) to enhance training data generation, particularly for inaccurately represented regions of the state space. The proposed approach employs a data-driven World Model to train Lyapunov functions from off-policy trajectories. The method is validated on both standard and goal-conditioned robotic tasks, demonstrating faster convergence and higher approximation accuracy compared to the state-of-the-art neural Lyapunov approximation baseline. The code is available at: https://github.com/CAV-Research-Lab/SACLA.git

A parallel algorithm for the odd two-face shortest k-disjoint path problem

from arXiv: Data Structures and Algorithms

Authors: Srijan Chakraborty, Samir Datta

The shortest Disjoint Path problem (SDPP) requires us to find pairwise vertex disjoint paths between k designated pairs of terminal vertices such that the sum of the path lengths is minimum. The focus here is on SDPP restricted to planar graphs where all terminals are arbitrarily partitioned over two distinct faces with the additional restriction that each face is required to contain an odd number of terminals. We call this problem the Odd two-face planar SDPP. It is shown that this problem is solvable in randomized polynomial time and even in RNC. This is the first parallel (or even polynomial time) solution for the problem. Our algorithm combines ideas from the randomized solution for 2-SDPP by Bj\"orklund and Huslfeldt with its parallelization by Datta and Jaiswal along with the deterministic algorithm for One-face planar SDPP by Datta, Iyer, Kulkarni and Mukherjee. The proof uses a combination of two involutions to reduce a system of linear equations modulo a power of 2 to a system of triangular form that is, therefore, invertible. This, in turn, is proved by showing that the matrix of the equations, can be interpreted as (the adjacency matrix of) a directed acyclic graph (DAG). While our algorithm is primarily algebraic the proof remains combinatorial. We also give a parallel algorithm for the (A + B)-SDPP introduced by Hirai and Namba.

Authors: Srijan Chakraborty, Samir Datta

The shortest Disjoint Path problem (SDPP) requires us to find pairwise vertex disjoint paths between k designated pairs of terminal vertices such that the sum of the path lengths is minimum. The focus here is on SDPP restricted to planar graphs where all terminals are arbitrarily partitioned over two distinct faces with the additional restriction that each face is required to contain an odd number of terminals. We call this problem the Odd two-face planar SDPP. It is shown that this problem is solvable in randomized polynomial time and even in RNC. This is the first parallel (or even polynomial time) solution for the problem. Our algorithm combines ideas from the randomized solution for 2-SDPP by Bj\"orklund and Huslfeldt with its parallelization by Datta and Jaiswal along with the deterministic algorithm for One-face planar SDPP by Datta, Iyer, Kulkarni and Mukherjee. The proof uses a combination of two involutions to reduce a system of linear equations modulo a power of 2 to a system of triangular form that is, therefore, invertible. This, in turn, is proved by showing that the matrix of the equations, can be interpreted as (the adjacency matrix of) a directed acyclic graph (DAG). While our algorithm is primarily algebraic the proof remains combinatorial. We also give a parallel algorithm for the (A + B)-SDPP introduced by Hirai and Namba.

Near-Linear Runtime for a Classical Matrix Preconditioning Algorithm

from arXiv: Data Structures and Algorithms

Authors: Xufeng Cai, Jason M. Altschuler, Jelena Diakonikolas

In 1960, Osborne proposed a simple iterative algorithm for matrix balancing with outstanding numerical performance. Today, it is the default preconditioning procedure before eigenvalue computation and other linear algebra subroutines in mainstream software packages such as Python, Julia, MATLAB, EISPACK, LAPACK, and more. Despite its widespread usage, Osborne's algorithm has long resisted theoretical guarantees for its runtime: the first polynomial-time guarantees were obtained only in the past decade, and recent near-linear runtimes remain confined to variants of Osborne's algorithm with important differences that make them simpler to analyze but empirically slower. In this paper, we address this longstanding gap between theory and practice by proving that Osborne's original algorithm -- the de facto preconditioner in practice -- in fact has a near-linear runtime. This runtime guarantee (1) is optimal in the input size up to at most a single logarithm, (2) is the first runtime for Osborne's algorithm that does not dominate the runtime of downstream tasks like eigenvalue computation, and (3) improves upon the theoretical runtimes for all other variants of Osborne's algorithm.

Authors: Xufeng Cai, Jason M. Altschuler, Jelena Diakonikolas

In 1960, Osborne proposed a simple iterative algorithm for matrix balancing with outstanding numerical performance. Today, it is the default preconditioning procedure before eigenvalue computation and other linear algebra subroutines in mainstream software packages such as Python, Julia, MATLAB, EISPACK, LAPACK, and more. Despite its widespread usage, Osborne's algorithm has long resisted theoretical guarantees for its runtime: the first polynomial-time guarantees were obtained only in the past decade, and recent near-linear runtimes remain confined to variants of Osborne's algorithm with important differences that make them simpler to analyze but empirically slower. In this paper, we address this longstanding gap between theory and practice by proving that Osborne's original algorithm -- the de facto preconditioner in practice -- in fact has a near-linear runtime. This runtime guarantee (1) is optimal in the input size up to at most a single logarithm, (2) is the first runtime for Osborne's algorithm that does not dominate the runtime of downstream tasks like eigenvalue computation, and (3) improves upon the theoretical runtimes for all other variants of Osborne's algorithm.

Dispersion is (Almost) Optimal under (A)synchrony

from arXiv: Data Structures and Algorithms

Authors: Ajay D. Kshemkalyani, Manish Kumar, Anisur Rahaman Molla, Gokarna Sharma

The dispersion problem has received much attention recently in the distributed computing literature. In this problem, $k\leq n$ agents placed initially arbitrarily on the nodes of an $n$-node, $m$-edge anonymous graph of maximum degree $\Delta$ have to reposition autonomously to reach a configuration in which each agent is on a distinct node of the graph. Dispersion is interesting as well as important due to its connections to many fundamental coordination problems by mobile agents on graphs, such as exploration, scattering, load balancing, relocation of self-driven electric cars (robots) to recharge stations (nodes), etc. The objective has been to provide a solution that optimizes simultaneously time and memory complexities. There exist graphs for which the lower bound on time complexity is $\Omega(k)$. Memory complexity is $\Omega(\log k)$ per agent independent of graph topology. The state-of-the-art algorithms have (i) time complexity $O(k\log^2k)$ and memory complexity $O(\log(k+\Delta))$ under the synchronous setting [DISC'24] and (ii) time complexity $O(\min\{m,k\Delta\})$ and memory complexity $O(\log(k+\Delta))$ under the asynchronous setting [OPODIS'21]. In this paper, we improve substantially on this state-of-the-art. Under the synchronous setting as in [DISC'24], we present the first optimal $O(k)$ time algorithm keeping memory complexity $O(\log (k+\Delta))$. Under the asynchronous setting as in [OPODIS'21], we present the first algorithm with time complexity $O(k\log k)$ keeping memory complexity $O(\log (k+\Delta))$, which is time-optimal within an $O(\log k)$ factor despite asynchrony. Both results were obtained through novel techniques to quickly find empty nodes to settle agents, which may be of independent interest.

Authors: Ajay D. Kshemkalyani, Manish Kumar, Anisur Rahaman Molla, Gokarna Sharma

The dispersion problem has received much attention recently in the distributed computing literature. In this problem, $k\leq n$ agents placed initially arbitrarily on the nodes of an $n$-node, $m$-edge anonymous graph of maximum degree $\Delta$ have to reposition autonomously to reach a configuration in which each agent is on a distinct node of the graph. Dispersion is interesting as well as important due to its connections to many fundamental coordination problems by mobile agents on graphs, such as exploration, scattering, load balancing, relocation of self-driven electric cars (robots) to recharge stations (nodes), etc. The objective has been to provide a solution that optimizes simultaneously time and memory complexities. There exist graphs for which the lower bound on time complexity is $\Omega(k)$. Memory complexity is $\Omega(\log k)$ per agent independent of graph topology. The state-of-the-art algorithms have (i) time complexity $O(k\log^2k)$ and memory complexity $O(\log(k+\Delta))$ under the synchronous setting [DISC'24] and (ii) time complexity $O(\min\{m,k\Delta\})$ and memory complexity $O(\log(k+\Delta))$ under the asynchronous setting [OPODIS'21]. In this paper, we improve substantially on this state-of-the-art. Under the synchronous setting as in [DISC'24], we present the first optimal $O(k)$ time algorithm keeping memory complexity $O(\log (k+\Delta))$. Under the asynchronous setting as in [OPODIS'21], we present the first algorithm with time complexity $O(k\log k)$ keeping memory complexity $O(\log (k+\Delta))$, which is time-optimal within an $O(\log k)$ factor despite asynchrony. Both results were obtained through novel techniques to quickly find empty nodes to settle agents, which may be of independent interest.

Thursday, March 20

Stochastic Coherence

from Ben Recht

Deriving the laws of probability through superforecasting

In Dawid’s “Well-Calibrated Bayesian,” he nonchalantly throws around the word “coherent” without definition.

“a coherent Bayesian expects to be well calibrated.”

“...probability forecasting fits neatly into the general Bayesian world-view as conceived by de Finetti (1975). The coherent subjectivist Bayesian can be shown to have a joint probability distribution over all conceivably observable quantities.”

“the coherent sequential forecaster believes that he will be empirically well calibrated.”

What does coherent mean in these quotations? It turns out there are many definitions that are not precisely equivalent. One, in particular, is tied to forecasting and provides an argument for why any forecaster should be a subjective Bayesian. Let me present that argument, and you can tell me if you buy it.

To begin, we have to talk about logical probability again. One of the main interpretations of probability is a numerical quantification of belief. The ideal rational thinker is supposed to be able to assign a number to any metalinguistic statement, with 0 meaning they believe it’s false and 1 meaning they believe it’s true. Let’s call these numbers credences. A numerical system of credences is coherent0 if it assigns a credence of 1 to logical tautologies and if, for mutually exclusive statements A and B, the credence of “A or B” is equal to the credence of A plus the credence of B. In other words, a set of credences is coherent0 if it obeys the axioms of probability.

Bruno de Finetti thought there should be a way to derive these axioms of probability as an inevitable consequence of being rational. De Finetti initially proposed a different property of credences: credences were the values of bets a person would be willing to take about the truths of logical statements. A set of credences was coherent1 if there was no series of bets where a person was guaranteed to lose money when wagering on their beliefs no matter what the truth value of the propositions was. That is, a system was coherent1 if it was impossible to create a set of bets that a person thought had positive expected value but would always lose no matter what.1 Via a relatively simple argument, you can show that any system that is coherent1 also obeys the axioms of probability. If you are willing to bet against logical tautologies or for negative probabilities, you lose money. A coherent1 system is coherent0.

Despite its origins at the dice table, I’ve always found the motivation of probability through gambling distasteful. Especially once probability and rationality became inextricably linked after World War II. This mindset assumes that people reason only to wager and that gambling is somehow the purpose of life. The terminus of this mindset leads to a world of ubiquitous boom-and-bust gambling meme scams. It's a world where everybody thinks their cleverness will get them to outwager the other guy, never admitting that almost everyone loses while an oligarchical house wins and wins. Looking out my window in the East Bay, let me tell you that that world sucks.

Lucky for me, it turns out that coherence can be motivated in lots of different ways, and a particularly appropriate one for this blog is in terms of prediction.

Instead of our degenerate rational gambler, let’s envision a hypothetical rational forecaster who wants to assign numerical values to their beliefs of outcomes. They will be evaluated in terms of the accuracy of their predictions. The evaluation will let them express their confidence by providing numbers between 0 and 1. One such potential evaluation is through the Brier Score.

A set of credences is coherent2 if there is no set of events where there is another forecast that achieves a lower Brier score no matter the outcome of the events.

Let me give an example of an incoherent2 forecast: predicting the probability of rain tomorrow to be 70% and the probability of no rain being 40%. In this case, the Brier score if it rains is

(1-.7)^2 + (0-.4)^2 = 0.25

If it doesn’t rain, the score is

(0-.7)^2 + (1-.4)^2 = 0.85

However, the competing forecast of 65% for rain and 35% for no rain has Brier score

(1-.65)^2 + (0-.35)^2 = 0.245

if it rains and

(0-.65)^2 + (1-.35)^2 = 0.845

if it doesn’t rain. These scores are lower than the incoherent2 forecast, whether it rains or not. De Finetti says that the second forecast dominates the first forecast as it receives a better score no matter what the future brings.

I like this formulation a lot. Perhaps unsurprisingly, a coherent2 forecast obeys the axioms of probability. A coherent2 forecast must assign 1 to events that must happen. The example I gave above is of a more subtle and interesting property. Coherent2 forecasts obey the law of subadditivity: for any two mutually exclusive events A and B, a coherent2 forecast for “A or B” equals the forecast for A plus the forecast for B. Subadditivity is a bit trickier, and de Finetti proves it via a beautiful convex analysis argument. Essentially, if you have a forecast function that violates subadditivity, you can project it onto the convex hull of the indicator function of the events. This projection will have a lower Brier score no matter what values the events take. This is how I came up with the (.65,.35) forecast to dominate the (.7,.4) forecast.

The following picture, which I recreated from some great slides of Teddy Seidenfeld, illustrates why subadditivity must hold.

For mutually exclusive, exhaustive events, there are only two potential outcomes: (0,1) and (1,0). The line in this figure draws the convex hull between (0,1) and (1,0). Any point in this convex hull will be of the form (p,1-p) for some p in between 0 and 1. If your forecast is outside the convex hull, then the projection of that forecast onto the convex hull will be closer to both (0,1) and (1,0). This means the projected forecast must have a lower Brier Score no matter what the future holds.

While De Finetti initially proposed Brier Scoring, the scoring rule itself isn’t essential. Building on the work of Lindley, Predd et al. showed that you can replace the Brier score with any continuous strictly proper scoring rule. A coherent2 forecast is never dominated by another forecast under any continuous strictly proper scoring rule. This is because the geometric argument I sketched above holds not only for the Euclidean distance but for any Bregman divergence. Math is fun.

If we pick one of the scoring rules from Monday’s post, Predd et al.’s result means that coherent2 contains coherent1. This shouldn’t be surprising, as bets are also predictions. But you don’t need to equate evaluation with potential gambling returns. What de Finetti’s argument implies is that no matter what you do, if your predictions will be strictly properly scored, you are always guaranteed a better score if you make probabilistic predictions.

Lindley takes this a step further to argue against fuzzy logic and frequentism.2 He writes, “only probabilistic descriptions of uncertainty are reasonable.”

Lindley gets over his skis. The precise statement is uncertain forecasts of logical outcomes need to be probabilistic if they are evaluated using systems that favor probabilities. Is this circular reasoning? A little!

Probabilistic descriptions are reasonable if your evaluation is a proper scoring rule. Machines scored by machines must speak in probabilities. But might we evaluate the value of forecasts and beliefs differently? Might they be assessed in nonquantifiable ways? Might they be evaluated singularly with respect to context? If so, it might be just fine to be incoherent.

Subscribe now

1

Series of bets where the incoherent1 person always loses are called “Dutch Books” in philosophy. I’m pretty sure it’s connotation is racist.

2

In the acknowledgments, Lindley thanks Lotfi Zadeh, who invited him to present a seminar at Berkeley on the “relationship between probability and the ideas of fuzzy logic.” Zadeh challenged Lindley to find a scoring rule that would favor the laws of fuzzy logic. Lindley’s paper shows no such rule exists.

By Ben Recht

2 PhD positions at KTH Royal Institute of Technology (apply by April 3, 2025)

from CCI: jobs

The students will work in the group of assistant professor Ioana-Oriana Bercea. The research team has a focus on randomized algorithms and probabilistic data structures such as data sketches, Bloom filters, and hash functions. We are also interested in implementing and developing efficient data structures for subroutines that appear in data science algorithms. Website: www.kth.se/lediga-jobb/803006?l=en […]

The students will work in the group of assistant professor Ioana-Oriana Bercea. The research team has a focus on randomized algorithms and probabilistic data structures such as data sketches, Bloom filters, and hash functions. We are also interested in implementing and developing efficient data structures for subroutines that appear in data science algorithms.

Website: https://www.kth.se/lediga-jobb/803006?l=en
Email: bercea@kth.se

By shacharlovett

Unique Hard Attention: A Tale of Two Sides

from arXiv: Computational Complexity

Authors: Selim Jerad, Anej Svete, Jiaoda Li, Ryan Cotterell

Understanding the expressive power of transformers has recently attracted attention, as it offers insights into their abilities and limitations. Many studies analyze unique hard attention transformers, where attention selects a single position that maximizes the attention scores. When multiple positions achieve the maximum score, either the rightmost or the leftmost of those is chosen. In this paper, we highlight the importance of this seeming triviality. Recently, finite-precision transformers with both leftmost- and rightmost-hard attention were shown to be equivalent to Linear Temporal Logic (LTL). We show that this no longer holds with only leftmost-hard attention -- in that case, they correspond to a \emph{strictly weaker} fragment of LTL. Furthermore, we show that models with leftmost-hard attention are equivalent to \emph{soft} attention, suggesting they may better approximate real-world transformers than right-attention models. These findings refine the landscape of transformer expressivity and underscore the role of attention directionality.

Authors: Selim Jerad, Anej Svete, Jiaoda Li, Ryan Cotterell

Understanding the expressive power of transformers has recently attracted attention, as it offers insights into their abilities and limitations. Many studies analyze unique hard attention transformers, where attention selects a single position that maximizes the attention scores. When multiple positions achieve the maximum score, either the rightmost or the leftmost of those is chosen. In this paper, we highlight the importance of this seeming triviality. Recently, finite-precision transformers with both leftmost- and rightmost-hard attention were shown to be equivalent to Linear Temporal Logic (LTL). We show that this no longer holds with only leftmost-hard attention -- in that case, they correspond to a \emph{strictly weaker} fragment of LTL. Furthermore, we show that models with leftmost-hard attention are equivalent to \emph{soft} attention, suggesting they may better approximate real-world transformers than right-attention models. These findings refine the landscape of transformer expressivity and underscore the role of attention directionality.

A Space-Efficient Algorithm for Longest Common Almost Increasing Subsequence of Two Sequences

from arXiv: Data Structures and Algorithms

Authors: Md Tanzeem Rahat, Md. Manzurul Hasan, Debajyoti Mondal

Let $A$ and $B$ be two number sequences of length $n$ and $m$, respectively, where $m\le n$. Given a positive number $\delta$, a common almost increasing sequence $s_1\ldots s_k$ is a common subsequence for both $A$ and $B$ such that for all $2\le i\le k$, $s_i+\delta > \max_{1\le j < i} s_j$. The LCaIS problem seeks to find the longest common almost increasing subsequence (LCaIS) of $A$ and $B$. An LCaIS can be computed in $O(nm\ell)$ time and $O(nm)$ space [Ta, Shieh, Lu (TCS 2021)], where $\ell$ is the length of the LCaIS of $A$ and $B$. In this paper we first give an $O(nm\ell)$-time and $O(n+m\ell)$-space algorithm to find LCaIS, which improves the space complexity. We then design an $O((n+m)\log n +\mathcal{M}\log \mathcal{M} + \mathcal{C}\ell)$-time and $O(\mathcal{M}(\ell+\log \mathcal{M}))$-space algorithm, which is faster when the number of matching pairs $\mathcal{M}$ and the number of compatible matching pairs $\mathcal{C}$ are in $o(nm/\log m)$.

Authors: Md Tanzeem Rahat, Md. Manzurul Hasan, Debajyoti Mondal

Let $A$ and $B$ be two number sequences of length $n$ and $m$, respectively, where $m\le n$. Given a positive number $\delta$, a common almost increasing sequence $s_1\ldots s_k$ is a common subsequence for both $A$ and $B$ such that for all $2\le i\le k$, $s_i+\delta > \max_{1\le j < i} s_j$. The LCaIS problem seeks to find the longest common almost increasing subsequence (LCaIS) of $A$ and $B$. An LCaIS can be computed in $O(nm\ell)$ time and $O(nm)$ space [Ta, Shieh, Lu (TCS 2021)], where $\ell$ is the length of the LCaIS of $A$ and $B$. In this paper we first give an $O(nm\ell)$-time and $O(n+m\ell)$-space algorithm to find LCaIS, which improves the space complexity. We then design an $O((n+m)\log n +\mathcal{M}\log \mathcal{M} + \mathcal{C}\ell)$-time and $O(\mathcal{M}(\ell+\log \mathcal{M}))$-space algorithm, which is faster when the number of matching pairs $\mathcal{M}$ and the number of compatible matching pairs $\mathcal{C}$ are in $o(nm/\log m)$.

Online Matching under KIID: Enhanced Competitive Analysis through Ordinary Differential Equation Systems

from arXiv: Data Structures and Algorithms

Authors: Pan Xu

We consider the (offline) vertex-weighted Online Matching problem under Known Identical and Independent Distributions (KIID) with integral arrival rates. We propose a meta-algorithm, denoted as $\mathsf{RTB}$, featuring Real-Time Boosting, where the core idea is as follows. Consider a bipartite graph $G=(I,J,E)$, where $I$ and $J$ represent the sets of offline and online nodes, respectively. Let $\mathbf{x}=(x_{ij}) \in [0,1]^{|E|}$, where $x_{ij}$ for $(i,j) \in E$ represents the probability that edge $(i,j)$ is matched in an offline optimal policy (a.k.a. a clairvoyant optimal policy), typically obtained by solving a benchmark linear program (LP). Upon the arrival of an online node $j$ at some time $t \in [0,1]$, $\mathsf{RTB}$ samples a safe (available) neighbor $i \in I_{j,t}$ with probability $x_{ij}/\sum_{i' \in I_{j,t}} x_{i'j}$ and matches it to $j$, where $I_{j,t}$ denotes the set of safe offline neighbors of $j$. In this paper, we showcase the power of Real-Time Boosting by demonstrating that $\mathsf{RTB}$, when fed with $\mathbf{X}^*$, achieves a competitive ratio of $(2e^4 - 8e^2 + 21e - 27) / (2e^4) \approx 0.7341$, where $\mathbf{X}^* \in \{0,1/3,2/3\}^{|E|}$ is a random vector obtained by applying a customized dependent rounding technique due to Brubach et al. (Algorithmica, 2020). Our result improves upon the state-of-the-art ratios of 0.7299 by Brubach et al. (Algorithmica, 2020) and 0.725 by Jaillet and Lu (Mathematics of Operations Research, 2013). Notably, this improvement does not stem from the algorithm itself but from a new competitive analysis methodology: We introduce an Ordinary Differential Equation (ODE) system-based approach that enables a {holistic} analysis of $\mathsf{RTB}$. We anticipate that utilizing other well-structured vectors from more advanced rounding techniques could potentially yield further improvements in the competitiveness.

Authors: Pan Xu

We consider the (offline) vertex-weighted Online Matching problem under Known Identical and Independent Distributions (KIID) with integral arrival rates. We propose a meta-algorithm, denoted as $\mathsf{RTB}$, featuring Real-Time Boosting, where the core idea is as follows. Consider a bipartite graph $G=(I,J,E)$, where $I$ and $J$ represent the sets of offline and online nodes, respectively. Let $\mathbf{x}=(x_{ij}) \in [0,1]^{|E|}$, where $x_{ij}$ for $(i,j) \in E$ represents the probability that edge $(i,j)$ is matched in an offline optimal policy (a.k.a. a clairvoyant optimal policy), typically obtained by solving a benchmark linear program (LP). Upon the arrival of an online node $j$ at some time $t \in [0,1]$, $\mathsf{RTB}$ samples a safe (available) neighbor $i \in I_{j,t}$ with probability $x_{ij}/\sum_{i' \in I_{j,t}} x_{i'j}$ and matches it to $j$, where $I_{j,t}$ denotes the set of safe offline neighbors of $j$. In this paper, we showcase the power of Real-Time Boosting by demonstrating that $\mathsf{RTB}$, when fed with $\mathbf{X}^*$, achieves a competitive ratio of $(2e^4 - 8e^2 + 21e - 27) / (2e^4) \approx 0.7341$, where $\mathbf{X}^* \in \{0,1/3,2/3\}^{|E|}$ is a random vector obtained by applying a customized dependent rounding technique due to Brubach et al. (Algorithmica, 2020). Our result improves upon the state-of-the-art ratios of 0.7299 by Brubach et al. (Algorithmica, 2020) and 0.725 by Jaillet and Lu (Mathematics of Operations Research, 2013). Notably, this improvement does not stem from the algorithm itself but from a new competitive analysis methodology: We introduce an Ordinary Differential Equation (ODE) system-based approach that enables a {holistic} analysis of $\mathsf{RTB}$. We anticipate that utilizing other well-structured vectors from more advanced rounding techniques could potentially yield further improvements in the competitiveness.

Fine-Grained Complexity of Computing Degree-Constrained Spanning Trees

from arXiv: Data Structures and Algorithms

Authors: Narek Bojikian, Alexander Firbas, Robert Ganian, Hung P. Hoang, Krisztina Szilágyi

We investigate the computation of minimum-cost spanning trees satisfying prescribed vertex degree constraints: Given a graph $G$ and a constraint function $D$, we ask for a (minimum-cost) spanning tree $T$ such that for each vertex $v$, $T$ achieves a degree specified by $D(v)$. Specifically, we consider three kinds of constraint functions ordered by their generality -- $D$ may either assign each vertex to a list of admissible degrees, an upper bound on the degrees, or a specific degree. Using a combination of novel techniques and state-of-the-art machinery, we obtain an almost-complete overview of the fine-grained complexity of these problems taking into account the most classical graph parameters of the input graph $G$. In particular, we present SETH-tight upper and lower bounds for these problems when parameterized by the pathwidth and cutwidth, an ETH-tight algorithm parameterized by the cliquewidth, and a nearly SETH-tight algorithm parameterized by treewidth.

Authors: Narek Bojikian, Alexander Firbas, Robert Ganian, Hung P. Hoang, Krisztina Szilágyi

We investigate the computation of minimum-cost spanning trees satisfying prescribed vertex degree constraints: Given a graph $G$ and a constraint function $D$, we ask for a (minimum-cost) spanning tree $T$ such that for each vertex $v$, $T$ achieves a degree specified by $D(v)$. Specifically, we consider three kinds of constraint functions ordered by their generality -- $D$ may either assign each vertex to a list of admissible degrees, an upper bound on the degrees, or a specific degree. Using a combination of novel techniques and state-of-the-art machinery, we obtain an almost-complete overview of the fine-grained complexity of these problems taking into account the most classical graph parameters of the input graph $G$. In particular, we present SETH-tight upper and lower bounds for these problems when parameterized by the pathwidth and cutwidth, an ETH-tight algorithm parameterized by the cliquewidth, and a nearly SETH-tight algorithm parameterized by treewidth.

On $G^p$-unimodality of radius functions in graphs: structure and algorithms

from arXiv: Data Structures and Algorithms

Authors: Jérémie Chalopin, Victor Chepoi, Feodor Dragan, Guillaume Ducoffe, Yann Vaxès

For every weight assignment $\pi$ to the vertices in a graph $G$, the radius function $r_\pi$ maps every vertex of $G$ to its largest weighted distance to the other vertices. The center problem asks to find a center, i.e., a vertex of $G$ that minimizes $r_\pi$. We here study some local properties of radius functions in graphs, and their algorithmic implications; our work is inspired by the nice property that in Euclidean spaces every local minimum of every radius function $r_\pi$ is a center. We study a discrete analogue of this property for graphs, which we name $G^p$-unimodality: specifically, every vertex that minimizes the radius function in its ball of radius $p$ must be a central vertex. While it has long been known since Dragan (1989) that graphs with $G$-unimodal radius functions $r_\pi$ are exactly the Helly graphs, the class of graphs with $G^2$-unimodal radius functions has not been studied insofar. We prove the latter class to be much larger than the Helly graphs, since it also comprises (weakly) bridged graphs, graphs with convex balls, and bipartite Helly graphs. Recently, using the $G$-unimodality of radius functions $r_\pi$, a randomized $\widetilde{\mathcal{O}}(\sqrt{n}m)$-time local search algorithm for the center problem on Helly graphs was proposed by Ducoffe (2023). Assuming the Hitting Set Conjecture (Abboud et al., 2016), we prove that a similar result for the class of graphs with $G^2$-unimodal radius functions is unlikely. However, we design local search algorithms (randomized or deterministic) for the center problem on many of its important subclasses.

Authors: Jérémie Chalopin, Victor Chepoi, Feodor Dragan, Guillaume Ducoffe, Yann Vaxès

For every weight assignment $\pi$ to the vertices in a graph $G$, the radius function $r_\pi$ maps every vertex of $G$ to its largest weighted distance to the other vertices. The center problem asks to find a center, i.e., a vertex of $G$ that minimizes $r_\pi$. We here study some local properties of radius functions in graphs, and their algorithmic implications; our work is inspired by the nice property that in Euclidean spaces every local minimum of every radius function $r_\pi$ is a center. We study a discrete analogue of this property for graphs, which we name $G^p$-unimodality: specifically, every vertex that minimizes the radius function in its ball of radius $p$ must be a central vertex. While it has long been known since Dragan (1989) that graphs with $G$-unimodal radius functions $r_\pi$ are exactly the Helly graphs, the class of graphs with $G^2$-unimodal radius functions has not been studied insofar. We prove the latter class to be much larger than the Helly graphs, since it also comprises (weakly) bridged graphs, graphs with convex balls, and bipartite Helly graphs. Recently, using the $G$-unimodality of radius functions $r_\pi$, a randomized $\widetilde{\mathcal{O}}(\sqrt{n}m)$-time local search algorithm for the center problem on Helly graphs was proposed by Ducoffe (2023). Assuming the Hitting Set Conjecture (Abboud et al., 2016), we prove that a similar result for the class of graphs with $G^2$-unimodal radius functions is unlikely. However, we design local search algorithms (randomized or deterministic) for the center problem on many of its important subclasses.

A Variational-Calculus Approach to Online Algorithm Design and Analysis

from arXiv: Data Structures and Algorithms

Authors: Pan Xu

Factor-revealing linear programs (LPs) and policy-revealing LPs arise in various contexts of algorithm design and analysis. They are commonly used techniques for analyzing the performance of approximation and online algorithms, especially when direct performance evaluation is challenging. The main idea is to characterize the worst-case performance as a family of LPs parameterized by an integer $n \ge 1$, representing the size of the input instance. To obtain the best possible bounds on the target ratio (e.g., approximation or competitive ratios), we often need to determine the optimal objective value (and the corresponding optimal solution) of a family of LPs as $n \to \infty$. One common method, called the Primal-Dual approach, involves examining the constraint structure in the primal and dual programs, then developing feasible analytical solutions to both that achieve equal or nearly equal objective values. Another approach, known as \emph{strongly factor-revealing LPs}, similarly requires careful investigation of the constraint structure in the primal program. In summary, both methods rely on \emph{instance-specific techniques}, which is difficult to generalize from one instance to another. In this paper, we introduce a general variational-calculus approach that enables us to analytically study the optimal value and solution to a family of LPs as their size approaches infinity. The main idea is to first reformulate the LP in the limit, as its size grows infinitely large, as a variational-calculus instance and then apply existing methods, such as the Euler-Lagrange equation and Lagrange multipliers, to solve it. We demonstrate the power of our approach through three case studies of online optimization problems and anticipate broader applications of this method.

Authors: Pan Xu

Factor-revealing linear programs (LPs) and policy-revealing LPs arise in various contexts of algorithm design and analysis. They are commonly used techniques for analyzing the performance of approximation and online algorithms, especially when direct performance evaluation is challenging. The main idea is to characterize the worst-case performance as a family of LPs parameterized by an integer $n \ge 1$, representing the size of the input instance. To obtain the best possible bounds on the target ratio (e.g., approximation or competitive ratios), we often need to determine the optimal objective value (and the corresponding optimal solution) of a family of LPs as $n \to \infty$. One common method, called the Primal-Dual approach, involves examining the constraint structure in the primal and dual programs, then developing feasible analytical solutions to both that achieve equal or nearly equal objective values. Another approach, known as \emph{strongly factor-revealing LPs}, similarly requires careful investigation of the constraint structure in the primal program. In summary, both methods rely on \emph{instance-specific techniques}, which is difficult to generalize from one instance to another. In this paper, we introduce a general variational-calculus approach that enables us to analytically study the optimal value and solution to a family of LPs as their size approaches infinity. The main idea is to first reformulate the LP in the limit, as its size grows infinitely large, as a variational-calculus instance and then apply existing methods, such as the Euler-Lagrange equation and Lagrange multipliers, to solve it. We demonstrate the power of our approach through three case studies of online optimization problems and anticipate broader applications of this method.

Better Private Distribution Testing by Leveraging Unverified Auxiliary Data

from arXiv: Data Structures and Algorithms

Authors: Maryam Aliakbarpour, Arnav Burudgunte, Clément Cannone, Ronitt Rubinfeld

We extend the framework of augmented distribution testing (Aliakbarpour, Indyk, Rubinfeld, and Silwal, NeurIPS 2024) to the differentially private setting. This captures scenarios where a data analyst must perform hypothesis testing tasks on sensitive data, but is able to leverage prior knowledge (public, but possibly erroneous or untrusted) about the data distribution. We design private algorithms in this augmented setting for three flagship distribution testing tasks, uniformity, identity, and closeness testing, whose sample complexity smoothly scales with the claimed quality of the auxiliary information. We complement our algorithms with information-theoretic lower bounds, showing that their sample complexity is optimal (up to logarithmic factors).

Authors: Maryam Aliakbarpour, Arnav Burudgunte, Clément Cannone, Ronitt Rubinfeld

We extend the framework of augmented distribution testing (Aliakbarpour, Indyk, Rubinfeld, and Silwal, NeurIPS 2024) to the differentially private setting. This captures scenarios where a data analyst must perform hypothesis testing tasks on sensitive data, but is able to leverage prior knowledge (public, but possibly erroneous or untrusted) about the data distribution. We design private algorithms in this augmented setting for three flagship distribution testing tasks, uniformity, identity, and closeness testing, whose sample complexity smoothly scales with the claimed quality of the auxiliary information. We complement our algorithms with information-theoretic lower bounds, showing that their sample complexity is optimal (up to logarithmic factors).

Wednesday, March 19

A Failure to Communicate

from Computational Complexity

With care you can explain major ideas and results in computational complexity to the general public, like the P v NP problem, zero-knowledge proofs, the PCP theorem and Shor's factoring algorithms in a way that a curious non-scientist can find interesting. Quanta magazine keeps coming back to complexity. because we have a inherently interesting field.

So why am I having such a difficult time with the new Ryan Williams result, that time can be simulated in nearly quadratically less memory, or more precisely DTIME(\(t(n)\)) \(\subseteq\) DSPACE(\(\sqrt{t(n)\log t(n)}\)), based on the Cook-Mertz space-efficient tree evaluation algorithm.

Many results in complexity are quite specialized and technical but this shouldn't be one of them. Ryan's result involves a major new relationship between time and memory, the two most basic resource measures that goes back to the 1960s. Everybody understands time--waiting while their computer or phone is spinning. They know about memory, at least that you have to pay more for an iPhone with more of it. Yet still I'm challenged to explain this result.

Maybe because for historical reasons we use SPACE instead of MEMORY, or the pesky \(\log t(n)\) factor. Someone said they got lost in the logarithms. But we can talk about the gist of the result without mention space or logarithms.

It might be the \(t(n)\). We take for granted that we measure resources as a function of the problem size but outsiders could find it a challenging concept. For P v NP I just talk about efficient computation, here you can't avoid more specific running times. 

Or maybe people don't think about time and memory. Most operations on a computer happen close to instantaneously. Nobody thinks about memory unless they run out of it. 

Or because we don't have good real-world examples. When would you prefer a space \(\sqrt{t(n)}\) time \(2^\sqrt{t(n)}\) procedure to a space \(t(n)\) time \(t(n)\) algorithm? 

AI is supposed to be good at this. OK, Claude take it away.

Ryan Williams' breakthrough paper shows that computers can solve complex problems using far less memory than previously thought possible. For 50 years, computer scientists believed that problems requiring a certain amount of time (t) needed at least t divided by a small factor of memory. Williams proved that these same problems can actually be solved using only about the square root of t memory. He achieved this by connecting these problems to the Tree Evaluation problem and leveraging a recent algorithm by Cook and Mertz. This result makes significant progress toward resolving the longstanding P versus PSPACE problem, suggesting that many algorithms could potentially run efficiently on devices with much less memory than we currently use.

Imagine a navigation app trying to find the optimal route through a complex city network. Before this discovery, engineers believed that calculating detailed routes required either substantial memory or accepting slower performance on memory-limited devices. Williams' theorem suggests these calculations could run using dramatically less memory—potentially reducing requirements from 100 MB to just 10 KB (roughly the square root). This breakthrough could enable sophisticated navigation features on devices with severe memory constraints, such as smart watches, older phones, or embedded car systems, allowing them to handle complex routing problems with multiple stops or constraints without performance degradation.

By Lance Fortnow

With care you can explain major ideas and results in computational complexity to the general public, like the P v NP problem, zero-knowledge proofs, the PCP theorem and Shor's factoring algorithms in a way that a curious non-scientist can find interesting. Quanta magazine keeps coming back to complexity. because we have a inherently interesting field.

So why am I having such a difficult time with the new Ryan Williams result, that time can be simulated in nearly quadratically less memory, or more precisely DTIME(\(t(n)\)) \(\subseteq\) DSPACE(\(\sqrt{t(n)\log t(n)}\)), based on the Cook-Mertz space-efficient tree evaluation algorithm.

Many results in complexity are quite specialized and technical but this shouldn't be one of them. Ryan's result involves a major new relationship between time and memory, the two most basic resource measures that goes back to the 1960s. Everybody understands time--waiting while their computer or phone is spinning. They know about memory, at least that you have to pay more for an iPhone with more of it. Yet still I'm challenged to explain this result.

Maybe because for historical reasons we use SPACE instead of MEMORY, or the pesky \(\log t(n)\) factor. Someone said they got lost in the logarithms. But we can talk about the gist of the result without mention space or logarithms.

It might be the \(t(n)\). We take for granted that we measure resources as a function of the problem size but outsiders could find it a challenging concept. For P v NP I just talk about efficient computation, here you can't avoid more specific running times. 

Or maybe people don't think about time and memory. Most operations on a computer happen close to instantaneously. Nobody thinks about memory unless they run out of it. 

Or because we don't have good real-world examples. When would you prefer a space \(\sqrt{t(n)}\) time \(2^\sqrt{t(n)}\) procedure to a space \(t(n)\) time \(t(n)\) algorithm? 

AI is supposed to be good at this. OK, Claude take it away.

Ryan Williams' breakthrough paper shows that computers can solve complex problems using far less memory than previously thought possible. For 50 years, computer scientists believed that problems requiring a certain amount of time (t) needed at least t divided by a small factor of memory. Williams proved that these same problems can actually be solved using only about the square root of t memory. He achieved this by connecting these problems to the Tree Evaluation problem and leveraging a recent algorithm by Cook and Mertz. This result makes significant progress toward resolving the longstanding P versus PSPACE problem, suggesting that many algorithms could potentially run efficiently on devices with much less memory than we currently use.

Imagine a navigation app trying to find the optimal route through a complex city network. Before this discovery, engineers believed that calculating detailed routes required either substantial memory or accepting slower performance on memory-limited devices. Williams' theorem suggests these calculations could run using dramatically less memory—potentially reducing requirements from 100 MB to just 10 KB (roughly the square root). This breakthrough could enable sophisticated navigation features on devices with severe memory constraints, such as smart watches, older phones, or embedded car systems, allowing them to handle complex routing problems with multiple stops or constraints without performance degradation.

By Lance Fortnow

TR25-031 | Error-Correction of Matrix Multiplication Algorithms | Shuichi Hirahara, Nobutaka Shimizu

from ECCC Papers

Given an efficient algorithm that correctly computes a tiny fraction of the entries of the matrix multiplication of a small fraction of two matrices, can one design an efficient algorithm that computes matrix multiplication exactly for all the matrices? In this paper, we present such ``worst-case exact to average-case approximate'' reductions that transform any algorithm that correctly computes a tiny fraction of the entries of the multiplication of two uniformly random matrices over a finite field into a randomized worst-case algorithm that computes matrix multiplication for all the matrices. Under non-uniform reductions, we present an optimal reduction that error-corrects an algorithm whose output has expected Hamming distance $1 - \frac{1}{p} - \varepsilon$ to the multiplication of two random matrices over a finite field of size $p$ for any positive constant $\varepsilon > 0$. Under uniform reductions, we present efficient reductions that correct a $(1 - \varepsilon)$-fraction of errors over a field of size $p$ for all $\varepsilon > 0$ and for all sufficiently large $p$. We also present an optimal uniform reduction for the Online Matrix-Vector Multiplication problem. The non-uniform reduction is based on a new and simple proof of Yao's XOR lemma for multi-output functions, whose complexity overhead is independent of the length of the output.

Given an efficient algorithm that correctly computes a tiny fraction of the entries of the matrix multiplication of a small fraction of two matrices, can one design an efficient algorithm that computes matrix multiplication exactly for all the matrices? In this paper, we present such ``worst-case exact to average-case approximate'' reductions that transform any algorithm that correctly computes a tiny fraction of the entries of the multiplication of two uniformly random matrices over a finite field into a randomized worst-case algorithm that computes matrix multiplication for all the matrices. Under non-uniform reductions, we present an optimal reduction that error-corrects an algorithm whose output has expected Hamming distance $1 - \frac{1}{p} - \varepsilon$ to the multiplication of two random matrices over a finite field of size $p$ for any positive constant $\varepsilon > 0$. Under uniform reductions, we present efficient reductions that correct a $(1 - \varepsilon)$-fraction of errors over a field of size $p$ for all $\varepsilon > 0$ and for all sufficiently large $p$. We also present an optimal uniform reduction for the Online Matrix-Vector Multiplication problem. The non-uniform reduction is based on a new and simple proof of Yao's XOR lemma for multi-output functions, whose complexity overhead is independent of the length of the output.

Boolean Circuit Complexity and Two-Dimensional Cover Problems

from arXiv: Computational Complexity

Authors: Bruno P. Cavalar, Igor C. Oliveira

We reduce the problem of proving deterministic and nondeterministic Boolean circuit size lower bounds to the analysis of certain two-dimensional combinatorial cover problems. This is obtained by combining results of Razborov (1989), Karchmer (1993), and Wigderson (1993) in the context of the fusion method for circuit lower bounds with the graph complexity framework of Pudl\'ak, R\"odl, and Savick\'y (1988). For convenience, we formalize these ideas in the more general setting of "discrete complexity", i.e., the natural set-theoretic formulation of circuit complexity, variants of communication complexity, graph complexity, and other measures. We show that random graphs have linear graph cover complexity, and that explicit super-logarithmic graph cover complexity lower bounds would have significant consequences in circuit complexity. We then use discrete complexity, the fusion method, and a result of Karchmer and Wigderson (1993) to introduce nondeterministic graph complexity. This allows us to establish a connection between graph complexity and nondeterministic circuit complexity. Finally, complementing these results, we describe an exact characterization of the power of the fusion method in discrete complexity. This is obtained via an adaptation of a result of Nakayama and Maruoka (1995) that connects the fusion method to the complexity of "cyclic" Boolean circuits, which generalize the computation of a circuit by allowing cycles in its specification.

Authors: Bruno P. Cavalar, Igor C. Oliveira

We reduce the problem of proving deterministic and nondeterministic Boolean circuit size lower bounds to the analysis of certain two-dimensional combinatorial cover problems. This is obtained by combining results of Razborov (1989), Karchmer (1993), and Wigderson (1993) in the context of the fusion method for circuit lower bounds with the graph complexity framework of Pudl\'ak, R\"odl, and Savick\'y (1988). For convenience, we formalize these ideas in the more general setting of "discrete complexity", i.e., the natural set-theoretic formulation of circuit complexity, variants of communication complexity, graph complexity, and other measures. We show that random graphs have linear graph cover complexity, and that explicit super-logarithmic graph cover complexity lower bounds would have significant consequences in circuit complexity. We then use discrete complexity, the fusion method, and a result of Karchmer and Wigderson (1993) to introduce nondeterministic graph complexity. This allows us to establish a connection between graph complexity and nondeterministic circuit complexity. Finally, complementing these results, we describe an exact characterization of the power of the fusion method in discrete complexity. This is obtained via an adaptation of a result of Nakayama and Maruoka (1995) that connects the fusion method to the complexity of "cyclic" Boolean circuits, which generalize the computation of a circuit by allowing cycles in its specification.

Monoidal Rips: Stable Multiparameter Filtrations of Directed Networks

from arXiv: Computational Geometry

Authors: Nello Blaser, Morten Brun, Odin Hoff Gardaa, Lars M. Salbu

We introduce the monoidal Rips filtration, a filtered simplicial set for weighted directed graphs and other lattice-valued networks. Our construction generalizes the Vietoris-Rips filtration for metric spaces by replacing the maximum operator, determining the filtration values, with a more general monoidal product. We establish interleaving guarantees for the monoidal Rips persistent homology, capturing existing stability results for real-valued networks. When the lattice is a product of totally ordered sets, we are in the setting of multiparameter persistence. Here, the interleaving distance is bounded in terms of a generalized network distance. We use this to prove a novel stability result for the sublevel Rips bifiltration. Our experimental results show that our method performs better than flagser in a graph regression task, and that combining different monoidal products in point cloud classification can improve performance.

Authors: Nello Blaser, Morten Brun, Odin Hoff Gardaa, Lars M. Salbu

We introduce the monoidal Rips filtration, a filtered simplicial set for weighted directed graphs and other lattice-valued networks. Our construction generalizes the Vietoris-Rips filtration for metric spaces by replacing the maximum operator, determining the filtration values, with a more general monoidal product. We establish interleaving guarantees for the monoidal Rips persistent homology, capturing existing stability results for real-valued networks. When the lattice is a product of totally ordered sets, we are in the setting of multiparameter persistence. Here, the interleaving distance is bounded in terms of a generalized network distance. We use this to prove a novel stability result for the sublevel Rips bifiltration. Our experimental results show that our method performs better than flagser in a graph regression task, and that combining different monoidal products in point cloud classification can improve performance.

Foam: A Tool for Spherical Approximation of Robot Geometry

from arXiv: Computational Geometry

Authors: Sai Coumar, Gilbert Chang, Nihar Kodkani, Zachary Kingston

Many applications in robotics require primitive spherical geometry, especially in cases where efficient distance queries are necessary. Manual creation of spherical models is time-consuming and prone to errors. This paper presents Foam, a tool to generate spherical approximations of robot geometry from an input Universal Robot Description Format (URDF) file. Foam provides a robust preprocessing pipeline to handle mesh defects and a number of configuration parameters to control the level and approximation of the spherization, and generates an output URDF with collision geometry specified only by spheres. We demonstrate Foam on a number of standard robot models on common tasks, and demonstrate improved collision checking and distance query performance with only a minor loss in fidelity compared to the true collision geometry. We release our tool as an open source Python library and containerized command-line application to facilitate adoption across the robotics community.

Authors: Sai Coumar, Gilbert Chang, Nihar Kodkani, Zachary Kingston

Many applications in robotics require primitive spherical geometry, especially in cases where efficient distance queries are necessary. Manual creation of spherical models is time-consuming and prone to errors. This paper presents Foam, a tool to generate spherical approximations of robot geometry from an input Universal Robot Description Format (URDF) file. Foam provides a robust preprocessing pipeline to handle mesh defects and a number of configuration parameters to control the level and approximation of the spherization, and generates an output URDF with collision geometry specified only by spheres. We demonstrate Foam on a number of standard robot models on common tasks, and demonstrate improved collision checking and distance query performance with only a minor loss in fidelity compared to the true collision geometry. We release our tool as an open source Python library and containerized command-line application to facilitate adoption across the robotics community.

Color-Constrained Arborescences in Edge-Colored Digraphs

from arXiv: Data Structures and Algorithms

Authors: P. S. Ardra, Jasine Babu, R. Krithika, Deepak Rajendraprasad

Given a multigraph $G$ whose edges are colored from the set $[q]:=\{1,2,\ldots,q\}$ (\emph{$q$-colored graph}), and a vector $\alpha=(\alpha_1,\ldots,\alpha_{q}) \in \mathbb{N}^{q}$ (\emph{color-constraint}), a subgraph $H$ of $G$ is called \emph{$\alpha$-colored}, if $H$ has exactly $\alpha_i$ edges of color $i$ for each $i \in[q]$. In this paper, we focus on $\alpha$-colored arborescences (spanning out-trees) in $q$-colored multidigraphs. We study the decision, counting and search versions of this problem. It is known that the decision and search problems are polynomial-time solvable when $q=2$ and that the decision problem is NP-complete when $q$ is arbitrary. However the complexity status of the problem for fixed $q$ was open for $q > 2$. We show that, for a $q$-colored digraph $G$ and a vertex $s$ in $G$, the number of $\alpha$-colored arborescences in $G$ rooted at $s$ for all color-constraints $\alpha \in \mathbb{N}^q$ can be read from the determinant of a symbolic matrix in $q-1$ indeterminates. This result extends Tutte's matrix-tree theorem for directed graphs and gives a polynomial-time algorithm for the counting and decision problems for fixed $q$. We also use it to design an algorithm that finds an $\alpha$-colored arborescence when one exists. Finally, we study the weighted variant of the problem and give a polynomial-time algorithm (when $q$ is fixed) which finds a minimum weight solution.

Authors: P. S. Ardra, Jasine Babu, R. Krithika, Deepak Rajendraprasad

Given a multigraph $G$ whose edges are colored from the set $[q]:=\{1,2,\ldots,q\}$ (\emph{$q$-colored graph}), and a vector $\alpha=(\alpha_1,\ldots,\alpha_{q}) \in \mathbb{N}^{q}$ (\emph{color-constraint}), a subgraph $H$ of $G$ is called \emph{$\alpha$-colored}, if $H$ has exactly $\alpha_i$ edges of color $i$ for each $i \in[q]$. In this paper, we focus on $\alpha$-colored arborescences (spanning out-trees) in $q$-colored multidigraphs. We study the decision, counting and search versions of this problem. It is known that the decision and search problems are polynomial-time solvable when $q=2$ and that the decision problem is NP-complete when $q$ is arbitrary. However the complexity status of the problem for fixed $q$ was open for $q > 2$. We show that, for a $q$-colored digraph $G$ and a vertex $s$ in $G$, the number of $\alpha$-colored arborescences in $G$ rooted at $s$ for all color-constraints $\alpha \in \mathbb{N}^q$ can be read from the determinant of a symbolic matrix in $q-1$ indeterminates. This result extends Tutte's matrix-tree theorem for directed graphs and gives a polynomial-time algorithm for the counting and decision problems for fixed $q$. We also use it to design an algorithm that finds an $\alpha$-colored arborescence when one exists. Finally, we study the weighted variant of the problem and give a polynomial-time algorithm (when $q$ is fixed) which finds a minimum weight solution.

Efficient Greedy Discrete Subtrajectory Clustering

from arXiv: Data Structures and Algorithms

Authors: Ivor van der Hoog, Lara Ost, Eva Rotenberg, Daniel Rutschmann

We cluster a set of trajectories T using subtrajectories of T. Clustering quality may be measured by the number of clusters, the number of vertices of T that are absent from the clustering, and by the Fr\'{e}chet distance between subtrajectories in a cluster. A $\Delta$-cluster of T is a cluster ${\mathcal{P}}$ of subtrajectories of T with a centre $P \in {\mathcal{P}}$ with complexity $\ell$, where all subtrajectories in ${\mathcal{P}}$ have Fr\'{e}chet distance at most $\Delta$ to $P$. Buchin, Buchin, Gudmundsson, L\"{o}ffler and Luo present two $O(n^2 + n m \ell)$-time algorithms: SC($\max$, $\ell$, $\Delta$, T) computes a single $\Delta$-cluster where $P$ has at least $\ell$ vertices and maximises the cardinality $m$ of ${\mathcal{P}}$. SC($m$, $\max$, $\Delta$, T) computes a single $\Delta$-cluster where ${\mathcal{P}}$ has cardinality $m$ and maximises the complexity $\ell$ of $P$. We use such maximum-cardinality clusters in a greedy clustering algorithm. We provide an efficient implementation of SC($\max$, $\ell$, $\Delta$, T) and SC($m$, $\max$, $\Delta$, T) that significantly outperforms previous implementations. We use these functions as a subroutine in a greedy clustering algorithm, which performs well when compared to existing subtrajectory clustering algorithms on real-world data. Finally, we observe that, for fixed $\Delta$ and T, these two functions always output a point on the Pareto front of some bivariate function $\theta(\ell, m)$. We design a new algorithm PSC($\Delta$, T) that in $O( n^2 \log^4 n)$ time computes a $2$-approximation of this Pareto front. This yields a broader set of candidate clusters, with comparable quality. We show that using PSC($\Delta$, T) as a subroutine improves the clustering quality and performance even further.

Authors: Ivor van der Hoog, Lara Ost, Eva Rotenberg, Daniel Rutschmann

We cluster a set of trajectories T using subtrajectories of T. Clustering quality may be measured by the number of clusters, the number of vertices of T that are absent from the clustering, and by the Fr\'{e}chet distance between subtrajectories in a cluster. A $\Delta$-cluster of T is a cluster ${\mathcal{P}}$ of subtrajectories of T with a centre $P \in {\mathcal{P}}$ with complexity $\ell$, where all subtrajectories in ${\mathcal{P}}$ have Fr\'{e}chet distance at most $\Delta$ to $P$. Buchin, Buchin, Gudmundsson, L\"{o}ffler and Luo present two $O(n^2 + n m \ell)$-time algorithms: SC($\max$, $\ell$, $\Delta$, T) computes a single $\Delta$-cluster where $P$ has at least $\ell$ vertices and maximises the cardinality $m$ of ${\mathcal{P}}$. SC($m$, $\max$, $\Delta$, T) computes a single $\Delta$-cluster where ${\mathcal{P}}$ has cardinality $m$ and maximises the complexity $\ell$ of $P$. We use such maximum-cardinality clusters in a greedy clustering algorithm. We provide an efficient implementation of SC($\max$, $\ell$, $\Delta$, T) and SC($m$, $\max$, $\Delta$, T) that significantly outperforms previous implementations. We use these functions as a subroutine in a greedy clustering algorithm, which performs well when compared to existing subtrajectory clustering algorithms on real-world data. Finally, we observe that, for fixed $\Delta$ and T, these two functions always output a point on the Pareto front of some bivariate function $\theta(\ell, m)$. We design a new algorithm PSC($\Delta$, T) that in $O( n^2 \log^4 n)$ time computes a $2$-approximation of this Pareto front. This yields a broader set of candidate clusters, with comparable quality. We show that using PSC($\Delta$, T) as a subroutine improves the clustering quality and performance even further.

Streaming and Massively Parallel Algorithms for Euclidean Max-Cut

from arXiv: Data Structures and Algorithms

Authors: Nicolas Menand, Erik Waingarten

Given a set of vectors $X = \{ x_1,\dots, x_n \} \subset \mathbb{R}^d$, the Euclidean max-cut problem asks to partition the vectors into two parts so as to maximize the sum of Euclidean distances which cross the partition. We design new algorithms for Euclidean max-cut in models for massive datasets: $\bullet$ We give a fully-scalable constant-round MPC algorithm using $O(nd) + n \cdot \text{poly}( \log(n) / \epsilon)$ total space which gives a $(1+\epsilon)$-approximate Euclidean max-cut. $\bullet$ We give a dynamic streaming algorithm using $\text{poly}(d \log \Delta / \epsilon)$ space when $X \subseteq [\Delta]^d$, which provides oracle access to a $(1+\epsilon)$-approximate Euclidean max-cut. Recently, Chen, Jiang, and Krauthgamer $[\text{STOC}~'23]$ gave a dynamic streaming algorithm with space $\text{poly}(d\log\Delta/\epsilon)$ to approximate the value of the Euclidean max-cut, but could not provide oracle access to an approximately optimal cut. This was left open in that work, and we resolve it here. Both algorithms follow from the same framework, which analyzes a ``parallel'' and ``subsampled'' (Euclidean) version of a greedy algorithm of Mathieu and Schudy $[\text{SODA}~'08]$ for dense max-cut.

Authors: Nicolas Menand, Erik Waingarten

Given a set of vectors $X = \{ x_1,\dots, x_n \} \subset \mathbb{R}^d$, the Euclidean max-cut problem asks to partition the vectors into two parts so as to maximize the sum of Euclidean distances which cross the partition. We design new algorithms for Euclidean max-cut in models for massive datasets: $\bullet$ We give a fully-scalable constant-round MPC algorithm using $O(nd) + n \cdot \text{poly}( \log(n) / \epsilon)$ total space which gives a $(1+\epsilon)$-approximate Euclidean max-cut. $\bullet$ We give a dynamic streaming algorithm using $\text{poly}(d \log \Delta / \epsilon)$ space when $X \subseteq [\Delta]^d$, which provides oracle access to a $(1+\epsilon)$-approximate Euclidean max-cut. Recently, Chen, Jiang, and Krauthgamer $[\text{STOC}~'23]$ gave a dynamic streaming algorithm with space $\text{poly}(d\log\Delta/\epsilon)$ to approximate the value of the Euclidean max-cut, but could not provide oracle access to an approximately optimal cut. This was left open in that work, and we resolve it here. Both algorithms follow from the same framework, which analyzes a ``parallel'' and ``subsampled'' (Euclidean) version of a greedy algorithm of Mathieu and Schudy $[\text{SODA}~'08]$ for dense max-cut.

Branch Prediction Analysis of Morris-Pratt and Knuth-Morris-Pratt Algorithms

from arXiv: Data Structures and Algorithms

Authors: Cyril Nicaud, Carine Pivoteau, Stéphane Vialette

We analyze the classical Morris-Pratt and Knuth-Morris-Pratt pattern matching algorithms through the lens of computer architecture, investigating the impact of incorporating a simple branch prediction mechanism into the model of computation. Assuming a fixed pattern and a random text, we derive precise estimates of the number of mispredictions these algorithms produce using local predictors. Our approach is based on automata theory and Markov chains, providing a foundation for the theoretical analysis of other text algorithms and more advanced branch prediction strategies.

Authors: Cyril Nicaud, Carine Pivoteau, Stéphane Vialette

We analyze the classical Morris-Pratt and Knuth-Morris-Pratt pattern matching algorithms through the lens of computer architecture, investigating the impact of incorporating a simple branch prediction mechanism into the model of computation. Assuming a fixed pattern and a random text, we derive precise estimates of the number of mispredictions these algorithms produce using local predictors. Our approach is based on automata theory and Markov chains, providing a foundation for the theoretical analysis of other text algorithms and more advanced branch prediction strategies.

Quantum EigenGame for excited state calculation

from arXiv: Data Structures and Algorithms

Authors: David Quiroga, Jason Han, Anastasios Kyrillidis

Computing the excited states of a given Hamiltonian is computationally hard for large systems, but methods that do so using quantum computers scale tractably. This problem is equivalent to the PCA problem where we are interested in decomposing a matrix into a collection of principal components. Classically, PCA is a well-studied problem setting, for which both centralized and distributed approaches have been developed. On the distributed side, one recent approach is that of EigenGame, a game-theoretic approach to finding eigenvectors where each eigenvector reaches a Nash equilibrium either sequentially or in parallel. With this work, we extend the EigenGame algorithm for both a $0^\text{th}$-order approach and for quantum computers, and harness the framework that quantum computing provides in computing excited states. Results show that using the Quantum EigenGame allows us to converge to excited states of a given Hamiltonian without the need of a deflation step. We also develop theory on error accumulation for finite-differences and parameterized approaches.

Authors: David Quiroga, Jason Han, Anastasios Kyrillidis

Computing the excited states of a given Hamiltonian is computationally hard for large systems, but methods that do so using quantum computers scale tractably. This problem is equivalent to the PCA problem where we are interested in decomposing a matrix into a collection of principal components. Classically, PCA is a well-studied problem setting, for which both centralized and distributed approaches have been developed. On the distributed side, one recent approach is that of EigenGame, a game-theoretic approach to finding eigenvectors where each eigenvector reaches a Nash equilibrium either sequentially or in parallel. With this work, we extend the EigenGame algorithm for both a $0^\text{th}$-order approach and for quantum computers, and harness the framework that quantum computing provides in computing excited states. Results show that using the Quantum EigenGame allows us to converge to excited states of a given Hamiltonian without the need of a deflation step. We also develop theory on error accumulation for finite-differences and parameterized approaches.

Optimal Non-Oblivious Open Addressing

from arXiv: Data Structures and Algorithms

Authors: Michael A. Bender, William Kuszmaul, Renfei Zhou

A hash table is said to be open-addressed (or non-obliviously open-addressed) if it stores elements (and free slots) in an array with no additional metadata. Intuitively, open-addressed hash tables must incur a space-time tradeoff: The higher the load factor at which the hash table operates, the longer insertions/deletions/queries should take. In this paper, we show that no such tradeoff exists: It is possible to construct an open-addressed hash table that supports constant-time operations even when the hash table is entirely full. In fact, it is even possible to construct a version of this data structure that: (1) is dynamically resized so that the number of slots in memory that it uses, at any given moment, is the same as the number of elements it contains; (2) supports $O(1)$-time operations, not just in expectation, but with high probability; and (3) requires external access to just $O(1)$ hash functions that are each just $O(1)$-wise independent. Our results complement a recent lower bound by Bender, Kuszmaul, and Zhou showing that oblivious open-addressed hash tables must incur $\Omega(\log \log \varepsilon^{-1})$-time operations. The hash tables in this paper are non-oblivious, which is why they are able to bypass the previous lower bound.

Authors: Michael A. Bender, William Kuszmaul, Renfei Zhou

A hash table is said to be open-addressed (or non-obliviously open-addressed) if it stores elements (and free slots) in an array with no additional metadata. Intuitively, open-addressed hash tables must incur a space-time tradeoff: The higher the load factor at which the hash table operates, the longer insertions/deletions/queries should take. In this paper, we show that no such tradeoff exists: It is possible to construct an open-addressed hash table that supports constant-time operations even when the hash table is entirely full. In fact, it is even possible to construct a version of this data structure that: (1) is dynamically resized so that the number of slots in memory that it uses, at any given moment, is the same as the number of elements it contains; (2) supports $O(1)$-time operations, not just in expectation, but with high probability; and (3) requires external access to just $O(1)$ hash functions that are each just $O(1)$-wise independent. Our results complement a recent lower bound by Bender, Kuszmaul, and Zhou showing that oblivious open-addressed hash tables must incur $\Omega(\log \log \varepsilon^{-1})$-time operations. The hash tables in this paper are non-oblivious, which is why they are able to bypass the previous lower bound.

Tuesday, March 18

I think it's gonna rain...

from Ben Recht

The paradoxes of calibrated forecasting.

In the last post, I described an interpretation of scoring rules in terms of utility maximization, arguing that risk-averse stochastic optimization demanded accurate forecasts of the future. In this decision-theoretic framework, the forecast errors were measured in terms of the particularities of the decision problem, and forecasters needed to tune their predictions to the particular formulation of utility.

However, some aspects of forecasting should be independent of their use. Are there parts of forecasting that are essential no matter what the task at hand is? Calibration is one such potentially universal component.

Calibration is not asking for much. A forecast is calibrated if whenever the forecaster says something should happen P percent of the time, it happens P percent of the time. The weather service wants to ensure that 50% of the time they predict a 50% chance of rain, it rains. This seems like a very reasonable property of a forecast. Honestly, it would be odd if some prognosticator got such rates wrong. If you told me something happened with probability 0.05, but I did my back-of-the-envelope calculation and saw that this thing happened half of the time, I’d stop asking you for predictions.

So how can you make well-calibrated predictions? You might think that a good Bayesian will always be calibrated. Bayes Rule is all you need, after all. Why would you need to correct a perfect epistemic framework? A classic argument by Phil Dawid shows this simply isn’t the case. Suppose you are a Bayesian forecaster. Given everything you’ve seen before, you decide the probability of rain today is 10%. Then, the events “it rains” and “I am going to check your accuracy at 10%” are conditionally independent, given everything observed by the Bayesian thus far. That is, before we see if it rains today, there’s no reason to think that checking the calibration of a prediction has any impact on the prediction. But this means that your empirical estimate of calibration is always unbiased. If you keep your subjective probabilities updated using Bayes rule, you will never think you are miscalibrated. However, if I were a malicious god, I could ask for your prediction of rain and decide to have the weather be whatever will make you miscalibrated. A subjective Bayesian will always believe they are calibrated, while an adversarial nature can ensure they are miscalibrated.

Probability is weird, folks. There’s a paradox lurking around every corner.

On the other hand, it’s very easy to cook up an algorithm that produces calibrated forecasts with absolutely no knowledge of what it's supposed to be forecasting. Dean Foster and Rakesh Vorha originally demonstrated this back in 1998. More recently, in a 2021 paper with Sergiu Hart, Dean came up with a super simple method for generating calibrated forecasts. You look at your frequency bins and find one where you have predicted a rate of outcomes greater than or equal to what you have observed. (e.g., suppose that when you have predicted 10%, the event has happened 15% of the time thus far). You then choose your forecast randomly, predicting either the overconfident bin or the adjacent bin. Forster and Hart give a closed-form formula for the probability of picking between the two. That’s it. Effectively, you look for places where you seem to be overpredicting and forecast that bin or the adjacent bin to push the rate down. Since you randomly choose between two forecasts, an adversarial god can’t know which bin you’ll pick and, hence, can’t muck up your calibration. This is the joy of mixed strategies.1

Though the algorithm here uses randomness, it gives you calibrated answers even if nature is completely deterministic. It’s just computing rates, and these rates will eventually look as good as if you had seen the entire course of history and the future and calculated the rates from that data. Probability doesn’t need to exist to compute calibrated forecasts. You’ve probably forgotten already, but I was going on about this earlier in the semester. As long as your estimates don’t affect the future, you can compute accurate rates of arbitrary sequences. This fact is tremendously underappreciated.

So it’s interesting: a subjective Bayesian always believes they are calibrated. An adversarial agent can drive them to not be calibrated. And yet, a randomized algorithm can produce a calibrated forecast without any knowledge of what’s happening. Very odd!

This is why you shouldn’t put too much faith in forecasters who brag about their calibration. They should be calibrated! It’s a minimal ask for them to be calibrated. But calibration doesn’t mean they have any particularly clever or novel insights.

If calibration is so easy to achieve, why bother with it? Foster and Hart turn this around: if calibration is so easy to achieve, everyone forecaster should calibrate! Because calibration makes your forecasts legible to everyone else. Calibration imbues a kind of meaning to your probabilities. People appreciate the connections between your predicted rates and your subjective beliefs.

If you are a pragmatic statistician, you would like to use the language of probability to communicate uncertainty, but you want your language to be useful to others. Even without probabilistic arguments, if frequencies in the past are the same as frequencies in the future, then just getting the frequencies right is often enough to solve stochastic optimization problems. A calibrated forecast ensures the team making downstream decisions can understand what your probabilities mean. Calibration is not “all you need,” but it’s a helpful way to communicate what you mean by probability.

Subscribe now

1

IMHO, a truly omniscient adversary would know your random seed, but whatever, it’s a math model.

By Ben Recht

On Some Fundamental Problems for Multi-Agent Systems Over Multilayer Networks

from arXiv: Computational Complexity

Authors: Daniel J. Rosenkrantz, Madhav V. Marathe, Zirou Qiu, S. S. Ravi, Richard E. Stearns

Many researchers have considered multi-agent systems over single-layer networks as models for studying diffusion phenomena. Since real-world networks involve connections between agents with different semantics (e.g., family member, friend, colleague), the study of multi-agent systems over multilayer networks has assumed importance. Our focus is on one class of multi-agent system models over multilayer networks, namely multilayer synchronous dynamical systems (MSyDSs). We study several fundamental problems for this model. We establish properties of the phase spaces of MSyDSs and bring out interesting differences between single-layer and multilayer dynamical systems. We show that, in general, the problem of determining whether two given MSyDSs are inequivalent is NP-complete. This hardness result holds even when the only difference between the two systems is the local function at just one node in one layer. We also present efficient algorithms for the equivalence problem for restricted versions of MSyDSs (e.g., systems where each local function is a bounded-threshold function, systems where the number of layers is fixed and each local function is symmetric). In addition, we investigate the expressive power of MSyDSs based on the number of layers. In particular, we examine conditions under which a system with k >= 2 layers has an equivalent system with k-1 or fewer layers.

Authors: Daniel J. Rosenkrantz, Madhav V. Marathe, Zirou Qiu, S. S. Ravi, Richard E. Stearns

Many researchers have considered multi-agent systems over single-layer networks as models for studying diffusion phenomena. Since real-world networks involve connections between agents with different semantics (e.g., family member, friend, colleague), the study of multi-agent systems over multilayer networks has assumed importance. Our focus is on one class of multi-agent system models over multilayer networks, namely multilayer synchronous dynamical systems (MSyDSs). We study several fundamental problems for this model. We establish properties of the phase spaces of MSyDSs and bring out interesting differences between single-layer and multilayer dynamical systems. We show that, in general, the problem of determining whether two given MSyDSs are inequivalent is NP-complete. This hardness result holds even when the only difference between the two systems is the local function at just one node in one layer. We also present efficient algorithms for the equivalence problem for restricted versions of MSyDSs (e.g., systems where each local function is a bounded-threshold function, systems where the number of layers is fixed and each local function is symmetric). In addition, we investigate the expressive power of MSyDSs based on the number of layers. In particular, we examine conditions under which a system with k >= 2 layers has an equivalent system with k-1 or fewer layers.

Deciding Connectivity in Symmetric Semi-Algebraic Sets

from arXiv: Computational Geometry

Authors: Cordian. Riener, Robin Schabert, Thi Xuan Vu

A semi-algebraic set is a subset of $\mathbb{R}^n$ defined by a finite collection of polynomial equations and inequalities. In this paper, we investigate the problem of determining whether two points in such a set belong to the same connected component. We focus on the case where the defining equations and inequalities are invariant under the natural action of the symmetric group and where each polynomial has degree at most \( d \), with \( d < n \) (where \( n \) denotes the number of variables). Exploiting this symmetry, we develop and analyze algorithms for two key tasks. First, we present an algorithm that determines whether the orbits of two given points are connected. Second, we provide an algorithm that decides connectivity between arbitrary points in the set. Both algorithms run in polynomial time with respect to \( n \).

Authors: Cordian. Riener, Robin Schabert, Thi Xuan Vu

A semi-algebraic set is a subset of $\mathbb{R}^n$ defined by a finite collection of polynomial equations and inequalities. In this paper, we investigate the problem of determining whether two points in such a set belong to the same connected component. We focus on the case where the defining equations and inequalities are invariant under the natural action of the symmetric group and where each polynomial has degree at most \( d \), with \( d < n \) (where \( n \) denotes the number of variables). Exploiting this symmetry, we develop and analyze algorithms for two key tasks. First, we present an algorithm that determines whether the orbits of two given points are connected. Second, we provide an algorithm that decides connectivity between arbitrary points in the set. Both algorithms run in polynomial time with respect to \( n \).

Deciding if a DAG is Interesting is Hard

from arXiv: Data Structures and Algorithms

Authors: Jean-Lou De Carufel, Anil Maheshwari, Saeed Odak, Bodhayan Roy, Michiel Smid, Marc Vicuna

The \emph{interestingness score} of a directed path $\Pi = e_1, e_2, e_3, \dots, e_\ell$ in an edge-weighted directed graph $G$ is defined as $\texttt{score}(\Pi) := \sum_{i=1}^\ell w(e_i) \cdot \log{(i+1)}$, where $w(e_i)$ is the weight of the edge $e_i$. We consider two optimization problems that arise in the analysis of Mapper graphs, which is a powerful tool in topological data analysis. In the IP problem, the objective is to find a collection $\mathcal{P}$ of edge-disjoint paths in $G$ with the maximum total interestingness score. %; that is, two raised to the power of the sum of the weights of the paths in $\mathcal{P}$. For $k \in \mathbb{N}$, the $k$-IP problem is a variant of the IP problem with the extra constraint that each path in $\mathcal{P}$ must have exactly $k$ edges. Kalyanaraman, Kamruzzaman, and Krishnamoorthy (Journal of Computational Geometry, 2019) claim that both IP and $k$-IP (for $k \geq 3$) are NP-complete. We point out some inaccuracies in their proofs. Furthermore, we show that both problems are NP-hard in directed acyclic graphs.

Authors: Jean-Lou De Carufel, Anil Maheshwari, Saeed Odak, Bodhayan Roy, Michiel Smid, Marc Vicuna

The \emph{interestingness score} of a directed path $\Pi = e_1, e_2, e_3, \dots, e_\ell$ in an edge-weighted directed graph $G$ is defined as $\texttt{score}(\Pi) := \sum_{i=1}^\ell w(e_i) \cdot \log{(i+1)}$, where $w(e_i)$ is the weight of the edge $e_i$. We consider two optimization problems that arise in the analysis of Mapper graphs, which is a powerful tool in topological data analysis. In the IP problem, the objective is to find a collection $\mathcal{P}$ of edge-disjoint paths in $G$ with the maximum total interestingness score. %; that is, two raised to the power of the sum of the weights of the paths in $\mathcal{P}$. For $k \in \mathbb{N}$, the $k$-IP problem is a variant of the IP problem with the extra constraint that each path in $\mathcal{P}$ must have exactly $k$ edges. Kalyanaraman, Kamruzzaman, and Krishnamoorthy (Journal of Computational Geometry, 2019) claim that both IP and $k$-IP (for $k \geq 3$) are NP-complete. We point out some inaccuracies in their proofs. Furthermore, we show that both problems are NP-hard in directed acyclic graphs.

Semi-Streaming Algorithms for Graph Property Certification

from arXiv: Data Structures and Algorithms

Authors: Avinandan Das, Pierre Fraigniaud, Ami Paz, Adi Rosen

We introduce the {\em certification} of solutions to graph problems when access to the input is restricted. This topic has received a lot of attention in the distributed computing setting, and we introduce it here in the context of \emph{streaming} algorithms, where the input is too large to be stored in memory. Given a graph property $\mbox{P}$, a \emph{streaming certification scheme} for $\mbox{P}$ is a \emph{prover-verifier} pair where the prover is a computationally unlimited but non-trustable oracle, and the verifier is a streaming algorithm. For any input graph, the prover provides the verifier with a \emph{certificate}. The verifier then receives the input graph as a stream of edges in an adversarial order, and must check whether the certificate is indeed a \emph{proof} that the input graph satisfies $\mbox{P}$. The main complexity measure for a streaming certification scheme is its \emph{space complexity}, defined as the sum of the size of the certificate provided by the oracle, and of the memory space required by the verifier. We give streaming certification schemes for several graph properties, including maximum matching, diameter, degeneracy, and coloring, with space complexity matching the requirement of \emph{semi-streaming}, i.e., with space complexity $O(n\,\mbox{polylog}\, n)$ for $n$-node graphs. All these problems do {\em not} admit semi-streaming algorithms, showing that also in the (semi) streaming setting, certification is sometimes easier than calculation (like $NP$). For each of these properties, we provide upper and lower bounds on the space complexity of the corresponding certification schemes, many being tight up to logarithmic multiplicative factors. We also show that some graph properties are hard for streaming certification, in the sense that they cannot be certified in semi-streaming, as they require $\Omega(n^2)$-bit certificates.

Authors: Avinandan Das, Pierre Fraigniaud, Ami Paz, Adi Rosen

We introduce the {\em certification} of solutions to graph problems when access to the input is restricted. This topic has received a lot of attention in the distributed computing setting, and we introduce it here in the context of \emph{streaming} algorithms, where the input is too large to be stored in memory. Given a graph property $\mbox{P}$, a \emph{streaming certification scheme} for $\mbox{P}$ is a \emph{prover-verifier} pair where the prover is a computationally unlimited but non-trustable oracle, and the verifier is a streaming algorithm. For any input graph, the prover provides the verifier with a \emph{certificate}. The verifier then receives the input graph as a stream of edges in an adversarial order, and must check whether the certificate is indeed a \emph{proof} that the input graph satisfies $\mbox{P}$. The main complexity measure for a streaming certification scheme is its \emph{space complexity}, defined as the sum of the size of the certificate provided by the oracle, and of the memory space required by the verifier. We give streaming certification schemes for several graph properties, including maximum matching, diameter, degeneracy, and coloring, with space complexity matching the requirement of \emph{semi-streaming}, i.e., with space complexity $O(n\,\mbox{polylog}\, n)$ for $n$-node graphs. All these problems do {\em not} admit semi-streaming algorithms, showing that also in the (semi) streaming setting, certification is sometimes easier than calculation (like $NP$). For each of these properties, we provide upper and lower bounds on the space complexity of the corresponding certification schemes, many being tight up to logarithmic multiplicative factors. We also show that some graph properties are hard for streaming certification, in the sense that they cannot be certified in semi-streaming, as they require $\Omega(n^2)$-bit certificates.

Constant Approximation of Fréchet Distance in Strongly Subquadratic Time

from arXiv: Data Structures and Algorithms

Authors: Siu-Wing Cheng, Haoqiang Huang, Shuo Zhang

Let $\tau$ and $\sigma$ be two polygonal curves in $\mathbb{R}^d$ for any fixed $d$. Suppose that $\tau$ and $\sigma$ have $n$ and $m$ vertices, respectively, and $m\le n$. While conditional lower bounds prevent approximating the Fr\'echet distance between $\tau$ and $\sigma$ within a factor of 3 in strongly subquadratic time, the current best approximation algorithm attains a ratio of $n^c$ in strongly subquadratic time, for some constant $c\in(0,1)$. We present a randomized algorithm with running time $O(nm^{0.99}\log(n/\varepsilon))$ that approximates the Fr\'echet distance within a factor of $7+\varepsilon$, with a success probability at least $1-1/n^6$. We also adapt our techniques to develop a randomized algorithm that approximates the \emph{discrete} Fr\'echet distance within a factor of $7+\varepsilon$ in strongly subquadratic time. They are the first algorithms to approximate the Fr\'echet distance and the discrete Fr\'echet distance within constant factors in strongly subquadratic time.

Authors: Siu-Wing Cheng, Haoqiang Huang, Shuo Zhang

Let $\tau$ and $\sigma$ be two polygonal curves in $\mathbb{R}^d$ for any fixed $d$. Suppose that $\tau$ and $\sigma$ have $n$ and $m$ vertices, respectively, and $m\le n$. While conditional lower bounds prevent approximating the Fr\'echet distance between $\tau$ and $\sigma$ within a factor of 3 in strongly subquadratic time, the current best approximation algorithm attains a ratio of $n^c$ in strongly subquadratic time, for some constant $c\in(0,1)$. We present a randomized algorithm with running time $O(nm^{0.99}\log(n/\varepsilon))$ that approximates the Fr\'echet distance within a factor of $7+\varepsilon$, with a success probability at least $1-1/n^6$. We also adapt our techniques to develop a randomized algorithm that approximates the \emph{discrete} Fr\'echet distance within a factor of $7+\varepsilon$ in strongly subquadratic time. They are the first algorithms to approximate the Fr\'echet distance and the discrete Fr\'echet distance within constant factors in strongly subquadratic time.

A $(1+ε)$-Approximation for Ultrametric Embedding in Subquadratic Time

from arXiv: Data Structures and Algorithms

Authors: Gabriel Bathie, Guillaume Lagarde

Efficiently computing accurate representations of high-dimensional data is essential for data analysis and unsupervised learning. Dendrograms, also known as ultrametrics, are widely used representations that preserve hierarchical relationships within the data. However, popular methods for computing them, such as linkage algorithms, suffer from quadratic time and space complexity, making them impractical for large datasets. The "best ultrametric embedding" (a.k.a. "best ultrametric fit") problem, which aims to find the ultrametric that best preserves the distances between points in the original data, is known to require at least quadratic time for an exact solution. Recent work has focused on improving scalability by approximating optimal solutions in subquadratic time, resulting in a $(\sqrt{2} + \epsilon)$-approximation (Cohen-Addad, de Joannis de Verclos and Lagarde, 2021). In this paper, we present the first subquadratic algorithm that achieves arbitrarily precise approximations of the optimal ultrametric embedding. Specifically, we provide an algorithm that, for any $c \geq 1$, outputs a $c$-approximation of the best ultrametric in time $\tilde{O}(n^{1 + 1/c})$. In particular, for any fixed $\epsilon > 0$, the algorithm computes a $(1+\epsilon)$-approximation in time $\tilde{O}(n^{2 - \epsilon + o(\epsilon^2)})$. Experimental results show that our algorithm improves upon previous methods in terms of approximation quality while maintaining comparable running times.

Authors: Gabriel Bathie, Guillaume Lagarde

Efficiently computing accurate representations of high-dimensional data is essential for data analysis and unsupervised learning. Dendrograms, also known as ultrametrics, are widely used representations that preserve hierarchical relationships within the data. However, popular methods for computing them, such as linkage algorithms, suffer from quadratic time and space complexity, making them impractical for large datasets. The "best ultrametric embedding" (a.k.a. "best ultrametric fit") problem, which aims to find the ultrametric that best preserves the distances between points in the original data, is known to require at least quadratic time for an exact solution. Recent work has focused on improving scalability by approximating optimal solutions in subquadratic time, resulting in a $(\sqrt{2} + \epsilon)$-approximation (Cohen-Addad, de Joannis de Verclos and Lagarde, 2021). In this paper, we present the first subquadratic algorithm that achieves arbitrarily precise approximations of the optimal ultrametric embedding. Specifically, we provide an algorithm that, for any $c \geq 1$, outputs a $c$-approximation of the best ultrametric in time $\tilde{O}(n^{1 + 1/c})$. In particular, for any fixed $\epsilon > 0$, the algorithm computes a $(1+\epsilon)$-approximation in time $\tilde{O}(n^{2 - \epsilon + o(\epsilon^2)})$. Experimental results show that our algorithm improves upon previous methods in terms of approximation quality while maintaining comparable running times.

Follow-the-Regularized-Leader with Adversarial Constraints

from arXiv: Data Structures and Algorithms

Authors: Ricardo N. Ferreira, Cláudia Soares

Constrained Online Convex Optimization (COCO) can be seen as a generalization of the standard Online Convex Optimization (OCO) framework. At each round, a cost function and constraint function are revealed after a learner chooses an action. The goal is to minimize both the regret and cumulative constraint violation (CCV) against an adaptive adversary. We show for the first time that is possible to obtain the optimal $O(\sqrt{T})$ bound on both regret and CCV, improving the best known bounds of $O \left( \sqrt{T} \right)$ and $\~{O} \left( \sqrt{T} \right)$ for the regret and CCV, respectively.

Authors: Ricardo N. Ferreira, Cláudia Soares

Constrained Online Convex Optimization (COCO) can be seen as a generalization of the standard Online Convex Optimization (OCO) framework. At each round, a cost function and constraint function are revealed after a learner chooses an action. The goal is to minimize both the regret and cumulative constraint violation (CCV) against an adaptive adversary. We show for the first time that is possible to obtain the optimal $O(\sqrt{T})$ bound on both regret and CCV, improving the best known bounds of $O \left( \sqrt{T} \right)$ and $\~{O} \left( \sqrt{T} \right)$ for the regret and CCV, respectively.

Hierarchical Multicriteria Shortest Path Search

from arXiv: Data Structures and Algorithms

Authors: Temirlan Kurbanov, Linxiao Miao, Jiří Vokřínek

This paper presents a novel multicriteria shortest path search algorithm called Hierarchical MLS. The distinguishing feature of the algorithm is the multilayered structure of compressed k-Path-Cover graphs it operates on. In addition to providing significant improvements in terms of time and memory consumption, the algorithm is notable for several other features. Due to the preprocessing phase requiring only several seconds, the algorithm can be successfully applied to scenarios with dynamic prices. Moreover, the algorithm does not employ bidirectional search, and can thus work on time-dependent metrics. We test the algorithm on multiple graphs and analyze its performance in terms of time and memory efficiency. The results prove Hierarchical MLS to be faster than its direct alternatives by at least 2 times in terms of query runtime and at least 20 times in terms of preprocessing.

Authors: Temirlan Kurbanov, Linxiao Miao, Jiří Vokřínek

This paper presents a novel multicriteria shortest path search algorithm called Hierarchical MLS. The distinguishing feature of the algorithm is the multilayered structure of compressed k-Path-Cover graphs it operates on. In addition to providing significant improvements in terms of time and memory consumption, the algorithm is notable for several other features. Due to the preprocessing phase requiring only several seconds, the algorithm can be successfully applied to scenarios with dynamic prices. Moreover, the algorithm does not employ bidirectional search, and can thus work on time-dependent metrics. We test the algorithm on multiple graphs and analyze its performance in terms of time and memory efficiency. The results prove Hierarchical MLS to be faster than its direct alternatives by at least 2 times in terms of query runtime and at least 20 times in terms of preprocessing.

Parallel Minimum Cost Flow in Near-Linear Work and Square Root Depth for Dense Instances

from arXiv: Data Structures and Algorithms

Authors: Jan van den Brand, Hossein Gholizadeh, Yonggang Jiang, Tijn de Vos

For $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities, we provide a randomized parallel algorithm for the minimum cost flow problem with $\tilde O(m+n^ {1.5})$ work and $\tilde O(\sqrt{n})$ depth. On moderately dense graphs ($m>n^{1.5}$), our algorithm is the first one to achieve both near-linear work and sub-linear depth. Previous algorithms are either achieving almost optimal work but are highly sequential [Chen, Kyng, Liu, Peng, Gutenberg, Sachdev, FOCS'22], or achieving sub-linear depth but use super-linear work, [Lee, Sidford, FOCS'14], [Orlin, Stein, Oper. Res. Lett.'93]. Our result also leads to improvements for the special cases of max flow, bipartite maximum matching, shortest paths, and reachability. Notably, the previous algorithms achieving near-linear work for shortest paths and reachability all have depth $n^{o(1)}\cdot \sqrt{n}$ [Fischer, Haeupler, Latypov, Roeyskoe, Sulser, SOSA'25], [Liu, Jambulapati, Sidford, FOCS'19]. Our algorithm consists of a parallel implementation of [van den Brand, Lee, Liu, Saranurak, Sidford, Song, Wang, STOC'21]. One important building block is a \emph{dynamic} parallel expander decomposition, which we show how to obtain from the recent parallel expander decomposition of [Chen, Meierhans, Probst Gutenberh, Saranurak, SODA'25].

Authors: Jan van den Brand, Hossein Gholizadeh, Yonggang Jiang, Tijn de Vos

For $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities, we provide a randomized parallel algorithm for the minimum cost flow problem with $\tilde O(m+n^ {1.5})$ work and $\tilde O(\sqrt{n})$ depth. On moderately dense graphs ($m>n^{1.5}$), our algorithm is the first one to achieve both near-linear work and sub-linear depth. Previous algorithms are either achieving almost optimal work but are highly sequential [Chen, Kyng, Liu, Peng, Gutenberg, Sachdev, FOCS'22], or achieving sub-linear depth but use super-linear work, [Lee, Sidford, FOCS'14], [Orlin, Stein, Oper. Res. Lett.'93]. Our result also leads to improvements for the special cases of max flow, bipartite maximum matching, shortest paths, and reachability. Notably, the previous algorithms achieving near-linear work for shortest paths and reachability all have depth $n^{o(1)}\cdot \sqrt{n}$ [Fischer, Haeupler, Latypov, Roeyskoe, Sulser, SOSA'25], [Liu, Jambulapati, Sidford, FOCS'19]. Our algorithm consists of a parallel implementation of [van den Brand, Lee, Liu, Saranurak, Sidford, Song, Wang, STOC'21]. One important building block is a \emph{dynamic} parallel expander decomposition, which we show how to obtain from the recent parallel expander decomposition of [Chen, Meierhans, Probst Gutenberh, Saranurak, SODA'25].

Connected Partitions via Connected Dominating Sets

from arXiv: Data Structures and Algorithms

Authors: Aikaterini Niklanovits, Kirill Simonov, Shaily Verma, Ziena Zeif

The classical theorem due to Gy\H{o}ri and Lov\'{a}sz states that any $k$-connected graph $G$ admits a partition into $k$ connected subgraphs, where each subgraph has a prescribed size and contains a prescribed vertex, as soon as the total size of target subgraphs is equal to the size of $G$. However, this result is notoriously evasive in terms of efficient constructions, and it is still unknown whether such a partition can be computed in polynomial time, even for $k = 5$. We make progress towards an efficient constructive version of the Gy\H{o}ri--Lov\'{a}sz theorem by considering a natural weakening of the $k$-connectivity requirement. Specifically, we show that the desired connected partition can be found in polynomial time, if $G$ contains $k$ disjoint connected dominating sets. As a consequence of this result, we give several efficient approximate and exact constructive versions of the original Gy\H{o}ri--Lov\'{a}sz theorem: 1. On general graphs, a Gy\H{o}ri--Lov\'{a}sz partition with $k$ parts can be computed in polynomial time when the input graph has connectivity $\Omega(k \cdot \log^2 n)$; 2. On convex bipartite graphs, connectivity of $4k$ is sufficient; 3. On biconvex graphs and interval graphs, connectivity of $k$ is sufficient, meaning that our algorithm gives a ``true'' constructive version of the theorem on these graph classes.

Authors: Aikaterini Niklanovits, Kirill Simonov, Shaily Verma, Ziena Zeif

The classical theorem due to Gy\H{o}ri and Lov\'{a}sz states that any $k$-connected graph $G$ admits a partition into $k$ connected subgraphs, where each subgraph has a prescribed size and contains a prescribed vertex, as soon as the total size of target subgraphs is equal to the size of $G$. However, this result is notoriously evasive in terms of efficient constructions, and it is still unknown whether such a partition can be computed in polynomial time, even for $k = 5$. We make progress towards an efficient constructive version of the Gy\H{o}ri--Lov\'{a}sz theorem by considering a natural weakening of the $k$-connectivity requirement. Specifically, we show that the desired connected partition can be found in polynomial time, if $G$ contains $k$ disjoint connected dominating sets. As a consequence of this result, we give several efficient approximate and exact constructive versions of the original Gy\H{o}ri--Lov\'{a}sz theorem: 1. On general graphs, a Gy\H{o}ri--Lov\'{a}sz partition with $k$ parts can be computed in polynomial time when the input graph has connectivity $\Omega(k \cdot \log^2 n)$; 2. On convex bipartite graphs, connectivity of $4k$ is sufficient; 3. On biconvex graphs and interval graphs, connectivity of $k$ is sufficient, meaning that our algorithm gives a ``true'' constructive version of the theorem on these graph classes.

Impact of Knowledge on the Cost of Treasure Hunt in Trees

from arXiv: Data Structures and Algorithms

Authors: Sébastien Bouchard, Arnaud Labourel, Andrzej Pelc

A mobile agent has to find an inert target in some environment that can be a graph or a terrain in the plane. This task is known as treasure hunt. We consider deterministic algorithms for treasure hunt in trees. Our goal is to establish the impact of different kinds of initial knowledge given to the agent on the cost of treasure hunt, defined as the total number of edge traversals until the agent reaches the treasure hidden in some node of the tree. The agent can be initially given either a complete map of the tree rooted at its starting node, with all port numbers marked, or a blind map of the tree rooted at its starting node but without port numbers. It may also be given, or not, the distance from the root to the treasure. This yields four different knowledge types that are partially ordered by their precision. (For example knowing the blind map and the distance is less precise than knowing the complete map and the distance). The penalty of a less precise knowledge type ${\cal T}_2$ over a more precise knowledge type ${\cal T}_1$ measures intuitively the worst-case ratio of the cost of an algorithm supplied with knowledge of type ${\cal T}_2$ over the cost of an algorithm supplied with knowledge of type ${\cal T}_1$. Our main results establish penalties for comparable knowledge types in this partial order. For knowledge types with known distance, the penalty for having a blind map over a complete map turns out to be very large. By contrast, for unknown distance, the penalty of having a blind map over having a complete map is small. When a map is provided (either complete or blind), the penalty of not knowing the distance over knowing it is medium.

Authors: Sébastien Bouchard, Arnaud Labourel, Andrzej Pelc

A mobile agent has to find an inert target in some environment that can be a graph or a terrain in the plane. This task is known as treasure hunt. We consider deterministic algorithms for treasure hunt in trees. Our goal is to establish the impact of different kinds of initial knowledge given to the agent on the cost of treasure hunt, defined as the total number of edge traversals until the agent reaches the treasure hidden in some node of the tree. The agent can be initially given either a complete map of the tree rooted at its starting node, with all port numbers marked, or a blind map of the tree rooted at its starting node but without port numbers. It may also be given, or not, the distance from the root to the treasure. This yields four different knowledge types that are partially ordered by their precision. (For example knowing the blind map and the distance is less precise than knowing the complete map and the distance). The penalty of a less precise knowledge type ${\cal T}_2$ over a more precise knowledge type ${\cal T}_1$ measures intuitively the worst-case ratio of the cost of an algorithm supplied with knowledge of type ${\cal T}_2$ over the cost of an algorithm supplied with knowledge of type ${\cal T}_1$. Our main results establish penalties for comparable knowledge types in this partial order. For knowledge types with known distance, the penalty for having a blind map over a complete map turns out to be very large. By contrast, for unknown distance, the penalty of having a blind map over having a complete map is small. When a map is provided (either complete or blind), the penalty of not knowing the distance over knowing it is medium.

Optimal mass estimation in the conditional sampling model

from arXiv: Data Structures and Algorithms

Authors: Tomer Adar, Eldar Fischer, Amit Levi

The conditional sampling model, introduced by Cannone, Ron and Servedio (SODA 2014, SIAM J.\ Comput.\ 2015) and independently by Chakraborty, Fischer, Goldhirsh and Matsliah (ITCS 2013, SIAM J.\ Comput.\ 2016), is a common framework for a number of studies (and variant models) into strengthened models of distribution testing. A core task in these investigations is that of estimating the mass of individual elements. The above works, as well as the improvement of Kumar, Meel and Pote (AISTATS 2025), have all yielded polylogarithmic algorithms. In this work we shatter the polylogarithmic barrier, and provide an estimator for individual elements that uses only $O(\log \log N) + O(\mathrm{poly}(1/\varepsilon))$ conditional samples. This in particular provides an improvement (and in some cases a unifying framework) for a number of related tasks, such as testing by learning of any label-invariant property, and distance estimation between two (unknown) distribution. The work of Chakraborty, Chakraborty and Kumar (SODA 2024) contains lower bounds for some of the above tasks. We derive from their work a nearly matching lower bound of $\tilde\Omega(\log\log N)$ for the estimation task. We also show that the full power of the conditional model is indeed required for the double-logarithmic bound. For the testing of label-invariant properties, we exponentially improve the previous lower bound from double-logarithmic to $\Omega(\log N)$ conditional samples, whereas our testing by learning algorithm provides an upper bound of $O(\mathrm{poly}(1/\varepsilon)\cdot\log N \log \log N)$.

Authors: Tomer Adar, Eldar Fischer, Amit Levi

The conditional sampling model, introduced by Cannone, Ron and Servedio (SODA 2014, SIAM J.\ Comput.\ 2015) and independently by Chakraborty, Fischer, Goldhirsh and Matsliah (ITCS 2013, SIAM J.\ Comput.\ 2016), is a common framework for a number of studies (and variant models) into strengthened models of distribution testing. A core task in these investigations is that of estimating the mass of individual elements. The above works, as well as the improvement of Kumar, Meel and Pote (AISTATS 2025), have all yielded polylogarithmic algorithms. In this work we shatter the polylogarithmic barrier, and provide an estimator for individual elements that uses only $O(\log \log N) + O(\mathrm{poly}(1/\varepsilon))$ conditional samples. This in particular provides an improvement (and in some cases a unifying framework) for a number of related tasks, such as testing by learning of any label-invariant property, and distance estimation between two (unknown) distribution. The work of Chakraborty, Chakraborty and Kumar (SODA 2024) contains lower bounds for some of the above tasks. We derive from their work a nearly matching lower bound of $\tilde\Omega(\log\log N)$ for the estimation task. We also show that the full power of the conditional model is indeed required for the double-logarithmic bound. For the testing of label-invariant properties, we exponentially improve the previous lower bound from double-logarithmic to $\Omega(\log N)$ conditional samples, whereas our testing by learning algorithm provides an upper bound of $O(\mathrm{poly}(1/\varepsilon)\cdot\log N \log \log N)$.

Enhanced Approximation Algorithms for the Capacitated Location Routing Problem

from arXiv: Data Structures and Algorithms

Authors: Jingyang Zhao, Mingyu Xiao, Shunwang Wang

The Capacitated Location Routing Problem is an important planning and routing problem in logistics, which generalizes the capacitated vehicle routing problem and the uncapacitated facility location problem. In this problem, we are given a set of depots and a set of customers where each depot has an opening cost and each customer has a demand. The goal is to open some depots and route capacitated vehicles from the opened depots to satisfy all customers' demand, while minimizing the total cost. In this paper, we propose a $4.169$-approximation algorithm for this problem, improving the best-known $4.38$-approximation ratio. Moreover, if the demand of each customer is allowed to be delivered by multiple tours, we propose a more refined $4.091$-approximation algorithm. Experimental study on benchmark instances shows that the quality of our computed solutions is better than that of the previous algorithm and is also much closer to optimality than the provable approximation factor.

Authors: Jingyang Zhao, Mingyu Xiao, Shunwang Wang

The Capacitated Location Routing Problem is an important planning and routing problem in logistics, which generalizes the capacitated vehicle routing problem and the uncapacitated facility location problem. In this problem, we are given a set of depots and a set of customers where each depot has an opening cost and each customer has a demand. The goal is to open some depots and route capacitated vehicles from the opened depots to satisfy all customers' demand, while minimizing the total cost. In this paper, we propose a $4.169$-approximation algorithm for this problem, improving the best-known $4.38$-approximation ratio. Moreover, if the demand of each customer is allowed to be delivered by multiple tours, we propose a more refined $4.091$-approximation algorithm. Experimental study on benchmark instances shows that the quality of our computed solutions is better than that of the previous algorithm and is also much closer to optimality than the provable approximation factor.

Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs

from arXiv: Data Structures and Algorithms

Authors: Nir Ailon, Akhiad Bercovich, Omri Weinstein

We propose a cheaper alternative bilinear operator to matrix-multiplication in deep neural networks (DNNs). Unlike many stubborn attempts to accelerate MatMuls in DNN inference, this operator is supported by capabilities of existing GPU hardware, most notably NVIDIA TensorCores. To our knowledge, this is the first GPU-native acceleration technique which \emph{does not decrease} (in fact, increases) the number of trainable parameters of the network, mitigating the accuracy-loss of compression-based techniques. Hence, this operator is at the same time more expressive than MatMul, yet requires substantially \emph{fewer} FLOPs to evaluate. We term this new operator \emph{Strassen-Tile} (STL). The main idea behind STL$(X,W)$ is a \emph{local} change-of-basis (learnable encoder) on weights and activation \emph{tiles}, after which we perform batched \emph{elementwise} products between tiles, and a final decoding transformation (inspired by algebraic pipelines from fast matrix and polynomial multiplication). We compare STL against two benchmarks. The first one is SoTA T2T-ViT on Imagenet-1K. Here we show that replacing \emph{all} linear layers with STL and training from scratch, results in factor x2.7 reduction in FLOPs with a 0.5 \emph{accuracy improvement}. Our second speed-accuracy comparison benchmark for pretrained LLMs is the most practical GPU-acceleration technique, \twofour structured Sparsity. Finetuning TinyLlama \cite{tinyllama24} with STL layers on the Slim Pajama dataset, achieves similar accuracy to 2:4, with x2.2 FLOP speedup compared to x1.7 of the latter. Finally, we discuss a group-theoretic approach for discovering \emph{universal} encoders for STL, which could lead to fast \emph{black-box} acceleration via approximate matrix-multiplication (AMM).

Authors: Nir Ailon, Akhiad Bercovich, Omri Weinstein

We propose a cheaper alternative bilinear operator to matrix-multiplication in deep neural networks (DNNs). Unlike many stubborn attempts to accelerate MatMuls in DNN inference, this operator is supported by capabilities of existing GPU hardware, most notably NVIDIA TensorCores. To our knowledge, this is the first GPU-native acceleration technique which \emph{does not decrease} (in fact, increases) the number of trainable parameters of the network, mitigating the accuracy-loss of compression-based techniques. Hence, this operator is at the same time more expressive than MatMul, yet requires substantially \emph{fewer} FLOPs to evaluate. We term this new operator \emph{Strassen-Tile} (STL). The main idea behind STL$(X,W)$ is a \emph{local} change-of-basis (learnable encoder) on weights and activation \emph{tiles}, after which we perform batched \emph{elementwise} products between tiles, and a final decoding transformation (inspired by algebraic pipelines from fast matrix and polynomial multiplication). We compare STL against two benchmarks. The first one is SoTA T2T-ViT on Imagenet-1K. Here we show that replacing \emph{all} linear layers with STL and training from scratch, results in factor x2.7 reduction in FLOPs with a 0.5 \emph{accuracy improvement}. Our second speed-accuracy comparison benchmark for pretrained LLMs is the most practical GPU-acceleration technique, \twofour structured Sparsity. Finetuning TinyLlama \cite{tinyllama24} with STL layers on the Slim Pajama dataset, achieves similar accuracy to 2:4, with x2.2 FLOP speedup compared to x1.7 of the latter. Finally, we discuss a group-theoretic approach for discovering \emph{universal} encoders for STL, which could lead to fast \emph{black-box} acceleration via approximate matrix-multiplication (AMM).

PREAMBLE: Private and Efficient Aggregation of Block Sparse Vectors and Applications

from arXiv: Data Structures and Algorithms

Authors: Hilal Asi, Vitaly Feldman, Hannah Keller, Guy N. Rothblum, Kunal Talwar

We revisit the problem of secure aggregation of high-dimensional vectors in a two-server system such as Prio. These systems are typically used to aggregate vectors such as gradients in private federated learning, where the aggregate itself is protected via noise addition to ensure differential privacy. Existing approaches require communication scaling with the dimensionality, and thus limit the dimensionality of vectors one can efficiently process in this setup. We propose PREAMBLE: Private Efficient Aggregation Mechanism for BLock-sparse Euclidean Vectors. PREAMBLE is a novel extension of distributed point functions that enables communication- and computation-efficient aggregation of block-sparse vectors, which are sparse vectors where the non-zero entries occur in a small number of clusters of consecutive coordinates. We then show that PREAMBLE can be combined with random sampling and privacy amplification by sampling results, to allow asymptotically optimal privacy-utility trade-offs for vector aggregation, at a fraction of the communication cost. When coupled with recent advances in numerical privacy accounting, our approach incurs a negligible overhead in noise variance, compared to the Gaussian mechanism used with Prio.

Authors: Hilal Asi, Vitaly Feldman, Hannah Keller, Guy N. Rothblum, Kunal Talwar

We revisit the problem of secure aggregation of high-dimensional vectors in a two-server system such as Prio. These systems are typically used to aggregate vectors such as gradients in private federated learning, where the aggregate itself is protected via noise addition to ensure differential privacy. Existing approaches require communication scaling with the dimensionality, and thus limit the dimensionality of vectors one can efficiently process in this setup. We propose PREAMBLE: Private Efficient Aggregation Mechanism for BLock-sparse Euclidean Vectors. PREAMBLE is a novel extension of distributed point functions that enables communication- and computation-efficient aggregation of block-sparse vectors, which are sparse vectors where the non-zero entries occur in a small number of clusters of consecutive coordinates. We then show that PREAMBLE can be combined with random sampling and privacy amplification by sampling results, to allow asymptotically optimal privacy-utility trade-offs for vector aggregation, at a fraction of the communication cost. When coupled with recent advances in numerical privacy accounting, our approach incurs a negligible overhead in noise variance, compared to the Gaussian mechanism used with Prio.

Local Pan-Privacy for Federated Analytics

from arXiv: Data Structures and Algorithms

Authors: Vitaly Feldman, Audra McMillan, Guy N. Rothblum, Kunal Talwar

Pan-privacy was proposed by Dwork et al. as an approach to designing a private analytics system that retains its privacy properties in the face of intrusions that expose the system's internal state. Motivated by federated telemetry applications, we study local pan-privacy, where privacy should be retained under repeated unannounced intrusions on the local state. We consider the problem of monitoring the count of an event in a federated system, where event occurrences on a local device should be hidden even from an intruder on that device. We show that under reasonable constraints, the goal of providing information-theoretic differential privacy under intrusion is incompatible with collecting telemetry information. We then show that this problem can be solved in a scalable way using standard cryptographic primitives.

Authors: Vitaly Feldman, Audra McMillan, Guy N. Rothblum, Kunal Talwar

Pan-privacy was proposed by Dwork et al. as an approach to designing a private analytics system that retains its privacy properties in the face of intrusions that expose the system's internal state. Motivated by federated telemetry applications, we study local pan-privacy, where privacy should be retained under repeated unannounced intrusions on the local state. We consider the problem of monitoring the count of an event in a federated system, where event occurrences on a local device should be hidden even from an intruder on that device. We show that under reasonable constraints, the goal of providing information-theoretic differential privacy under intrusion is incompatible with collecting telemetry information. We then show that this problem can be solved in a scalable way using standard cryptographic primitives.

Graph Discovery and Source Detection in Temporal Graphs

from arXiv: Data Structures and Algorithms

Authors: Ben Bals

Researchers, policy makers, and engineers need to make sense of data on spreading processes as diverse as viral infections, water contamination, and misinformation in social networks. Classical questions include predicting infection behavior in a given network or deducing the structure of a network from infection data. We study two central problems in this area. In graph discovery, we aim to fully reconstruct the structure of a graph from infection data. In source detection, we observe a limited subset of the infections and aim to deduce the source of the infection chain. These questions have received considerable attention and have been analyzed in many settings (e.g., under different models of spreading processes), yet all previous work shares the assumption that the network has the same structure at every point in time. For example, if we consider how a disease spreads, it is unrealistic to assume that two people can either never or always infect each other, rather such an infection is possible precisely when they meet. Temporal graphs, in which connections change over time, have recently been used as a more realistic graph model to study infections. Despite this recent attention, we are the first to study graph discovery or source detection in temporal graphs. We propose models for temporal graph discovery and source detection that are consistent with previous work on static graphs and extend it to embrace the stronger expressiveness of temporal graphs. For this, we employ the standard susceptible-infected-resistant model of spreading processes, which is particularly often used to study diseases. We provide algorithms, lower bounds, and some experimental evaluation.

Authors: Ben Bals

Researchers, policy makers, and engineers need to make sense of data on spreading processes as diverse as viral infections, water contamination, and misinformation in social networks. Classical questions include predicting infection behavior in a given network or deducing the structure of a network from infection data. We study two central problems in this area. In graph discovery, we aim to fully reconstruct the structure of a graph from infection data. In source detection, we observe a limited subset of the infections and aim to deduce the source of the infection chain. These questions have received considerable attention and have been analyzed in many settings (e.g., under different models of spreading processes), yet all previous work shares the assumption that the network has the same structure at every point in time. For example, if we consider how a disease spreads, it is unrealistic to assume that two people can either never or always infect each other, rather such an infection is possible precisely when they meet. Temporal graphs, in which connections change over time, have recently been used as a more realistic graph model to study infections. Despite this recent attention, we are the first to study graph discovery or source detection in temporal graphs. We propose models for temporal graph discovery and source detection that are consistent with previous work on static graphs and extend it to embrace the stronger expressiveness of temporal graphs. For this, we employ the standard susceptible-infected-resistant model of spreading processes, which is particularly often used to study diseases. We provide algorithms, lower bounds, and some experimental evaluation.

Monday, March 17

Gambling on the Richter Scale

from Ben Recht

Risk aversion when evaluating risk scores

In the last post, I described Leonard Savage’s argument that we could score predictions based on their utility. If a forecast will be used to compute expected returns, there is no “incentive” to report probabilities that the forecaster didn’t believe in.

I worked out the abstract case for this last week, but today let me make it concrete as it will let me finally make “rigorous” a puzzling quote by Frank Ramsey:

“The old-established way of measuring a person's belief is to propose a bet, and see what are the lowest odds which he will accept.”

Since I’m going to be “rigorous,” this post will have too much math. But it will also feature degenerate gambling. So hopefully something for everyone.

Let’s suppose we have a single event we’re betting on. There are four possible outcomes: if you bet a dollar that the event happens and the event happens, you receive B dollars. If you bet a dollar that the event won’t happen and the event doesn’t happen, you receive 1/B dollars. If you bet incorrectly, you lose your dollar. Up to the bookie costs, most sports betting works in this fashion. The product of the fractional payout for betting on a team winning or losing is always close to one.1

Suppose you have a model predicting that the event happens with probability Q. How much of your pot should you bet? You could just maximize the expected value of your return. Suppose you wager the fraction A of your total money (so A is between 0 and 1). Then, if you bet on the event happening, the expected return on your bet is

Q A B - (1-Q) A

That is, if the event happens (with probability Q), you win AB dollars. If the event doesn’t happen (probability 1-Q), you lose A dollars. Using the same reasoning, the expected return if you bet against the event is

-Q A + (1-Q) A/B.

Under your probabilistic model of the future, only one of these bets can give a positive expected value, and that’s the one you’ll choose. Specifically, you should bet on the event if the odds the model places on the event happening exceeds 1/B. In math, if

you should bet on the event, and you should bet against otherwise.

Now, if you want to maximize your expected value, you’ll put all of your money on your chosen bet. This seems, well, degenerate, but let’s see what we can learn from the degens. Certainly, a degenerate bet gives us a way to evaluate a forecaster’s probabilistic predictions. If the actual probability of the event is P, then the actual expected return on the bet is

This is the expected value with respect to P of the return with the policy πQ that was designed using Q.

Note that a forecaster who correctly estimates the probability of the event as P will have not only a positive expected return but the maximal possible expected return. But there are many Qs that get the same maximal return.

Let me summarize the situation so far. Eliciting degenerate bets yields a proper scoring rule for probability distributions. A scoring rule is a function that takes a forecaster’s distribution Q and compares it to the probabilistic nature of reality P.2 For every event Y, a scoring rule assigns a number evaluating the quality of the forecast Q. If Y happens, it is R(Q,Y). The scoring rule is then

A scoring rule S(Q,P) is proper if S(Q,P) is less than or equal to S(P,P) for all Q and P. That is, the score function is maximized by the “true distribution” of nature. A strictly proper scoring rule is one where S(Q,P) is strictly less than S(P,P) for all Q that are not equal to P. Strictly proper rules are preferable because a forecaster who guesses the precise true pattern of future outcomes achieves the optimal score.

In our degenerate gambling setup, we have derived a proper scoring rule but not a strictly proper scoring rule. Interestingly, we can get proper scoring rules back by adding a bit of risk aversion to the utility maximization problem.

How should we gamble if we don’t want to be degens? A gambling strategy that dates back to Bernoulli suggests that you should maximize the rate of return over several rounds of gambling and not just try to win it all in one hand. This turns out to be equivalent to maximizing the geometric mean of your returns.

This model assumes a hypothetical infinite sequence of bets on events whose outcome has probability Q. The utility is the long-run average rate of return for always allocating a bet on the event, whose size is equal to the fraction A of your total assets.

If you prefer to think in terms of logarithms, this utility maximization problem is equivalent to

The optimal solution of this problem is called the Kelly Criterion. You bet

A = Q - (1-Q)/B

on the event happening, provided that the odds of the event happening are greater than or equal to 1/B. If the odds are worse than 1/B, you bet against the event with allocation

A = (1-Q) - Q B

The expected value of the Kelly Bet, assuming the probability is P has a fascinating form:

where H(P,Q) is the cross-entropy of Q relative to P. The second term is not affected by the choice of Q. It is the expected return of the policy derived from the true distribution P. We have

Where the left-hand side is the negative Kullback-Leibler Divergence between P and Q. What the KL divergence means is the topic of another post. But for today, note that the KL divergence is always positive and equals zero only if P=Q. That is, this particular utility maximization problem yields a strictly proper scoring rule. Moreover, the scoring rule is always morally equal to the KL divergence, no matter the relationship between the offered bet size and the probability of the event.

You might guess (and you’d be correct) that you get the Brier Score by using a different risk-aversion model. If your utility were regularized by a quadratic term

then you get a proper scoring rule with a quadratic distance measured between P and Q. In the case of even payouts (B=1), the induced scoring rule is the Brier Score.

Though I never see these scores motivated this way in the recent literature, I find Savage’s appeal to Homo Economus instructive. In all of the papers and articles I’ve read about proper scoring rules, the promoters often argue forecasters should not lean too heavily on prediction accuracy. They advocate for calibration and appropriate “distributional sharpness.” I’ve tried but never understood why exactly these properties are the gold standard of what we should strive for.

The utility maximization framework is far more explicit. The scoring rule comes out of explicitly articulating our expectations of the forecasting system, and the penalty will be a direct measurement of the deviation of the expectations from reality. When we explicitly declare the evaluation rules, we get a scoring rule for free. It’s interesting to me that in order to get a strictly proper rule, we have to add an element of risk aversion to the utility maximization.

I think risk aversion is what the statisticians are after, too. Statisticians argue that scoring rules are supposed to enable decision-makers to hedge against uncertainty. For example, when Frank Harrell eloquently describes his advocacy for scoring rules, he explicitly argues for risk aversion. But if risk aversion is the intended goal, why not be explicit about it and precisely declare the purported value of a probabilistic forecast? Value is only clear if we articulate how we will evaluate.

Subscribe now

1

It’s always less than one because the house never loses.

2

I admit I am in pain typing out that last sentence, but let’s see where it takes us.

By Ben Recht

Workshop on Algorithms for Large Data

from CS Theory Events

April 14-16, 2025 Online waldo-workshop.github.io/2025.html On behalf of the organizers – Guy Blanc, Quanquan Liu, Shivam Nadimpalli, Samson Zhou – we would like to invite you to the Workshop on Algorithms for Large Data (Online) 2025, which will be from Monday, April 14th to Wednesday, April 16th. We are pleased to host a fantastic set … Continue reading Workshop on Algorithms for Large Data

By shacharlovett

April 14-16, 2025 Online https://waldo-workshop.github.io/2025.html On behalf of the organizers – Guy Blanc, Quanquan Liu, Shivam Nadimpalli, Samson Zhou – we would like to invite you to the Workshop on Algorithms for Large Data (Online) 2025, which will be from Monday, April 14th to Wednesday, April 16th. We are pleased to host a fantastic set … Continue reading Workshop on Algorithms for Large Data

By shacharlovett

Computational Complexity of Finding Subgroups of a Given Order

from arXiv: Computational Complexity

Authors: K. Lakshmanan

We study the problem of finding a subgroup of a given order in a finite group, where the group is represented by its Cayley table. We establish that this problem is NP-hard in the general case by providing a reduction from the Hamiltonian Cycle problem. Additionally, we analyze the complexity of the problem in the special case of abelian groups and present a linear-time algorithm for finding a subgroup of a given order when the input is given in the form of a Cayley table. To the best of our knowledge, no prior work has addressed the complexity of this problem under the Cayley table representation. Our results also provide insight into the computational difficulty of finding subgroup across different ways of groups representations.

Authors: K. Lakshmanan

We study the problem of finding a subgroup of a given order in a finite group, where the group is represented by its Cayley table. We establish that this problem is NP-hard in the general case by providing a reduction from the Hamiltonian Cycle problem. Additionally, we analyze the complexity of the problem in the special case of abelian groups and present a linear-time algorithm for finding a subgroup of a given order when the input is given in the form of a Cayley table. To the best of our knowledge, no prior work has addressed the complexity of this problem under the Cayley table representation. Our results also provide insight into the computational difficulty of finding subgroup across different ways of groups representations.