Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Thursday, February 20

In defense of typing monkeys

from Ben Recht

We definitely are training on our test sets. This is fine.

I appreciate all of your feedback about why machine learning practice doesn’t seem to adaptively overfit to static test sets. One commenter proposed that Kaggle competitions wouldn’t tolerate people spamming the leaderboard. I agree that active Kaggle competitions wouldn’t accept querying a test set one million times, but on machine learning benchmarks, Google is just fine with it. Here’s a fun paper that evaluated the CIFAR10 test error three hundred thousand times for a single figure. I’m not exaggerating that number.

Since we query the test sets to death, the overwhelming answer to why running on the RL on the test set isn’t a problem seems to be that hyperparameter tuning isn’t powerful enough to overfit. I don’t buy this for so many reasons. First, hyperparameters are just vibes. What are the hyperparameters in a neural network?

  1. The number of units in each layer

  2. The number of layers

  3. The architecture

  4. The regularization

  5. The weight initialization

  6. The ADAM parameters

  7. The input encoding

  8. The length of the hamburger train

  9. The random seed

  10. The bugs in the code

As far as I can tell, the only parts of the neural network that are not parameters are the weights after initialization.1 We can tell ourselves stories that this list of ten things is somehow too weak to overfit. But if you let me run RL on the neural network weights, I can get the test error to zero. Why is it that if we optimize everything else, we don’t get the error to zero?

The paper from Google Research that I cited above that queried the test set 300K times only was messing with the architecture. The research community at large runs a giant reinforcement learning algorithm in parallel. Thousands of teams compare what each other has done in papers and get their leaderboard performances down to zero. In Kaggle competitions, competitors don’t communicate directly, but they all surf the web looking for strategies. On ML benchmarks, everyone tries to replicate everyone else's tricks and find novel innovations to hill climb from there.

And what happens? The test error gets really small over time! It goes to zero. Let’s stick with CIFAR10 for a bit because it’s not only a data set that has been queried to death and has absurdly low test error, but it’s one where my group did a replication study.

Did this massive frenzy of test set optimization adapt to the labels and yield poor out-of-sample behavior? It turns out it didn’t. The models with the best error on the public test set ended up being the best ones on the private test set.

The top model in gold was a huge convolutional neural network. It was one of the most recent models we tested. Not only did it have the lowest error on the CIFAR10 data set (2.9% error), but it had the best test error on the replicated CIFAR10.1 data set. That is, it generalized the best. Community reinforcement learning found a model with low test error and low generalization error. The bias and variance were both minimized.

In 2021, a Google team reported a vision transformer model that achieved 0.5% error on CIFAR10. If anyone knows how it fared on CIFAR10.1, please let me know in the comments.

This clustering of models “on the line,” where benchmark test error predicts out-of-sample performance, is not just a CIFAR10 artifact. It is a robust, well-replicated phenomenon, and we should try to understand it better to make sense of benchmarks and evaluations.

My guess for why the gap between theory and practice is so stark is pretty mundane: it’s that most learning theory tells us nothing about data.

In the construction of attacking the test set from last time, I only needed to reason about the labels. I said nothing about the features at all. If you have a fixed hidden bit string of length n, then with n guesses, I can recover the labels from test error. It’s not hard to verify that you can get closer to the labels by boosting random guesses. As Moritz put it, this is “Competing in a data science contest without reading the data.” You can hill climb without looking at the features. You can hill climb without looking at the training labels!

However, this is only part of the story. Sadly, most learning theory bounds are derived without reading the data. The standard Hoeffding bound that argues that the generalization gap will scale no worse than

also never looks at the data. If you sample a random predictor, the mean test error is 1/2 and the variance 1/n, where n is the number of test data points. Hence, if you sample K random predictors and don’t boost, your minimum test error will be

Now, let’s say you have a set of functions that can effectively be represented by K archetypical functions. For example, a class with VC dimension d can be represented by about 2^d functions. Plug this into my expression and, lo and behold, you have a learning theory bound. This bound holds regardless of whether the VC class is any good at classifying your data. Our large deviation inequalities only care about the number of things you test, not what you are actually testing.

Learning theory almost never says anything about data. This is because if we knew something about the data, we probably wouldn’t be using nonparametric prediction algorithms. But avoiding thinking about data is problematic. Our thinking derived from learning theory is wrong and misleading. Can we think about new theories that reflect the last four decades of fruitful training on the test set?

Subscribe now

1

It’s wild that this is the only thing you’re not allowed to optimize.

By Ben Recht

TR25-015 | An exposition of recent list-size bounds of FRS Codes | Abhibhav Garg, Prahladh Harsha, Mrinal Kumar, Ramprasad Saptharishi, Ashutosh Shankar

from ECCC Papers

In the last year, there have been some remarkable improvements in the combinatorial list-size bounds of Folded Reed Solomon codes and multiplicity codes. Starting from the work on Kopparty, Ron-Zewi, Saraf and Wootters (and subsequent simplifications due to Tamo), we have had dramatic improvements in the list-size bounds of FRS codes and Chen & Zhang. In this note, we give a short exposition of these three results (Tamo, Srivastava and Chen-Zhang).

In the last year, there have been some remarkable improvements in the combinatorial list-size bounds of Folded Reed Solomon codes and multiplicity codes. Starting from the work on Kopparty, Ron-Zewi, Saraf and Wootters (and subsequent simplifications due to Tamo), we have had dramatic improvements in the list-size bounds of FRS codes and Chen & Zhang. In this note, we give a short exposition of these three results (Tamo, Srivastava and Chen-Zhang).

FAQ on Microsoft’s topological qubit thing

from Scott Aaronson

Q1. Did you see Microsoft’s announcement?A. Yes, thanks, you can stop emailing to ask! Microsoft’s Chetan Nayak was even kind enough to give me a personal briefing a few weeks ago. Yesterday I did a brief interview on this for the BBC’s World Business Report, and I also commented for MIT Technology Review. Q2. What […]

Q1. Did you see Microsoft’s announcement?
A. Yes, thanks, you can stop emailing to ask! Microsoft’s Chetan Nayak was even kind enough to give me a personal briefing a few weeks ago. Yesterday I did a brief interview on this for the BBC’s World Business Report, and I also commented for MIT Technology Review.

Q2. What is a topological qubit?
A. It’s a special kind of qubit built using nonabelian anyons, which are excitations that can exist in a two-dimensional medium, behaving neither as fermions nor as bosons. The idea grew out of seminal work by Alexei Kitaev, Michael Freedman, and others starting in the late 1990s. Topological qubits have proved harder to create and control than ordinary qubits.

Q3. Then why do people care about topological qubits?
A. The dream is that they could eventually be more resilient to decoherence than regular qubits, since an error, in order to matter, needs to change the topology of how the nonabelian anyons are braided around each other. So you’d have some robustness built in to the physics of your system, rather than having to engineer it laboriously at the software level (via quantum fault-tolerance).

Q4. Did Microsoft create the first topological qubit?
A. Well, they say they did! [Update: Commenters point out to me that buried in Nature’s review materials is the following striking passage: “The editorial team wishes to point out that the results in this manuscript do not represent evidence for the presence of Majorana zero modes in the reported devices. The work is published for introducing a device architecture that might enable fusion experiments using future Majorana zero modes.” So, the situation is that Microsoft is unambiguously claiming to have created a topological qubit, and they just published a relevant paper in Nature, but their claim to have created a topological qubit has not yet been accepted by Nature’s peer review.]

Q5. Didn’t Microsoft claim the experimental creation of Majorana zero modes—a building block of topological qubits—back in 2018, and didn’t they then need to retract that claim?
A. Yep. Certainly that history is making some experts cautious about the new claim. When I asked Chetan Nayak how confident I should be, his response was basically “look, we now have a topological qubit that’s behaving fully as a qubit; how much more do people want?”

Q6. Is this a big deal?
A. If the claim stands, I’d say it would be a scientific milestone for the field of topological quantum computing and physics beyond. The number of topological qubits manipulated in a single experiment would then have finally increased from 0 to 1, and depending on how you define things, arguably a “new state of matter” would even have been created, one that doesn’t appear in nature (but only in Nature).

Q7. Is this useful?
A. Not yet! If anyone claims that a single qubit, or even 30 qubits, are already useful for speeding up computation, you can ignore anything else that person says. (Certainly Microsoft makes no such claim.) On the question of what we believe quantum computers will or won’t eventually be useful for, see like half the archives of this blog over the past twenty years.

Q8. Does this announcement vindicate topological qubits as the way forward for quantum computing?
A. Think of it this way. If Microsoft’s claim stands, then topological qubits have finally reached some sort of parity with where more traditional qubits were 20-30 years ago. I.e., the non-topological approaches like superconducting, trapped-ion, and neutral-atom have an absolutely massive head start: there, Google, IBM, Quantinuum, QuEra, and other companies now routinely do experiments with dozens or even hundreds of entangled qubits, and thousands of two-qubit gates. Topological qubits can win if, and only if, they turn out to be so much more reliable that they leapfrog the earlier approaches—sort of like the transistor did to the vacuum tube and electromechanical relay. Whether that will happen is still an open question, to put it extremely mildly.

Q9. Are there other major experimental efforts to build topological qubits?
A. No, it’s pretty much just Microsoft. Purely as a scientist who likes to see things tried, I’m grateful that one player stuck with the topological approach even when it ended up being a long, painful slog.

Q10. Is Microsoft now on track to scale to a million topological qubits in the next few years?
A. In the world of corporate PR and pop-science headlines, sure, why not? As Bender from Futurama says, “I can guarantee anything you want!” In the world of reality, a “few years” certainly feels overly aggressive to me, but good luck to Microsoft and good luck to its competitors! I foresee exciting times ahead, provided we still have a functioning civilization in which to enjoy them.

By Scott

Asst./Assoc./Full Professor at University at Buffalo (apply by March 10, 2025)

from CCI: jobs

We are a bit late in the cycle but we have up to two positions at all levels in Theoretical Computer Science (broadly defined): we encourage applicants in both algorithms and complexity theory. (The March 10 is a soft deadline.) Website: www.ubjobs.buffalo.edu/postings/55930 Email: cse-theory-search@buffalo.edu

We are a bit late in the cycle but we have up to two positions at all levels in Theoretical Computer Science (broadly defined): we encourage applicants in both algorithms and complexity theory. (The March 10 is a soft deadline.)

Website: https://www.ubjobs.buffalo.edu/postings/55930
Email: cse-theory-search@buffalo.edu

By shacharlovett

EHOP: A Dataset of Everyday NP-Hard Optimization Problems

from arXiv: Computational Complexity

Authors: Alex Duchnowski, Ellie Pavlick, Alexander Koller

We introduce the dataset of Everyday Hard Optimization Problems (EHOP), a collection of NP-hard optimization problems expressed in natural language. EHOP includes problem formulations that could be found in computer science textbooks, versions that are dressed up as problems that could arise in real life, and variants of well-known problems with inverted rules. We find that state-of-the-art LLMs, across multiple prompting strategies, systematically solve textbook problems more accurately than their real-life and inverted counterparts. We argue that this constitutes evidence that LLMs adapt solutions seen during training, rather than leveraging reasoning abilities that would enable them to generalize to novel problems.

Authors: Alex Duchnowski, Ellie Pavlick, Alexander Koller

We introduce the dataset of Everyday Hard Optimization Problems (EHOP), a collection of NP-hard optimization problems expressed in natural language. EHOP includes problem formulations that could be found in computer science textbooks, versions that are dressed up as problems that could arise in real life, and variants of well-known problems with inverted rules. We find that state-of-the-art LLMs, across multiple prompting strategies, systematically solve textbook problems more accurately than their real-life and inverted counterparts. We argue that this constitutes evidence that LLMs adapt solutions seen during training, rather than leveraging reasoning abilities that would enable them to generalize to novel problems.

Does there exist a quantum fingerprinting protocol without coherent measurements?

from arXiv: Computational Complexity

Authors: Atsuya Hasegawa, Srijita Kundu, François Le Gall, Harumichi Nishimura, Qisheng Wang

Buhrman, Cleve, Watrous, and de Wolf (PRL 2001) discovered the quantum fingerprinting protocol, which is the quantum SMP protocol with $O(\log n)$ qubits communication for the equality problem. In the protocol, Alice and Bob create some quantum fingerprints of their inputs, and the referee conducts the SWAP tests for the quantum fingerprints. Since $\Omega(\sqrt{n})$ bits communication is required with the classical SMP scheme for the equality problem first shown by Newman and Szegedy (STOC 1996), there exists an exponential quantum advantage in the amount of communication. In this paper, we consider a setting in which the referee can do only incoherent measurements rather than coherent measurements including the SWAP tests. We first show that, in the case of one-way LOCC measurements, $\Omega(\sqrt{n})$ qubits communication is required. To prove the result, we derive a new method to replace quantum messages by classical messages and consider a reduction to the optimal lower bound in the hybrid SMP model where one message is quantum and the other is classical, which was first shown by Klauck and Podder (MFCS 2014). Our method uses the result of Oszmaniec, Guerini, Wittek, and Ac\'in (PRL 2017), who showed that general POVM measurements can be simulated by randomized projective measurements with small ancilla qubits, and Newman's theorem. We further investigate the setting of quantum SMP protocols with two-way LOCC measurements, and derive a lower bound against some restricted two-way LOCC measurements. To prove it, we revisit the technique to replace quantum messages by classical deterministic messages introduced by Aaronson (ToC 2005) and generalized by Gavinsky, Regev, and de Wolf (CJTCS 2008), and show that, using the deterministic message, the referee can simulate the two-way LOCC measurements.

Authors: Atsuya Hasegawa, Srijita Kundu, François Le Gall, Harumichi Nishimura, Qisheng Wang

Buhrman, Cleve, Watrous, and de Wolf (PRL 2001) discovered the quantum fingerprinting protocol, which is the quantum SMP protocol with $O(\log n)$ qubits communication for the equality problem. In the protocol, Alice and Bob create some quantum fingerprints of their inputs, and the referee conducts the SWAP tests for the quantum fingerprints. Since $\Omega(\sqrt{n})$ bits communication is required with the classical SMP scheme for the equality problem first shown by Newman and Szegedy (STOC 1996), there exists an exponential quantum advantage in the amount of communication. In this paper, we consider a setting in which the referee can do only incoherent measurements rather than coherent measurements including the SWAP tests. We first show that, in the case of one-way LOCC measurements, $\Omega(\sqrt{n})$ qubits communication is required. To prove the result, we derive a new method to replace quantum messages by classical messages and consider a reduction to the optimal lower bound in the hybrid SMP model where one message is quantum and the other is classical, which was first shown by Klauck and Podder (MFCS 2014). Our method uses the result of Oszmaniec, Guerini, Wittek, and Ac\'in (PRL 2017), who showed that general POVM measurements can be simulated by randomized projective measurements with small ancilla qubits, and Newman's theorem. We further investigate the setting of quantum SMP protocols with two-way LOCC measurements, and derive a lower bound against some restricted two-way LOCC measurements. To prove it, we revisit the technique to replace quantum messages by classical deterministic messages introduced by Aaronson (ToC 2005) and generalized by Gavinsky, Regev, and de Wolf (CJTCS 2008), and show that, using the deterministic message, the referee can simulate the two-way LOCC measurements.

Multi-Covering a Point Set by $m$ Disks with Minimum Total Area

from arXiv: Computational Geometry

Authors: Mariem Guitouni, Chek-Manh Loi, Sándor P. Fekete, Michael Perk, Aaron T. Becker

A common robotics sensing problem is to place sensors to robustly monitor a set of assets, where robustness is assured by requiring asset $p$ to be monitored by at least $\kappa(p)$ sensors. Given $n$ assets that must be observed by $m$ sensors, each with a disk-shaped sensing region, where should the sensors be placed to minimize the total area observed? We provide and analyze a fast heuristic for this problem. We then use the heuristic to initialize an exact Integer Programming solution. Subsequently, we enforce separation constraints between the sensors by modifying the integer program formulation and by changing the disk candidate set.

Authors: Mariem Guitouni, Chek-Manh Loi, Sándor P. Fekete, Michael Perk, Aaron T. Becker

A common robotics sensing problem is to place sensors to robustly monitor a set of assets, where robustness is assured by requiring asset $p$ to be monitored by at least $\kappa(p)$ sensors. Given $n$ assets that must be observed by $m$ sensors, each with a disk-shaped sensing region, where should the sensors be placed to minimize the total area observed? We provide and analyze a fast heuristic for this problem. We then use the heuristic to initialize an exact Integer Programming solution. Subsequently, we enforce separation constraints between the sensors by modifying the integer program formulation and by changing the disk candidate set.

Fast Kd-trees for the Kullback--Leibler Divergence and other Decomposable Bregman Divergences

from arXiv: Computational Geometry

Authors: Tuyen Pham, Hubert Wagner

The contributions of the paper span theoretical and implementational results. First, we prove that Kd-trees can be extended to spaces in which the distance is measured with an arbitrary Bregman divergence. Perhaps surprisingly, this shows that the triangle inequality is not necessary for correct pruning in Kd-trees. Second, we offer an efficient algorithm and C++ implementation for nearest neighbour search for decomposable Bregman divergences. The implementation supports the Kullback--Leibler divergence (relative entropy) which is a popular distance between probability vectors and is commonly used in statistics and machine learning. This is a step toward broadening the usage of computational geometry algorithms. Our benchmarks show that our implementation efficiently handles both exact and approximate nearest neighbour queries. Compared to a naive approach, we achieve two orders of magnitude speedup for practical scenarios in dimension up to 100. Our solution is simpler and more efficient than competing methods.

Authors: Tuyen Pham, Hubert Wagner

The contributions of the paper span theoretical and implementational results. First, we prove that Kd-trees can be extended to spaces in which the distance is measured with an arbitrary Bregman divergence. Perhaps surprisingly, this shows that the triangle inequality is not necessary for correct pruning in Kd-trees. Second, we offer an efficient algorithm and C++ implementation for nearest neighbour search for decomposable Bregman divergences. The implementation supports the Kullback--Leibler divergence (relative entropy) which is a popular distance between probability vectors and is commonly used in statistics and machine learning. This is a step toward broadening the usage of computational geometry algorithms. Our benchmarks show that our implementation efficiently handles both exact and approximate nearest neighbour queries. Compared to a naive approach, we achieve two orders of magnitude speedup for practical scenarios in dimension up to 100. Our solution is simpler and more efficient than competing methods.

Optimal covering of rectangular grid graphs with tours of constrained length

from arXiv: Computational Geometry

Authors: Sergey Bereg, Jesús Capitán, José-Miguel Díaz-Bañez, José-Manuel Higes-López, Miguel-Angel Pérez-Cutiño, Vanesa Sánchez, Inmaculada Ventura

Given a rectangular grid graph with a special vertex at a corner called base station, we study the problem of covering the vertices of the entire graph with tours that start and end at the base station and whose lengths do not exceed a given threshold, while minimizing a quality measure. We consider two objective functions: minimizing the number of tours and minimizing the sum of their lengths. We present an algorithm that computes the optimal solution for both objectives in linear time with respect to the grid size.

Authors: Sergey Bereg, Jesús Capitán, José-Miguel Díaz-Bañez, José-Manuel Higes-López, Miguel-Angel Pérez-Cutiño, Vanesa Sánchez, Inmaculada Ventura

Given a rectangular grid graph with a special vertex at a corner called base station, we study the problem of covering the vertices of the entire graph with tours that start and end at the base station and whose lengths do not exceed a given threshold, while minimizing a quality measure. We consider two objective functions: minimizing the number of tours and minimizing the sum of their lengths. We present an algorithm that computes the optimal solution for both objectives in linear time with respect to the grid size.

A Query-Driven Approach to Space-Efficient Range Searching

from arXiv: Data Structures and Algorithms

Authors: Dimitris Fotakis, Andreas Kalavas, Ioannis Psarros

We initiate a study of a query-driven approach to designing partition trees for range-searching problems. Our model assumes that a data structure is to be built for an unknown query distribution that we can access through a sampling oracle, and must be selected such that it optimizes a meaningful performance parameter on expectation. Our first contribution is to show that a near-linear sample of queries allows the construction of a partition tree with a near-optimal expected number of nodes visited during querying. We enhance this approach by treating node processing as a classification problem, leveraging fast classifiers like shallow neural networks to obtain experimentally efficient query times. Our second contribution is to develop partition trees using sparse geometric separators. Our preprocessing algorithm, based on a sample of queries, builds a balanced tree with nodes associated with separators that minimize query stabs on expectation; this yields both fast processing of each node and a small number of visited nodes, significantly reducing query time.

Authors: Dimitris Fotakis, Andreas Kalavas, Ioannis Psarros

We initiate a study of a query-driven approach to designing partition trees for range-searching problems. Our model assumes that a data structure is to be built for an unknown query distribution that we can access through a sampling oracle, and must be selected such that it optimizes a meaningful performance parameter on expectation. Our first contribution is to show that a near-linear sample of queries allows the construction of a partition tree with a near-optimal expected number of nodes visited during querying. We enhance this approach by treating node processing as a classification problem, leveraging fast classifiers like shallow neural networks to obtain experimentally efficient query times. Our second contribution is to develop partition trees using sparse geometric separators. Our preprocessing algorithm, based on a sample of queries, builds a balanced tree with nodes associated with separators that minimize query stabs on expectation; this yields both fast processing of each node and a small number of visited nodes, significantly reducing query time.

Slant/Gokigen Naname is NP-complete, and Some Variations are in P

from arXiv: Data Structures and Algorithms

Authors: Jayson Lynch, Jack Spalding-Jamieson

In this paper we show that a generalized version of the Nikoli puzzle Slant is NP-complete. We also give polynomial time algorithms for versions of the puzzle where some constraints are omitted. These problems correspond to simultaneously satisfying connectivity and vertex degree constraints in a grid graph and its dual.

Authors: Jayson Lynch, Jack Spalding-Jamieson

In this paper we show that a generalized version of the Nikoli puzzle Slant is NP-complete. We also give polynomial time algorithms for versions of the puzzle where some constraints are omitted. These problems correspond to simultaneously satisfying connectivity and vertex degree constraints in a grid graph and its dual.

FPT algorithms over linear delta-matroids with applications

from arXiv: Data Structures and Algorithms

Authors: Eduard Eiben, Tomohiro Koana, Magnus Wahlström

Matroids, particularly linear ones, have been a powerful tool in parameterized complexity for algorithms and kernelization. They have sped up or replaced dynamic programming. Delta-matroids generalize matroids by encapsulating structures such as non-maximum matchings in general graphs and various path-packing and topological configurations. Linear delta-matroids (represented by skew-symmetric matrices) offer significant expressive power and enable powerful algorithms. We investigate parameterized complexity aspects of problems defined over linear delta-matroids or with delta-matroid constraints. Our analysis of basic intersection and packing problems reveals a different complexity landscape compared to the familiar matroid case. In particular, there is a stark contrast between the cardinality parameter $k$ and the rank parameter $r$. For example, finding an intersection of size $k$ of three linear delta-matroids is W[1]-hard when parameterized by $k$, while more general problems (e.g., finding a set packing of size $k$ feasible in a linear delta-matroid) are FPT when parameterized by $r$. We extend the recent determinantal sieving procedure of Eiben, Koana and Wahlstr\"om (SODA 2024) to sieve a polynomial for a monomial whose support is feasible in a linear delta-matroid by $r$. Second, we investigate a class of problems that remains FPT when parameterized by $k$, even on delta-matroids of unbounded rank. We begin with Delta-matroid Triangle Cover - finding a feasible set of size $k$ that can be covered by a vertex-disjoint packing of triangles (sets of size 3) from a given collection. This approach allows us to find a packing of $K_3$'s and $K_2$'s in a graph with a maximum number of edges, parameterized above the matching number. As applications, we settle questions on the FPT status of Cluster Subgraph and Strong Triadic Closure parameterized above the matching number.

Authors: Eduard Eiben, Tomohiro Koana, Magnus Wahlström

Matroids, particularly linear ones, have been a powerful tool in parameterized complexity for algorithms and kernelization. They have sped up or replaced dynamic programming. Delta-matroids generalize matroids by encapsulating structures such as non-maximum matchings in general graphs and various path-packing and topological configurations. Linear delta-matroids (represented by skew-symmetric matrices) offer significant expressive power and enable powerful algorithms. We investigate parameterized complexity aspects of problems defined over linear delta-matroids or with delta-matroid constraints. Our analysis of basic intersection and packing problems reveals a different complexity landscape compared to the familiar matroid case. In particular, there is a stark contrast between the cardinality parameter $k$ and the rank parameter $r$. For example, finding an intersection of size $k$ of three linear delta-matroids is W[1]-hard when parameterized by $k$, while more general problems (e.g., finding a set packing of size $k$ feasible in a linear delta-matroid) are FPT when parameterized by $r$. We extend the recent determinantal sieving procedure of Eiben, Koana and Wahlstr\"om (SODA 2024) to sieve a polynomial for a monomial whose support is feasible in a linear delta-matroid by $r$. Second, we investigate a class of problems that remains FPT when parameterized by $k$, even on delta-matroids of unbounded rank. We begin with Delta-matroid Triangle Cover - finding a feasible set of size $k$ that can be covered by a vertex-disjoint packing of triangles (sets of size 3) from a given collection. This approach allows us to find a packing of $K_3$'s and $K_2$'s in a graph with a maximum number of edges, parameterized above the matching number. As applications, we settle questions on the FPT status of Cluster Subgraph and Strong Triadic Closure parameterized above the matching number.

Semi-Streaming Algorithms for Hypergraph Matching

from arXiv: Data Structures and Algorithms

Authors: Henrik Reinstädtler, S M Ferdous, Alex Pothen, Bora Uçar, Christian Schulz

We propose two one-pass streaming algorithms for the NP-hard hypergraph matching problem. The first algorithm stores a small subset of potential matching edges in a stack using dual variables to select edges. It has an approximation guarantee of $\frac{1}{d(1+\varepsilon)}$ and requires $O((n/\varepsilon) \log^2{n})$ bits of memory. The second algorithm computes, stores, and updates a single matching as the edges stream, with an approximation ratio dependent on a parameter $\alpha$. Its best approximation ratio is $\frac{1}{(2d-1) + 2 \sqrt{d(d-1)}}$, and it requires only $O(n)$ memory. We have implemented both algorithms and have engineered variants for optimizing matching weights, memory consumption, and running times. These include relaxations of the rule for admitting edges into the stack and using a second pass to improve the weight. The evaluation is done on large-sized hypergraphs from circuit design and sparse matrix computations. Our results show that the streaming algorithms achieve much better approximation factors in practice than the worst-case bounds, reducing memory required by up to 50 times and outperforming the offline Greedy algorithm.

Authors: Henrik Reinstädtler, S M Ferdous, Alex Pothen, Bora Uçar, Christian Schulz

We propose two one-pass streaming algorithms for the NP-hard hypergraph matching problem. The first algorithm stores a small subset of potential matching edges in a stack using dual variables to select edges. It has an approximation guarantee of $\frac{1}{d(1+\varepsilon)}$ and requires $O((n/\varepsilon) \log^2{n})$ bits of memory. The second algorithm computes, stores, and updates a single matching as the edges stream, with an approximation ratio dependent on a parameter $\alpha$. Its best approximation ratio is $\frac{1}{(2d-1) + 2 \sqrt{d(d-1)}}$, and it requires only $O(n)$ memory. We have implemented both algorithms and have engineered variants for optimizing matching weights, memory consumption, and running times. These include relaxations of the rule for admitting edges into the stack and using a second pass to improve the weight. The evaluation is done on large-sized hypergraphs from circuit design and sparse matrix computations. Our results show that the streaming algorithms achieve much better approximation factors in practice than the worst-case bounds, reducing memory required by up to 50 times and outperforming the offline Greedy algorithm.

Faster Minimization of Total Weighted Completion Time on Parallel Machines

from arXiv: Data Structures and Algorithms

Authors: Danny Hermelin, Tomohiro Koana, Dvir Shabtay

We study the classical problem of minimizing the total weighted completion time on a fixed set of $m$ identical machines working in parallel, the $Pm||\sum w_jC_j$ problem in the standard three field notation for scheduling problems. This problem is well known to be NP-hard, but only in the ordinary sense, and appears as one of the fundamental problems in any scheduling textbook. In particular, the problem served as a proof of concept for applying pseudo-polynomial time algorithms and approximation schemes to scheduling problems. The fastest known pseudo-polynomial time algorithm for $Pm||\sum w_jC_j$ is the famous Lawler and Moore algorithm from the late 1960's which runs in $\tilde{O}(P^{m-1}n)$ time, where $P$ is the total processing time of all jobs in the input. After more than 50 years, we are the first to present an algorithm, alternative to that of Lawler and Moore, which is faster for certain range of the problem parameters (e.g., when their values are all $O(1)$).

Authors: Danny Hermelin, Tomohiro Koana, Dvir Shabtay

We study the classical problem of minimizing the total weighted completion time on a fixed set of $m$ identical machines working in parallel, the $Pm||\sum w_jC_j$ problem in the standard three field notation for scheduling problems. This problem is well known to be NP-hard, but only in the ordinary sense, and appears as one of the fundamental problems in any scheduling textbook. In particular, the problem served as a proof of concept for applying pseudo-polynomial time algorithms and approximation schemes to scheduling problems. The fastest known pseudo-polynomial time algorithm for $Pm||\sum w_jC_j$ is the famous Lawler and Moore algorithm from the late 1960's which runs in $\tilde{O}(P^{m-1}n)$ time, where $P$ is the total processing time of all jobs in the input. After more than 50 years, we are the first to present an algorithm, alternative to that of Lawler and Moore, which is faster for certain range of the problem parameters (e.g., when their values are all $O(1)$).

Graph-Based Algorithms for Diverse Similarity Search

from arXiv: Data Structures and Algorithms

Authors: Piyush Anand, Piotr Indyk, Ravishankar Krishnaswamy, Sepideh Mahabadi, Vikas C. Raykar, Kirankumar Shiragur, Haike Xu

Nearest neighbor search is a fundamental data structure problem with many applications in machine learning, computer vision, recommendation systems and other fields. Although the main objective of the data structure is to quickly report data points that are closest to a given query, it has long been noted (Carbonell and Goldstein, 1998) that without additional constraints the reported answers can be redundant and/or duplicative. This issue is typically addressed in two stages: in the first stage, the algorithm retrieves a (large) number $r$ of points closest to the query, while in the second stage, the $r$ points are post-processed and a small subset is selected to maximize the desired diversity objective. Although popular, this method suffers from a fundamental efficiency bottleneck, as the set of points retrieved in the first stage often needs to be much larger than the final output. In this paper we present provably efficient algorithms for approximate nearest neighbor search with diversity constraints that bypass this two stage process. Our algorithms are based on popular graph-based methods, which allows us to "piggy-back" on the existing efficient implementations. These are the first graph-based algorithms for nearest neighbor search with diversity constraints. For data sets with low intrinsic dimension, our data structures report a diverse set of $k$ points approximately closest to the query, in time that only depends on $k$ and $\log \Delta$, where $\Delta$ is the ratio of the diameter to the closest pair distance in the data set. This bound is qualitatively similar to the best known bounds for standard (non-diverse) graph-based algorithms. Our experiments show that the search time of our algorithms is substantially lower than that using the standard two-stage approach.

Authors: Piyush Anand, Piotr Indyk, Ravishankar Krishnaswamy, Sepideh Mahabadi, Vikas C. Raykar, Kirankumar Shiragur, Haike Xu

Nearest neighbor search is a fundamental data structure problem with many applications in machine learning, computer vision, recommendation systems and other fields. Although the main objective of the data structure is to quickly report data points that are closest to a given query, it has long been noted (Carbonell and Goldstein, 1998) that without additional constraints the reported answers can be redundant and/or duplicative. This issue is typically addressed in two stages: in the first stage, the algorithm retrieves a (large) number $r$ of points closest to the query, while in the second stage, the $r$ points are post-processed and a small subset is selected to maximize the desired diversity objective. Although popular, this method suffers from a fundamental efficiency bottleneck, as the set of points retrieved in the first stage often needs to be much larger than the final output. In this paper we present provably efficient algorithms for approximate nearest neighbor search with diversity constraints that bypass this two stage process. Our algorithms are based on popular graph-based methods, which allows us to "piggy-back" on the existing efficient implementations. These are the first graph-based algorithms for nearest neighbor search with diversity constraints. For data sets with low intrinsic dimension, our data structures report a diverse set of $k$ points approximately closest to the query, in time that only depends on $k$ and $\log \Delta$, where $\Delta$ is the ratio of the diameter to the closest pair distance in the data set. This bound is qualitatively similar to the best known bounds for standard (non-diverse) graph-based algorithms. Our experiments show that the search time of our algorithms is substantially lower than that using the standard two-stage approach.

Sum-Of-Squares To Approximate Knapsack

from arXiv: Data Structures and Algorithms

Authors: Pravesh K. Kothari, Sherry Sarkar

These notes give a self-contained exposition of Karlin, Mathieu and Nguyen's tight estimate of the integrality gap of the sum-of-squares semidefinite program for solving the knapsack problem. They are based on a sequence of three lectures in CMU course on Advanced Approximation Algorithms in Fall'21 that used the KMN result to introduce the Sum-of-Squares method for algorithm design. The treatment in these notes uses the pseudo-distribution view of solutions to the sum-of-squares SDPs and only rely on a few basic, reusable results about pseudo-distributions.

Authors: Pravesh K. Kothari, Sherry Sarkar

These notes give a self-contained exposition of Karlin, Mathieu and Nguyen's tight estimate of the integrality gap of the sum-of-squares semidefinite program for solving the knapsack problem. They are based on a sequence of three lectures in CMU course on Advanced Approximation Algorithms in Fall'21 that used the KMN result to introduce the Sum-of-Squares method for algorithm design. The treatment in these notes uses the pseudo-distribution view of solutions to the sum-of-squares SDPs and only rely on a few basic, reusable results about pseudo-distributions.

Wednesday, February 19

FORC: Deadline for Highlight Nominations Extended

from TOC for Fairness

We are seeing wonderful growth in this year’s FORC submissions. Well done! As the highlight nomination is a new thing for FORC 2025, the committee extended the deadline for nominations ...

We are seeing wonderful growth in this year’s FORC submissions. Well done! As the highlight nomination is a new thing for FORC 2025, the committee extended the deadline for nominations to Tuesday, February 25, 2025. See the CFP for more information.

By Omer Reingold

Workshop on Algebraic Complexity, Geometry, and Representations

from CS Theory Events

March 17-21, 2025 University of Warwick warwick.ac.uk/fac/sci/maths/research/events/2024-2025/algebraiccomplexity/ Algebraic Complexity Theory is a vibrant field that has been seeing a tremendous amount of activity in the recent years. Its classical questions have been interwoven with deep questions from algebraic geometry, invariant theory, and representation theory, with recent exciting connections to metacomplexity and proof complexity. The workshop … Continue reading Workshop on Algebraic Complexity, Geometry, and Representations

By shacharlovett

March 17-21, 2025 University of Warwick https://warwick.ac.uk/fac/sci/maths/research/events/2024-2025/algebraiccomplexity/ Algebraic Complexity Theory is a vibrant field that has been seeing a tremendous amount of activity in the recent years. Its classical questions have been interwoven with deep questions from algebraic geometry, invariant theory, and representation theory, with recent exciting connections to metacomplexity and proof complexity. The workshop … Continue reading Workshop on Algebraic Complexity, Geometry, and Representations

By shacharlovett

Tomorrow and Yesterday

from Computational Complexity

I recently completed Tomorrow, and Tomorrow, and Tomorrow by Gabrielle Zevin, a book recommended by many including the City of Chicago. The novel covers the decades long journey of two game developers, Sadie and Sam, and how their lives interact with the games they create.

A paragraph towards the end made me rethink the whole book (not a spoiler):

Well, if we’d been born a little bit earlier, we wouldn’t have been able to make our games so easily. Access to computers would have been harder. We would have been part of the generation who was putting floppy disks in Ziploc bags and driving the games to stores. And if we’d been born a little bit later, there would have been even greater access to the internet and certain tools, but honestly, the games got so much more complicated; the industry got so professional. We couldn’t have done as much as we did on our own.

This paragraph hearkens back to my post last week, about how the era you grew up in can affect your trajectory. But also I'm a generation older than the book's main characters, and indeed Ribbit was distributed on a floppy disk in a Ziploc bag.

The novel at its heart is about two friends making games. I was lucky to have that experience myself for a couple of years in the early 80s, with high school friend Chris Eisnaugle, working on Ribbit, Excalibur and some other games that never saw the light of day. We coded for days on end while listening to music like REO Speedwagon, and taking time off for bowling or watching early Schwarzenegger movies. Coding in assembly language on slow processors with limited graphics, taking advantage of our complementary strengths and making it work. I don't regret leaving that life behind for the theoretical wonders of computational complexity, but that doesn't mean I don't miss it.

By Lance Fortnow

I recently completed Tomorrow, and Tomorrow, and Tomorrow by Gabrielle Zevin, a book recommended by many including the City of Chicago. The novel covers the decades long journey of two game developers, Sadie and Sam, and how their lives interact with the games they create.

A paragraph towards the end made me rethink the whole book (not a spoiler):

Well, if we’d been born a little bit earlier, we wouldn’t have been able to make our games so easily. Access to computers would have been harder. We would have been part of the generation who was putting floppy disks in Ziploc bags and driving the games to stores. And if we’d been born a little bit later, there would have been even greater access to the internet and certain tools, but honestly, the games got so much more complicated; the industry got so professional. We couldn’t have done as much as we did on our own.

This paragraph hearkens back to my post last week, about how the era you grew up in can affect your trajectory. But also I'm a generation older than the book's main characters, and indeed Ribbit was distributed on a floppy disk in a Ziploc bag.

The novel at its heart is about two friends making games. I was lucky to have that experience myself for a couple of years in the early 80s, with high school friend Chris Eisnaugle, working on Ribbit, Excalibur and some other games that never saw the light of day. We coded for days on end while listening to music like REO Speedwagon, and taking time off for bowling or watching early Schwarzenegger movies. Coding in assembly language on slow processors with limited graphics, taking advantage of our complementary strengths and making it work. I don't regret leaving that life behind for the theoretical wonders of computational complexity, but that doesn't mean I don't miss it.

By Lance Fortnow

FOCS 2025 call for papers (Guest post by Clément Cannone)

from Windows on Theory

The 66th Annual Symposium on Foundations of Computer Science (FOCS 2025), sponsored by the IEEE Computer Society Technical Committee on Mathematical Foundations of Computing, will be held in Sydney, Australia, December 14-17. Papers presenting new and original research on theory of computation are sought. Typical but not exclusive topics of interest include: algorithmic coding theory, algebraic … Continue reading FOCS 2025 call for papers (Guest post by Clément Cannone)

The 66th Annual Symposium on Foundations of Computer Science (FOCS 2025), sponsored by the IEEE Computer Society Technical Committee on Mathematical Foundations of Computing, will be held in Sydney, Australia, December 14-17.

Papers presenting new and original research on theory of computation are sought. Typical but not exclusive topics of interest include: algorithmic coding theory, algebraic computation, algorithmic graph theory, algorithmic game theory, algorithms and data structures, analysis of Boolean functions, approximation algorithms, average-case complexity, computational applications of logic, combinatorics, computational complexity, communication complexity, circuit complexity, combinatorial optimization, computational game theory, computational geometry, computational learning theory, continuous optimization, cryptography, foundations of machine learning, online algorithms, optimization, parallel and distributed algorithms, parameterized algorithms, randomized algorithms, sublinear algorithms, streaming algorithms, quantum computing, pseudorandomness and derandomization, foundations of fairness and privacy, and theoretical aspects of areas such as networks, information retrieval, computational biology, and databases. Papers that broaden the reach of the theory of computing, or raise important problems that can benefit from theoretical investigation and analysis, are encouraged.

Submission deadline: April 3, 2025 (8PM ET)
Paper notification: July 8, 2025

More details on the scope and submission format can be found at https://focs.computer.org/2025/call-for-papers/

By Boaz Barak

Logic and Computation Through the Lens of Semirings

from arXiv: Computational Complexity

Authors: Timon Barlag, Nicolas Fröhlich, Teemu Hankala, Miika Hannula, Minna Hirvonen, Vivian Holzapfel, Juha Kontinen, Arne Meier, Laura Strieker

We study computational aspects of first-order logic and its extensions in the semiring semantics developed by Gr\"adel and Tannen. We characterize the complexity of model checking and data complexity of first-order logic both in terms of a generalization of BSS-machines and arithmetic circuits defined over $K$. In particular, we give a logical characterization of $\mathrm{FAC}^0_{K}$ by an extension of first-order logic that holds for any $K$ that is both commutative and positive.

Authors: Timon Barlag, Nicolas Fröhlich, Teemu Hankala, Miika Hannula, Minna Hirvonen, Vivian Holzapfel, Juha Kontinen, Arne Meier, Laura Strieker

We study computational aspects of first-order logic and its extensions in the semiring semantics developed by Gr\"adel and Tannen. We characterize the complexity of model checking and data complexity of first-order logic both in terms of a generalization of BSS-machines and arithmetic circuits defined over $K$. In particular, we give a logical characterization of $\mathrm{FAC}^0_{K}$ by an extension of first-order logic that holds for any $K$ that is both commutative and positive.

Faster search for tensor decomposition over finite fields

from arXiv: Computational Complexity

Authors: Jason Yang

We present an $O^*(|\mathbb{F}|^{\min\left\{R,\ \sum_{d\ge 2} n_d\right\} + (R-n_0)(\sum_{d\ne 0} n_d)})$-time algorithm for determining whether the rank of a concise tensor $T\in\mathbb{F}^{n_0\times\dots\times n_{D-1}}$ is $\le R$, assuming $n_0\ge\dots\ge n_{D-1}$ and $R\ge n_0$. For 3-dimensional tensors, we have a second algorithm running in $O^*(|\mathbb{F}|^{n_0+n_2 + (R-n_0+1-r_*)(n_1+n_2)+r_*^2})$ time, where $r_*:=\left\lfloor\frac{R}{n_0}\right\rfloor+1$. Both algorithms use polynomial space and improve on our previous work, which achieved running time $O^*(|\mathbb{F}|^{n_0+(R-n_0)(\sum_d n_d)})$.

Authors: Jason Yang

We present an $O^*(|\mathbb{F}|^{\min\left\{R,\ \sum_{d\ge 2} n_d\right\} + (R-n_0)(\sum_{d\ne 0} n_d)})$-time algorithm for determining whether the rank of a concise tensor $T\in\mathbb{F}^{n_0\times\dots\times n_{D-1}}$ is $\le R$, assuming $n_0\ge\dots\ge n_{D-1}$ and $R\ge n_0$. For 3-dimensional tensors, we have a second algorithm running in $O^*(|\mathbb{F}|^{n_0+n_2 + (R-n_0+1-r_*)(n_1+n_2)+r_*^2})$ time, where $r_*:=\left\lfloor\frac{R}{n_0}\right\rfloor+1$. Both algorithms use polynomial space and improve on our previous work, which achieved running time $O^*(|\mathbb{F}|^{n_0+(R-n_0)(\sum_d n_d)})$.

On the Computational Tractability of the (Many) Shapley Values

from arXiv: Computational Complexity

Authors: Reda Marzouk, Shahaf Bassan, Guy Katz, Colin de la Higuera

Recent studies have examined the computational complexity of computing Shapley additive explanations (also known as SHAP) across various models and distributions, revealing their tractability or intractability in different settings. However, these studies primarily focused on a specific variant called Conditional SHAP, though many other variants exist and address different limitations. In this work, we analyze the complexity of computing a much broader range of such variants, including Conditional, Interventional, and Baseline SHAP, while exploring both local and global computations. We show that both local and global Interventional and Baseline SHAP can be computed in polynomial time for various ML models under Hidden Markov Model distributions, extending popular algorithms such as TreeSHAP beyond empirical distributions. On the downside, we prove intractability results for these variants over a wide range of neural networks and tree ensembles. We believe that our results emphasize the intricate diversity of computing Shapley values, demonstrating how their complexity is substantially shaped by both the specific SHAP variant, the model type, and the distribution.

Authors: Reda Marzouk, Shahaf Bassan, Guy Katz, Colin de la Higuera

Recent studies have examined the computational complexity of computing Shapley additive explanations (also known as SHAP) across various models and distributions, revealing their tractability or intractability in different settings. However, these studies primarily focused on a specific variant called Conditional SHAP, though many other variants exist and address different limitations. In this work, we analyze the complexity of computing a much broader range of such variants, including Conditional, Interventional, and Baseline SHAP, while exploring both local and global computations. We show that both local and global Interventional and Baseline SHAP can be computed in polynomial time for various ML models under Hidden Markov Model distributions, extending popular algorithms such as TreeSHAP beyond empirical distributions. On the downside, we prove intractability results for these variants over a wide range of neural networks and tree ensembles. We believe that our results emphasize the intricate diversity of computing Shapley values, demonstrating how their complexity is substantially shaped by both the specific SHAP variant, the model type, and the distribution.

Improving Algorithmic Efficiency using Cryptography

from arXiv: Data Structures and Algorithms

Authors: Vinod Vaikuntanathan, Or Zamir

Cryptographic primitives have been used for various non-cryptographic objectives, such as eliminating or reducing randomness and interaction. We show how to use cryptography to improve the time complexity of solving computational problems. Specifically, we show that under standard cryptographic assumptions, we can design algorithms that are asymptotically faster than existing ones while maintaining correctness. As a concrete demonstration, we construct a distribution of trapdoored matrices with the following properties: (a) computationally bounded adversaries cannot distinguish a random matrix from one drawn from this distribution, and (b) given a secret key, we can multiply such a n-by-n matrix with any vector in near-linear (in n) time. We provide constructions both over finite fields and the reals. This enables a broad speedup technique: any algorithm relying on a random matrix - such as those using various notions of dimensionality reduction - can replace it with a matrix from our distribution, achieving computational speedups while preserving correctness.

Authors: Vinod Vaikuntanathan, Or Zamir

Cryptographic primitives have been used for various non-cryptographic objectives, such as eliminating or reducing randomness and interaction. We show how to use cryptography to improve the time complexity of solving computational problems. Specifically, we show that under standard cryptographic assumptions, we can design algorithms that are asymptotically faster than existing ones while maintaining correctness. As a concrete demonstration, we construct a distribution of trapdoored matrices with the following properties: (a) computationally bounded adversaries cannot distinguish a random matrix from one drawn from this distribution, and (b) given a secret key, we can multiply such a n-by-n matrix with any vector in near-linear (in n) time. We provide constructions both over finite fields and the reals. This enables a broad speedup technique: any algorithm relying on a random matrix - such as those using various notions of dimensionality reduction - can replace it with a matrix from our distribution, achieving computational speedups while preserving correctness.

Smoothed Analysis of Dynamic Graph Algorithms

from arXiv: Data Structures and Algorithms

Authors: Uri Meir, Ami Paz

Recent years have seen significant progress in the study of dynamic graph algorithms, and most notably, the introduction of strong lower bound techniques for them (e.g., Henzinger, Krinninger, Nanongkai and Saranurak, STOC 2015; Larsen and Yu, FOCS 2023). As worst-case analysis (adversarial inputs) may lead to the necessity of high running times, a natural question arises: in which cases are high running times really necessary, and in which cases these inputs merely manifest unique pathological cases? Early attempts to tackle this question were made by Nikoletseas, Reif, Spirakis and Yung (ICALP 1995) and by Alberts and Henzinger (Algorithmica 1998), who considered models with very little adversarial control over the inputs, and showed fast algorithms exist for them. The question was then overlooked for decades, until Henzinger, Lincoln and Saha (SODA 2022) recently addressed uniformly random inputs, and presented algorithms and impossibility results for several subgraph counting problems. To tackle the above question more thoroughly, we employ smoothed analysis, a celebrated framework introduced by Spielman and Teng (J. ACM, 2004). An input is proposed by an adversary but then a noisy version of it is processed by the algorithm instead. Parameterized by the amount of adversarial control, this input model fully interpolates between worst-case inputs and a uniformly random input. Doing so, we extend impossibility results for some problems to the smoothed model with only a minor quantitative loss. That is, we show that partially-adversarial inputs suffice to impose high running times for certain problems. In contrast, we show that other problems become easy even with the slightest amount of noise. In addition, we study the interplay between the adversary and the noise, leading to three natural models of smoothed inputs, for which we show a hierarchy of increasing complexity.

Authors: Uri Meir, Ami Paz

Recent years have seen significant progress in the study of dynamic graph algorithms, and most notably, the introduction of strong lower bound techniques for them (e.g., Henzinger, Krinninger, Nanongkai and Saranurak, STOC 2015; Larsen and Yu, FOCS 2023). As worst-case analysis (adversarial inputs) may lead to the necessity of high running times, a natural question arises: in which cases are high running times really necessary, and in which cases these inputs merely manifest unique pathological cases? Early attempts to tackle this question were made by Nikoletseas, Reif, Spirakis and Yung (ICALP 1995) and by Alberts and Henzinger (Algorithmica 1998), who considered models with very little adversarial control over the inputs, and showed fast algorithms exist for them. The question was then overlooked for decades, until Henzinger, Lincoln and Saha (SODA 2022) recently addressed uniformly random inputs, and presented algorithms and impossibility results for several subgraph counting problems. To tackle the above question more thoroughly, we employ smoothed analysis, a celebrated framework introduced by Spielman and Teng (J. ACM, 2004). An input is proposed by an adversary but then a noisy version of it is processed by the algorithm instead. Parameterized by the amount of adversarial control, this input model fully interpolates between worst-case inputs and a uniformly random input. Doing so, we extend impossibility results for some problems to the smoothed model with only a minor quantitative loss. That is, we show that partially-adversarial inputs suffice to impose high running times for certain problems. In contrast, we show that other problems become easy even with the slightest amount of noise. In addition, we study the interplay between the adversary and the noise, leading to three natural models of smoothed inputs, for which we show a hierarchy of increasing complexity.

Edge-Colored Clustering in Hypergraphs: Beyond Minimizing Unsatisfied Edges

from arXiv: Data Structures and Algorithms

Authors: Alex Crane, Thomas Stanley, Blair D. Sullivan, Nate Veldt

We consider a framework for clustering edge-colored hypergraphs, where the goal is to cluster (equivalently, to color) objects based on the primary type of multiway interactions they participate in. One well-studied objective is to color nodes to minimize the number of unsatisfied hyperedges -- those containing one or more nodes whose color does not match the hyperedge color. We motivate and present advances for several directions that extend beyond this minimization problem. We first provide new algorithms for maximizing satisfied edges, which is the same at optimality but is much more challenging to approximate, with all prior work restricted to graphs. We develop the first approximation algorithm for hypergraphs, and then refine it to improve the best-known approximation factor for graphs. We then introduce new objective functions that incorporate notions of balance and fairness, and provide new hardness results, approximations, and fixed-parameter tractability results.

Authors: Alex Crane, Thomas Stanley, Blair D. Sullivan, Nate Veldt

We consider a framework for clustering edge-colored hypergraphs, where the goal is to cluster (equivalently, to color) objects based on the primary type of multiway interactions they participate in. One well-studied objective is to color nodes to minimize the number of unsatisfied hyperedges -- those containing one or more nodes whose color does not match the hyperedge color. We motivate and present advances for several directions that extend beyond this minimization problem. We first provide new algorithms for maximizing satisfied edges, which is the same at optimality but is much more challenging to approximate, with all prior work restricted to graphs. We develop the first approximation algorithm for hypergraphs, and then refine it to improve the best-known approximation factor for graphs. We then introduce new objective functions that incorporate notions of balance and fairness, and provide new hardness results, approximations, and fixed-parameter tractability results.

Approximate Tree Completion and Learning-Augmented Algorithms for Metric Minimum Spanning Trees

from arXiv: Data Structures and Algorithms

Authors: Nate Veldt, Thomas Stanley, Benjamin W. Priest, Trevor Steil, Keita Iwabuchi, T. S. Jayram, Geoffrey Sanders

Finding a minimum spanning tree (MST) for $n$ points in an arbitrary metric space is a fundamental primitive for hierarchical clustering and many other ML tasks, but this takes $\Omega(n^2)$ time to even approximate. We introduce a framework for metric MSTs that first (1) finds a forest of disconnected components using practical heuristics, and then (2) finds a small weight set of edges to connect disjoint components of the forest into a spanning tree. We prove that optimally solving the second step still takes $\Omega(n^2)$ time, but we provide a subquadratic 2.62-approximation algorithm. In the spirit of learning-augmented algorithms, we then show that if the forest found in step (1) overlaps with an optimal MST, we can approximate the original MST problem in subquadratic time, where the approximation factor depends on a measure of overlap. In practice, we find nearly optimal spanning trees for a wide range of metrics, while being orders of magnitude faster than exact algorithms.

Authors: Nate Veldt, Thomas Stanley, Benjamin W. Priest, Trevor Steil, Keita Iwabuchi, T. S. Jayram, Geoffrey Sanders

Finding a minimum spanning tree (MST) for $n$ points in an arbitrary metric space is a fundamental primitive for hierarchical clustering and many other ML tasks, but this takes $\Omega(n^2)$ time to even approximate. We introduce a framework for metric MSTs that first (1) finds a forest of disconnected components using practical heuristics, and then (2) finds a small weight set of edges to connect disjoint components of the forest into a spanning tree. We prove that optimally solving the second step still takes $\Omega(n^2)$ time, but we provide a subquadratic 2.62-approximation algorithm. In the spirit of learning-augmented algorithms, we then show that if the forest found in step (1) overlaps with an optimal MST, we can approximate the original MST problem in subquadratic time, where the approximation factor depends on a measure of overlap. In practice, we find nearly optimal spanning trees for a wide range of metrics, while being orders of magnitude faster than exact algorithms.

On the Complexity of Minimising the Moving Distance for Dispersing Objects

from arXiv: Data Structures and Algorithms

Authors: Nicolás Honorato-Droguett, Kazuhiro Kurita, Tesshu Hanaka, Hirotaka Ono

We study Geometric Graph Edit Distance (GGED), a graph-editing model to compute the minimum edit distance of intersection graphs that uses moving objects as an edit operation. We first show an $O(n\log n)$-time algorithm that minimises the total moving distance to disperse unit intervals. This algorithm is applied to render a given unit interval graph (i) edgeless, (ii) acyclic and (iii) $k$-clique-free. We next show that GGED becomes strongly NP-hard when rendering a weighted interval graph (i) edgeless, (ii) acyclic and (iii) $k$-clique-free. Lastly, we prove that minimising the maximum moving distance for rendering a unit disk graph edgeless is strongly NP-hard over the $L_1$ and $L_2$ distances.

Authors: Nicolás Honorato-Droguett, Kazuhiro Kurita, Tesshu Hanaka, Hirotaka Ono

We study Geometric Graph Edit Distance (GGED), a graph-editing model to compute the minimum edit distance of intersection graphs that uses moving objects as an edit operation. We first show an $O(n\log n)$-time algorithm that minimises the total moving distance to disperse unit intervals. This algorithm is applied to render a given unit interval graph (i) edgeless, (ii) acyclic and (iii) $k$-clique-free. We next show that GGED becomes strongly NP-hard when rendering a weighted interval graph (i) edgeless, (ii) acyclic and (iii) $k$-clique-free. Lastly, we prove that minimising the maximum moving distance for rendering a unit disk graph edgeless is strongly NP-hard over the $L_1$ and $L_2$ distances.

Finding Maximum Weight 2-Packing Sets on Arbitrary Graphs

from arXiv: Data Structures and Algorithms

Authors: Jannick Borowitz, Ernestine Großmann, Christian Schulz

A 2-packing set for an undirected, weighted graph G=(V,E,w) is a subset S of the vertices V such that any two vertices are not adjacent and have no common neighbors. The Maximum Weight 2-Packing Set problem that asks for a 2-packing set of maximum weight is NP-hard. Next to 13 novel data reduction rules for this problem, we develop two new approaches to solve this problem on arbitrary graphs. First, we introduce a preprocessing routine that exploits the close relation of 2-packing sets to independent sets. This makes well-studied independent set solvers usable for the Maximum Weight 2-Packing Set problem. Second, we propose an iterative reduce-and-peel approach that utilizes the new data reductions. Our experiments show that our preprocessing routine gives speedups of multiple orders of magnitude, while also improving solution quality, and memory consumption compared to a naive transformation to independent set instances. Furthermore, it solves 44 % of the instances tested to optimality. Our heuristic can keep up with the best-performing maximum weight independent set solvers combined with our preprocessing routine. Additionally, our heuristic can find the best solution quality on the biggest instances in our data set, outperforming all other approaches.

Authors: Jannick Borowitz, Ernestine Großmann, Christian Schulz

A 2-packing set for an undirected, weighted graph G=(V,E,w) is a subset S of the vertices V such that any two vertices are not adjacent and have no common neighbors. The Maximum Weight 2-Packing Set problem that asks for a 2-packing set of maximum weight is NP-hard. Next to 13 novel data reduction rules for this problem, we develop two new approaches to solve this problem on arbitrary graphs. First, we introduce a preprocessing routine that exploits the close relation of 2-packing sets to independent sets. This makes well-studied independent set solvers usable for the Maximum Weight 2-Packing Set problem. Second, we propose an iterative reduce-and-peel approach that utilizes the new data reductions. Our experiments show that our preprocessing routine gives speedups of multiple orders of magnitude, while also improving solution quality, and memory consumption compared to a naive transformation to independent set instances. Furthermore, it solves 44 % of the instances tested to optimality. Our heuristic can keep up with the best-performing maximum weight independent set solvers combined with our preprocessing routine. Additionally, our heuristic can find the best solution quality on the biggest instances in our data set, outperforming all other approaches.

Generalized De Bruijn Words, Invertible Necklaces, and the Burrows-Wheeler Transform

from arXiv: Data Structures and Algorithms

Authors: Gabriele Fici, Estéban Gabory

We define generalized de Bruijn words, as those words having a Burrows--Wheeler transform that is a concatenation of permutations of the alphabet. We show how to interpret generalized de Bruijn words in terms of Hamiltonian cycles in the generalized de Bruijn graphs introduced in the early '80s in the context of network design. When the size of the alphabet is a prime, we give relations between generalized de Bruijn words, normal bases of finite fields, invertible circulant matrices, and Reutenauer groups. In particular, we highlight a correspondence between binary de Bruijn words of order $d+1$, binary necklaces of length $2^{d}$ having an odd number of $1$s, invertible BWT matrices of size $2^{d}\times 2^{d}$, and normal bases of the finite field $\mathbb{F}_{2^{2^{d}}}$.

Authors: Gabriele Fici, Estéban Gabory

We define generalized de Bruijn words, as those words having a Burrows--Wheeler transform that is a concatenation of permutations of the alphabet. We show how to interpret generalized de Bruijn words in terms of Hamiltonian cycles in the generalized de Bruijn graphs introduced in the early '80s in the context of network design. When the size of the alphabet is a prime, we give relations between generalized de Bruijn words, normal bases of finite fields, invertible circulant matrices, and Reutenauer groups. In particular, we highlight a correspondence between binary de Bruijn words of order $d+1$, binary necklaces of length $2^{d}$ having an odd number of $1$s, invertible BWT matrices of size $2^{d}\times 2^{d}$, and normal bases of the finite field $\mathbb{F}_{2^{2^{d}}}$.

Revisiting Token Sliding on Chordal Graphs

from arXiv: Data Structures and Algorithms

Authors: Rajat Adak, Saraswati Girish Nanoti, Prafullkumar Tale

In this article, we revisit the complexity of the reconfiguration of independent sets under the token sliding rule on chordal graphs. In the \textsc{Token Sliding-Connectivity} problem, the input is a graph $G$ and an integer $k$, and the objective is to determine whether the reconfiguration graph $TS_k(G)$ of $G$ is connected. The vertices of $TS_k(G)$ are $k$-independent sets of $G$, and two vertices are adjacent if and only if one can transform one of the two corresponding independent sets into the other by sliding a vertex (also called a \emph{token}) along an edge. Bonamy and Bousquet [WG'17] proved that the \textsc{Token Sliding-Connectivity} problem is polynomial-time solvable on interval graphs but \NP-hard on split graphs. In light of these two results, the authors asked: can we decide the connectivity of $TS_k(G)$ in polynomial time for chordal graphs with \emph{maximum clique-tree degree} $d$? We answer this question in the negative and prove that the problem is \para-\NP-hard when parameterized by $d$. More precisely, the problem is \NP-hard even when $d = 4$. We then study the parameterized complexity of the problem for a larger parameter called \emph{leafage} and prove that the problem is \co-\W[1]-hard. We prove similar results for a closely related problem called \textsc{Token Sliding-Reachability}. In this problem, the input is a graph $G$ with two of its $k$-independent sets $I$ and $J$, and the objective is to determine whether there is a sequence of valid token sliding moves that transform $I$ into $J$.

Authors: Rajat Adak, Saraswati Girish Nanoti, Prafullkumar Tale

In this article, we revisit the complexity of the reconfiguration of independent sets under the token sliding rule on chordal graphs. In the \textsc{Token Sliding-Connectivity} problem, the input is a graph $G$ and an integer $k$, and the objective is to determine whether the reconfiguration graph $TS_k(G)$ of $G$ is connected. The vertices of $TS_k(G)$ are $k$-independent sets of $G$, and two vertices are adjacent if and only if one can transform one of the two corresponding independent sets into the other by sliding a vertex (also called a \emph{token}) along an edge. Bonamy and Bousquet [WG'17] proved that the \textsc{Token Sliding-Connectivity} problem is polynomial-time solvable on interval graphs but \NP-hard on split graphs. In light of these two results, the authors asked: can we decide the connectivity of $TS_k(G)$ in polynomial time for chordal graphs with \emph{maximum clique-tree degree} $d$? We answer this question in the negative and prove that the problem is \para-\NP-hard when parameterized by $d$. More precisely, the problem is \NP-hard even when $d = 4$. We then study the parameterized complexity of the problem for a larger parameter called \emph{leafage} and prove that the problem is \co-\W[1]-hard. We prove similar results for a closely related problem called \textsc{Token Sliding-Reachability}. In this problem, the input is a graph $G$ with two of its $k$-independent sets $I$ and $J$, and the objective is to determine whether there is a sequence of valid token sliding moves that transform $I$ into $J$.

Maximizing Value in Challenge the Champ Tournaments

from arXiv: Data Structures and Algorithms

Authors: Umang Bhaskar, Juhi Chaudhary, Palash Dey

A tournament is a method to decide the winner in a competition, and describes the overall sequence in which matches between the players are held. While deciding a worthy winner is the primary goal of a tournament, a close second is to maximize the value generated for the matches played, with value for a match measured either in terms of tickets sold, television viewership, advertising revenue, or other means. Tournament organizers often seed the players -- i.e., decide which matches are played -- to increase this value. We study the value maximization objective in a particular tournament format called Challenge the Champ. This is a simple tournament format where an ordering of the players is decided. The first player in this order is the initial champion. The remaining players in order challenge the current champion; if a challenger wins, she replaces the current champion. We model the outcome of a match between two players using a complete directed graph, called a strength graph, with each player represented as a vertex, and the direction of an edge indicating the winner in a match. The value-maximization objective has been recently explored for knockout tournaments when the strength graph is a directed acyclic graph (DAG). We extend the investigation to Challenge the Champ tournaments and general strength graphs. We study different representations of the value of each match, and completely characterize the computational complexity of the problem.

Authors: Umang Bhaskar, Juhi Chaudhary, Palash Dey

A tournament is a method to decide the winner in a competition, and describes the overall sequence in which matches between the players are held. While deciding a worthy winner is the primary goal of a tournament, a close second is to maximize the value generated for the matches played, with value for a match measured either in terms of tickets sold, television viewership, advertising revenue, or other means. Tournament organizers often seed the players -- i.e., decide which matches are played -- to increase this value. We study the value maximization objective in a particular tournament format called Challenge the Champ. This is a simple tournament format where an ordering of the players is decided. The first player in this order is the initial champion. The remaining players in order challenge the current champion; if a challenger wins, she replaces the current champion. We model the outcome of a match between two players using a complete directed graph, called a strength graph, with each player represented as a vertex, and the direction of an edge indicating the winner in a match. The value-maximization objective has been recently explored for knockout tournaments when the strength graph is a directed acyclic graph (DAG). We extend the investigation to Challenge the Champ tournaments and general strength graphs. We study different representations of the value of each match, and completely characterize the computational complexity of the problem.

Min-Max Correlation Clustering via Neighborhood Similarity

from arXiv: Data Structures and Algorithms

Authors: Nairen Cao, Steven Roche, Hsin-Hao Su

We present an efficient algorithm for the min-max correlation clustering problem. The input is a complete graph where edges are labeled as either positive $(+)$ or negative $(-)$, and the objective is to find a clustering that minimizes the $\ell_{\infty}$-norm of the disagreement vector over all vertices. We resolve this problem with an efficient $(3 + \epsilon)$-approximation algorithm that runs in nearly linear time, $\tilde{O}(|E^+|)$, where $|E^+|$ denotes the number of positive edges. This improves upon the previous best-known approximation guarantee of 4 by Heidrich, Irmai, and Andres, whose algorithm runs in $O(|V|^2 + |V| D^2)$ time, where $|V|$ is the number of nodes and $D$ is the maximum degree in the graph. Furthermore, we extend our algorithm to the massively parallel computation (MPC) model and the semi-streaming model. In the MPC model, our algorithm runs on machines with memory sublinear in the number of nodes and takes $O(1)$ rounds. In the streaming model, our algorithm requires only $\tilde{O}(|V|)$ space, where $|V|$ is the number of vertices in the graph. Our algorithms are purely combinatorial. They are based on a novel structural observation about the optimal min-max instance, which enables the construction of a $(3 + \epsilon)$-approximation algorithm using $O(|E^+|)$ neighborhood similarity queries. By leveraging random projection, we further show these queries can be computed in nearly linear time.

Authors: Nairen Cao, Steven Roche, Hsin-Hao Su

We present an efficient algorithm for the min-max correlation clustering problem. The input is a complete graph where edges are labeled as either positive $(+)$ or negative $(-)$, and the objective is to find a clustering that minimizes the $\ell_{\infty}$-norm of the disagreement vector over all vertices. We resolve this problem with an efficient $(3 + \epsilon)$-approximation algorithm that runs in nearly linear time, $\tilde{O}(|E^+|)$, where $|E^+|$ denotes the number of positive edges. This improves upon the previous best-known approximation guarantee of 4 by Heidrich, Irmai, and Andres, whose algorithm runs in $O(|V|^2 + |V| D^2)$ time, where $|V|$ is the number of nodes and $D$ is the maximum degree in the graph. Furthermore, we extend our algorithm to the massively parallel computation (MPC) model and the semi-streaming model. In the MPC model, our algorithm runs on machines with memory sublinear in the number of nodes and takes $O(1)$ rounds. In the streaming model, our algorithm requires only $\tilde{O}(|V|)$ space, where $|V|$ is the number of vertices in the graph. Our algorithms are purely combinatorial. They are based on a novel structural observation about the optimal min-max instance, which enables the construction of a $(3 + \epsilon)$-approximation algorithm using $O(|E^+|)$ neighborhood similarity queries. By leveraging random projection, we further show these queries can be computed in nearly linear time.

GPU Memory Usage Optimization for Backward Propagation in Deep Network Training

from arXiv: Data Structures and Algorithms

Authors: Ding-Yong Hong, Tzu-Hsien Tsai, Ning Wang, Pangfeng Liu, Jan-Jan Wu

In modern Deep Learning, it has been a trend to design larger Deep Neural Networks (DNNs) for the execution of more complex tasks and better accuracy. On the other hand, Convolutional Neural Networks (CNNs) have become the standard method for most of computer vision tasks. However, the memory allocation for the intermediate data in convolution layers can cause severe memory pressure during model training. Many solutions have been proposed to resolve the problem. Besides hardware-dependent solutions, a general methodology rematerialization can reduce GPU memory usage by trading computation for memory efficiently. The idea is to select a set of intermediate results during the forward phase as checkpoints, and only save them in memory to reduce memory usage. The backward phase recomputes the intermediate data from the closest checkpoints in memory as needed. This recomputation increases execution time but saves memory by not storing all intermediate results in memory during the forward phase. In this paper, we will focus on efficiently finding the optimal checkpoint subset to achieve the least peak memory usage during the model training. We first describe the theoretical background of the training of a neural network using mathematical equations. We use these equations to identify all essential data required during both forward and backward phases to compute the gradient of weights of the model. We first identify the checkpoint selection problem and propose a dynamic programming algorithm with time complexity O(n3) to solve the problem of finding the optimal checkpoint subset. With extensive experiments, we formulate a more accurate description of the problem using our theoretical analysis and revise the objective function based on the tracing, and propose an O(n)-time algorithm for finding the optimal checkpoint subset.

Authors: Ding-Yong Hong, Tzu-Hsien Tsai, Ning Wang, Pangfeng Liu, Jan-Jan Wu

In modern Deep Learning, it has been a trend to design larger Deep Neural Networks (DNNs) for the execution of more complex tasks and better accuracy. On the other hand, Convolutional Neural Networks (CNNs) have become the standard method for most of computer vision tasks. However, the memory allocation for the intermediate data in convolution layers can cause severe memory pressure during model training. Many solutions have been proposed to resolve the problem. Besides hardware-dependent solutions, a general methodology rematerialization can reduce GPU memory usage by trading computation for memory efficiently. The idea is to select a set of intermediate results during the forward phase as checkpoints, and only save them in memory to reduce memory usage. The backward phase recomputes the intermediate data from the closest checkpoints in memory as needed. This recomputation increases execution time but saves memory by not storing all intermediate results in memory during the forward phase. In this paper, we will focus on efficiently finding the optimal checkpoint subset to achieve the least peak memory usage during the model training. We first describe the theoretical background of the training of a neural network using mathematical equations. We use these equations to identify all essential data required during both forward and backward phases to compute the gradient of weights of the model. We first identify the checkpoint selection problem and propose a dynamic programming algorithm with time complexity O(n3) to solve the problem of finding the optimal checkpoint subset. With extensive experiments, we formulate a more accurate description of the problem using our theoretical analysis and revise the objective function based on the tracing, and propose an O(n)-time algorithm for finding the optimal checkpoint subset.

Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification

from arXiv: Data Structures and Algorithms

Authors: Dhruv Rohatgi, Adam Block, Audrey Huang, Akshay Krishnamurthy, Dylan J. Foster

Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling, but, in practice, suffers from error amplification, where errors in the model compound and generation quality degrades as sequence length $H$ increases. From a theoretical perspective, this phenomenon should not appear in well-specified settings, and, indeed, a growing body of empirical work hypothesizes that misspecification, where the learner is not sufficiently expressive to represent the target distribution, may be the root cause. Under misspecification -- where the goal is to learn as well as the best-in-class model up to a multiplicative approximation factor $C\geq 1$ -- we confirm that $C$ indeed grows with $H$ for next-token prediction, lending theoretical support to this empirical hypothesis. We then ask whether this mode of error amplification is avoidable algorithmically, computationally, or information-theoretically, and uncover inherent computational-statistical tradeoffs. We show: (1) Information-theoretically, one can avoid error amplification and achieve $C=O(1)$. (2) Next-token prediction can be made robust so as to achieve $C=\tilde O(H)$, representing moderate error amplification, but this is an inherent barrier: any next-token prediction-style objective must suffer $C=\Omega(H)$. (3) For the natural testbed of autoregressive linear models, no computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e^{(\log H)^{1-\Omega(1)}}$; however, at least for binary token spaces, one can smoothly trade compute for statistical power and improve on $C=\Omega(H)$ in sub-exponential time. Our results have consequences in the more general setting of imitation learning, where the widely-used behavior cloning algorithm generalizes next-token prediction.

Authors: Dhruv Rohatgi, Adam Block, Audrey Huang, Akshay Krishnamurthy, Dylan J. Foster

Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling, but, in practice, suffers from error amplification, where errors in the model compound and generation quality degrades as sequence length $H$ increases. From a theoretical perspective, this phenomenon should not appear in well-specified settings, and, indeed, a growing body of empirical work hypothesizes that misspecification, where the learner is not sufficiently expressive to represent the target distribution, may be the root cause. Under misspecification -- where the goal is to learn as well as the best-in-class model up to a multiplicative approximation factor $C\geq 1$ -- we confirm that $C$ indeed grows with $H$ for next-token prediction, lending theoretical support to this empirical hypothesis. We then ask whether this mode of error amplification is avoidable algorithmically, computationally, or information-theoretically, and uncover inherent computational-statistical tradeoffs. We show: (1) Information-theoretically, one can avoid error amplification and achieve $C=O(1)$. (2) Next-token prediction can be made robust so as to achieve $C=\tilde O(H)$, representing moderate error amplification, but this is an inherent barrier: any next-token prediction-style objective must suffer $C=\Omega(H)$. (3) For the natural testbed of autoregressive linear models, no computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e^{(\log H)^{1-\Omega(1)}}$; however, at least for binary token spaces, one can smoothly trade compute for statistical power and improve on $C=\Omega(H)$ in sub-exponential time. Our results have consequences in the more general setting of imitation learning, where the widely-used behavior cloning algorithm generalizes next-token prediction.

Tuesday, February 18

The Adaptivity Paradox

from Ben Recht

Test-set reuse: the problem that wasn't.

It’s easy to cook up theory to convince yourself that all machine learning practice is overfitting. Statistical dogma states that every time you look at a holdout set, you leak information, defiling its purity. Let’s unwrap how this could be a problem.

For any prediction function f, we’d like to understand how it will fare on data in the wild. Define the external error, errext[f] to be the error rate of the function on data we’ll evaluate in the future. Define the internal error, errint[f] to be the error we see on the data we have collected so far.1 Finally, let’s say we take some of our data, put it in a special box, and call it a “testing set.” The test error, errtest[f], is the error observed on this test set.

We can estimate the external error using the following decomposition.2

We hope that our set selection data makes the difference between the internal and external errors small. We hope that good data hygiene means that the test error is a reasonable estimate of the error on the data we’ve collected. And hence, we hope that the error on the test set is a reasonable signifier of the external error.

There is some reasonably attractive theory that gives us some intuition about how to select and use a test set. If you sample the test set uniformly from your data set, a single prediction function will satisfy

You can derive this bound using Hoeffding’s inequality. It’s a gross overestimate of what you see in practice, but it suggests that larger test sets will provide better estimates of the internal error. This analysis also suggests that the test set size need not grow as a function of the dataset size.

If you line up a family of K candidate prediction functions in advance and test them on the test set, you can apply a union bound argument to get that

for all K of the prediction functions. This analysis, which is again woefully loose, says that if you want to test a lot of functions, it’s OK because your test sample only needs to grow logarithmically with the number of possible functions. The logarithm of 10,000 is about 9, and the logarithm of a trillion is about 27. For a factor of 3, you get a huge number of extra queries. Importantly, you don’t need a trillion test samples. For machine learning, something in the thousands seems reasonable.

Standard theory suggests we can reuse the test set a lot! But there’s a caveat. For this analysis to work, I must list my candidate prediction functions in advance. If I am allowed to look at the test set and then pick an f with the error I see, I can do something devious.

Let’s suppose that we are working on a simple binary classification problem where there are exactly two possible labels for each data point. For convenience, let’s let the choices be +1 and -1.

Let me define a class of n+1 prediction functions as follows. For the first n of them, I’ll choose random functions, predicting a random label for each data point. The test error on these predictions will be approximately ½. The internal errors will also be about ½. But they won’t be exactly ½ because random fluctuations will induce variance.

I now record the test errors for the n random predictions. Without looking at any of the data, I can then find a linear combination of these functions where the test error is zero, but the internal error remains at ½. Why? It’s because the test error is a linear function of the test labels:

From the set of n test errors, I can solve this system of equations to find the labels on the test set.

OK, so what broke here? How did I get such a large gap between the internal and test errors? This is a problem of adaptivity. I used previous queries to influence a following query. In doing so, I broke the promises I made in my earlier statistical reasoning.

With adaptivity, you can get strikingly small test error with far fewer than n queries. Each query reveals something about the test labels. What errors can you get with K queries? A decade ago, Moritz Hardt proposed a simple boosting strategy. If you take only the T queries that get test error at least ½-s, majority voting gets you a classifier with test error no worse than exp(- 2 s^2 T).3 Straightforward analysis shows that approaches like this can get you an error rate of

The log K has become a K. Yikes. This seems bad!

We’re not as malicious as this in practice, but surely our leaderboard climbing and SOTA chasing are leaking something about the test sets. What happens when you tune hyperparameters by looking at the test error? The popular tuning methods use quote-unquote “reinforcement learning,” which, for the layperson, is trying a bunch of random things and then taking mixtures of what worked well. I’m sorry, but modern reinforcement learning is Hardt’s Boosting attack. Shouldn’t your excessive tuning cause so much test set leakage that you get overly optimistic about your external performance?

The answer seems to be no. And I have no idea why the answer is no. If you use the test error to select among functions that interpolate the training set, you seem to get very good prediction functions. Models continue to improve on external data after a decade or more of test set abuse. When we looked at Kaggle competitions, the public test leaderboard errors were perfect predictors of the private test errors.

So why? This is a great theory question! I mean, we shouldn’t let rigor get in the way of a good time, but shouldn’t we care about why this method of competitive testing and frictionless reproducibility seems to work despite its apparent flaws? Is the intuition from Hoeffding’s inequality just wrong? Is statistical deduction simply invalid?

I have no good answers, but I’d love to hear yours.

Subscribe now

1

In machine learning theory, these are more commonly called the population risk, the empirical risk, respectively.

2

Yes, this sort of thing is the start of many the machine learning theory paper. Add and subtract the same quantities and declare derived insight.

3

This follows from Hoeffding’s Inequality again! It’s as if learning theorists only have one tool.

By Ben Recht

TR25-014 | Information Dissemination via Broadcasts in the Presence of Adversarial Noise | Raghuvansh Saxena, Klim Efremenko, Gillat Kol, Dmitry Paramonov, Ran Raz

from ECCC Papers

We initiate the study of error correcting codes over the multi-party adversarial broadcast channel. Specifically, we consider the classic information dissemination problem where $n$ parties, each holding an input bit, wish to know each other's input. For this, they communicate in rounds, where, in each round, one designated party sends a bit to all other parties over a channel governed by an adversary that may corrupt a constant fraction of the received communication. We mention that the dissemination problem was studied in the stochastic noise model since the 80's. While stochastic noise in multi-party channels has received quite a bit of attention, the case of adversarial noise has largely been avoided, as such channels cannot handle more than a $\frac{1}{n}$-fraction of errors. Indeed, this many errors allow an adversary to completely corrupt the incoming or outgoing communication for one of the parties and fail the protocol. Curiously, we show that by eliminating these "trivial" attacks, one can get a simple protocol resilient to a constant fraction of errors. Thus, a model that rules out such attacks is both necessary and sufficient to get a resilient protocol. The main shortcoming of our dissemination protocol is its length: it requires $\Theta(n^2)$ communication rounds whereas $n$ rounds suffice in the absence of noise. Our main result is a matching lower bound of $\Omega(n^2)$ on the length of any dissemination protocol in our model. Our proof first "gets rid" of the channel noise by converting it to a form of "input noise", showing that a noisy dissemination protocol implies a (noiseless) protocol for a version of the direct sum gap-majority problem. We conclude the proof with a tight lower bound for the latter problem, which may be of independent interest.

We initiate the study of error correcting codes over the multi-party adversarial broadcast channel. Specifically, we consider the classic information dissemination problem where $n$ parties, each holding an input bit, wish to know each other's input. For this, they communicate in rounds, where, in each round, one designated party sends a bit to all other parties over a channel governed by an adversary that may corrupt a constant fraction of the received communication. We mention that the dissemination problem was studied in the stochastic noise model since the 80's. While stochastic noise in multi-party channels has received quite a bit of attention, the case of adversarial noise has largely been avoided, as such channels cannot handle more than a $\frac{1}{n}$-fraction of errors. Indeed, this many errors allow an adversary to completely corrupt the incoming or outgoing communication for one of the parties and fail the protocol. Curiously, we show that by eliminating these "trivial" attacks, one can get a simple protocol resilient to a constant fraction of errors. Thus, a model that rules out such attacks is both necessary and sufficient to get a resilient protocol. The main shortcoming of our dissemination protocol is its length: it requires $\Theta(n^2)$ communication rounds whereas $n$ rounds suffice in the absence of noise. Our main result is a matching lower bound of $\Omega(n^2)$ on the length of any dissemination protocol in our model. Our proof first "gets rid" of the channel noise by converting it to a form of "input noise", showing that a noisy dissemination protocol implies a (noiseless) protocol for a version of the direct sum gap-majority problem. We conclude the proof with a tight lower bound for the latter problem, which may be of independent interest.

TR25-013 | Polynomial Size, Short-Circuit Resilient Circuits for NC | Raghuvansh Saxena, Yael Tauman Kalai

from ECCC Papers

We show how to convert any circuit of poly-logarithmic depth and polynomial size into a functionally equivalent circuit of polynomial size (and polynomial depth) that is resilient to adversarial short-circuit errors. Specifically, the resulting circuit computes the same function even if up to $\epsilon d$ gates on every root-to-leaf path are short-circuited, i.e., their output is replaced with the value of one of its inputs, where $d$ is the depth of the circuit and $\epsilon > 0$ is a fixed constant. Previously, such a result was known for formulas (Kalai-Lewko-Rao, FOCS 2012). It was also known how to convert general circuits to error resilient ones whose size is quasi-polynomial in the size of the original circuit (Efremenko et al.~STOC 2022). The reason both these works do not extend to our setting is that there may be many paths from the root to a given gate, and the resilient circuits needs to "remember" a lot of information about these paths, which causes it to be large. Our main idea is to reduce the amount of this information at the cost of increasing the depth of the resilient circuit.

We show how to convert any circuit of poly-logarithmic depth and polynomial size into a functionally equivalent circuit of polynomial size (and polynomial depth) that is resilient to adversarial short-circuit errors. Specifically, the resulting circuit computes the same function even if up to $\epsilon d$ gates on every root-to-leaf path are short-circuited, i.e., their output is replaced with the value of one of its inputs, where $d$ is the depth of the circuit and $\epsilon > 0$ is a fixed constant. Previously, such a result was known for formulas (Kalai-Lewko-Rao, FOCS 2012). It was also known how to convert general circuits to error resilient ones whose size is quasi-polynomial in the size of the original circuit (Efremenko et al.~STOC 2022). The reason both these works do not extend to our setting is that there may be many paths from the root to a given gate, and the resilient circuits needs to "remember" a lot of information about these paths, which causes it to be large. Our main idea is to reduce the amount of this information at the cost of increasing the depth of the resilient circuit.

TR25-012 | Bit-Fixing Extractors for Almost-Logarithmic Entropy | Dean Doron, Ori Fridman

from ECCC Papers

An oblivious bit-fixing source is a distribution over $\{0,1\}^n$, where $k$ bits are uniform and independent and the rest $n-k$ are fixed a priori to some constant value. Extracting (close to) true randomness from an oblivious bit-fixing source has been studied since the 1980s, with applications in cryptography and complexity theory. We construct explicit extractors for oblivious bit-fixing source that support $k = \widetilde{O}(\log n)$, outputting almost all the entropy with low error. The previous state-of-the-art construction that outputs many bits is due to Rao [Rao, CCC '09], and require entropy $k \ge \log^{c}n$ for some large constant $c$. The two key components in our constructions are new low-error affine condensers for poly-logarithmic entropies (that we achieve using techniques from the nonmalleable extractors literature), and a dual use of linear condensers for OBF sources.

An oblivious bit-fixing source is a distribution over $\{0,1\}^n$, where $k$ bits are uniform and independent and the rest $n-k$ are fixed a priori to some constant value. Extracting (close to) true randomness from an oblivious bit-fixing source has been studied since the 1980s, with applications in cryptography and complexity theory. We construct explicit extractors for oblivious bit-fixing source that support $k = \widetilde{O}(\log n)$, outputting almost all the entropy with low error. The previous state-of-the-art construction that outputs many bits is due to Rao [Rao, CCC '09], and require entropy $k \ge \log^{c}n$ for some large constant $c$. The two key components in our constructions are new low-error affine condensers for poly-logarithmic entropies (that we achieve using techniques from the nonmalleable extractors literature), and a dual use of linear condensers for OBF sources.

Explaining Necessary Truths

from arXiv: Computational Complexity

Authors: Gülce Kardeş, Simon DeDeo

Knowing the truth is rarely enough -- we also seek out reasons why the fact is true. While much is known about how we explain contingent truths, we understand less about how we explain facts, such as those in mathematics, that are true as a matter of logical necessity. We present a framework, based in computational complexity, where explanations for deductive truths co-emerge with discoveries of simplifying steps during the search process. When such structures are missing, we revert, in turn, to error-based reasons, where a (corrected) mistake can serve as fictitious, but explanatory, contingency-cause: not making the mistake serves as a reason why the truth takes the form it does. We simulate human subjects, using GPT-4o, presented with SAT puzzles of varying complexity and reasonableness, validating our theory and showing how its predictions can be tested in future human studies.

Authors: Gülce Kardeş, Simon DeDeo

Knowing the truth is rarely enough -- we also seek out reasons why the fact is true. While much is known about how we explain contingent truths, we understand less about how we explain facts, such as those in mathematics, that are true as a matter of logical necessity. We present a framework, based in computational complexity, where explanations for deductive truths co-emerge with discoveries of simplifying steps during the search process. When such structures are missing, we revert, in turn, to error-based reasons, where a (corrected) mistake can serve as fictitious, but explanatory, contingency-cause: not making the mistake serves as a reason why the truth takes the form it does. We simulate human subjects, using GPT-4o, presented with SAT puzzles of varying complexity and reasonableness, validating our theory and showing how its predictions can be tested in future human studies.

Tusqh: Topological Control of Volume-Fraction Meshes Near Small Features and Dirty Geometry

from arXiv: Computational Geometry

Authors: Brian Shawcroft, Kendrick M. Shepherd, Scott Mitchell

This work develops a framework to create meshes with user-specified homology from potentially dirty geometry by coupling background grids, persistent homology, and a generalization of volume fractions. For a mesh with fixed grid size, the topology of the output mesh changes predictably and monotonically as its volume-fraction threshold decreases. Topological anti-aliasing methods are introduced to resolve pinch points and disconnected regions that are artifacts of user choice of grid size and orientation, making the output meshes suitable for downstream processes including analysis. The methodology is demonstrated on geographical, mechanical, and graphics models in 2D and 3D using a custom-made software called Tusqh. The work demonstrates that the proposed framework is viable for generating meshes on topologically invalid geometries and for automatic defeaturing of small geometric artifacts. Finally, the work shows that although subdividing the background grid frequently improves the topological and geometrical fidelity of the output mesh, there are simple 2D examples for which the topology does not converge under refinement for volume-fraction codes.

Authors: Brian Shawcroft, Kendrick M. Shepherd, Scott Mitchell

This work develops a framework to create meshes with user-specified homology from potentially dirty geometry by coupling background grids, persistent homology, and a generalization of volume fractions. For a mesh with fixed grid size, the topology of the output mesh changes predictably and monotonically as its volume-fraction threshold decreases. Topological anti-aliasing methods are introduced to resolve pinch points and disconnected regions that are artifacts of user choice of grid size and orientation, making the output meshes suitable for downstream processes including analysis. The methodology is demonstrated on geographical, mechanical, and graphics models in 2D and 3D using a custom-made software called Tusqh. The work demonstrates that the proposed framework is viable for generating meshes on topologically invalid geometries and for automatic defeaturing of small geometric artifacts. Finally, the work shows that although subdividing the background grid frequently improves the topological and geometrical fidelity of the output mesh, there are simple 2D examples for which the topology does not converge under refinement for volume-fraction codes.

On a tree-based variant of bandwidth and forbidding simple topological minors

from arXiv: Data Structures and Algorithms

Authors: Hugo Jacob, William Lochet, Christophe Paul

We obtain structure theorems for graphs excluding a fan (a path with a universal vertex) or a dipole ($K_{2,k}$) as a topological minor. The corresponding decompositions can be computed in FPT linear time. This is motivated by the study of a graph parameter we call treebandwidth which extends the graph parameter bandwidth by replacing the linear layout by a rooted tree such that neighbours in the graph are in ancestor-descendant relation in the tree. We deduce an approximation algorithm for treebandwidth running in FPT linear time from our structure theorems. We complement this result with a precise characterisation of the parameterised complexity of computing the parameter exactly.

Authors: Hugo Jacob, William Lochet, Christophe Paul

We obtain structure theorems for graphs excluding a fan (a path with a universal vertex) or a dipole ($K_{2,k}$) as a topological minor. The corresponding decompositions can be computed in FPT linear time. This is motivated by the study of a graph parameter we call treebandwidth which extends the graph parameter bandwidth by replacing the linear layout by a rooted tree such that neighbours in the graph are in ancestor-descendant relation in the tree. We deduce an approximation algorithm for treebandwidth running in FPT linear time from our structure theorems. We complement this result with a precise characterisation of the parameterised complexity of computing the parameter exactly.

Algorithms and Hardness for Estimating Statistical Similarity

from arXiv: Data Structures and Algorithms

Authors: Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, Dimitrios Myrisiotis, A. Pavan, N. V. Vinodchandran

We study the problem of computing statistical similarity between probability distributions. For distributions $P$ and $Q$ over a finite sample space, their statistical similarity is defined as $S_{\mathrm{stat}}(P, Q) := \sum_{x} \min(P(x), Q(x))$. Statistical similarity is a basic measure of similarity between distributions, with several natural interpretations, and captures the Bayes error in prediction and hypothesis testing problems. Recent work has established that, somewhat surprisingly, even for the simple class of product distributions, exactly computing statistical similarity is $\#\mathsf{P}$-hard. This motivates the question of designing approximation algorithms for statistical similarity. Our primary contribution is a Fully Polynomial-Time deterministic Approximation Scheme (FPTAS) for estimating statistical similarity between two product distributions. To obtain this result, we introduce a new variant of the Knapsack problem, which we call the Masked Knapsack problem, and design an FPTAS to estimate the number of solutions of a multidimensional version of this problem. This new technical contribution could be of independent interest. Furthermore, we also establish a complementary hardness result. We show that it is $\mathsf{NP}$-hard to estimate statistical similarity when $P$ and $Q$ are Bayes net distributions of in-degree $2$.

Authors: Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, Dimitrios Myrisiotis, A. Pavan, N. V. Vinodchandran

We study the problem of computing statistical similarity between probability distributions. For distributions $P$ and $Q$ over a finite sample space, their statistical similarity is defined as $S_{\mathrm{stat}}(P, Q) := \sum_{x} \min(P(x), Q(x))$. Statistical similarity is a basic measure of similarity between distributions, with several natural interpretations, and captures the Bayes error in prediction and hypothesis testing problems. Recent work has established that, somewhat surprisingly, even for the simple class of product distributions, exactly computing statistical similarity is $\#\mathsf{P}$-hard. This motivates the question of designing approximation algorithms for statistical similarity. Our primary contribution is a Fully Polynomial-Time deterministic Approximation Scheme (FPTAS) for estimating statistical similarity between two product distributions. To obtain this result, we introduce a new variant of the Knapsack problem, which we call the Masked Knapsack problem, and design an FPTAS to estimate the number of solutions of a multidimensional version of this problem. This new technical contribution could be of independent interest. Furthermore, we also establish a complementary hardness result. We show that it is $\mathsf{NP}$-hard to estimate statistical similarity when $P$ and $Q$ are Bayes net distributions of in-degree $2$.

Fully Dynamic LZ77 in Sublinear Time

from arXiv: Data Structures and Algorithms

Authors: Itai Boneh, Matan Kraus

The Lempel-Ziv 77 (LZ77) factorization is a fundamental compression scheme widely used in text processing and data compression. While efficient static algorithms exist for computing LZ77, maintaining it dynamically remains a challenging problem. Recently, Bannai, Charalampopoulos, and Radoszewski introduced an algorithm that maintains the size of the LZ77 factorization of a dynamic text in $\tilde{O}(\sqrt{n})$ per update. Their data structure works in the semi-dynamic model, where the only allowed updates are insertions at the end of the string or deletions from the start. In contrast, we present an algorithm that operates in a significantly more general setting of arbitrary edit operations. Our algorithm maintains the size of the LZ77 factorization of a string undergoing symbol substitutions, deletions, and insertions in $\tilde{O}(n^{2/3})$ time per update. Additionally, our data structure supports random access to the LZ77 factorization in polylogarithmic time, providing enhanced functionality for dynamic text processing.

Authors: Itai Boneh, Matan Kraus

The Lempel-Ziv 77 (LZ77) factorization is a fundamental compression scheme widely used in text processing and data compression. While efficient static algorithms exist for computing LZ77, maintaining it dynamically remains a challenging problem. Recently, Bannai, Charalampopoulos, and Radoszewski introduced an algorithm that maintains the size of the LZ77 factorization of a dynamic text in $\tilde{O}(\sqrt{n})$ per update. Their data structure works in the semi-dynamic model, where the only allowed updates are insertions at the end of the string or deletions from the start. In contrast, we present an algorithm that operates in a significantly more general setting of arbitrary edit operations. Our algorithm maintains the size of the LZ77 factorization of a string undergoing symbol substitutions, deletions, and insertions in $\tilde{O}(n^{2/3})$ time per update. Additionally, our data structure supports random access to the LZ77 factorization in polylogarithmic time, providing enhanced functionality for dynamic text processing.

Algorithm Engineering of SSSP With Negative Edge Weights

from arXiv: Data Structures and Algorithms

Authors: Alejandro Cassis, Andreas Karrenbauer, André Nusser, Paolo Luigi Rinaldi

Computing shortest paths is one of the most fundamental algorithmic graph problems. It is known since decades that this problem can be solved in near-linear time if all weights are nonnegative. A recent break-through by [Bernstein, Nanongkai, Wulff-Nilsen '22] presented a randomized near-linear time algorithm for this problem. A subsequent improvement in [Bringmann, Cassis, Fischer '23] significantly reduced the number of logarithmic factors and thereby also simplified the algorithm. It is surprising and exciting that both of these algorithms are combinatorial and do not contain any fundamental obstacles for being practical. We launch the, to the best of our knowledge, first extensive investigation towards a practical implementation of [Bringmann, Cassis, Fischer '23]. To this end, we give an accessible overview of the algorithm, discussing what adaptions are necessary to obtain a fast algorithm in practice. We manifest these adaptions in an efficient implementation. We test our implementation on a benchmark data set that is adapted to be more difficult for our implementation in order to allow for a fair comparison. As in [Bringmann, Cassis, Fischer '23] as well as in our implementation there are multiple parameters to tune, we empirically evaluate their effect and thereby determine the best choices. Our implementation is then extensively compared to one of the state-of-the-art algorithms for this problem [Goldberg, Radzik '93]. On the hardest instance type, we are faster by up to almost two orders of magnitude.

Authors: Alejandro Cassis, Andreas Karrenbauer, André Nusser, Paolo Luigi Rinaldi

Computing shortest paths is one of the most fundamental algorithmic graph problems. It is known since decades that this problem can be solved in near-linear time if all weights are nonnegative. A recent break-through by [Bernstein, Nanongkai, Wulff-Nilsen '22] presented a randomized near-linear time algorithm for this problem. A subsequent improvement in [Bringmann, Cassis, Fischer '23] significantly reduced the number of logarithmic factors and thereby also simplified the algorithm. It is surprising and exciting that both of these algorithms are combinatorial and do not contain any fundamental obstacles for being practical. We launch the, to the best of our knowledge, first extensive investigation towards a practical implementation of [Bringmann, Cassis, Fischer '23]. To this end, we give an accessible overview of the algorithm, discussing what adaptions are necessary to obtain a fast algorithm in practice. We manifest these adaptions in an efficient implementation. We test our implementation on a benchmark data set that is adapted to be more difficult for our implementation in order to allow for a fair comparison. As in [Bringmann, Cassis, Fischer '23] as well as in our implementation there are multiple parameters to tune, we empirically evaluate their effect and thereby determine the best choices. Our implementation is then extensively compared to one of the state-of-the-art algorithms for this problem [Goldberg, Radzik '93]. On the hardest instance type, we are faster by up to almost two orders of magnitude.

Logarithmic Approximation for Road Pricing on Grids

from arXiv: Data Structures and Algorithms

Authors: Andrei Constantinescu, Andrzej Turko, Roger Wattenhofer

Consider a graph $G = (V, E)$ and some commuters, each specified by a tuple $(u, v, b)$ consisting of two nodes in the graph $u, v \in V$ and a non-negative real number $b$, specifying their budget. The goal is to find a pricing function $p$ of the edges of $G$ that maximizes the revenue generated by the commuters. Here, each commuter $(u, v, b)$ either pays the lowest-cost of a $u$-$v$ path under the pricing $p$, or 0, if this exceeds their budget $b$. We study this problem for the case where $G$ is a bounded-width grid graph and give a polynomial-time approximation algorithm with approximation ratio $O(\log |E|)$. Our approach combines existing ideas with new insights. Most notably, we employ a rather seldom-encountered technique that we coin under the name 'assume-implement dynamic programming.' This technique involves dynamic programming where some information about the future decisions of the dynamic program is guessed in advance and 'assumed' to hold, and then subsequent decisions are forced to 'implement' the guess. This enables computing the cost of the current transition by using information that would normally only be available in the future.

Authors: Andrei Constantinescu, Andrzej Turko, Roger Wattenhofer

Consider a graph $G = (V, E)$ and some commuters, each specified by a tuple $(u, v, b)$ consisting of two nodes in the graph $u, v \in V$ and a non-negative real number $b$, specifying their budget. The goal is to find a pricing function $p$ of the edges of $G$ that maximizes the revenue generated by the commuters. Here, each commuter $(u, v, b)$ either pays the lowest-cost of a $u$-$v$ path under the pricing $p$, or 0, if this exceeds their budget $b$. We study this problem for the case where $G$ is a bounded-width grid graph and give a polynomial-time approximation algorithm with approximation ratio $O(\log |E|)$. Our approach combines existing ideas with new insights. Most notably, we employ a rather seldom-encountered technique that we coin under the name 'assume-implement dynamic programming.' This technique involves dynamic programming where some information about the future decisions of the dynamic program is guessed in advance and 'assumed' to hold, and then subsequent decisions are forced to 'implement' the guess. This enables computing the cost of the current transition by using information that would normally only be available in the future.

Parameterised algorithms for temporal reconfiguration problems

from arXiv: Data Structures and Algorithms

Authors: Tom Davot, Jessica Enright, Laura Larios-Jones

Given a static vertex-selection problem (e.g. independent set, dominating set) on a graph, we can define a corresponding temporal reconfiguration problem on a temporal graph which asks for a sequence of solutions to the vertex-selection problem at each time such that we can reconfigure from one solution to the next. We can think of each solution in the sequence as a set of vertices with tokens placed on them; our reconfiguration model allows us to slide tokens along active edges of a temporal graph. We show that it is possible to efficiently check whether one solution can be reconfigured to another, and show that approximation results on the static vertex-selection problem can be adapted with a lifetime factor to the reconfiguration version. Our main contributions are fixed-parameter tractable algorithms with respect to: enumeration time of the related static problem; the combination of temporal neighbourhood diversity and lifetime of the input graph; and the combination of lifetime and treewidth of the footprint graph.

Authors: Tom Davot, Jessica Enright, Laura Larios-Jones

Given a static vertex-selection problem (e.g. independent set, dominating set) on a graph, we can define a corresponding temporal reconfiguration problem on a temporal graph which asks for a sequence of solutions to the vertex-selection problem at each time such that we can reconfigure from one solution to the next. We can think of each solution in the sequence as a set of vertices with tokens placed on them; our reconfiguration model allows us to slide tokens along active edges of a temporal graph. We show that it is possible to efficiently check whether one solution can be reconfigured to another, and show that approximation results on the static vertex-selection problem can be adapted with a lifetime factor to the reconfiguration version. Our main contributions are fixed-parameter tractable algorithms with respect to: enumeration time of the related static problem; the combination of temporal neighbourhood diversity and lifetime of the input graph; and the combination of lifetime and treewidth of the footprint graph.

Private Synthetic Graph Generation and Fused Gromov-Wasserstein Distance

from arXiv: Data Structures and Algorithms

Authors: Leoni Carla Wirth, Gholamali Aminian, Gesine Reinert

Networks are popular for representing complex data. In particular, differentially private synthetic networks are much in demand for method and algorithm development. The network generator should be easy to implement and should come with theoretical guarantees. Here we start with complex data as input and jointly provide a network representation as well as a synthetic network generator. Using a random connection model, we devise an effective algorithmic approach for generating attributed synthetic graphs which is $\epsilon$-differentially private at the vertex level, while preserving utility under an appropriate notion of distance which we develop. We provide theoretical guarantees for the accuracy of the private synthetic graphs using the fused Gromov-Wasserstein distance, which extends the Wasserstein metric to structured data. Our method draws inspiration from the PSMM method of \citet{he2023}.

Authors: Leoni Carla Wirth, Gholamali Aminian, Gesine Reinert

Networks are popular for representing complex data. In particular, differentially private synthetic networks are much in demand for method and algorithm development. The network generator should be easy to implement and should come with theoretical guarantees. Here we start with complex data as input and jointly provide a network representation as well as a synthetic network generator. Using a random connection model, we devise an effective algorithmic approach for generating attributed synthetic graphs which is $\epsilon$-differentially private at the vertex level, while preserving utility under an appropriate notion of distance which we develop. We provide theoretical guarantees for the accuracy of the private synthetic graphs using the fused Gromov-Wasserstein distance, which extends the Wasserstein metric to structured data. Our method draws inspiration from the PSMM method of \citet{he2023}.

On the Locality of the Lovász Local Lemma

from arXiv: Data Structures and Algorithms

Authors: Peter Davies-Peck

The Lov\'asz Local Lemma is a versatile result in probability theory, characterizing circumstances in which a collection of $n$ `bad events', each occurring with probability at most $p$ and dependent on a set of underlying random variables, can be avoided. It is a central tool of the probabilistic method, since it can be used to show that combinatorial objects satisfying some desirable properties must exist. While the original proof was existential, subsequent work has shown algorithms for the Lov\'asz Local Lemma: that is, in circumstances in which the lemma proves the existence of some object, these algorithms can constructively find such an object. One main strand of these algorithms, which began with Moser and Tardos's well-known result (JACM 2010), involves iteratively resampling the dependent variables of satisfied bad events until none remain satisfied. In this paper, we present a novel analysis that can be applied to resampling-style Lov\'asz Local Lemma algorithms. This analysis shows that an output assignment for the dependent variables of most events can be determined only from $O(\log \log_{1/p} n)$-radius local neighborhoods, and that the events whose variables may still require resampling can be identified from these neighborhoods. This allows us to improve randomized complexities for the constructive Lov\'asz Local Lemma (with polynomial criterion) in several parallel and distributed models. In particular, we obtain: 1) A LOCAL algorithm with $O(\log\log_{1/p} n)$ node-averaged complexity (while matching the $O(\log_{1/p} n)$ worst-case complexity of Chung, Pettie, and Su). 2) An algorithm for the LCA and VOLUME models requiring $d^{O(\log\log_{1/p} n)}$ probes per query. 3) An $O(\log\log\log_{1/p} n)$-round algorithm for CONGESTED CLIQUE, linear space MPC, and Heterogenous MPC.

Authors: Peter Davies-Peck

The Lov\'asz Local Lemma is a versatile result in probability theory, characterizing circumstances in which a collection of $n$ `bad events', each occurring with probability at most $p$ and dependent on a set of underlying random variables, can be avoided. It is a central tool of the probabilistic method, since it can be used to show that combinatorial objects satisfying some desirable properties must exist. While the original proof was existential, subsequent work has shown algorithms for the Lov\'asz Local Lemma: that is, in circumstances in which the lemma proves the existence of some object, these algorithms can constructively find such an object. One main strand of these algorithms, which began with Moser and Tardos's well-known result (JACM 2010), involves iteratively resampling the dependent variables of satisfied bad events until none remain satisfied. In this paper, we present a novel analysis that can be applied to resampling-style Lov\'asz Local Lemma algorithms. This analysis shows that an output assignment for the dependent variables of most events can be determined only from $O(\log \log_{1/p} n)$-radius local neighborhoods, and that the events whose variables may still require resampling can be identified from these neighborhoods. This allows us to improve randomized complexities for the constructive Lov\'asz Local Lemma (with polynomial criterion) in several parallel and distributed models. In particular, we obtain: 1) A LOCAL algorithm with $O(\log\log_{1/p} n)$ node-averaged complexity (while matching the $O(\log_{1/p} n)$ worst-case complexity of Chung, Pettie, and Su). 2) An algorithm for the LCA and VOLUME models requiring $d^{O(\log\log_{1/p} n)}$ probes per query. 3) An $O(\log\log\log_{1/p} n)$-round algorithm for CONGESTED CLIQUE, linear space MPC, and Heterogenous MPC.

A linear-time algorithm computing the resident fitness in interacting trajectories

from arXiv: Data Structures and Algorithms

Authors: Katalin Friedl, Viktória Nemkin, András Tóbiás

The notion of a system of interacting trajectories was recently introduced by Hermann, Gonz\'alez Casanova, Soares dos Santos, T\'obi\'as and Wakolbinger. Such a system of $[0,1]$-valued piecewise linear trajectories arises as a scaling limit of the system of logarithmic subpopulation sizes in a certain population-genetic model (more precisely, a Moran model) with mutation and selection. By definition, the resident fitness is initially 0 and afterwards it increases by the ultimate slope of each trajectory that reaches height 1. We show that although the interaction of $n$ trajectories may yield $\Omega(n^2)$ slope changes in total, the resident fitness (at all times) can be computed algorithmically in $O(n)$ time. Our algorithm is given in terms of the so-called continued lines representation of the system of interacting trajectories. In the special case of Poissonian interacting trajectories where the birth times of the trajectories form a Poisson process and the initial slopes are random and i.i.d., we show that even the expected number of slope changes grows only linearly in time.

Authors: Katalin Friedl, Viktória Nemkin, András Tóbiás

The notion of a system of interacting trajectories was recently introduced by Hermann, Gonz\'alez Casanova, Soares dos Santos, T\'obi\'as and Wakolbinger. Such a system of $[0,1]$-valued piecewise linear trajectories arises as a scaling limit of the system of logarithmic subpopulation sizes in a certain population-genetic model (more precisely, a Moran model) with mutation and selection. By definition, the resident fitness is initially 0 and afterwards it increases by the ultimate slope of each trajectory that reaches height 1. We show that although the interaction of $n$ trajectories may yield $\Omega(n^2)$ slope changes in total, the resident fitness (at all times) can be computed algorithmically in $O(n)$ time. Our algorithm is given in terms of the so-called continued lines representation of the system of interacting trajectories. In the special case of Poissonian interacting trajectories where the birth times of the trajectories form a Poisson process and the initial slopes are random and i.i.d., we show that even the expected number of slope changes grows only linearly in time.