Last Update

OPML feed of all feeds.

Subscribe to the Atom feed, RSS feed to stay up to date.

Thank you to arXiv for use of its open access interoperability.

Note: the date of arXiv entries announced right after publication holidays might incorrectly show up as the date of the publication holiday itself. This is due to our ad hoc method of inferring announcement dates, which are not returned by the arXiv API.

Powered by Pluto.

Source on GitHub.

Maintained by Nima Anari, Arnab Bhattacharyya, Gautam Kamath.

Theory of Computing Report

Friday, March 28

baby, it's cold inside

from Ben Recht

celebrating the reissue of an ambient deep cut

In 2006, I got my PhD, got married, and moved to California to start a post-doc. I started exercising a lot more. I started eating better. I didn’t miss the eastern seaboard autumn or Massachusetts weather. I had a network of friends who had settled out there before me. I loved the LA lifestyle.

But then, why was I so depressed? It gets dark at 4 in Los Angeles December, and the temperature plummets after sunset. The uninsulated ranch houses in the sprawl don’t protect you from the chill.

A few days before heading out for Christmas with in-laws in Virginia, I recorded a new song in my makeshift home studio. I called it baby, it’s cold inside, and emailed my bandmate Isaac, “I've got to work on making things creepy and not just melancholy.” That wouldn’t happen. But that session’s name would inspire us to record one of our more successful albums. Melancholy would remain our sound, and to our surprise, it resonated with a lot of people, who were probably also brooding in their late twenties.

And this album still resonates. We’re beyond thrilled to announce that Berlin’s Keplar label is re-releasing baby, it’s cold inside remastered on vinyl. It comes out today. Go get your copy!

Originally released in 2008, this would be our second album with the curatorial wizards at Barge Recordings. Our first album with Barge, life-sized psychoses, came out in April 2007, and the Barge guys convinced us to do a little tour to promote the record, booking shows in Boston, Providence, and Brooklyn. I flew out to Cambridge for a few weeks to prepare.

It was the first album we tried to record in Isaac’s apartment, a run down second story walkup in Cambridgeport. Isaac and his partner Ari had separate bedrooms, but Isaac’s room was more of a multipurpose oversized closet fondly dubbed “the bedroom tomb.” It had a loft bed and was packed to the gills with records, music gear, a screen printing setup, film equipment, and stuff gathered off the street. When I moved to California, I left one of my baritone guitars, a direct box, and an overdrive pedal with Isaac. They were buried in the bedroom tomb too.

The bedroom tomb was decidedly more lo-fi than my home studios, but this was part of its charm. Bad cabling or unpredictable electronics or just awkward sitting brought a lot of weird serendipity. We hadn’t made music together in months, but we sat down and things started flowing. Over three sessions in one week, we captured our next record.

On the first Friday, we recorded fucking milwaukee’s been hesher forever, a dreamy accident of one of the built-in delays of Ableton Live. On Sunday, autoshow day of the dead emerged from an exercise of tension around a piano loop. And then the rest of the record came together the following Friday the 13th. We spent an evening leaning into the noisier end of the spectrum, experimenting with more distortion and more grit. The result was a little more warm shoegazer distortion than cold post-rock arpeggios.

We’d go on to play some of these songs during the tour, trying to recreate the vibes. This was the first time we tried to capture the initial improvisations in a live experience. Over the next few months, we threw the many different iterations into a single session, moving pieces around and sending each other clips via YouSendIt and email. Eventually, we patched together baby, it’s cold inside.

Sixteen years later, Keplar emailed us out of the blue. Since it seemed like most music had moved to short-form social media, Isaac and I had been pretty delinquent at checking that email account. Though they reached out in June, it wasn’t until August, when I logged in to get a recovery code for a music distribution website, that we saw the email. We thought we had missed out on a great opportunity. Fortunately, the folks at Keplar were enthusiastic to push this forward, and you can grab a physical copy today.

We hope you like it! The fun years still has an infinite back catalog that we’d like to release eventually. Our motto has always been “from quantity comes quality.” We’ve never convinced ourselves to embrace the social media mindset of constantly posting, but maybe it’s time we do. In that spirit, we’re releasing the original, wistful solo piece on our bandcamp. I’m not sure if anyone other than me and Isaac have even heard this before. We hope you dig it. And who knows, 2025 could be the year we dust off our archives to see what’s still cold inside.

Subscribe now

By Ben Recht

PhD at Inria Lille (apply by April 23, 2025)

from CCI: jobs

The LINKS team at Inria Lille is seeking PhD candidates to work on the topic of “Efficient enumeration via edits”. The PhD will be carried out in the LINKS team in Lille, France: it offers a dynamic environment and a friendly atmosphere. The PhD will be co-supervised by Antoine Amarilli (a3nm.net/) and Mikaël Monet (mikael-monet.net/), […]

The LINKS team at Inria Lille is seeking PhD candidates to work on the topic of “Efficient enumeration via edits”. The PhD will be carried out in the LINKS team in Lille, France: it offers a dynamic environment and a friendly atmosphere. The PhD will be co-supervised by Antoine Amarilli (https://a3nm.net/) and Mikaël Monet (https://mikael-monet.net/), and starts in fall 2025.

Website: https://a3nm.net/work/research/offers/phd_enumeration_changes.pdf
Email: a3nm@a3nm.net

By shacharlovett

TCS+ talk: Wednesday, April 9 — Or Zamir, Tel Aviv University

from TCS+ Seminar Series

The third TCS+ talk of the season will take place on Wednesday, April 9th at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 17:00 UTC: check your time zone here). Or Zamir from Tel Aviv University will speak about “Optimality of Frequency Moment Estimation” (abstract below). (Note that the talk is […]

The third TCS+ talk of the season will take place on Wednesday, April 9th at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 17:00 UTC: check your time zone here). Or Zamir from Tel Aviv University will speak about “Optimality of Frequency Moment Estimation” (abstract below).

(Note that the talk is on April 9th, not April 2nd, to accommodate the FOCS deadline.)

You can reserve a spot as an individual or a group to join us live by signing up on the online form. Registration is not required to attend the interactive talk, and the link will be posted on the website the day prior to the talk; however, by registering in the form, you will receive a reminder, along with the link. (The recorded talk will also be posted on our website afterwards) As usual, for more information about the TCS+ online seminar series and the upcoming talks, or to suggest a possible topic or speaker, please see the website.

Abstract: Estimating the second frequency moment of a stream up to (1±ε) multiplicative error requires at most O(log n / ε²) bits of space, due to a seminal result of Alon, Matias, and Szegedy. It is also known that at least Ω(log n + 1/ε²) space is needed. We prove a tight lower bound of Ω(log(nε²) / ε²) for all ε = Ω(1/√n). Notably, when ε > n^(-1/2 + c), where c > 0, our lower bound matches the classic upper bound of AMS. For smaller values of ε, we also introduce a revised algorithm that improves the classic AMS bound and matches our lower bound. Our lower bound also applies to the more general problem of p-th frequency moment estimation for the range of p in (1, 2], providing a tight bound in the only remaining range to settle the optimal space complexity of estimating frequency moments.

Based on a joint work with Mark Braverman.

By plustcs

Time hierarchies for sublogarithmic-space quantum computation

from arXiv: Computational Complexity

Authors: A. C. Cem Say

We present new results on the landscape of problems that can be solved by quantum Turing machines (QTM's) employing severely limited amounts of memory. In this context, we demonstrate two infinite time hierarchies of complexity classes within the ``small space'' regime: For all $i\geq 0$, there is a language that can be recognized by a constant-space machine in $2^{O(n^{1/2^i})}$ time, but not by any sublogarithmic-space QTM in $2^{O(n^{1/2^{i+1}})}$ time. For quantum machines operating within $o(\log \log n)$ space, there exists another hierarchy, each level of which corresponds to an expected runtime of $2^{O((\log n)^i)}$ for a different positive integer $i$. We also improve a quantum advantage result, demonstrating a language that can be recognized by a polynomial-time constant-space QTM, but not by any classical machine using $o(\log \log n)$ space, regardless of the time budget. The implications of our findings for quantum space-time tradeoffs are discussed.

Authors: A. C. Cem Say

We present new results on the landscape of problems that can be solved by quantum Turing machines (QTM's) employing severely limited amounts of memory. In this context, we demonstrate two infinite time hierarchies of complexity classes within the ``small space'' regime: For all $i\geq 0$, there is a language that can be recognized by a constant-space machine in $2^{O(n^{1/2^i})}$ time, but not by any sublogarithmic-space QTM in $2^{O(n^{1/2^{i+1}})}$ time. For quantum machines operating within $o(\log \log n)$ space, there exists another hierarchy, each level of which corresponds to an expected runtime of $2^{O((\log n)^i)}$ for a different positive integer $i$. We also improve a quantum advantage result, demonstrating a language that can be recognized by a polynomial-time constant-space QTM, but not by any classical machine using $o(\log \log n)$ space, regardless of the time budget. The implications of our findings for quantum space-time tradeoffs are discussed.

Lattice Based Crypto breaks in a Superposition of Spacetimes

from arXiv: Computational Complexity

Authors: Divesh Aggarwal, Shashwat Agrawal, Rajendra Kumar

We explore the computational implications of a superposition of spacetimes, a phenomenon hypothesized in quantum gravity theories. This was initiated by Shmueli (2024) where the author introduced the complexity class $\mathbf{BQP^{OI}}$ consisting of promise problems decidable by quantum polynomial time algorithms with access to an oracle for computing order interference. In this work, it was shown that the Graph Isomorphism problem and the Gap Closest Vector Problem (with approximation factor $\mathcal{O}(n^{3/2})$) are in $\mathbf{BQP^{OI}}$. We extend this result by showing that the entire complexity class $\mathbf{SZK}$ (Statistical Zero Knowledge) is contained within $\mathbf{BQP^{OI}}$. This immediately implies that the security of numerous lattice based cryptography schemes will be compromised in a computational model based on superposition of spacetimes, since these often rely on the hardness of the Learning with Errors problem, which is in $\mathbf{SZK}$.

Authors: Divesh Aggarwal, Shashwat Agrawal, Rajendra Kumar

We explore the computational implications of a superposition of spacetimes, a phenomenon hypothesized in quantum gravity theories. This was initiated by Shmueli (2024) where the author introduced the complexity class $\mathbf{BQP^{OI}}$ consisting of promise problems decidable by quantum polynomial time algorithms with access to an oracle for computing order interference. In this work, it was shown that the Graph Isomorphism problem and the Gap Closest Vector Problem (with approximation factor $\mathcal{O}(n^{3/2})$) are in $\mathbf{BQP^{OI}}$. We extend this result by showing that the entire complexity class $\mathbf{SZK}$ (Statistical Zero Knowledge) is contained within $\mathbf{BQP^{OI}}$. This immediately implies that the security of numerous lattice based cryptography schemes will be compromised in a computational model based on superposition of spacetimes, since these often rely on the hardness of the Learning with Errors problem, which is in $\mathbf{SZK}$.

Matchgate signatures under variable permutations

from arXiv: Computational Complexity

Authors: Boning Meng, Yicheng Pan

In this article, we give a sufficient and necessary condition for determining whether a matchgate signature retains its property under a certain variable permutation, which can be checked in polynomial time. We also define the concept of permutable matchgate signatures, and use it to erase the gap between Pl-\#CSP and \#CSP on planar graphs in the previous study. We provide a detailed characterization of permutable matchgate signatures as well, by presenting their relation to symmetric matchgate signatures. In addition, we prove a dichotomy for Pl-$\#R_D$-CSP where $D\ge 3$ is an integer.

Authors: Boning Meng, Yicheng Pan

In this article, we give a sufficient and necessary condition for determining whether a matchgate signature retains its property under a certain variable permutation, which can be checked in polynomial time. We also define the concept of permutable matchgate signatures, and use it to erase the gap between Pl-\#CSP and \#CSP on planar graphs in the previous study. We provide a detailed characterization of permutable matchgate signatures as well, by presenting their relation to symmetric matchgate signatures. In addition, we prove a dichotomy for Pl-$\#R_D$-CSP where $D\ge 3$ is an integer.

Efficient Computation of the Directional Extremal Boundary of a Union of Equal-Radius Circles

from arXiv: Computational Geometry

Authors: Alexander Gribov

This paper focuses on computing the directional extremal boundary of a union of equal-radius circles. We introduce an efficient algorithm that accurately determines this boundary by analyzing the intersections and dominant relationships among the circles. The algorithm has time complexity of O(n log n).

Authors: Alexander Gribov

This paper focuses on computing the directional extremal boundary of a union of equal-radius circles. We introduce an efficient algorithm that accurately determines this boundary by analyzing the intersections and dominant relationships among the circles. The algorithm has time complexity of O(n log n).

Surface guided analysis of breast changes during post-operative radiotherapy by using a functional map framework

from arXiv: Computational Geometry

Authors: Pierre Galmiche, Hyewon Seo, Yvan Pin, Philippe Meyer, Georges Noël, Michel de Mathelin

The treatment of breast cancer using radiotherapy involves uncertainties regarding breast positioning. As the studies progress, more is known about the expected breast positioning errors, which are taken into account in the Planning Target Volume (PTV) in the form of the margin around the clinical target volume. However, little is known about the non-rigid deformations of the breast in the course of radiotherapy, which is a non-negligible factor to the treatment. Purpose: Taking into account such inter-fractional breast deformations would help develop a promising future direction, such as patient-specific adjustable irradiation plannings. Methods: In this study, we develop a geometric approach to analyze inter-fractional breast deformation throughout the radiotherapy treatment. Our data consists of 3D surface scans of patients acquired during radiotherapy sessions using a handheld scanner. We adapt functional map framework to compute inter-and intra-patient non-rigid correspondences, which are then used to analyze intra-patient changes and inter-patient variability. Results: The qualitative shape collection analysis highlight deformations in the contralateral breast and armpit areas, along with positioning shifts on the head or abdominal regions. We also perform extrinsic analysis, where we align surface acquisitions of the treated breast with the CT-derived skin surface to assess displacements and volume changes in the treated area. On average, displacements within the treated breast exhibit amplitudes of 1-2 mm across sessions, with higher values observed at the time of the 25 th irradiation session. Volume changes, inferred from surface variations, reached up to 10%, with values ranging between 2% and 5% over the course of treatment. Conclusions: We propose a comprehensive workflow for analyzing and modeling breast deformations during radiotherapy using surface acquisitions, incorporating a novel inter-collection shape matching approach to model shape variability within a i shared space across multiple patient shape collections. We validate our method using 3D surface data acquired from patients during External Beam Radiotherapy (EBRT) sessions, demonstrating its effectiveness. The clinical trial data used in this paper is registered under the ClinicalTrials.gov ID NCT03801850.

Authors: Pierre Galmiche, Hyewon Seo, Yvan Pin, Philippe Meyer, Georges Noël, Michel de Mathelin

The treatment of breast cancer using radiotherapy involves uncertainties regarding breast positioning. As the studies progress, more is known about the expected breast positioning errors, which are taken into account in the Planning Target Volume (PTV) in the form of the margin around the clinical target volume. However, little is known about the non-rigid deformations of the breast in the course of radiotherapy, which is a non-negligible factor to the treatment. Purpose: Taking into account such inter-fractional breast deformations would help develop a promising future direction, such as patient-specific adjustable irradiation plannings. Methods: In this study, we develop a geometric approach to analyze inter-fractional breast deformation throughout the radiotherapy treatment. Our data consists of 3D surface scans of patients acquired during radiotherapy sessions using a handheld scanner. We adapt functional map framework to compute inter-and intra-patient non-rigid correspondences, which are then used to analyze intra-patient changes and inter-patient variability. Results: The qualitative shape collection analysis highlight deformations in the contralateral breast and armpit areas, along with positioning shifts on the head or abdominal regions. We also perform extrinsic analysis, where we align surface acquisitions of the treated breast with the CT-derived skin surface to assess displacements and volume changes in the treated area. On average, displacements within the treated breast exhibit amplitudes of 1-2 mm across sessions, with higher values observed at the time of the 25 th irradiation session. Volume changes, inferred from surface variations, reached up to 10%, with values ranging between 2% and 5% over the course of treatment. Conclusions: We propose a comprehensive workflow for analyzing and modeling breast deformations during radiotherapy using surface acquisitions, incorporating a novel inter-collection shape matching approach to model shape variability within a i shared space across multiple patient shape collections. We validate our method using 3D surface data acquired from patients during External Beam Radiotherapy (EBRT) sessions, demonstrating its effectiveness. The clinical trial data used in this paper is registered under the ClinicalTrials.gov ID NCT03801850.

Fully dynamic biconnectivity in $\tilde{\mathcal{O}}(\log^2 n)$ time

from arXiv: Data Structures and Algorithms

Authors: Jacob Holm, Wojciech Nadara, Eva Rotenberg, Marek Sokołowski

We present a deterministic fully-dynamic data structure for maintaining information about the cut-vertices in a graph; i.e. the vertices whose removal would disconnect the graph. Our data structure supports insertion and deletion of edges, as well as queries to whether a pair of connected vertices are either biconnected, or can be separated by a cutvertex, and in the latter case we support access to separating cutvertices. All update operations are supported in amortized $O(\log^2 n \log^2 \log n)$ time, and queries take worst-case $O(\log n \log^2 \log n)$ time. Note that these time bounds match the current best for deterministic dynamic connectivity up to $\log \log n$ factors. We obtain our improved running time by a series of reductions from the original problem into well-defined data structure problems. While we do apply the well-known techniques for improving running time of two-edge connectivity [STOC'00, SODA'18], these techniques alone do not lead to an update time of $\tilde{O}(\log^3 n)$, let alone the $\tilde{O}(\log^2 n)$ we give as a final result. Our contributions include a formally defined transient expose operation, which can be thought of as a cheaper read-only expose operation on a top tree. For each vertex in the graph, we maintain a data structure over its neighbors, and in this data structure we apply biasing (twice) to save two $\tilde{O}(\log n)$ factors. One of these biasing techniques is a new biased disjoint sets data structure, which may be of independent interest. Moreover, in this neighborhood data structure, we facilitate that the vertex can select two VIP neighbors that get special treatment, corresponding to its potentially two neighbors on an exposed path, improving a $\log n$-time operation down to constant time. It is this combination of VIP neighbors with the transient expose that saves an $\tilde{O}(\log n)$-factor from another bottleneck.

Authors: Jacob Holm, Wojciech Nadara, Eva Rotenberg, Marek Sokołowski

We present a deterministic fully-dynamic data structure for maintaining information about the cut-vertices in a graph; i.e. the vertices whose removal would disconnect the graph. Our data structure supports insertion and deletion of edges, as well as queries to whether a pair of connected vertices are either biconnected, or can be separated by a cutvertex, and in the latter case we support access to separating cutvertices. All update operations are supported in amortized $O(\log^2 n \log^2 \log n)$ time, and queries take worst-case $O(\log n \log^2 \log n)$ time. Note that these time bounds match the current best for deterministic dynamic connectivity up to $\log \log n$ factors. We obtain our improved running time by a series of reductions from the original problem into well-defined data structure problems. While we do apply the well-known techniques for improving running time of two-edge connectivity [STOC'00, SODA'18], these techniques alone do not lead to an update time of $\tilde{O}(\log^3 n)$, let alone the $\tilde{O}(\log^2 n)$ we give as a final result. Our contributions include a formally defined transient expose operation, which can be thought of as a cheaper read-only expose operation on a top tree. For each vertex in the graph, we maintain a data structure over its neighbors, and in this data structure we apply biasing (twice) to save two $\tilde{O}(\log n)$ factors. One of these biasing techniques is a new biased disjoint sets data structure, which may be of independent interest. Moreover, in this neighborhood data structure, we facilitate that the vertex can select two VIP neighbors that get special treatment, corresponding to its potentially two neighbors on an exposed path, improving a $\log n$-time operation down to constant time. It is this combination of VIP neighbors with the transient expose that saves an $\tilde{O}(\log n)$-factor from another bottleneck.

Output-sensitive approximate counting via a measure-bounded hyperedge oracle, or: How asymmetry helps estimate $k$-clique counts faster

from arXiv: Data Structures and Algorithms

Authors: Keren Censor-Hillel, Tomer Even, Virginia Vassilevska Williams

Dell, Lapinskas and Meeks [DLM SICOMP 2022] presented a general reduction from approximate counting to decision for a class of fine-grained problems that can be viewed as hyperedge counting or detection problems in an implicit hypergraph, thus obtaining tight equivalences between approximate counting and decision for many key problems such as $k$-clique, $k$-sum and more. Their result is a reduction from approximately counting the number of hyperedges in an implicit $k$-partite hypergraph to a polylogarithmic number of calls to a hyperedge oracle that returns whether a given subhypergraph contains an edge. The main result of this paper is a generalization of the DLM result for {\em output-sensitive} approximate counting, where the running time of the desired counting algorithm is inversely proportional to the number of witnesses. Our theorem is a reduction from approximately counting the (unknown) number of hyperedges in an implicit $k$-partite hypergraph to a polylogarithmic number of calls to a hyperedge oracle called only on subhypergraphs with a small ``measure''. If a subhypergraph has $u_i$ nodes in the $i$th node partition of the $k$-partite hypergraph, then its measure is $\prod_i u_i$. Using the new general reduction and by efficiently implementing measure-bounded colorful independence oracles, we obtain new improved output-sensitive approximate counting algorithms for $k$-clique, $k$-dominating set and $k$-sum. In graphs with $n^t$ $k$-cliques, for instance, our algorithm $(1\pm \epsilon)$-approximates the $k$-clique count in time $$\tilde{O}_\epsilon(n^{\omega(\frac{k-t-1}{3},\frac{k-t}{3},\frac{k-t+2}{3}) }+n^2),$$ where $\omega(a,b,c)$ is the exponent of $n^a\times n^b$ by $n^b\times n^c$ matrix multiplication. For large $k$ and $t>2$, this is a substantial improvement over prior work, even if $\omega=2$.

Authors: Keren Censor-Hillel, Tomer Even, Virginia Vassilevska Williams

Dell, Lapinskas and Meeks [DLM SICOMP 2022] presented a general reduction from approximate counting to decision for a class of fine-grained problems that can be viewed as hyperedge counting or detection problems in an implicit hypergraph, thus obtaining tight equivalences between approximate counting and decision for many key problems such as $k$-clique, $k$-sum and more. Their result is a reduction from approximately counting the number of hyperedges in an implicit $k$-partite hypergraph to a polylogarithmic number of calls to a hyperedge oracle that returns whether a given subhypergraph contains an edge. The main result of this paper is a generalization of the DLM result for {\em output-sensitive} approximate counting, where the running time of the desired counting algorithm is inversely proportional to the number of witnesses. Our theorem is a reduction from approximately counting the (unknown) number of hyperedges in an implicit $k$-partite hypergraph to a polylogarithmic number of calls to a hyperedge oracle called only on subhypergraphs with a small ``measure''. If a subhypergraph has $u_i$ nodes in the $i$th node partition of the $k$-partite hypergraph, then its measure is $\prod_i u_i$. Using the new general reduction and by efficiently implementing measure-bounded colorful independence oracles, we obtain new improved output-sensitive approximate counting algorithms for $k$-clique, $k$-dominating set and $k$-sum. In graphs with $n^t$ $k$-cliques, for instance, our algorithm $(1\pm \epsilon)$-approximates the $k$-clique count in time $$\tilde{O}_\epsilon(n^{\omega(\frac{k-t-1}{3},\frac{k-t}{3},\frac{k-t+2}{3}) }+n^2),$$ where $\omega(a,b,c)$ is the exponent of $n^a\times n^b$ by $n^b\times n^c$ matrix multiplication. For large $k$ and $t>2$, this is a substantial improvement over prior work, even if $\omega=2$.

A Tolerant Independent Set Tester

from arXiv: Data Structures and Algorithms

Authors: Cameron Seth

We give nearly optimal bounds on the sample complexity of $(\widetilde{\Omega}(\epsilon),\epsilon)$-tolerant testing the $\rho$-independent set property in the dense graph setting. In particular, we give an algorithm that inspects a random subgraph on $\widetilde{O}(\rho^3/\epsilon^2)$ vertices and, for some constant $c,$ distinguishes between graphs that have an induced subgraph of size $\rho n$ with fewer than $\frac{\epsilon}{c \log^4(1/\epsilon)} n^2$ edges from graphs for which every induced subgraph of size $\rho n$ has at least $\epsilon n^2$ edges. Our sample complexity bound matches, up to logarithmic factors, the recent upper bound by Blais and Seth (2023) for the non-tolerant testing problem, which is known to be optimal for the non-tolerant testing problem based on a lower bound by Feige, Langberg and Schechtman (2004). Our main technique is a new graph container lemma for sparse subgraphs instead of independent sets. We also show that our new lemma can be used to generalize one of the classic applications of the container method, that of counting independent sets in regular graphs, to counting sparse subgraphs in regular graphs.

Authors: Cameron Seth

We give nearly optimal bounds on the sample complexity of $(\widetilde{\Omega}(\epsilon),\epsilon)$-tolerant testing the $\rho$-independent set property in the dense graph setting. In particular, we give an algorithm that inspects a random subgraph on $\widetilde{O}(\rho^3/\epsilon^2)$ vertices and, for some constant $c,$ distinguishes between graphs that have an induced subgraph of size $\rho n$ with fewer than $\frac{\epsilon}{c \log^4(1/\epsilon)} n^2$ edges from graphs for which every induced subgraph of size $\rho n$ has at least $\epsilon n^2$ edges. Our sample complexity bound matches, up to logarithmic factors, the recent upper bound by Blais and Seth (2023) for the non-tolerant testing problem, which is known to be optimal for the non-tolerant testing problem based on a lower bound by Feige, Langberg and Schechtman (2004). Our main technique is a new graph container lemma for sparse subgraphs instead of independent sets. We also show that our new lemma can be used to generalize one of the classic applications of the container method, that of counting independent sets in regular graphs, to counting sparse subgraphs in regular graphs.

A Theoretical Framework for Distribution-Aware Dataset Search

from arXiv: Data Structures and Algorithms

Authors: Aryan Esmailpour, Sainyam Galhotra, Rahul Raychaudhury, Stavros Sintos

Effective data discovery is a cornerstone of modern data-driven decision-making. Yet, identifying datasets with specific distributional characteristics, such as percentiles or preferences, remains challenging. While recent proposals have enabled users to search based on percentile predicates, much of the research in data discovery relies on heuristics. This paper presents the first theoretically backed framework that unifies data discovery under centralized and decentralized settings. Let $\mathcal{P}=\{P_1,...,P_N\}$ be a repository of $N$ datasets, where $P_i\subset \mathbb{R}^d$, for $d=O(1)$ . We study the percentile indexing (Ptile) problem and the preference indexing (Pref) problem under the centralized and the federated setting. In the centralized setting we assume direct access to the datasets. In the federated setting we assume access to a synopsis of each dataset. The goal of Ptile is to construct a data structure such that given a predicate (rectangle $R$ and interval $\theta$) report all indexes $J$ such that $j\in J$ iff $|P_j\cap R|/|P_j|\in\theta$. The goal of Pref is to construct a data structure such that given a predicate (vector $v$ and interval $\theta$) report all indexes $J$ such that $j\in J$ iff $\omega(P_j,v)\in \theta$, where $\omega(P_j,v)$ is the inner-product of the $k$-th largest projection of $P_j$ on $v$. We first show that we cannot hope for near-linear data structures with polylogarithmic query time in the centralized setting. Next we show $\tilde{O}(N)$ space data structures that answer Ptile and Pref queries in $\tilde{O}(1+OUT)$ time, where $OUT$ is the output size. Each data structure returns a set of indexes $J$ such that i) for every $P_i$ that satisfies the predicate, $i\in J$ and ii) if $j\in J$ then $P_j$ satisfies the predicate up to an additive error $\varepsilon+2\delta$, where $\varepsilon\in(0,1)$ and $\delta$ is the error of synopses.

Authors: Aryan Esmailpour, Sainyam Galhotra, Rahul Raychaudhury, Stavros Sintos

Effective data discovery is a cornerstone of modern data-driven decision-making. Yet, identifying datasets with specific distributional characteristics, such as percentiles or preferences, remains challenging. While recent proposals have enabled users to search based on percentile predicates, much of the research in data discovery relies on heuristics. This paper presents the first theoretically backed framework that unifies data discovery under centralized and decentralized settings. Let $\mathcal{P}=\{P_1,...,P_N\}$ be a repository of $N$ datasets, where $P_i\subset \mathbb{R}^d$, for $d=O(1)$ . We study the percentile indexing (Ptile) problem and the preference indexing (Pref) problem under the centralized and the federated setting. In the centralized setting we assume direct access to the datasets. In the federated setting we assume access to a synopsis of each dataset. The goal of Ptile is to construct a data structure such that given a predicate (rectangle $R$ and interval $\theta$) report all indexes $J$ such that $j\in J$ iff $|P_j\cap R|/|P_j|\in\theta$. The goal of Pref is to construct a data structure such that given a predicate (vector $v$ and interval $\theta$) report all indexes $J$ such that $j\in J$ iff $\omega(P_j,v)\in \theta$, where $\omega(P_j,v)$ is the inner-product of the $k$-th largest projection of $P_j$ on $v$. We first show that we cannot hope for near-linear data structures with polylogarithmic query time in the centralized setting. Next we show $\tilde{O}(N)$ space data structures that answer Ptile and Pref queries in $\tilde{O}(1+OUT)$ time, where $OUT$ is the output size. Each data structure returns a set of indexes $J$ such that i) for every $P_i$ that satisfies the predicate, $i\in J$ and ii) if $j\in J$ then $P_j$ satisfies the predicate up to an additive error $\varepsilon+2\delta$, where $\varepsilon\in(0,1)$ and $\delta$ is the error of synopses.

A Quantum Constraint Generation Framework for Binary Linear Programs

from arXiv: Data Structures and Algorithms

Authors: András Czégel, Boglárka G. -Tóth

We propose a new approach to utilize quantum computers for binary linear programming (BLP), which can be extended to general integer linear programs (ILP). Quantum optimization algorithms, hybrid or quantum-only, are currently general purpose, standalone solvers for ILP. However, to consider them practically useful, we expect them to overperform the current state of the art classical solvers. That expectation is unfair to quantum algorithms: in classical ILP solvers, after many decades of evolution, many different algorithms work together as a robust machine to get the best result. This is the approach we would like to follow now with our quantum 'solver' solutions. In this study we wrap any suitable quantum optimization algorithm into a quantum informed classical constraint generation framework. First we relax our problem by dropping all constraints and encode it into an Ising Hamiltonian for the quantum optimization subroutine. Then, by sampling from the solution state of the subroutine, we obtain information about constraint violations in the initial problem, from which we decide which coupling terms we need to introduce to the Hamiltonian. The coupling terms correspond to the constraints of the initial binary linear program. Then we optimize over the new Hamiltonian again, until we reach a feasible solution, or other stopping conditions hold. Since one can decide how many constraints they add to the Hamiltonian in a single step, our algorithm is at least as efficient as the (hybrid) quantum optimization algorithm it wraps. We support our claim with results on small scale minimum cost exact cover problem instances.

Authors: András Czégel, Boglárka G. -Tóth

We propose a new approach to utilize quantum computers for binary linear programming (BLP), which can be extended to general integer linear programs (ILP). Quantum optimization algorithms, hybrid or quantum-only, are currently general purpose, standalone solvers for ILP. However, to consider them practically useful, we expect them to overperform the current state of the art classical solvers. That expectation is unfair to quantum algorithms: in classical ILP solvers, after many decades of evolution, many different algorithms work together as a robust machine to get the best result. This is the approach we would like to follow now with our quantum 'solver' solutions. In this study we wrap any suitable quantum optimization algorithm into a quantum informed classical constraint generation framework. First we relax our problem by dropping all constraints and encode it into an Ising Hamiltonian for the quantum optimization subroutine. Then, by sampling from the solution state of the subroutine, we obtain information about constraint violations in the initial problem, from which we decide which coupling terms we need to introduce to the Hamiltonian. The coupling terms correspond to the constraints of the initial binary linear program. Then we optimize over the new Hamiltonian again, until we reach a feasible solution, or other stopping conditions hold. Since one can decide how many constraints they add to the Hamiltonian in a single step, our algorithm is at least as efficient as the (hybrid) quantum optimization algorithm it wraps. We support our claim with results on small scale minimum cost exact cover problem instances.

On the Hardness Hierarchy for the $O(n \sqrt{\log n})$ Complexity in the Word RAM

from arXiv: Data Structures and Algorithms

Authors: Dominik Kempa, Tomasz Kociumaka

In this work, we study the relative hardness of fundamental problems with state-of-the-art word RAM algorithms that take $O(n\sqrt{\log n})$ time for instances described in $\Theta(n)$ machine words ($\Theta(n\log n)$ bits). This complexity class, one of six hardness levels identified by Chan and P\u{a}tra\c{s}cu [SODA 2010], includes diverse problems from several domains: Counting Inversions, string processing problems (BWT Construction, LZ77 Factorization, Longest Common Substring, Batched Longest Previous Factor Queries, Batched Inverse Suffix Array Queries), and computational geometry tasks (Orthogonal Range Counting, Orthogonal Segment Intersection). We offer two main contributions: We establish new links between the above string problems and Dictionary Matching, a classic task solvable using the Aho-Corasick automaton. We restrict Dictionary Matching to instances with $O(n)$ binary patterns of length $m = O(\log n)$ each, and we prove that, unless these instances can be solved in $o(n\sqrt{\log n})$ time, the aforementioned string problems cannot be solved faster either. Via further reductions, we extend this hardness to Counting Inversions (a fundamental component in geometric algorithms) and thus to Orthogonal Range Counting and Orthogonal Segment Intersection. This hinges on String Nesting, a new problem which is equivalent to Dictionary Matching and can be reduced to Counting Inversions in three steps. Together, our results unveil a single problem, with two equivalent formulations, that underlies the hardness of nearly all major problems currently occupying the $O(n\sqrt{\log n})$ level of hardness. These results drastically funnel further efforts to improve the complexity of near-linear problems. As an auxiliary outcome of our framework, we also prove that the alphabet in several central string problems can be efficiently reduced to binary.

Authors: Dominik Kempa, Tomasz Kociumaka

In this work, we study the relative hardness of fundamental problems with state-of-the-art word RAM algorithms that take $O(n\sqrt{\log n})$ time for instances described in $\Theta(n)$ machine words ($\Theta(n\log n)$ bits). This complexity class, one of six hardness levels identified by Chan and P\u{a}tra\c{s}cu [SODA 2010], includes diverse problems from several domains: Counting Inversions, string processing problems (BWT Construction, LZ77 Factorization, Longest Common Substring, Batched Longest Previous Factor Queries, Batched Inverse Suffix Array Queries), and computational geometry tasks (Orthogonal Range Counting, Orthogonal Segment Intersection). We offer two main contributions: We establish new links between the above string problems and Dictionary Matching, a classic task solvable using the Aho-Corasick automaton. We restrict Dictionary Matching to instances with $O(n)$ binary patterns of length $m = O(\log n)$ each, and we prove that, unless these instances can be solved in $o(n\sqrt{\log n})$ time, the aforementioned string problems cannot be solved faster either. Via further reductions, we extend this hardness to Counting Inversions (a fundamental component in geometric algorithms) and thus to Orthogonal Range Counting and Orthogonal Segment Intersection. This hinges on String Nesting, a new problem which is equivalent to Dictionary Matching and can be reduced to Counting Inversions in three steps. Together, our results unveil a single problem, with two equivalent formulations, that underlies the hardness of nearly all major problems currently occupying the $O(n\sqrt{\log n})$ level of hardness. These results drastically funnel further efforts to improve the complexity of near-linear problems. As an auxiliary outcome of our framework, we also prove that the alphabet in several central string problems can be efficiently reduced to binary.

History-Independent Concurrent Hash Tables

from arXiv: Data Structures and Algorithms

Authors: Hagit Attiya, Michael A. Bender, Martín Farach-Colton, Rotem Oshman, Noa Schiller

A history-independent data structure does not reveal the history of operations applied to it, only its current logical state, even if its internal state is examined. This paper studies history-independent concurrent dictionaries, in particular, hash tables, and establishes inherent bounds on their space requirements. This paper shows that there is a lock-free history-independent concurrent hash table, in which each memory cell stores two elements and two bits, based on Robin Hood hashing. Our implementation is linearizable, and uses the shared memory primitive LL/SC. The expected amortized step complexity of the hash table is $O(c)$, where $c$ is an upper bound on the number of concurrent operations that access the same element, assuming the hash table is not overpopulated. We complement this positive result by showing that even if we have only two concurrent processes, no history-independent concurrent dictionary that supports sets of any size, with wait-free membership queries and obstruction-free insertions and deletions, can store only two elements of the set and a constant number of bits in each memory cell. This holds even if the step complexity of operations on the dictionary is unbounded.

Authors: Hagit Attiya, Michael A. Bender, Martín Farach-Colton, Rotem Oshman, Noa Schiller

A history-independent data structure does not reveal the history of operations applied to it, only its current logical state, even if its internal state is examined. This paper studies history-independent concurrent dictionaries, in particular, hash tables, and establishes inherent bounds on their space requirements. This paper shows that there is a lock-free history-independent concurrent hash table, in which each memory cell stores two elements and two bits, based on Robin Hood hashing. Our implementation is linearizable, and uses the shared memory primitive LL/SC. The expected amortized step complexity of the hash table is $O(c)$, where $c$ is an upper bound on the number of concurrent operations that access the same element, assuming the hash table is not overpopulated. We complement this positive result by showing that even if we have only two concurrent processes, no history-independent concurrent dictionary that supports sets of any size, with wait-free membership queries and obstruction-free insertions and deletions, can store only two elements of the set and a constant number of bits in each memory cell. This holds even if the step complexity of operations on the dictionary is unbounded.

Deterministic Vertex Connectivity via Common-Neighborhood Clustering and Pseudorandomness

from arXiv: Data Structures and Algorithms

Authors: Yonggang Jiang, Chaitanya Nalam, Thatchaphol Saranurak, Sorrachai Yingchareonthawornchai

We give a deterministic algorithm for computing a global minimum vertex cut in a vertex-weighted graph $n$ vertices and $m$ edges in $\widehat O(mn)$ time. This breaks the long-standing $\widehat \Omega(n^{4})$-time barrier in dense graphs, achievable by trivially computing all-pairs maximum flows. Up to subpolynomial factors, we match the fastest randomized $\tilde O(mn)$-time algorithm by [Henzinger, Rao, and Gabow'00], and affirmatively answer the question by [Gabow'06] whether deterministic $O(mn)$-time algorithms exist even for unweighted graphs. Our algorithm works in directed graphs, too. In unweighted undirected graphs, we present a faster deterministic $\widehat O(m\kappa)$-time algorithm where $\kappa\le n$ is the size of the global minimum vertex cut. For a moderate value of $\kappa$, this strictly improves upon all previous deterministic algorithms in unweighted graphs with running time $\widehat O(m(n+\kappa^{2}))$ [Even'75], $\widehat O(m(n+\kappa\sqrt{n}))$ [Gabow'06], and $\widehat O(m2^{O(\kappa^{2})})$ [Saranurak and Yingchareonthawornchai'22]. Recently, a linear-time algorithm has been shown by [Korhonen'24] for very small $\kappa$. Our approach applies the common-neighborhood clustering, recently introduced by [Blikstad, Jiang, Mukhopadhyay, Yingchareonthawornchai'25], in novel ways, e.g., on top of weighted graphs and on top of vertex-expander decomposition. We also exploit pseudorandom objects often used in computational complexity communities, including crossing families based on dispersers from [Wigderson and Zuckerman'99; TaShma, Umans and Zuckerman'01] and selectors based on linear lossless condensers [Guruswwami, Umans and Vadhan'09; Cheraghchi'11]. To our knowledge, this is the first application of selectors in graph algorithms.

Authors: Yonggang Jiang, Chaitanya Nalam, Thatchaphol Saranurak, Sorrachai Yingchareonthawornchai

We give a deterministic algorithm for computing a global minimum vertex cut in a vertex-weighted graph $n$ vertices and $m$ edges in $\widehat O(mn)$ time. This breaks the long-standing $\widehat \Omega(n^{4})$-time barrier in dense graphs, achievable by trivially computing all-pairs maximum flows. Up to subpolynomial factors, we match the fastest randomized $\tilde O(mn)$-time algorithm by [Henzinger, Rao, and Gabow'00], and affirmatively answer the question by [Gabow'06] whether deterministic $O(mn)$-time algorithms exist even for unweighted graphs. Our algorithm works in directed graphs, too. In unweighted undirected graphs, we present a faster deterministic $\widehat O(m\kappa)$-time algorithm where $\kappa\le n$ is the size of the global minimum vertex cut. For a moderate value of $\kappa$, this strictly improves upon all previous deterministic algorithms in unweighted graphs with running time $\widehat O(m(n+\kappa^{2}))$ [Even'75], $\widehat O(m(n+\kappa\sqrt{n}))$ [Gabow'06], and $\widehat O(m2^{O(\kappa^{2})})$ [Saranurak and Yingchareonthawornchai'22]. Recently, a linear-time algorithm has been shown by [Korhonen'24] for very small $\kappa$. Our approach applies the common-neighborhood clustering, recently introduced by [Blikstad, Jiang, Mukhopadhyay, Yingchareonthawornchai'25], in novel ways, e.g., on top of weighted graphs and on top of vertex-expander decomposition. We also exploit pseudorandom objects often used in computational complexity communities, including crossing families based on dispersers from [Wigderson and Zuckerman'99; TaShma, Umans and Zuckerman'01] and selectors based on linear lossless condensers [Guruswwami, Umans and Vadhan'09; Cheraghchi'11]. To our knowledge, this is the first application of selectors in graph algorithms.

Solving the Correlation Cluster LP in Sublinear Time

from arXiv: Data Structures and Algorithms

Authors: Nairen Cao, Vincent Cohen-Addad, Shi Li, Euiwoong Lee, David Rasmussen Lolck, Alantha Newman, Mikkel Thorup, Lukas Vogl, Shuyi Yan, Hanwen Zhang

Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges. CCL+24 introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the Cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, CCL+24 showed how to find a feasible solution for the Cluster LP in time O(n^{\text{poly}(1/\eps)}) with objective value at most (1+\epsilon) times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the Cluster LP, yielding a (1.437+\eps)-approximation algorithm for the Correlation Clustering problem. The main technical result of this paper is a new approach to find a feasible solution for the Cluster LP with objective value at most (1+\epsilon) of the optimum in time \widetilde O(2^{\text{poly}(1/\eps)} n), where n is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast (1.437+\epsilon)-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.

Authors: Nairen Cao, Vincent Cohen-Addad, Shi Li, Euiwoong Lee, David Rasmussen Lolck, Alantha Newman, Mikkel Thorup, Lukas Vogl, Shuyi Yan, Hanwen Zhang

Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges. CCL+24 introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the Cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, CCL+24 showed how to find a feasible solution for the Cluster LP in time O(n^{\text{poly}(1/\eps)}) with objective value at most (1+\epsilon) times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the Cluster LP, yielding a (1.437+\eps)-approximation algorithm for the Correlation Clustering problem. The main technical result of this paper is a new approach to find a feasible solution for the Cluster LP with objective value at most (1+\epsilon) of the optimum in time \widetilde O(2^{\text{poly}(1/\eps)} n), where n is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast (1.437+\epsilon)-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.

Thursday, March 27

On the order of the shortest solution sequences for the pebble motion problems

from arXiv: Computational Complexity

Authors: Tomoki Nakamigawa, Tadashi Sakuma

Let $G$ be a connected graph with $N$ vertices. Let $k$ be the number of vertices in a longest path of $G$ such that every vertex on the path is a cut vertex of $G$, and every intermediate vertex of the path is a degree-two vertex of $G$. %Let $k$ be the number of vertices of such a longest path of $T$ that every vertex of %the path is a cut vertex and that every intermediate vertex of the path is a degree-two vertex of $T$. Let $P=\{1,\ldots,n\}$ be a set of pebbles with $n+k < N$. A \textit{configuration} of $P$ on $G$ is defined as a function $f$ from $V(G)$ to $\{0, 1, \ldots, n \}$ with $|f^{-1}(i)| = 1$ for $1 \le i \le n$, where $f^{-1}(i)$ is a vertex occupied with the $i$th pebble for $1 \le i \le n$ and $f^{-1}(0)$ is a set of unoccupied vertices. A \textit{move} is defined as shifting a pebble from a vertex to %its unoccupied neighbour. some unoccupied neighbor. The {\it pebble motion problem on the pair $(G,P)$} is to decide whether a given configuration of pebbles is reachable from another by executing a sequence of moves. In this paper, we show that the length of the shortest solution sequence of the pebble motion problem on the pair $(G,P)$ is in $O(Nn + n^2 \log(\min\{n,k\}))$ if $G$ is a $N$-vertex tree, and it is in $O(N^2 + \frac{n^3}{N-n} + n^2 \log(\min\{n,N-n\}))$ if $G$ is a connected general $N$-vertex graph. We provide an algorithm that can obtain a solution sequence of lengths that satisfy these orders, with the same computational complexity as the order of the length. Keywords: pebble motion, motion planning, multi-agent path finding, $15$-puzzle, tree

Authors: Tomoki Nakamigawa, Tadashi Sakuma

Let $G$ be a connected graph with $N$ vertices. Let $k$ be the number of vertices in a longest path of $G$ such that every vertex on the path is a cut vertex of $G$, and every intermediate vertex of the path is a degree-two vertex of $G$. %Let $k$ be the number of vertices of such a longest path of $T$ that every vertex of %the path is a cut vertex and that every intermediate vertex of the path is a degree-two vertex of $T$. Let $P=\{1,\ldots,n\}$ be a set of pebbles with $n+k < N$. A \textit{configuration} of $P$ on $G$ is defined as a function $f$ from $V(G)$ to $\{0, 1, \ldots, n \}$ with $|f^{-1}(i)| = 1$ for $1 \le i \le n$, where $f^{-1}(i)$ is a vertex occupied with the $i$th pebble for $1 \le i \le n$ and $f^{-1}(0)$ is a set of unoccupied vertices. A \textit{move} is defined as shifting a pebble from a vertex to %its unoccupied neighbour. some unoccupied neighbor. The {\it pebble motion problem on the pair $(G,P)$} is to decide whether a given configuration of pebbles is reachable from another by executing a sequence of moves. In this paper, we show that the length of the shortest solution sequence of the pebble motion problem on the pair $(G,P)$ is in $O(Nn + n^2 \log(\min\{n,k\}))$ if $G$ is a $N$-vertex tree, and it is in $O(N^2 + \frac{n^3}{N-n} + n^2 \log(\min\{n,N-n\}))$ if $G$ is a connected general $N$-vertex graph. We provide an algorithm that can obtain a solution sequence of lengths that satisfy these orders, with the same computational complexity as the order of the length. Keywords: pebble motion, motion planning, multi-agent path finding, $15$-puzzle, tree

Primes via Zeros: Interactive Proofs for Testing Primality of Natural Classes of Ideals

from arXiv: Computational Complexity

Authors: Abhibhav Garg, Rafael Oliveira, Nitin Saxena

A central question in mathematics and computer science is the question of determining whether a given ideal $I$ is prime, which geometrically corresponds to the zero set of $I$, denoted $Z(I)$, being irreducible. The case of principal ideals (i.e., $m=1$) corresponds to the more familiar absolute irreducibility testing of polynomials, where the seminal work of (Kaltofen 1995) yields a randomized, polynomial time algorithm for this problem. However, when $m > 1$, the complexity of the primality testing problem seems much harder. The current best algorithms for this problem are only known to be in EXPSPACE. In this work, we significantly reduce the complexity-theoretic gap for the ideal primality testing problem for the important families of ideals $I$ (namely, radical ideals and equidimensional Cohen-Macaulay ideals). For these classes of ideals, assuming the Generalized Riemann Hypothesis, we show that primality testing lies in $\Sigma_3^p \cap \Pi_3^p$. This significantly improves the upper bound for these classes, approaching their lower bound, as the primality testing problem is coNP-hard for these classes of ideals. Another consequence of our results is that for equidimensional Cohen-Macaulay ideals, we get the first PSPACE algorithm for primality testing, exponentially improving the space and time complexity of prior known algorithms.

Authors: Abhibhav Garg, Rafael Oliveira, Nitin Saxena

A central question in mathematics and computer science is the question of determining whether a given ideal $I$ is prime, which geometrically corresponds to the zero set of $I$, denoted $Z(I)$, being irreducible. The case of principal ideals (i.e., $m=1$) corresponds to the more familiar absolute irreducibility testing of polynomials, where the seminal work of (Kaltofen 1995) yields a randomized, polynomial time algorithm for this problem. However, when $m > 1$, the complexity of the primality testing problem seems much harder. The current best algorithms for this problem are only known to be in EXPSPACE. In this work, we significantly reduce the complexity-theoretic gap for the ideal primality testing problem for the important families of ideals $I$ (namely, radical ideals and equidimensional Cohen-Macaulay ideals). For these classes of ideals, assuming the Generalized Riemann Hypothesis, we show that primality testing lies in $\Sigma_3^p \cap \Pi_3^p$. This significantly improves the upper bound for these classes, approaching their lower bound, as the primality testing problem is coNP-hard for these classes of ideals. Another consequence of our results is that for equidimensional Cohen-Macaulay ideals, we get the first PSPACE algorithm for primality testing, exponentially improving the space and time complexity of prior known algorithms.

An Algorithm for Illuminating $n$ Nonoverlapping Circular Discs' Boundaries on the Plane with Application to Tree Stem Illumination Problem

from arXiv: Computational Geometry

Authors: Phapaengmuang Sukkasem, Supanut Chaidee, Watit Khokthong

Given a set of $n$ nonoverlapping circular discs on a plane, we aim to determine possible positions of points (referred to as cameras) that could fully illuminate all the circular discs' boundaries. This work presents a geometric approach for determining feasible camera positions that would provide total illumination of all circular discs. The Laguerre Delaunay triangulation, coupled with the intersection of slabs formed by the boundaries of circular discs, is employed to form the region that satisfies the given conditions. The experiment is conducted using a set of randomly positioned circular discs on a plane. This study has the potential to address the issue of illumination in forests by utilizing a LiDAR camera to determine the possible number and placement of cameras that can effectively illuminate trees within a forest.

Authors: Phapaengmuang Sukkasem, Supanut Chaidee, Watit Khokthong

Given a set of $n$ nonoverlapping circular discs on a plane, we aim to determine possible positions of points (referred to as cameras) that could fully illuminate all the circular discs' boundaries. This work presents a geometric approach for determining feasible camera positions that would provide total illumination of all circular discs. The Laguerre Delaunay triangulation, coupled with the intersection of slabs formed by the boundaries of circular discs, is employed to form the region that satisfies the given conditions. The experiment is conducted using a set of randomly positioned circular discs on a plane. This study has the potential to address the issue of illumination in forests by utilizing a LiDAR camera to determine the possible number and placement of cameras that can effectively illuminate trees within a forest.

Beyond Worst-Case Subset Sum: An Adaptive, Structure-Aware Solver with Sub-$2^{n/2}$ Enumeration

from arXiv: Data Structures and Algorithms

Authors: Jesus Salas

The Subset Sum problem, which asks whether a set of $n$ integers has a subset summing to a target $t$, is a fundamental NP-complete problem in cryptography and combinatorial optimization. The classical meet-in-the-middle (MIM) algorithm of Horowitz--Sahni runs in $\widetilde{\mathcal{O}}\bigl(2^{n/2}\bigr)$, still the best-known deterministic bound. Yet many instances exhibit abundant collisions in partial sums, so actual hardness often depends on the number of unique sums ($U$). We present a structure-aware, adaptive solver that enumerates only distinct sums, pruning duplicates on the fly, thus running in $\widetilde{\mathcal{O}}(U)$ when $U \ll 2^n$. Its core is a unique-subset-sums enumerator combined with a double meet-in-the-middle strategy and lightweight dynamic programming, avoiding the classical MIM's expensive merge. We also introduce combinatorial tree compression to guarantee strictly sub-$2^{n/2}$ enumeration even on unstructured inputs, shaving a nontrivial constant from the exponent. Our solver supports anytime and online modes, producing partial solutions early and adapting to newly added elements. Theoretical analysis and experiments show that for structured instances -- e.g. with small doubling constants, high additive energy, or significant redundancy -- our method can far outperform classical approaches, often nearing dynamic-programming efficiency. Even in the worst case, it remains within $\widetilde{\mathcal{O}}\bigl(2^{n/2}\bigr)$, and its compression-based pruning yields a real constant-factor speedup over naive MIM. We conclude by discussing how this instance-specific adaptivity refines the Subset Sum complexity landscape and suggesting future adaptive-exponential directions.

Authors: Jesus Salas

The Subset Sum problem, which asks whether a set of $n$ integers has a subset summing to a target $t$, is a fundamental NP-complete problem in cryptography and combinatorial optimization. The classical meet-in-the-middle (MIM) algorithm of Horowitz--Sahni runs in $\widetilde{\mathcal{O}}\bigl(2^{n/2}\bigr)$, still the best-known deterministic bound. Yet many instances exhibit abundant collisions in partial sums, so actual hardness often depends on the number of unique sums ($U$). We present a structure-aware, adaptive solver that enumerates only distinct sums, pruning duplicates on the fly, thus running in $\widetilde{\mathcal{O}}(U)$ when $U \ll 2^n$. Its core is a unique-subset-sums enumerator combined with a double meet-in-the-middle strategy and lightweight dynamic programming, avoiding the classical MIM's expensive merge. We also introduce combinatorial tree compression to guarantee strictly sub-$2^{n/2}$ enumeration even on unstructured inputs, shaving a nontrivial constant from the exponent. Our solver supports anytime and online modes, producing partial solutions early and adapting to newly added elements. Theoretical analysis and experiments show that for structured instances -- e.g. with small doubling constants, high additive energy, or significant redundancy -- our method can far outperform classical approaches, often nearing dynamic-programming efficiency. Even in the worst case, it remains within $\widetilde{\mathcal{O}}\bigl(2^{n/2}\bigr)$, and its compression-based pruning yields a real constant-factor speedup over naive MIM. We conclude by discussing how this instance-specific adaptivity refines the Subset Sum complexity landscape and suggesting future adaptive-exponential directions.

Adaptive Local Clustering over Attributed Graphs

from arXiv: Data Structures and Algorithms

Authors: Haoran Zheng, Renchi Yang, Jianliang Xu

Given a graph $G$ and a seed node $v_s$, the objective of local graph clustering (LGC) is to identify a subgraph $C_s \in G$ (a.k.a. local cluster) surrounding $v_s$ in time roughly linear with the size of $C_s$. This approach yields personalized clusters without needing to access the entire graph, which makes it highly suitable for numerous applications involving large graphs. However, most existing solutions merely rely on the topological connectivity between nodes in $G$, rendering them vulnerable to missing or noisy links that are commonly present in real-world graphs. To address this issue, this paper resorts to leveraging the complementary nature of graph topology and node attributes to enhance local clustering quality. To effectively exploit the attribute information, we first formulate the LGC as an estimation of the bidirectional diffusion distribution (BDD), which is specialized for capturing the multi-hop affinity between nodes in the presence of attributes. Furthermore, we propose LACA, an efficient and effective approach for LGC that achieves superb empirical performance on multiple real datasets while maintaining strong locality. The core components of LACA include (i) a fast and theoretically-grounded preprocessing technique for node attributes, (ii) an adaptive algorithm for diffusing any vectors over $G$ with rigorous theoretical guarantees and expedited convergence, and (iii) an effective three-step scheme for BDD approximation. Extensive experiments, comparing 17 competitors on 8 real datasets, show that LACA outperforms all competitors in terms of result quality measured against ground truth local clusters, while also being up to orders of magnitude faster. The code is available at github.com/HaoranZ99/alac.

Authors: Haoran Zheng, Renchi Yang, Jianliang Xu

Given a graph $G$ and a seed node $v_s$, the objective of local graph clustering (LGC) is to identify a subgraph $C_s \in G$ (a.k.a. local cluster) surrounding $v_s$ in time roughly linear with the size of $C_s$. This approach yields personalized clusters without needing to access the entire graph, which makes it highly suitable for numerous applications involving large graphs. However, most existing solutions merely rely on the topological connectivity between nodes in $G$, rendering them vulnerable to missing or noisy links that are commonly present in real-world graphs. To address this issue, this paper resorts to leveraging the complementary nature of graph topology and node attributes to enhance local clustering quality. To effectively exploit the attribute information, we first formulate the LGC as an estimation of the bidirectional diffusion distribution (BDD), which is specialized for capturing the multi-hop affinity between nodes in the presence of attributes. Furthermore, we propose LACA, an efficient and effective approach for LGC that achieves superb empirical performance on multiple real datasets while maintaining strong locality. The core components of LACA include (i) a fast and theoretically-grounded preprocessing technique for node attributes, (ii) an adaptive algorithm for diffusing any vectors over $G$ with rigorous theoretical guarantees and expedited convergence, and (iii) an effective three-step scheme for BDD approximation. Extensive experiments, comparing 17 competitors on 8 real datasets, show that LACA outperforms all competitors in terms of result quality measured against ground truth local clusters, while also being up to orders of magnitude faster. The code is available at https://github.com/HaoranZ99/alac.

Factorised Representations of Join Queries: Tight Bounds and a New Dichotomy

from arXiv: Data Structures and Algorithms

Authors: Christoph Berkholz, Harry Vinall-Smeeth

A common theme in factorised databases and knowledge compilation is the representation of solution sets in a useful yet succinct data structure. In this paper, we study the representation of the result of join queries (or, equivalently, the set of homomorphisms between two relational structures). We focus on the very general format of $\{\cup, \times\}$-circuits -- also known as d-representations or DNNF circuits -- and aim to find the limits of this approach. In prior work, it has been shown that there always exists a $\{\cup, \times\}$-circuits-circuit of size $N^{O(subw)}$ representing the query result, where N is the size of the database and subw the submodular width of the query. If the arity of all relations is bounded by a constant, then subw is linear in the treewidth tw of the query. In this setting, the authors of this paper proved a lower bound of $N^{\Omega(tw^{\varepsilon})}$ on the circuit size (ICALP 2023), where $\varepsilon>0$ depends on the excluded grid theorem. Our first main contribution is to improve this lower bound to $N^{\Omega(tw)}$, which is tight up to a constant factor in the exponent. Our second contribution is a $N^{\Omega(subw^{1/4})}$ lower bound on the circuit size for join queries over relations of unbounded arity. Both lower bounds are unconditional lower bounds on the circuit size for well-chosen database instances. Their proofs use a combination of structural (hyper)graph theory with communication complexity in a simple yet novel way. While the second lower bound is asymptotically equivalent to Marx's conditional bound on the decision complexity (JACM 2013), our $N^{\Theta(tw)}$ bound in the bounded-arity setting is tight, while the best conditional bound on the decision complexity is $N^{\Omega(tw/\log tw)}$. Note that, removing this logarithmic factor in the decision setting is a major open problem.

Authors: Christoph Berkholz, Harry Vinall-Smeeth

A common theme in factorised databases and knowledge compilation is the representation of solution sets in a useful yet succinct data structure. In this paper, we study the representation of the result of join queries (or, equivalently, the set of homomorphisms between two relational structures). We focus on the very general format of $\{\cup, \times\}$-circuits -- also known as d-representations or DNNF circuits -- and aim to find the limits of this approach. In prior work, it has been shown that there always exists a $\{\cup, \times\}$-circuits-circuit of size $N^{O(subw)}$ representing the query result, where N is the size of the database and subw the submodular width of the query. If the arity of all relations is bounded by a constant, then subw is linear in the treewidth tw of the query. In this setting, the authors of this paper proved a lower bound of $N^{\Omega(tw^{\varepsilon})}$ on the circuit size (ICALP 2023), where $\varepsilon>0$ depends on the excluded grid theorem. Our first main contribution is to improve this lower bound to $N^{\Omega(tw)}$, which is tight up to a constant factor in the exponent. Our second contribution is a $N^{\Omega(subw^{1/4})}$ lower bound on the circuit size for join queries over relations of unbounded arity. Both lower bounds are unconditional lower bounds on the circuit size for well-chosen database instances. Their proofs use a combination of structural (hyper)graph theory with communication complexity in a simple yet novel way. While the second lower bound is asymptotically equivalent to Marx's conditional bound on the decision complexity (JACM 2013), our $N^{\Theta(tw)}$ bound in the bounded-arity setting is tight, while the best conditional bound on the decision complexity is $N^{\Omega(tw/\log tw)}$. Note that, removing this logarithmic factor in the decision setting is a major open problem.

Global vs. s-t Vertex Connectivity Beyond Sequential: Almost-Perfect Reductions & Near-Optimal Separations

from arXiv: Data Structures and Algorithms

Authors: Joakim Blikstad, Yonggang Jiang, Sagnik Mukhopadhyay, Sorrachai Yingchareonthawornchai

A recent breakthrough by [LNPSY STOC'21] showed that solving s-t vertex connectivity is sufficient (up to polylogarithmic factors) to solve (global) vertex connectivity in the sequential model. This raises a natural question: What is the relationship between s-t and global vertex connectivity in other computational models? In this paper, we demonstrate that the connection between global and s-t variants behaves very differently across computational models: 1.In parallel and distributed models, we obtain almost tight reductions from global to s-t vertex connectivity. In PRAM, this leads to a $n^{\omega+o(1)}$-work and $n^{o(1)}$-depth algorithm for vertex connectivity, improving over the 35-year-old $\tilde O(n^{\omega+1})$-work $O(\log^2n)$-depth algorithm by [LLW FOCS'86], where $\omega$ is the matrix multiplication exponent and $n$ is the number of vertices. In CONGEST, the reduction implies the first sublinear-round (when the diameter is moderately small) vertex connectivity algorithm. This answers an open question in [JM STOC'23]. 2. In contrast, we show that global vertex connectivity is strictly harder than s-t vertex connectivity in the two-party communication setting, requiring $\tilde \Theta (n^{1.5})$ bits of communication. The s-t variant was known to be solvable in $\tilde O(n)$ communication [BvdBEMN FOCS'22]. Our results resolve open problems raised by [MN STOC'20, BvdBEMN FOCS'22, AS SOSA'23]. At the heart of our results is a new graph decomposition framework we call \emph{common-neighborhood clustering}, which can be applied in multiple models. Finally, we observe that global vertex connectivity cannot be solved without using s-t vertex connectivity, by proving an s-t to global reduction in dense graphs, in the PRAM and communication models.

Authors: Joakim Blikstad, Yonggang Jiang, Sagnik Mukhopadhyay, Sorrachai Yingchareonthawornchai

A recent breakthrough by [LNPSY STOC'21] showed that solving s-t vertex connectivity is sufficient (up to polylogarithmic factors) to solve (global) vertex connectivity in the sequential model. This raises a natural question: What is the relationship between s-t and global vertex connectivity in other computational models? In this paper, we demonstrate that the connection between global and s-t variants behaves very differently across computational models: 1.In parallel and distributed models, we obtain almost tight reductions from global to s-t vertex connectivity. In PRAM, this leads to a $n^{\omega+o(1)}$-work and $n^{o(1)}$-depth algorithm for vertex connectivity, improving over the 35-year-old $\tilde O(n^{\omega+1})$-work $O(\log^2n)$-depth algorithm by [LLW FOCS'86], where $\omega$ is the matrix multiplication exponent and $n$ is the number of vertices. In CONGEST, the reduction implies the first sublinear-round (when the diameter is moderately small) vertex connectivity algorithm. This answers an open question in [JM STOC'23]. 2. In contrast, we show that global vertex connectivity is strictly harder than s-t vertex connectivity in the two-party communication setting, requiring $\tilde \Theta (n^{1.5})$ bits of communication. The s-t variant was known to be solvable in $\tilde O(n)$ communication [BvdBEMN FOCS'22]. Our results resolve open problems raised by [MN STOC'20, BvdBEMN FOCS'22, AS SOSA'23]. At the heart of our results is a new graph decomposition framework we call \emph{common-neighborhood clustering}, which can be applied in multiple models. Finally, we observe that global vertex connectivity cannot be solved without using s-t vertex connectivity, by proving an s-t to global reduction in dense graphs, in the PRAM and communication models.

Finding Near-Optimal Maximum Set of Disjoint $k$-Cliques in Real-World Social Networks

from arXiv: Data Structures and Algorithms

Authors: Wenqing Lin, Xin Chen, Haoxuan Xie, Sibo Wang, Siqiang Luo

A $k$-clique is a dense graph, consisting of $k$ fully-connected nodes, that finds numerous applications, such as community detection and network analysis. In this paper, we study a new problem, that finds a maximum set of disjoint $k$-cliques in a given large real-world graph with a user-defined fixed number $k$, which can contribute to a good performance of teaming collaborative events in online games. However, this problem is NP-hard when $k \geq 3$, making it difficult to solve. To address that, we propose an efficient lightweight method that avoids significant overheads and achieves a $k$-approximation to the optimal, which is equipped with several optimization techniques, including the ordering method, degree estimation in the clique graph, and a lightweight implementation. Besides, to handle dynamic graphs that are widely seen in real-world social networks, we devise an efficient indexing method with careful swapping operations, leading to the efficient maintenance of a near-optimal result with frequent updates in the graph. In various experiments on several large graphs, our proposed approaches significantly outperform the competitors by up to 2 orders of magnitude in running time and 13.3\% in the number of computed disjoint $k$-cliques, which demonstrates the superiority of the proposed approaches in terms of efficiency and effectiveness.

Authors: Wenqing Lin, Xin Chen, Haoxuan Xie, Sibo Wang, Siqiang Luo

A $k$-clique is a dense graph, consisting of $k$ fully-connected nodes, that finds numerous applications, such as community detection and network analysis. In this paper, we study a new problem, that finds a maximum set of disjoint $k$-cliques in a given large real-world graph with a user-defined fixed number $k$, which can contribute to a good performance of teaming collaborative events in online games. However, this problem is NP-hard when $k \geq 3$, making it difficult to solve. To address that, we propose an efficient lightweight method that avoids significant overheads and achieves a $k$-approximation to the optimal, which is equipped with several optimization techniques, including the ordering method, degree estimation in the clique graph, and a lightweight implementation. Besides, to handle dynamic graphs that are widely seen in real-world social networks, we devise an efficient indexing method with careful swapping operations, leading to the efficient maintenance of a near-optimal result with frequent updates in the graph. In various experiments on several large graphs, our proposed approaches significantly outperform the competitors by up to 2 orders of magnitude in running time and 13.3\% in the number of computed disjoint $k$-cliques, which demonstrates the superiority of the proposed approaches in terms of efficiency and effectiveness.

Online Disjoint Spanning Trees and Polymatroid Bases

from arXiv: Data Structures and Algorithms

Authors: Karthekeyan Chandrasekaran, Chandra Chekuri, Weihao Zhu

Finding the maximum number of disjoint spanning trees in a given graph is a well-studied problem with several applications and connections. The Tutte-Nash-Williams theorem provides a min-max relation for this problem which also extends to disjoint bases in a matroid and leads to efficient algorithms. Several other packing problems such as element disjoint Steiner trees, disjoint set covers, and disjoint dominating sets are NP-Hard but admit an $O(\log n)$-approximation. C\u{a}linescu, Chekuri, and Vondr\'ak viewed all these packing problems as packing bases of a polymatroid and provided a unified perspective. Motivated by applications in wireless networks, recent works have studied the problem of packing set covers in the online model. The online model poses new challenges for packing problems. In particular, it is not clear how to pack a maximum number of disjoint spanning trees in a graph when edges arrive online. Motivated by these applications and theoretical considerations, we formulate an online model for packing bases of a polymatroid, and describe a randomized algorithm with a polylogarithmic competitive ratio. Our algorithm is based on interesting connections to the notion of quotients of a polymatroid that has recently seen applications in polymatroid sparsification. We generalize the previously known result for the online disjoint set cover problem and also address several other packing problems in a unified fashion. For the special case of packing disjoint spanning trees in a graph (or a hypergraph) whose edges arrive online, we provide an alternative to our general algorithm that is simpler and faster while achieving the same poly-logarithmic competitive ratio.

Authors: Karthekeyan Chandrasekaran, Chandra Chekuri, Weihao Zhu

Finding the maximum number of disjoint spanning trees in a given graph is a well-studied problem with several applications and connections. The Tutte-Nash-Williams theorem provides a min-max relation for this problem which also extends to disjoint bases in a matroid and leads to efficient algorithms. Several other packing problems such as element disjoint Steiner trees, disjoint set covers, and disjoint dominating sets are NP-Hard but admit an $O(\log n)$-approximation. C\u{a}linescu, Chekuri, and Vondr\'ak viewed all these packing problems as packing bases of a polymatroid and provided a unified perspective. Motivated by applications in wireless networks, recent works have studied the problem of packing set covers in the online model. The online model poses new challenges for packing problems. In particular, it is not clear how to pack a maximum number of disjoint spanning trees in a graph when edges arrive online. Motivated by these applications and theoretical considerations, we formulate an online model for packing bases of a polymatroid, and describe a randomized algorithm with a polylogarithmic competitive ratio. Our algorithm is based on interesting connections to the notion of quotients of a polymatroid that has recently seen applications in polymatroid sparsification. We generalize the previously known result for the online disjoint set cover problem and also address several other packing problems in a unified fashion. For the special case of packing disjoint spanning trees in a graph (or a hypergraph) whose edges arrive online, we provide an alternative to our general algorithm that is simpler and faster while achieving the same poly-logarithmic competitive ratio.

Wednesday, March 26

Call for papers Information-Theoretic Crpytography

from Windows on Theory

The sixth Information-Theoretic Cryptography (ITC) conference will be held at UC Santa Barbara, California, on August 16-17, 2025. The conference is affiliated with CRYPTO 2025, and will take place in the same location just before CRYPTO. Information-theoretic cryptography deals with the design and implementation of cryptographic protocols and primitives with unconditional security guarantees and the usage … Continue reading Call for papers Information-Theoretic Crpytography

The sixth Information-Theoretic Cryptography (ITC) conference will be held at UC Santa Barbara, California, on August 16-17, 2025. The conference is affiliated with CRYPTO 2025, and will take place in the same location just before CRYPTO.

Information-theoretic cryptography deals with the design and implementation of cryptographic protocols and primitives with unconditional security guarantees and the usage of information-theoretic tools and techniques in achieving other forms of security. The conference takes a broad interpretation of this theme and encourages submissions from different communities (cryptography, information theory, coding theory, theory of computation) that are at the intersection of security and information theory.

The conference will have two tracks: a conference track and a spotlight track. The conference track will operate like a traditional conference with the usual review process and published proceedings. The spotlight track will include invited talks of two types: surveys on recent advances on Information-Theoretic Cryptography, and presentations by early career researchers of recent ITC results that were published on other venues.

The final date for submission to the publication track is March 28th, 2025. Nominations for the spotlight track (including self nominations) should be sent by mail to itc2025chair@gmail.com.

See our website at https://itcrypto.github.io/2025/index.html for further details. The call for papers is available here: https://itcrypto.github.io/2025/2025cfp.html.

By Boaz Barak

On the JPMC/Quantinuum certified quantum randomness demo

from Scott Aaronson

These days, any quantum computing post I write ought to begin with the disclaimer that the armies of Sauron are triumphing around the globe, this is the darkest time for humanity most of us have ever known, and nothing else matters by comparison. Certainly not quantum computing. Nevertheless stuff happens in quantum computing and it […]

These days, any quantum computing post I write ought to begin with the disclaimer that the armies of Sauron are triumphing around the globe, this is the darkest time for humanity most of us have ever known, and nothing else matters by comparison. Certainly not quantum computing. Nevertheless stuff happens in quantum computing and it often brings me happiness to blog about it—certainly more happiness than doomscrolling or political arguments.


So then: today JP Morgan Chase announced that, together with Quantinuum and DoE labs, they’ve experimentally demonstrated the protocol I proposed in 2018, and further developed in a STOC’2023 paper with Shih-Han Hung, for using current quantum supremacy experiments to generate certifiable random bits for use in cryptographic applications. See here for our paper in Nature—the JPMC team was gracious enough to include me and Shih-Han as coauthors.

Mirroring a conceptual split in the protocol itself, Quantinuum handled the quantum hardware part of my protocol, while JPMC handled the rest: modification of the protocol to make it suitable for trapped ions, as well as software to generate pseudorandom challenge circuits to send to the quantum computer over the Internet, then to verify the correctness of the quantum computer’s outputs (thereby ensuring, under reasonable complexity assumptions, that the outputs contained at least a certain amount of entropy), and finally to extract nearly uniform random bits from the outputs. The experiment used Quantinuum’s 56-qubit trapped-ion quantum computer, which was given and took a couple seconds to respond to each challenge. Verification of the outputs was done using the Frontier and Summit supercomputers. The team estimates that about 70,000 certified random bits were generated over 18 hours, in such a way that, using the best currently-known attack, you’d need at least about four Frontier supercomputers working continuously to spoof the quantum computer’s outputs, and get the verifier to accept non-random bits.

We should be clear that this gap, though impressive from the standpoint of demonstrating quantum supremacy with trapped ions, is not yet good enough for high-stakes cryptographic applications (more about that later). Another important caveat is that the parameters of the experiment aren’t yet good enough for my and Shih-Han’s formal security reduction to give assurances: instead, for the moment one only has “practical security,” or security against a class of simplified yet realistic attackers. I hope that future experiments will build on the JPMC/Quantinuum achievement and remedy these issues.


The story of this certified randomness protocol starts seven years ago, when I had lunch with Or Sattath at a Japanese restaurant in Tel Aviv. Or told me that I needed to pay more attention to the then-recent Quantum Lightning paper by Mark Zhandry. I already know that paper is great, I said. You don’t know the half of it, Or replied. As one byproduct of what he’s doing, for example, Mark gives a way to measure quantum money states in order to get certified random bits—bits whose genuine randomness (not pseudorandomness) is certified by computational intractability, something that wouldn’t have been possible in a classical world.

Well, why do you even need quantum money states for that? I asked. Why not just use, say, a quantum supremacy experiment based on Random Circuit Sampling, like the one Google is now planning to do (i.e., the experiment Google would do, a year later after this conversation)? Then, the more I thought about that question, the more I liked the idea that these “useless” Random Circuit Sampling experiments would do something potentially useful despite themselves, generating certified entropy as just an inevitable byproduct of passing our benchmarks for sampling from certain classically-hard probability distributions. Over the next couple weeks, I worked out some of the technical details of the security analysis (though not all! it was a big job, and one that only got finished years later, when I brought Shih-Han to UT Austin as a postdoc and worked with him on it for a year).

I emailed the Google team about the idea; they responded enthusiastically. I also got in touch with UT Austin’s intellectual property office to file a provisional patent, the only time I’ve done that my career. UT and I successfully licensed the patent to Google, though the license lapsed when Google’s priorities changed. Meantime, a couple years ago, when I visited Quantinuum’s lab in Broomfield, Colorado, I learned that a JPMC-led collaboration toward an experimental demonstration of the protocol was then underway. The protocol was well-suited to Quantinuum’s devices, particularly given their ability to apply two-qubit gates with all-to-all connectivity and fidelity approaching 99.9%.

I should mention that, in the intervening years, others had also studied the use of quantum computers to generate cryptographically certified randomness; indeed it became a whole subarea of quantum computing. See especially the seminal work of Brakerski, Christiano, Mahadev, Vazirani, and Vidick, which gave a certified randomness protocol that (unlike mine) relies only on standard cryptographic assumptions and allows verification in classical polynomial time. The “only” downside is that implementing their protocol securely seems to require a full fault-tolerant quantum computer (capable of things like Shor’s algorithm), rather than current noisy devices with 50-100 qubits.


For the rest of this post, I’ll share a little FAQ, adapted from my answers to a journalist’s questions. Happy to answer additional questions in the comments.

  • To what extent is this a world-first?

Well, it’s the first experimental demonstration of a protocol to generate cryptographically certified random bits with the use of a quantum computer.

To remove any misunderstanding: if you’re just talking about the use of quantum phenomena to generate random bits, without certifying the randomness of those bits to a faraway skeptic, then that’s been easy to do for generations (just stick a Geiger counter next to some radioactive material!). The new part, the part that requires a quantum computer, is all about the certification.

Also: if you’re talking about the use of separated, entangled parties to generate certified random bits by violating the Bell inequality (see eg here) — that approach does give certification, but the downside is that you need to believe that the two parties really are unable to communicate with each other, something that you couldn’t certify in practice over the Internet.  A quantum-computer-based protocol like mine, by contrast, requires just a single quantum device.

  • Why is the certification element important?

In any cryptographic application where you need to distribute random bits over the Internet, the fundamental question is, why should everyone trust that these bits are truly random, rather than being backdoored by an adversary?

This isn’t so easy to solve.  If you consider any classical method for generating random bits, an adversary could substitute a cryptographic pseudorandom generator without anyone being the wiser.

The key insight behind the quantum protocol is that a quantum computer can solve certain problems efficiently, but only (it’s conjectured, and proven under plausible assumptions) by sampling an answer randomly — thereby giving you certified randomness, once you verify that the quantum computer really has solved the problem in question.  Unlike with a classical computer, there’s no way to substitute a pseudorandom generator, since randomness is just an inherent part of a quantum computer’s operation — specifically, when the entangled superposition state randomly collapses on measurement.

  • What are the applications and possible uses?

One potential application is to proof-of-stake cryptocurrencies, like Ethereum.  These cryptocurrencies are vastly more energy-efficient than “proof-of-work” cryptocurrencies (like Bitcoin), but they require lotteries to be run constantly to decide which currency holder gets to add the next block to the blockchain (and get paid for it).  Billions of dollars are riding on these lotteries being fair.

Other potential applications are to zero-knowledge protocols, lotteries and online gambling, and deciding which precincts to audit in elections. See here for a nice perspective article that JPMC put together discussing these and other potential applications.

Having said all this, a major problem right now is that verifying the results using a classical computer is extremely expensive — indeed, basically as expensive as spoofing the results would be.  This problem, and other problems related to verification (eg “why should everyone else trust the verifier?”), are the reasons why most people will probably pass on this solution in the near future, and generate random bits in simpler, non-quantum-computational ways.

We do know, from e.g. Brakerski et al.’s work, that the problem of making the verification fast is solvable with sufficient advancements in quantum computing hardware.  Even without hardware advancements, it might also be solvable with new theoretical ideas — one of my favorite research directions.

  • Is this is an early win for quantum computing?

It’s not directly an advancement in quantum computing hardware, but yes, it’s a very nice demonstration of such advancements — of something that’s possible today but wouldn’t have been possible just a few short years ago.  It’s a step toward using current, non-error-corrected quantum computers for a practical application that’s not itself about quantum mechanics but that really does inherently require quantum computers.

Of course it’s personally gratifying to see something I developed get experimentally realized after seven years.  Huge congratulations to the teams at JP Morgan Chase and Quantinuum, and thanks to them for the hard work they put into this.


Unrelated Announcement: See here for a podcast about quantum computing that I recorded with, of all organizations, the FBI. As I told the gentlemen who interviewed me, I’m glad the FBI still exists, let alone its podcast!

By Scott

Lecturer / Senior Lecturer / Reader in Foundational AI (Equivalent to Assistant / Associate Professor in the US) at University of Glasgow, UK (apply by April 30, 2025)

from CCI: jobs

We are seeking applications from individuals whose research focuses on the formal analysis of AI systems, with the aim of proving that these systems are responsible, unbiased, trustworthy, secure and robust against adversarial behaviour. Website: www.jobs.gla.ac.uk/job/lecturer-slash-senior-lecturer-slash-reader-in-foundational-ai Email: david.manlove@glasgow.ac.uk

We are seeking applications from individuals whose research focuses on the formal analysis of AI systems, with the aim of proving that these systems are responsible, unbiased, trustworthy, secure and robust against adversarial behaviour.

Website: https://www.jobs.gla.ac.uk/job/lecturer-slash-senior-lecturer-slash-reader-in-foundational-ai
Email: david.manlove@glasgow.ac.uk

By shacharlovett

What Happened to MOOCS?

from Computational Complexity

In 2012 I wrote a blog post about the growing influence of Massively Open Online Courses, or MOOCs.

John Hennessey, president of Stanford, gave the CRA keynote address arguing that MOOCs will save universities. He puts the untenable costs of universities at personnel costs (faculty salaries) are making colleges unaffordable (not sure I fully agree). He argued that MOOCs will help teach courses more effectively. The hidden subtext: fewer professors and probably fewer universities, or as someone joked, we'll all be branch campuses of Stanford.

I ended the post "MOOCs may completely change higher education in America and around the world. Or they won't." A reader asked "Wondering what are you takes about MOOCS now?". Good question.

If you want a detailed answer I had chatty put together a deep research report. Here's my take, mostly from the US computing perspective. The term MOOC is rarely used anymore, but we have seen tremendous growth in online courses and degrees, particularly in Masters programs.

We've seen some major successes, most notably the Georgia Tech Online Masters of Science in Computer Science program that we started in 2014. By we, I mostly mean then-dean Zvi Galil's tenacity to make it happen. Zvi made the right moves (after some pushing), getting faculty buy-in, strong incentives for faculty participation, putting significant resources for course development, a very low-cost degree and most importantly insisting that we have the same if not better quality than our on-campus offerings. The program grew tremendously reaching about 10,000 students by 2020. Georgia Tech had to add a new graduation ceremony for students who finished the degree remotely but traveled to campus for graduation.

We've seen a plethora of new programs. Most domestic students can get a good computing masters degree at a fraction of a cost of an in-person program. On-campus Masters program in computing are now almost entirely international for on-campus programs can deliver something an on-line course cannot: A visa, and a chance to build a life in the United States.

These new programs vary quite a bit in quality, some truly strong, others less so. Some are outright misleading, making a deal with a university to use their name but otherwise having no connection to the school's faculty or academic departments. These programs often feature 'professional certificates' marketed under university branding but are actually developed and administered by third-party education companies.

While we learned to teach everything online during the pandemic, on-line degrees don't work as well for bachelor degrees where the on-campus experience almost matters more than the courses, or for research-intensive PhD programs.

We are not all branch campuses of Stanford but the story isn't done. Colleges continue to have financial challenges, artificial intelligence will continue to play new roles in education, not to mention the recent actions of the Trump administration. Hopefully MOOCs won't be the only thing surviving.

By Lance Fortnow

In 2012 I wrote a blog post about the growing influence of Massively Open Online Courses, or MOOCs.

John Hennessey, president of Stanford, gave the CRA keynote address arguing that MOOCs will save universities. He puts the untenable costs of universities at personnel costs (faculty salaries) are making colleges unaffordable (not sure I fully agree). He argued that MOOCs will help teach courses more effectively. The hidden subtext: fewer professors and probably fewer universities, or as someone joked, we'll all be branch campuses of Stanford.

I ended the post "MOOCs may completely change higher education in America and around the world. Or they won't." A reader asked "Wondering what are you takes about MOOCS now?". Good question.

If you want a detailed answer I had chatty put together a deep research report. Here's my take, mostly from the US computing perspective. The term MOOC is rarely used anymore, but we have seen tremendous growth in online courses and degrees, particularly in Masters programs.

We've seen some major successes, most notably the Georgia Tech Online Masters of Science in Computer Science program that we started in 2014. By we, I mostly mean then-dean Zvi Galil's tenacity to make it happen. Zvi made the right moves (after some pushing), getting faculty buy-in, strong incentives for faculty participation, putting significant resources for course development, a very low-cost degree and most importantly insisting that we have the same if not better quality than our on-campus offerings. The program grew tremendously reaching about 10,000 students by 2020. Georgia Tech had to add a new graduation ceremony for students who finished the degree remotely but traveled to campus for graduation.

We've seen a plethora of new programs. Most domestic students can get a good computing masters degree at a fraction of a cost of an in-person program. On-campus Masters program in computing are now almost entirely international for on-campus programs can deliver something an on-line course cannot: A visa, and a chance to build a life in the United States.

These new programs vary quite a bit in quality, some truly strong, others less so. Some are outright misleading, making a deal with a university to use their name but otherwise having no connection to the school's faculty or academic departments. These programs often feature 'professional certificates' marketed under university branding but are actually developed and administered by third-party education companies.

While we learned to teach everything online during the pandemic, on-line degrees don't work as well for bachelor degrees where the on-campus experience almost matters more than the courses, or for research-intensive PhD programs.

We are not all branch campuses of Stanford but the story isn't done. Colleges continue to have financial challenges, artificial intelligence will continue to play new roles in education, not to mention the recent actions of the Trump administration. Hopefully MOOCs won't be the only thing surviving.

By Lance Fortnow

TR25-035 | Primes via Zeros: Interactive Proofs for Testing Primality of Natural Classes of Ideals | Abhibhav Garg, Rafael Mendes de Oliveira, Nitin Saxena

from ECCC Papers

A central question in mathematics and computer science is the question of determining whether a given ideal $I$ is prime, which geometrically corresponds to the zero set of $I$, denoted $Z(I)$, being irreducible. The case of principal ideals (i.e., $m=1$) corresponds to the more familiar absolute irreducibility testing of polynomials, where the seminal work of (Kaltofen 1995) yields a randomized, polynomial time algorithm for this problem. However, when $m > 1$, the complexity of the primality testing problem seems much harder. The current best algorithms for this problem are only known to be in EXPSPACE. Such drastic state of affairs has prompted research on the primality testing problem (and its more general variants, the primary decomposition problem, and the problem of counting the number of irreducible components) for natural classes of ideals. Notable classes of ideals are the class of radical ideals, complete intersections (and more generally Cohen-Macaulay ideals). For radical ideals, the current best upper bounds are given by (Bürgisser & Scheiblechner, 2007), putting the problem in PSPACE. For complete intersections, the primary decomposition algorithm of (Eisenbud, Huneke, Vasconcelos 1992) coupled with the degree bounds of (DFGS 1991), puts the ideal primality testing problem in EXP. In these situations, the only known complexity-theoretic lower bound for the ideal primality testing problem is that it is coNP-hard for the classes of radical ideals, and equidimensional Cohen-Macaulay ideals. In this work, we significantly reduce the complexity-theoretic gap for the ideal primality testing problem for the important families of ideals $I$ (namely, radical ideals and equidimensional Cohen-Macaulay ideals). For these classes of ideals, assuming the Generalized Riemann Hypothesis, we show that primality testing lies in $\Sigma_3^p \cap \Pi_3^p$. This significantly improves the upper bound for these classes, approaching their lower bound, as the primality testing problem is coNP-hard for these classes of ideals. Another consequence of our results is that for equidimensional Cohen-Macaulay ideals, we get the first PSPACE algorithm for primality testing, exponentially improving the space and time complexity of prior known algorithms.

A central question in mathematics and computer science is the question of determining whether a given ideal $I$ is prime, which geometrically corresponds to the zero set of $I$, denoted $Z(I)$, being irreducible. The case of principal ideals (i.e., $m=1$) corresponds to the more familiar absolute irreducibility testing of polynomials, where the seminal work of (Kaltofen 1995) yields a randomized, polynomial time algorithm for this problem. However, when $m > 1$, the complexity of the primality testing problem seems much harder. The current best algorithms for this problem are only known to be in EXPSPACE. Such drastic state of affairs has prompted research on the primality testing problem (and its more general variants, the primary decomposition problem, and the problem of counting the number of irreducible components) for natural classes of ideals. Notable classes of ideals are the class of radical ideals, complete intersections (and more generally Cohen-Macaulay ideals). For radical ideals, the current best upper bounds are given by (Bürgisser & Scheiblechner, 2007), putting the problem in PSPACE. For complete intersections, the primary decomposition algorithm of (Eisenbud, Huneke, Vasconcelos 1992) coupled with the degree bounds of (DFGS 1991), puts the ideal primality testing problem in EXP. In these situations, the only known complexity-theoretic lower bound for the ideal primality testing problem is that it is coNP-hard for the classes of radical ideals, and equidimensional Cohen-Macaulay ideals. In this work, we significantly reduce the complexity-theoretic gap for the ideal primality testing problem for the important families of ideals $I$ (namely, radical ideals and equidimensional Cohen-Macaulay ideals). For these classes of ideals, assuming the Generalized Riemann Hypothesis, we show that primality testing lies in $\Sigma_3^p \cap \Pi_3^p$. This significantly improves the upper bound for these classes, approaching their lower bound, as the primality testing problem is coNP-hard for these classes of ideals. Another consequence of our results is that for equidimensional Cohen-Macaulay ideals, we get the first PSPACE algorithm for primality testing, exponentially improving the space and time complexity of prior known algorithms.

Upper and Lower Bounds for the Linear Ordering Principle

from arXiv: Computational Complexity

Authors: Edward A. Hirsch, Ilya Volkovich

Korten and Pitassi (FOCS, 2024) defined a new complexity class $L_2P$ as the polynomial-time Turing closure of the Linear Ordering Principle. They asked whether a Karp--Lipton--style collapse can be proven for $L_2P$. We answer this question affirmatively by showing that $P^{prMA}\subseteq L_2P$. As a byproduct, we also answer an open question of Chakaravarthy and Roy (Computational Complexity, 2011) whether $P^{prMA}\subseteq S_2P$. We complement this result by providing a new upper bound for $L_2P$, namely $L_2P\subseteq P^{prSBP}$. Thus we are placing $L_2P$ between $P^{prMA}$ and $P^{prSBP}$. One technical ingredient of this result is an algorithm that approximates the number of satisfying assignments of a Boolean circuit using a $prSBP$ oracle (i.e. in $FP^{prSBP}$), which could be of independent interest. Finally, we prove that $P^{prO_2P}\subseteq O_2P$, which implies that the Karp--Lipton--style collapse to $P^{prOMA}$ is actually better than both known collapses to $P^{prMA}$ due to Chakaravarthy and Roy (Computational Complexity, 2011) and to $O_2P$ also due to Chakaravarthy and Roy (STACS, 2006).

Authors: Edward A. Hirsch, Ilya Volkovich

Korten and Pitassi (FOCS, 2024) defined a new complexity class $L_2P$ as the polynomial-time Turing closure of the Linear Ordering Principle. They asked whether a Karp--Lipton--style collapse can be proven for $L_2P$. We answer this question affirmatively by showing that $P^{prMA}\subseteq L_2P$. As a byproduct, we also answer an open question of Chakaravarthy and Roy (Computational Complexity, 2011) whether $P^{prMA}\subseteq S_2P$. We complement this result by providing a new upper bound for $L_2P$, namely $L_2P\subseteq P^{prSBP}$. Thus we are placing $L_2P$ between $P^{prMA}$ and $P^{prSBP}$. One technical ingredient of this result is an algorithm that approximates the number of satisfying assignments of a Boolean circuit using a $prSBP$ oracle (i.e. in $FP^{prSBP}$), which could be of independent interest. Finally, we prove that $P^{prO_2P}\subseteq O_2P$, which implies that the Karp--Lipton--style collapse to $P^{prOMA}$ is actually better than both known collapses to $P^{prMA}$ due to Chakaravarthy and Roy (Computational Complexity, 2011) and to $O_2P$ also due to Chakaravarthy and Roy (STACS, 2006).

High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise

from arXiv: Computational Complexity

Authors: Yuchen Fang, Javad Lavaei, Katya Scheinberg, Sen Na

In this paper, we consider nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Stochastic Sequential Quadratic Programming (TR-SSQP) method and establish its high-probability iteration complexity bounds for identifying first- and second-order $\epsilon$-stationary points. In our algorithm, we assume that exact objective values, gradients, and Hessians are not directly accessible but can be estimated via zeroth-, first-, and second-order probabilistic oracles. Compared to existing complexity studies of SSQP methods that rely on a zeroth-order oracle with sub-exponential tail noise (i.e., light-tailed) and focus mostly on first-order stationarity, our analysis accommodates irreducible and heavy-tailed noise in the zeroth-order oracle and significantly extends the analysis to second-order stationarity. We show that under weaker noise conditions, our method achieves the same high-probability first-order iteration complexity bounds, while also exhibiting promising second-order iteration complexity bounds. Specifically, the method identifies a first-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-2})$ iterations and a second-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-3})$ iterations with high probability, provided that $\epsilon$ is lower bounded by a constant determined by the irreducible noise level in estimation. We validate our theoretical findings and evaluate the practical performance of our method on CUTEst benchmark test set.

Authors: Yuchen Fang, Javad Lavaei, Katya Scheinberg, Sen Na

In this paper, we consider nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Stochastic Sequential Quadratic Programming (TR-SSQP) method and establish its high-probability iteration complexity bounds for identifying first- and second-order $\epsilon$-stationary points. In our algorithm, we assume that exact objective values, gradients, and Hessians are not directly accessible but can be estimated via zeroth-, first-, and second-order probabilistic oracles. Compared to existing complexity studies of SSQP methods that rely on a zeroth-order oracle with sub-exponential tail noise (i.e., light-tailed) and focus mostly on first-order stationarity, our analysis accommodates irreducible and heavy-tailed noise in the zeroth-order oracle and significantly extends the analysis to second-order stationarity. We show that under weaker noise conditions, our method achieves the same high-probability first-order iteration complexity bounds, while also exhibiting promising second-order iteration complexity bounds. Specifically, the method identifies a first-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-2})$ iterations and a second-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-3})$ iterations with high probability, provided that $\epsilon$ is lower bounded by a constant determined by the irreducible noise level in estimation. We validate our theoretical findings and evaluate the practical performance of our method on CUTEst benchmark test set.

When Distances Lie: Euclidean Embeddings in the Presence of Outliers and Distance Violations

from arXiv: Data Structures and Algorithms

Authors: Matthias Bentert, Fedor V. Fomin, Petr A. Golovach, M. S. Ramanujan, Saket Saurabh

Distance geometry explores the properties of distance spaces that can be exactly represented as the pairwise Euclidean distances between points in $\mathbb{R}^d$ ($d \geq 1$), or equivalently, distance spaces that can be isometrically embedded in $\mathbb{R}^d$. In this work, we investigate whether a distance space can be isometrically embedded in $\mathbb{R}^d$ after applying a limited number of modifications. Specifically, we focus on two types of modifications: outlier deletion (removing points) and distance modification (adjusting distances between points). The central problem, Euclidean Embedding Editing (EEE), asks whether an input distance space on $n$ points can be transformed, using at most $k$ modifications, into a space that is isometrically embeddable in $\mathbb{R}^d$. We present several fixed-parameter tractable (FPT) and approximation algorithms for this problem. Our first result is an algorithm that solves EEE in time $(dk)^{\mathcal{O}(d+k)} + n^{\mathcal{O}(1)}$. The core subroutine of this algorithm, which is of independent interest, is a polynomial-time method for compressing the input distance space into an equivalent instance of EEE with $\mathcal{O}((dk)^2)$ points. For the special but important case of EEE where only outlier deletions are allowed, we improve the parameter dependence of the FPT algorithm and obtain a running time of $\min\{(d+3)^k, 2^{d+k}\} \cdot n^{\mathcal{O}(1)}$. Additionally, we provide an FPT-approximation algorithm for this problem, which outputs a set of at most $2 \cdot {\rm OPT}$ outliers in time $2^d \cdot n^{\mathcal{O}(1)}$. This 2-approximation algorithm improves upon the previous $(3+\varepsilon)$-approximation algorithm by Sidiropoulos, Wang, and Wang [SODA '17]. Furthermore, we complement our algorithms with hardness results motivating our choice of parameterizations.

Authors: Matthias Bentert, Fedor V. Fomin, Petr A. Golovach, M. S. Ramanujan, Saket Saurabh

Distance geometry explores the properties of distance spaces that can be exactly represented as the pairwise Euclidean distances between points in $\mathbb{R}^d$ ($d \geq 1$), or equivalently, distance spaces that can be isometrically embedded in $\mathbb{R}^d$. In this work, we investigate whether a distance space can be isometrically embedded in $\mathbb{R}^d$ after applying a limited number of modifications. Specifically, we focus on two types of modifications: outlier deletion (removing points) and distance modification (adjusting distances between points). The central problem, Euclidean Embedding Editing (EEE), asks whether an input distance space on $n$ points can be transformed, using at most $k$ modifications, into a space that is isometrically embeddable in $\mathbb{R}^d$. We present several fixed-parameter tractable (FPT) and approximation algorithms for this problem. Our first result is an algorithm that solves EEE in time $(dk)^{\mathcal{O}(d+k)} + n^{\mathcal{O}(1)}$. The core subroutine of this algorithm, which is of independent interest, is a polynomial-time method for compressing the input distance space into an equivalent instance of EEE with $\mathcal{O}((dk)^2)$ points. For the special but important case of EEE where only outlier deletions are allowed, we improve the parameter dependence of the FPT algorithm and obtain a running time of $\min\{(d+3)^k, 2^{d+k}\} \cdot n^{\mathcal{O}(1)}$. Additionally, we provide an FPT-approximation algorithm for this problem, which outputs a set of at most $2 \cdot {\rm OPT}$ outliers in time $2^d \cdot n^{\mathcal{O}(1)}$. This 2-approximation algorithm improves upon the previous $(3+\varepsilon)$-approximation algorithm by Sidiropoulos, Wang, and Wang [SODA '17]. Furthermore, we complement our algorithms with hardness results motivating our choice of parameterizations.

A Tight Meta-theorem for LOCAL Certification of MSO$_2$ Properties within Bounded Treewidth Graphs

from arXiv: Data Structures and Algorithms

Authors: Linda Cook, Eun Jung Kim, Tomáš Masařík

Distributed networks are prone to errors so verifying their output is critical. Hence, we develop LOCAL certification protocols for graph properties in which nodes are given certificates that allow them to check whether their network as a whole satisfies some fixed property while only communicating with their local network. Most known LOCAL certification protocols are specifically tailored to the problem they work on and cannot be translated more generally. Thus we target general protocols that can certify any property expressible within a certain logical framework. We consider Monadic Second Order Logic (MSO$_2$), a powerful framework that can express properties such as non-$k$-colorability, Hamiltonicity, and $H$-minor-freeness. Unfortunately, in general, there are MSO$_2$-expressible properties that cannot be certified without huge certificates. For instance, non-3-colorability requires certificates of size $\Omega(n^2/\log n)$ on general $n$-vertex graphs (G\"o\"os, Suomela 2016). Hence, we impose additional structural restrictions on the graph. We provide a LOCAL certification protocol for certifying any MSO$_2$-expressible property on graphs of bounded treewidth and, consequently, a LOCAL certification protocol for certifying bounded treewidth. That is for each integer $k$ and each MSO$_2$-expressible property $\Pi$ we give a LOCAL Certification protocol to certify that a graph satisfies $\Pi$ and has treewidth at most $k$ using certificates of size $\mathcal{O}(\log n)$ (which is asymptotically optimal). Our LOCAL certification protocol requires only one round of distributed communication, hence it is also proof-labeling scheme. Our result improves upon work by Fraigniaud, Montealegre, Rapaport, and Todinca (Algorithmica 2024), Bousquet, Feuilloley, Pierron (PODC 2022), and the very recent work of Baterisna and Chang.

Authors: Linda Cook, Eun Jung Kim, Tomáš Masařík

Distributed networks are prone to errors so verifying their output is critical. Hence, we develop LOCAL certification protocols for graph properties in which nodes are given certificates that allow them to check whether their network as a whole satisfies some fixed property while only communicating with their local network. Most known LOCAL certification protocols are specifically tailored to the problem they work on and cannot be translated more generally. Thus we target general protocols that can certify any property expressible within a certain logical framework. We consider Monadic Second Order Logic (MSO$_2$), a powerful framework that can express properties such as non-$k$-colorability, Hamiltonicity, and $H$-minor-freeness. Unfortunately, in general, there are MSO$_2$-expressible properties that cannot be certified without huge certificates. For instance, non-3-colorability requires certificates of size $\Omega(n^2/\log n)$ on general $n$-vertex graphs (G\"o\"os, Suomela 2016). Hence, we impose additional structural restrictions on the graph. We provide a LOCAL certification protocol for certifying any MSO$_2$-expressible property on graphs of bounded treewidth and, consequently, a LOCAL certification protocol for certifying bounded treewidth. That is for each integer $k$ and each MSO$_2$-expressible property $\Pi$ we give a LOCAL Certification protocol to certify that a graph satisfies $\Pi$ and has treewidth at most $k$ using certificates of size $\mathcal{O}(\log n)$ (which is asymptotically optimal). Our LOCAL certification protocol requires only one round of distributed communication, hence it is also proof-labeling scheme. Our result improves upon work by Fraigniaud, Montealegre, Rapaport, and Todinca (Algorithmica 2024), Bousquet, Feuilloley, Pierron (PODC 2022), and the very recent work of Baterisna and Chang.

Multiplication of 0-1 matrices via clustering

from arXiv: Data Structures and Algorithms

Authors: Jesper Jansson, Miroslaw Kowaluk, Andrzej Lingas, Mia Persson

We study applications of clustering (in particular the $k$-center clustering problem) in the design of efficient and practical deterministic algorithms for computing an approximate and the exact arithmetic matrix product of two 0-1 rectangular matrices $A$ and $B$ with clustered rows or columns, respectively. Let $\lambda_A$ and $\lambda_B$ denote the minimum maximum radius of a cluster in an $\ell$-center clustering of the rows of $A$ and in a $k$-center clustering of the columns of $B,$ respectively. In particular, assuming that the matrices have size $n\times n$, we obtain the following results. A simple deterministic algorithm that approximates each entry of the arithmetic matrix product of $A$ and $B$ within the additive error of at most $2\lambda_A$ in $O(n^2\ell)$ time or at most $2\lambda_B$ in $O(n^2k)$ time. A simple deterministic preprocessing of the matrices $A$ and $B$ in $O(n^2\ell)$ time or $O(n^2k)$ time such that a query asking for the exact value of an arbitrary entry of the arithmetic matrix product of $A$ and $B$ can be answered in $O(\lambda_A)$ time or $O(\lambda_B)$ time, respectively. A simple deterministic algorithm for the exact arithmetic matrix product of $A$ and $B$ running in time $O(n^2(\ell+k+\min\{\lambda_A,\lambda_B\}))$.

Authors: Jesper Jansson, Miroslaw Kowaluk, Andrzej Lingas, Mia Persson

We study applications of clustering (in particular the $k$-center clustering problem) in the design of efficient and practical deterministic algorithms for computing an approximate and the exact arithmetic matrix product of two 0-1 rectangular matrices $A$ and $B$ with clustered rows or columns, respectively. Let $\lambda_A$ and $\lambda_B$ denote the minimum maximum radius of a cluster in an $\ell$-center clustering of the rows of $A$ and in a $k$-center clustering of the columns of $B,$ respectively. In particular, assuming that the matrices have size $n\times n$, we obtain the following results. A simple deterministic algorithm that approximates each entry of the arithmetic matrix product of $A$ and $B$ within the additive error of at most $2\lambda_A$ in $O(n^2\ell)$ time or at most $2\lambda_B$ in $O(n^2k)$ time. A simple deterministic preprocessing of the matrices $A$ and $B$ in $O(n^2\ell)$ time or $O(n^2k)$ time such that a query asking for the exact value of an arbitrary entry of the arithmetic matrix product of $A$ and $B$ can be answered in $O(\lambda_A)$ time or $O(\lambda_B)$ time, respectively. A simple deterministic algorithm for the exact arithmetic matrix product of $A$ and $B$ running in time $O(n^2(\ell+k+\min\{\lambda_A,\lambda_B\}))$.

Lifting Linear Sketches: Optimal Bounds and Adversarial Robustness

from arXiv: Data Structures and Algorithms

Authors: Elena Gribelyuk, Honghao Lin, David P. Woodruff, Huacheng Yu, Samson Zhou

We introduce a novel technique for ``lifting'' dimension lower bounds for linear sketches in the real-valued setting to dimension lower bounds for linear sketches with polynomially-bounded integer entries when the input is a polynomially-bounded integer vector. Using this technique, we obtain the first optimal sketching lower bounds for discrete inputs in a data stream, for classical problems such as approximating the frequency moments, estimating the operator norm, and compressed sensing. Additionally, we lift the adaptive attack of Hardt and Woodruff (STOC, 2013) for breaking any real-valued linear sketch via a sequence of real-valued queries, and show how to obtain an attack on any integer-valued linear sketch using integer-valued queries. This shows that there is no linear sketch in a data stream with insertions and deletions that is adversarially robust for approximating any $L_p$ norm of the input, resolving a central open question for adversarially robust streaming algorithms. To do so, we introduce a new pre-processing technique of independent interest which, given an integer-valued linear sketch, increases the dimension of the sketch by only a constant factor in order to make the orthogonal lattice to its row span smooth. This pre-processing then enables us to leverage results in lattice theory on discrete Gaussian distributions and reason that efficient discrete sketches imply efficient continuous sketches. Our work resolves open questions from the Banff '14 and '17 workshops on Communication Complexity and Applications, as well as the STOC '21 and FOCS '23 workshops on adaptivity and robustness.

Authors: Elena Gribelyuk, Honghao Lin, David P. Woodruff, Huacheng Yu, Samson Zhou

We introduce a novel technique for ``lifting'' dimension lower bounds for linear sketches in the real-valued setting to dimension lower bounds for linear sketches with polynomially-bounded integer entries when the input is a polynomially-bounded integer vector. Using this technique, we obtain the first optimal sketching lower bounds for discrete inputs in a data stream, for classical problems such as approximating the frequency moments, estimating the operator norm, and compressed sensing. Additionally, we lift the adaptive attack of Hardt and Woodruff (STOC, 2013) for breaking any real-valued linear sketch via a sequence of real-valued queries, and show how to obtain an attack on any integer-valued linear sketch using integer-valued queries. This shows that there is no linear sketch in a data stream with insertions and deletions that is adversarially robust for approximating any $L_p$ norm of the input, resolving a central open question for adversarially robust streaming algorithms. To do so, we introduce a new pre-processing technique of independent interest which, given an integer-valued linear sketch, increases the dimension of the sketch by only a constant factor in order to make the orthogonal lattice to its row span smooth. This pre-processing then enables us to leverage results in lattice theory on discrete Gaussian distributions and reason that efficient discrete sketches imply efficient continuous sketches. Our work resolves open questions from the Banff '14 and '17 workshops on Communication Complexity and Applications, as well as the STOC '21 and FOCS '23 workshops on adaptivity and robustness.

Approximating $q \rightarrow p$ Norms of Non-Negative Matrices in Nearly-Linear Time

from arXiv: Data Structures and Algorithms

Authors: Étienne Objois, Adrian Vladu

We provide the first nearly-linear time algorithm for approximating $\ell_{q \rightarrow p}$-norms of non-negative matrices, for $q \geq p \geq 1$. Our algorithm returns a $(1-\varepsilon)$-approximation to the matrix norm in time $\widetilde{O}\left(\frac{1}{q \varepsilon} \cdot \text{nnz}(\boldsymbol{\mathit{A}})\right)$, where $\boldsymbol{\mathit{A}}$ is the input matrix, and improves upon the previous state of the art, which either proved convergence only in the limit [Boyd '74], or had very high polynomial running times [Bhaskara-Vijayraghavan, SODA '11]. Our algorithm is extremely simple, and is largely inspired from the coordinate-scaling approach used for positive linear program solvers. We note that our algorithm can readily be used in the [Englert-R\"{a}cke, FOCS '09] to improve the running time of constructing $O(\log n)$-competitive $\ell_p$-oblivious routings. We thus complement this result with a simple cutting-plane based scheme for computing $\textit{optimal}$ oblivious routings in graphs with respect to any monotone norm. Combined with state of the art cutting-plane solvers, this scheme runs in time $\widetilde{O}(n^6 m^3)$, which is significantly faster than the one based on Englert-R\"{a}cke, and generalizes the $\ell_\infty$ routing algorithm of [Azar-Cohen-Fiat-Kaplan-R\"acke, STOC '03].

Authors: Étienne Objois, Adrian Vladu

We provide the first nearly-linear time algorithm for approximating $\ell_{q \rightarrow p}$-norms of non-negative matrices, for $q \geq p \geq 1$. Our algorithm returns a $(1-\varepsilon)$-approximation to the matrix norm in time $\widetilde{O}\left(\frac{1}{q \varepsilon} \cdot \text{nnz}(\boldsymbol{\mathit{A}})\right)$, where $\boldsymbol{\mathit{A}}$ is the input matrix, and improves upon the previous state of the art, which either proved convergence only in the limit [Boyd '74], or had very high polynomial running times [Bhaskara-Vijayraghavan, SODA '11]. Our algorithm is extremely simple, and is largely inspired from the coordinate-scaling approach used for positive linear program solvers. We note that our algorithm can readily be used in the [Englert-R\"{a}cke, FOCS '09] to improve the running time of constructing $O(\log n)$-competitive $\ell_p$-oblivious routings. We thus complement this result with a simple cutting-plane based scheme for computing $\textit{optimal}$ oblivious routings in graphs with respect to any monotone norm. Combined with state of the art cutting-plane solvers, this scheme runs in time $\widetilde{O}(n^6 m^3)$, which is significantly faster than the one based on Englert-R\"{a}cke, and generalizes the $\ell_\infty$ routing algorithm of [Azar-Cohen-Fiat-Kaplan-R\"acke, STOC '03].

Online Stochastic Matching with Unknown Arrival Order: Beating $0.5$ against the Online Optimum

from arXiv: Data Structures and Algorithms

Authors: Enze Sun, Zhihao Gavin Tang, Yifan Wang

We study the online stochastic matching problem. Against the offline benchmark, Feldman, Gravin, and Lucier (SODA 2015) designed an optimal $0.5$-competitive algorithm. A recent line of work, initiated by Papadimitriou, Pollner, Saberi, and Wajc (MOR 2024), focuses on designing approximation algorithms against the online optimum. The online benchmark allows positive results surpassing the $0.5$ ratio. In this work, adapting the order-competitive analysis by Ezra, Feldman, Gravin, and Tang (SODA 2023), we design a $0.5+\Omega(1)$ order-competitive algorithm against the online benchmark with unknown arrival order. Our algorithm is significantly different from existing ones, as the known arrival order is crucial to the previous approximation algorithms.

Authors: Enze Sun, Zhihao Gavin Tang, Yifan Wang

We study the online stochastic matching problem. Against the offline benchmark, Feldman, Gravin, and Lucier (SODA 2015) designed an optimal $0.5$-competitive algorithm. A recent line of work, initiated by Papadimitriou, Pollner, Saberi, and Wajc (MOR 2024), focuses on designing approximation algorithms against the online optimum. The online benchmark allows positive results surpassing the $0.5$ ratio. In this work, adapting the order-competitive analysis by Ezra, Feldman, Gravin, and Tang (SODA 2023), we design a $0.5+\Omega(1)$ order-competitive algorithm against the online benchmark with unknown arrival order. Our algorithm is significantly different from existing ones, as the known arrival order is crucial to the previous approximation algorithms.

Improved Approximation Algorithms for Three-Dimensional Knapsack

from arXiv: Data Structures and Algorithms

Authors: Klaus Jansen, Debajyoti Kar, Arindam Khan, K. V. N. Sreenivas, Malte Tutas

We study the three-dimensional Knapsack (3DK) problem, in which we are given a set of axis-aligned cuboids with associated profits and an axis-aligned cube knapsack. The objective is to find a non-overlapping axis-aligned packing (by translation) of the maximum profit subset of cuboids into the cube. The previous best approximation algorithm is due to Diedrich, Harren, Jansen, Th\"{o}le, and Thomas (2008), who gave a $(7+\varepsilon)$-approximation algorithm for 3DK and a $(5+\varepsilon)$-approximation algorithm for the variant when the items can be rotated by 90 degrees around any axis, for any constant $\varepsilon>0$. Chleb\'{\i}k and Chleb\'{\i}kov\'{a} (2009) showed that the problem does not admit an asymptotic polynomial-time approximation scheme. We provide an improved polynomial-time $(139/29+\varepsilon) \approx 4.794$-approximation algorithm for 3DK and $(30/7+\varepsilon) \approx 4.286$-approximation when rotations by 90 degrees are allowed. We also provide improved approximation algorithms for several variants such as the cardinality case (when all items have the same profit) and uniform profit-density case (when the profit of an item is equal to its volume). Our key technical contribution is container packing -- a structured packing in 3D such that all items are assigned into a constant number of containers, and each container is packed using a specific strategy based on its type. We first show the existence of highly profitable container packings. Thereafter, we show that one can find near-optimal container packing efficiently using a variant of the Generalized Assignment Problem (GAP).

Authors: Klaus Jansen, Debajyoti Kar, Arindam Khan, K. V. N. Sreenivas, Malte Tutas

We study the three-dimensional Knapsack (3DK) problem, in which we are given a set of axis-aligned cuboids with associated profits and an axis-aligned cube knapsack. The objective is to find a non-overlapping axis-aligned packing (by translation) of the maximum profit subset of cuboids into the cube. The previous best approximation algorithm is due to Diedrich, Harren, Jansen, Th\"{o}le, and Thomas (2008), who gave a $(7+\varepsilon)$-approximation algorithm for 3DK and a $(5+\varepsilon)$-approximation algorithm for the variant when the items can be rotated by 90 degrees around any axis, for any constant $\varepsilon>0$. Chleb\'{\i}k and Chleb\'{\i}kov\'{a} (2009) showed that the problem does not admit an asymptotic polynomial-time approximation scheme. We provide an improved polynomial-time $(139/29+\varepsilon) \approx 4.794$-approximation algorithm for 3DK and $(30/7+\varepsilon) \approx 4.286$-approximation when rotations by 90 degrees are allowed. We also provide improved approximation algorithms for several variants such as the cardinality case (when all items have the same profit) and uniform profit-density case (when the profit of an item is equal to its volume). Our key technical contribution is container packing -- a structured packing in 3D such that all items are assigned into a constant number of containers, and each container is packed using a specific strategy based on its type. We first show the existence of highly profitable container packings. Thereafter, we show that one can find near-optimal container packing efficiently using a variant of the Generalized Assignment Problem (GAP).

Privately Evaluating Untrusted Black-Box Functions

from arXiv: Data Structures and Algorithms

Authors: Ephraim Linder, Sofya Raskhodnikova, Adam Smith, Thomas Steinke

We provide tools for sharing sensitive data when the data curator doesn't know in advance what questions an (untrusted) analyst might ask about the data. The analyst can specify a program that they want the curator to run on the dataset. We model the program as a black-box function $f$. We study differentially private algorithms, called privacy wrappers, that, given black-box access to a real-valued function $f$ and a sensitive dataset $x$, output an accurate approximation to $f(x)$. The dataset $x$ is modeled as a finite subset of a possibly infinite set $U$, in which each entry represents data of one individual. A privacy wrapper calls $f$ on the dataset $x$ and on some subsets of $x$ and returns either an approximation to $f(x)$ or a nonresponse symbol $\perp$. The wrapper may also use additional information (that is, parameters) provided by the analyst, but differential privacy is required for all values of these parameters. Correct setting of these parameters will ensure better accuracy of the wrapper. The bottleneck in the running time of our wrappers is the number of calls to $f$, which we refer to as queries. Our goal is to design wrappers with high accuracy and low query complexity. We introduce a novel setting, the automated sensitivity detection setting, where the analyst supplies the black-box function $f$ and the intended (finite) range of $f$. In the previously considered setting, the claimed sensitivity bound setting, the analyst supplies additional parameters that describe the sensitivity of $f$. We design privacy wrappers for both settings and show that our wrappers are nearly optimal in terms of accuracy, locality (i.e., the depth of the local neighborhood of the dataset $x$ they explore), and query complexity. In the claimed sensitivity bound setting, we provide the first accuracy guarantees that have no dependence on the size of the universe $U$.

Authors: Ephraim Linder, Sofya Raskhodnikova, Adam Smith, Thomas Steinke

We provide tools for sharing sensitive data when the data curator doesn't know in advance what questions an (untrusted) analyst might ask about the data. The analyst can specify a program that they want the curator to run on the dataset. We model the program as a black-box function $f$. We study differentially private algorithms, called privacy wrappers, that, given black-box access to a real-valued function $f$ and a sensitive dataset $x$, output an accurate approximation to $f(x)$. The dataset $x$ is modeled as a finite subset of a possibly infinite set $U$, in which each entry represents data of one individual. A privacy wrapper calls $f$ on the dataset $x$ and on some subsets of $x$ and returns either an approximation to $f(x)$ or a nonresponse symbol $\perp$. The wrapper may also use additional information (that is, parameters) provided by the analyst, but differential privacy is required for all values of these parameters. Correct setting of these parameters will ensure better accuracy of the wrapper. The bottleneck in the running time of our wrappers is the number of calls to $f$, which we refer to as queries. Our goal is to design wrappers with high accuracy and low query complexity. We introduce a novel setting, the automated sensitivity detection setting, where the analyst supplies the black-box function $f$ and the intended (finite) range of $f$. In the previously considered setting, the claimed sensitivity bound setting, the analyst supplies additional parameters that describe the sensitivity of $f$. We design privacy wrappers for both settings and show that our wrappers are nearly optimal in terms of accuracy, locality (i.e., the depth of the local neighborhood of the dataset $x$ they explore), and query complexity. In the claimed sensitivity bound setting, we provide the first accuracy guarantees that have no dependence on the size of the universe $U$.

Graph neural networks extrapolate out-of-distribution for shortest paths

from arXiv: Data Structures and Algorithms

Authors: Robert R. Nerem, Samantha Chen, Sanjoy Dasgupta, Yusu Wang

Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We rigorously analyze the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that GNNs, trained to minimize a sparsity-regularized loss over a small set of shortest path instances, exactly implement the Bellman-Ford (BF) algorithm for shortest paths. In fact, if a GNN minimizes this loss within an error of $\epsilon$, it implements the BF algorithm with an error of $O(\epsilon)$. Consequently, despite limited training data, these GNNs are guaranteed to extrapolate to arbitrary shortest-path problems, including instances of any size. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.

Authors: Robert R. Nerem, Samantha Chen, Sanjoy Dasgupta, Yusu Wang

Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We rigorously analyze the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that GNNs, trained to minimize a sparsity-regularized loss over a small set of shortest path instances, exactly implement the Bellman-Ford (BF) algorithm for shortest paths. In fact, if a GNN minimizes this loss within an error of $\epsilon$, it implements the BF algorithm with an error of $O(\epsilon)$. Consequently, despite limited training data, these GNNs are guaranteed to extrapolate to arbitrary shortest-path problems, including instances of any size. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.

Tuesday, March 25

All bets are off

from Ben Recht

All decisions are made under uncertainty. Almost no decisions are gambling.

In the comments of last Thursday’s post, Matt Hoffman replied at length, starting with:

“Every decision-making-under-uncertainty problem, like it or not, is a question of how to wager.”

Matt is not alone in thinking this, but the word “every” makes the statement untrue for me. I’m happy to embrace pluralism again and let us all have our own truths. But outside of the casino, my truth is that no decision-making problems are about gambling. In fact, one of the more pernicious aspects of our running national nightmare is the oligarchy that parasitically leeches money from people they convince to gamble.1

Since the future is unknown, every decision-making problem is made in the face of uncertainty. How we think about that uncertainty varies a lot. And the number of times we can cleanly make a decision-making problem into a gambling problem is… almost never. Indeed, I can’t think of any outside of gambling. Unless you are shackled to a blackjack table, decision making just isn’t about wagering! There is no reason that anyone need conceptualize their life as an endless string of cost-benefit analyses. But lots of people do. I’m not denying that they do. But it’s a problem.

Even investing doesn’t follow the cleanly derived utility maximizing rules of game-theory optimal gambling. Your portfolio manager is not making Kelly bets. As David Rothman pointed out in a comment, Kelly bets can be derived from beautifully simple theory, but no one uses them in practice. Instead, people at best run “fractional Kelly” rules to be even more risk-averse. While you can derive a lot of math explaining why fractional Kelly is “more optimal,” I haven’t seen any math that doesn’t tie itself into knots to justify deviating from Kelly’s criterion.

I got excited about forecast coherence because it motivated probabilistic thinking solely in terms of prediction. There need not be any financial stakes involved. You just have to commit to being scored and tabulated. If your beliefs will be scored—whether they be remunerated or not—convex geometry forces you into making probabilistic forecasts.

But even this derivation comes out of a contrived mathematical game where the assumptions have to line up just right (continuous proper scoring) for you to get a clean story. That’s fine! It’s an elegant way to motivate logical probability. Pedagogically, I should be able to teach Bayesian logical probability without leading with gambling. Forecast coherence is one of many ways to argue for the subadditivity axiom. I personally like forecast coherence more than the derivation through Dutch Books, and also prefer it to the arguments that appeal to ranking plausibility of statements and applying Cox’s theorem. But everyone has their own tastes.

If you think you will be judged on performance, and you need to forecast the plausibility of outcomes on a scale of 0 to 1, then you need to talk in the language of probabilities. But this pattern of mirroring normativity in the idiosyncrasies of computers is an epistemological trap recurrent in the information age:2

  • We collectively decide that we need to score predictions quantitatively.

  • Such quantification necessitates predictions being expressed in terms of probability.

  • We forget why we decided to score things in the first place and tell ourselves that probability is the only way to make predictions.

The ever presence of machines and their numbers convince us that these numbers are inescapable. I’m fine with motivating probabilistic thinking in terms of scoring predictions. This was how Shannon and Wiener motivated probability models as they formulated our modern conceptions of information. But people are not computers.

All of our decisions are made in the face of uncertainty. Almost none are plugged into a Brier score or rewarded with a lottery payout. It’s not gambling when we decide who to date. It’s not gambling to choose to do something with our kids instead of answering emails. It’s not gambling when we care for a sick loved one. These statements are so obvious they sound ridiculous when you say them out loud. And yet there’s a particular mindset shared amongst a very powerful group of people who want us to believe that we can make all our decisions by deferring to game-theoretic machine thinking. It would be funny if it weren’t so terrifying.

Subscribe now

1

If you want to read a more fleshed out version of this argument, check out this review of Nate Silver’s book with Leif Weatherby.

By Ben Recht

A Galois-Theoretic Complexity Measure for Solving Systems of Algebraic Equations

from arXiv: Computational Complexity

Authors: Timothy Duff

Motivated by applications of algebraic geometry, we introduce the Galois width, a quantity characterizing the complexity of solving algebraic equations in a restricted model of computation allowing only field arithmetic and adjoining polynomial roots. We explain why practical heuristics such as monodromy give (at least) lower bounds on this quantity, and discuss problems in geometry, optimization, statistics, and computer vision for which knowledge of the Galois width either leads to improvements over standard solution techniques or rules out this possibility entirely.

Authors: Timothy Duff

Motivated by applications of algebraic geometry, we introduce the Galois width, a quantity characterizing the complexity of solving algebraic equations in a restricted model of computation allowing only field arithmetic and adjoining polynomial roots. We explain why practical heuristics such as monodromy give (at least) lower bounds on this quantity, and discuss problems in geometry, optimization, statistics, and computer vision for which knowledge of the Galois width either leads to improvements over standard solution techniques or rules out this possibility entirely.

Privacy-Preserving Hamming Distance Computation with Property-Preserving Hashing

from arXiv: Computational Complexity

Authors: Dongfang Zhao

We study the problem of approximating Hamming distance in sublinear time under property-preserving hashing (PPH), where only hashed representations of inputs are available. Building on the threshold evaluation framework of Fleischhacker, Larsen, and Simkin (EUROCRYPT 2022), we present a sequence of constructions with progressively improved complexity: a baseline binary search algorithm, a refined variant with constant repetition per query, and a novel hash design that enables constant-time approximation without oracle access. Our results demonstrate that approximate distance recovery is possible under strong cryptographic guarantees, bridging efficiency and security in similarity estimation.

Authors: Dongfang Zhao

We study the problem of approximating Hamming distance in sublinear time under property-preserving hashing (PPH), where only hashed representations of inputs are available. Building on the threshold evaluation framework of Fleischhacker, Larsen, and Simkin (EUROCRYPT 2022), we present a sequence of constructions with progressively improved complexity: a baseline binary search algorithm, a refined variant with constant repetition per query, and a novel hash design that enables constant-time approximation without oracle access. Our results demonstrate that approximate distance recovery is possible under strong cryptographic guarantees, bridging efficiency and security in similarity estimation.

The Power of Recursive Embeddings for $\ell_p$ Metrics

from arXiv: Data Structures and Algorithms

Authors: Robert Krauthgamer, Nir Petruschka, Shay Sapir

Metric embedding is a powerful mathematical tool that is extensively used in mathematics and computer science. We devise a new method of using metric embeddings recursively that turned out to be particularly effective for $\ell_p$ spaces, $p>2$. Our method yields state-of-the-art results for Lipschitz decomposition, nearest neighbor search and embedding into $\ell_2$. In a nutshell, we compose metric embeddings by way of reductions, leading to new reductions that are substantially more effective than the straightforward reduction that employs a single embedding. In fact, we compose reductions recursively, oftentimes using double recursion, which exemplifies this gap.

Authors: Robert Krauthgamer, Nir Petruschka, Shay Sapir

Metric embedding is a powerful mathematical tool that is extensively used in mathematics and computer science. We devise a new method of using metric embeddings recursively that turned out to be particularly effective for $\ell_p$ spaces, $p>2$. Our method yields state-of-the-art results for Lipschitz decomposition, nearest neighbor search and embedding into $\ell_2$. In a nutshell, we compose metric embeddings by way of reductions, leading to new reductions that are substantially more effective than the straightforward reduction that employs a single embedding. In fact, we compose reductions recursively, oftentimes using double recursion, which exemplifies this gap.

$k$-Universality of Regular Languages Revisited

from arXiv: Data Structures and Algorithms

Authors: Duncan Adamson, Pamela Fleischmann, Annika Huch, Tore Koß, Florin Manea

A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \cdots w[i_k]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \vert w \vert$. A word $w$ is \emph{$k$-subsequence universal} over an alphabet $\Sigma$ if every word over $\Sigma$ up to length $k$ appears in $w$ as a subsequence. In this paper, we revisit the problem $k$-ESU of deciding, for a given integer $k$, whether a regular language, given either as nondeterministic finite automaton or as a regular expression, contains a $k$-universal word. [Adamson et al., ISAAC 2023] showed that this problem is NP-hard, even in the case when $k=1$, and an FPT algorithm w.r.t. the size of the input alphabet was given. In this paper, we improve the aforementioned algorithmic result and complete the analysis of this problem w.r.t. other parameters. That is, we propose a more efficient FPT algorithm for $k$-ESU, with respect to the size of the input alphabet, and propose new FPT algorithms for this problem w.r.t.~the number of states of the input automaton and the length of the input regular expression. We also discuss corresponding lower bounds. Our results significantly improve the understanding of this problem.

Authors: Duncan Adamson, Pamela Fleischmann, Annika Huch, Tore Koß, Florin Manea

A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \cdots w[i_k]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \vert w \vert$. A word $w$ is \emph{$k$-subsequence universal} over an alphabet $\Sigma$ if every word over $\Sigma$ up to length $k$ appears in $w$ as a subsequence. In this paper, we revisit the problem $k$-ESU of deciding, for a given integer $k$, whether a regular language, given either as nondeterministic finite automaton or as a regular expression, contains a $k$-universal word. [Adamson et al., ISAAC 2023] showed that this problem is NP-hard, even in the case when $k=1$, and an FPT algorithm w.r.t. the size of the input alphabet was given. In this paper, we improve the aforementioned algorithmic result and complete the analysis of this problem w.r.t. other parameters. That is, we propose a more efficient FPT algorithm for $k$-ESU, with respect to the size of the input alphabet, and propose new FPT algorithms for this problem w.r.t.~the number of states of the input automaton and the length of the input regular expression. We also discuss corresponding lower bounds. Our results significantly improve the understanding of this problem.

Õptimal Fault-Tolerant Labeling for Reachability and Approximate Distances in Directed Planar Graphs

from arXiv: Data Structures and Algorithms

Authors: Itai Boneh, Shiri Chechik, Shay Golan, Shay Mozes, Oren Weimann

We present a labeling scheme that assigns labels of size $\tilde O(1)$ to the vertices of a directed weighted planar graph $G$, such that for any fixed $\varepsilon>0$ from the labels of any three vertices $s$, $t$ and $f$ one can determine in $\tilde O(1)$ time a $(1+\varepsilon)$-approximation of the $s$-to-$t$ distance in the graph $G\setminus\{f\}$. For approximate distance queries, prior to our work, no efficient solution existed, not even in the centralized oracle setting. Even for the easier case of reachability, $\tilde O(1)$ queries were known only with a centralized oracle of size $\tilde O(n)$ [SODA 21].

Authors: Itai Boneh, Shiri Chechik, Shay Golan, Shay Mozes, Oren Weimann

We present a labeling scheme that assigns labels of size $\tilde O(1)$ to the vertices of a directed weighted planar graph $G$, such that for any fixed $\varepsilon>0$ from the labels of any three vertices $s$, $t$ and $f$ one can determine in $\tilde O(1)$ time a $(1+\varepsilon)$-approximation of the $s$-to-$t$ distance in the graph $G\setminus\{f\}$. For approximate distance queries, prior to our work, no efficient solution existed, not even in the centralized oracle setting. Even for the easier case of reachability, $\tilde O(1)$ queries were known only with a centralized oracle of size $\tilde O(n)$ [SODA 21].

A Graph-based Approach to Variant Extraction

from arXiv: Data Structures and Algorithms

Authors: Mark A. Santcroos, Walter A. Kosters, Mihai Lefter, Jeroen F. J. Laros, Jonathan K. Vis

Accurate variant descriptions are of paramount importance in the field of genetics. The domain is confronted with increasingly complex variants, making it more challenging to generate proper variant descriptions. We present a graph based on all minimal alignments that is a complete representation of a variant and we provide three complementary extraction methods to derive variant descriptions from this graph. Our experiments show that our method in comparison with dbSNP results in identical HGVS descriptions for simple variants and more meaningful descriptions for complex variants.

Authors: Mark A. Santcroos, Walter A. Kosters, Mihai Lefter, Jeroen F. J. Laros, Jonathan K. Vis

Accurate variant descriptions are of paramount importance in the field of genetics. The domain is confronted with increasingly complex variants, making it more challenging to generate proper variant descriptions. We present a graph based on all minimal alignments that is a complete representation of a variant and we provide three complementary extraction methods to derive variant descriptions from this graph. Our experiments show that our method in comparison with dbSNP results in identical HGVS descriptions for simple variants and more meaningful descriptions for complex variants.

Faster Construction of a Planar Distance Oracle with Õ(1) Query Time

from arXiv: Data Structures and Algorithms

Authors: Itai Boneh, Shay Golan, Shay Mozes, Daniel Prigan, Oren Weimann

We show how to preprocess a weighted undirected $n$-vertex planar graph in $\tilde O(n^{4/3})$ time, such that the distance between any pair of vertices can then be reported in $\tilde O(1)$ time. This improves the previous $\tilde O(n^{3/2})$ preprocessing time [JACM'23]. Our main technical contribution is a near optimal construction of \emph{additively weighted Voronoi diagrams} in undirected planar graphs. Namely, given a planar graph $G$ and a face $f$, we show that one can preprocess $G$ in $\tilde O(n)$ time such that given any weight assignment to the vertices of $f$ one can construct the additively weighted Voronoi diagram of $f$ in near optimal $\tilde O(|f|)$ time. This improves the $\tilde O(\sqrt{n |f|})$ construction time of [JACM'23].

Authors: Itai Boneh, Shay Golan, Shay Mozes, Daniel Prigan, Oren Weimann

We show how to preprocess a weighted undirected $n$-vertex planar graph in $\tilde O(n^{4/3})$ time, such that the distance between any pair of vertices can then be reported in $\tilde O(1)$ time. This improves the previous $\tilde O(n^{3/2})$ preprocessing time [JACM'23]. Our main technical contribution is a near optimal construction of \emph{additively weighted Voronoi diagrams} in undirected planar graphs. Namely, given a planar graph $G$ and a face $f$, we show that one can preprocess $G$ in $\tilde O(n)$ time such that given any weight assignment to the vertices of $f$ one can construct the additively weighted Voronoi diagram of $f$ in near optimal $\tilde O(|f|)$ time. This improves the $\tilde O(\sqrt{n |f|})$ construction time of [JACM'23].