View publication

As statistical analyses become more central to science, industry and society, there is a growing need to ensure correctness of their results. Approximate correctness can be verified by replicating the entire analysis, but can we verify without replication? Building on a recent line of work, we study proof-systems that allow a probabilistic verifier to ascertain that the results of an analysis are approximately correct, while drawing fewer samples and using less computational resources than would be needed to replicate the analysis. We focus on distribution testing problems: verifying that an unknown distribution is close to having a claimed property.

Our main contribution is a interactive protocol between a verifier and an untrusted prover, which can be used to verify any distribution property that can be decided in polynomial time given a full and explicit description of the distribution. If the distribution is at statistical distance ε from having the property, then the verifier rejects with high probability. This soundness property holds against any polynomial-time strategy that a cheating prover might follow, assuming the existence of collision-resistant hash functions (a standard assumption in cryptography). For distributions over a domain of size N, the protocol consists of 4 messages and the communication complexity and verifier runtime are roughly O~(N/ε2)Õ(\sqrt N / ε^2). The verifier's sample complexity is O~(N/ε2)Õ(\sqrt N / ε^2), and this is optimal up to polylog(N)polylog(N) factors (for any protocol, regardless of its communication complexity). Even for simple properties, approximately deciding whether an unknown distribution has the property can require quasi-linear sample complexity and running time. For any such property, our protocol provides a quadratic speedup over replicating the analysis.

† Weizmann Institute

Related readings and updates.

Despite increasing awareness of the need to support accessibility in mobile apps, many still lack support for key accessibility features. Developers and quality assurance testers often rely on manual testing to test accessibility features throughout the product lifecyle. However, manual testing can be tedious, often has an overwhelming scope, and test passes can be difficult to time amongst other development milestones. Recently, Large Language…
Read more
Stochastic convex optimization over an ℓ1ℓ_1ℓ1​-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy. We show that, up to logarithmic factors the optimal excess population loss of any (ε,δ)(\varepsilon, \delta)(ε,δ)-differentially private optimizer is log⁡(d)/n  +\sqrt{\log(d)/n}\; +log(d)/n​+ d/εn.\sqrt{d}/\varepsilon n.d​/εn. The upper bound is based on…
Read more