## Mathematics, Department of

## Date of this Version

7-22-2021

## Citation

(2023) SIAM Journal on Computing, 52 (3), pp. 761-793. DOI: 10.1137/22m1489678

## Abstract

We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans. Inform. Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution *P* on Σ^{n} and a tree *T* on *n* nodes, we say *T* is an *ε*-approximate tree for *P* if there is a *T*-structured distribution *Q* such that *D*(*P* || *Q*) is at most *ε* more than the best possible tree-structured distribution for *P*. We show that if *P* itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with eO (|Σ|^{3}*nε*^{−1}) i.i.d. samples outputs an ε-approximate tree for P with constant probability. In contrast, for a general *P* (which may not be tree-structured), Ω(*n*^{2}*ε*^{−2}) samples are necessary to find an *ε*-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart (STOC, 2018): we prove that for three random variables *X*, *Y*, *Z* each over Σ, testing if *I*(*X*; *Y* | *Z*) is 0 or ≥ *ε* is possible with *Õ*(|Σ|^{3}/*ε*) samples. Finally, we show that for a specific tree *T* , with *Õ*(|Σ|^{2}*nε*^{−1}) samples from a distribution *P* over Σ^{n}, one can efficiently learn the closest *T* -structured distribution in KL divergence by applying the add-1 estimator at each node.

## Comments

Used by permission.