Description 
Abstract: Statistical models with hidden or latent variables are of great importance in natural language processing, speech, and many other fields. The EM algorithm is a remarkably successful method for parameter estimation within these models: it is simple, it is often relatively efficient, and it has well understood formal properties. It does, however, have a major limitation: it has no guarantee of finding the global optimum of the likelihood function. From a theoretical perspective, this means that the EM algorithm is not guaranteed to give consistent parameter estimates. From a practical perspective, problems with local optima can be difficult to deal with.
In this talk I'll describe a spectral algorithm for learning of latentvariable probabilistic contextfree grammars (LPCFGs). LPCFGs have been shown to be highly effective models for natural language parsing (Petrov et al. 2006; Matuszaki et al., 2005). Under a separation (singular value) condition, our algorithm provides consistent parameter estimates; this is in contrast with previous work, which has used the EM algorithm for parameter estimation, with the usual problems of local optima.
In experiments on parsing using LPCFGs, we show that the method is as accurate as models estimated using EM, while being an order of magnitude or two more efficient (it requires a singular value decomposition followed by a single pass over the data, in contrast to multiple iterations over the data required by EM).
While the focus of this talk is on models for natural language parsing, the techniques I will describe should be relevant to many other latentvariable models used in machine learning and natural language processing. The method requires a significant generalization on prior work on spectral learning algorithms, in particular the seminal work of Hsu et al (2009) on provably consistent learning of HMMs.
This is joint work with Shay Cohen, Karl Stratos, Dean Foster, and Lyle Ungar.
Bio: Michael Collins is the Vikram S. Pandit Professor of computer science at Columbia University. He completed a PhD in computer science from the University of Pennsylvania in December 1998. From January 1999 to November 2002 he was a researcher at AT&T LabsResearch, and from January 2003 until December 2010 he was an assistant/associate professor at MIT. His research is focused on topics including statistical parsing, structured prediction problems in machine learning, and applications including machine translation, dialog systems, and speech recognition.
