Master Class: How much computation is required in order to achieve statistical efficiency?

SpeakerSham Kakade
AffiliationMicrosoft Research, New England
DateWednesday, 05 Nov 2014
Time11:30 - 12:30
Location1.03, Malet Place Engineering Building
Event seriesMaster Class: Sham Kakade (3-5 November 2014)

There has been much recent interest and need for scalable optimization algorithms (where it is often costly to even pass through our dataset). Recent progress has provided optimization algorithms which have extremely fast convergence rates (e.g. linear convergence, where the error drops exponentially quickly in the number of iterations). In contrast, in practice, stochastic gradient descent is often used (and preferred) due to the relative simplicity of implementation, even though the convergence rate is O(1/t) or O(1/sqrt(t)) in t iterations (as opposed to dropping exponentially quickly in t).

Furthermore, in practice, we are often optimizing on a noisy dataset, where we care about our future predictions, i.e. generalization. Here, the need for obtaining such high precision (i.e. numerically precise) estimates on our training dataset is less clear when our goal is generalization to new data. A natural question is how much computation do we need in order to achieve a statistically efficient estimate (for generalization purposes)? And can we provide an algorithm which achieves the statistical limit in a manner that is scalable (say in a manner similar to how stochastic gradient descent is implemented)? This final lecture provides positive results toward addressing this question.

iCalendar csml_id_207.ics