Master Class: How much computation is required in order to achieve statistical efficiency?
|Affiliation||Microsoft Research, New England|
|Date||Wednesday, 05 Nov 2014|
|Time||11:30 - 12:30|
|Location||1.03, Malet Place Engineering Building|
|Event series||Master Class: Sham Kakade (3-5 November 2014)|
There has been much recent interest and need for scalable optimization algorithms (where it is often costly to even pass through our dataset). Recent progress has provided optimization algorithms which have extremely fast convergence rates (e.g. linear convergence, where the error drops exponentially quickly in the number of iterations). In contrast, in practice, stochastic gradient descent is often used (and preferred) due to the relative simplicity of implementation, even though the convergence rate is O(1/t) or O(1/sqrt(t)) in t iterations (as opposed to dropping exponentially quickly in t).