Seminar: Two generic principles in modern bandits: the optimistic principle and Thompson sampling
Speaker | Remi Munos |
---|---|
Affiliation | INRIA Lille |
Date | Friday, 12 Sep 2014 |
Time | 13:00 - 14:00 |
Location | Roberts G08 (Sir David Davies LT) |
Event series | DeepMind CSML Seminar Series |
Description |
Abstract: I will describe two principles considered in multi-armed bandits, namely the optimistic principle and Thompson sampling, and illustrate how they extend to structured bandit settings, such as in linear bandits and bandits in graphs. Bio: Remi Munos received his PhD in 1997 in Cognitive Science from EHESS, France, and did a postdoc at CMU from 1998-2000. Then he was Assistant Professor in the department of Applied Mathematics at Ecole Polytechnique. In 2006 he joined the French public research institute INRIA as a Senior Researcher and co-leaded the project-team SequeL (Sequential Learning) which now gather approximately 25 people. His research interests cover several fields of Statistical Learning including Reinforcement Learning, Optimization, and Bandit Theory. Slides for the talk: PDF |
iCalendar | csml_id_189.ics |