Seminar: Two generic principles in modern bandits: the optimistic principle and Thompson sampling

SpeakerRemi Munos
AffiliationINRIA Lille
DateFriday, 12 Sep 2014
Time13:00 - 14:00
LocationRoberts G08 (Sir David Davies LT)
Event seriesDeepMind CSML Seminar Series

Abstract: I will describe two principles considered in multi-armed bandits, namely the optimistic principle and Thompson sampling, and illustrate how they extend to structured bandit settings, such as in linear bandits and bandits in graphs.

Bio: Remi Munos received his PhD in 1997 in Cognitive Science from EHESS, France, and did a postdoc at CMU from 1998-2000. Then he was Assistant Professor in the department of Applied Mathematics at Ecole Polytechnique. In 2006 he joined the French public research institute INRIA as a Senior Researcher and co-leaded the project-team SequeL (Sequential Learning) which now gather approximately 25 people. His research interests cover several fields of Statistical Learning including Reinforcement Learning, Optimization, and Bandit Theory.

Slides for the talk: PDF

iCalendar csml_id_189.ics