UCL ELLIS

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Speaker	Aleksandar Botev
Affiliation	Google DeepMind
Date	Friday, 22 March 2024
Time	12:00-13:00
Location	Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH
Link	https://ucl.zoom.us/j/97245943682
Event series	DeepMind/ELLIS CSML Seminar Series
Abstract	In the last few years transformers have become the default architecture for sequential modelling tasks like language modelling. However, a new family of models - state space models - are trying to challenge the status quo. In this talk we will investigate the recent progress on these class models and provide context and different perspectives on them from both theoretical and practical point of view. We will argue that not only the choice of recurrent layer matters, but rather than whole block design and architecture plays a huge role in their success. With this we will present Griffin - a hybrid of Recurrent Gated Linear Recurrent Unit and Local Attention that achieves state of the art performance similar to Transformers, but is significantly faster at inference, both in latency and throughput. We will also show that these models can leverage much longer contexts than being trained on and will discuss interesting implications of this.
Biography	Alex is a Research Scientist in Machine Learning at Google DeepMind. He has worked on generative models, second order optimization, Bayesian methods, ML applied to physics and now is finally dabbling into LLMs. Previously he studied in UCL under the supervision of David Barber for his PhD.