Seminar: A hybrid Cholesky decomposition algorithm for multicore CPUs with GPU accelerators

SpeakerGary Macindoe
DateFriday, 08 Feb 2013
Time12:30 - 14:00
LocationCruciform B404 - LT2
Event seriesDeepMind CSML Seminar Series

Use of the Cholesky decomposition appears throughout the field of computational statistics and is often the performance bottleneck of such algorithms. As the number of cores available in a processor increases algorithms need to be redesigned to extract performance by running operations in parallel rather than relying on an increase in clock speeds. In addition, graphics processing units are capable of executing tens of thousands of operations in parallel and are no longer restricted to graphical calculations.

We have developed a Cholesky decomposition algorithm for multi-core CPUs and GPUs. We introduce a new method of copying submatrices and use it to have the GPU and CPU calculate the matrix in parallel. We add a new level of dynamic blocking that matches the workload to the compute device at each iteration and also exploit the differences between SIMD and SIMT programming to have multiple functions execute simultaneously on older classes of GPU that do not have this capability built into the hardware.

Our methods are generally applicable to blocked algorithms for linear algebra such as those in the LAPACK library.

Slides for the talk: PDF

iCalendar csml_id_76.ics