Raymond Namyst
Heterogeneous accelerator-based parallel machines, featuring manycore CPUs and with GPU accelerators provide an unprecedented amount of processing power per node. Dealing with such a large number of heterogeneous processing units -- providing a highly unbalanced computing power -- is one of the biggest challenge that developpers of HPC applications have to face. To Fully tap into the potential of these heterogeneous machines, pure offloading approaches, that consist in running an application on regular processors while offloading part of the code on accelerators, are not sufficient.
In this talk, I will go through the major programming environments that were specifically designed to harness heterogeneous architectures, focusing on runtime systems. I will discuss some of the most critical issues programmers have to consider to achieve portability of performance, and I will show how advanced runtime techniques can speed up applications in the domain of dense linear algebra.
Eventually, I will give some insights about the main challenges designers of programming environments will have to face in upcoming years.