Richard W. Vuduc
Given an algorithm, what processor and memory architecture will deliver the best performance and power/energy efficiency? Conversely, given an architecture, what class of algorithms will (or will not) run efficiently? We are developing a novel analytical framework to answer these kinds of co-design questions, based on the concept of balance principles. A balance principle is a theoretical constraint equation that explicitly relates algorithm parameters to hardware parameters according to some figure of merit, such as speed, power, or cost. This notion originates in early work by Kung (1986) and others; however, we reinterpret the classical notions of balance in a modern context of parallel and I/O-efficient algorithm design as well as trends in emerging architectures. From such a principle, we argue that one can gain insight into how to design and (auto)tune algorithms and hardware. We use this principle to make quantitative predictions about future supercomputer systems as we march toward exascale machines. These predictions include when matrix multiply might become memory-bound; for what algorithms stacking processors and memory will be beneficial, and for which it will not; and whether supercomputers based on embedded CPU-like processors or those based on GPU-like processors will be better and why. Our overall aim is to suggest how to one might co-design rigorously and quantitatively while still yielding intuition and insight.