Fred G. Gustavson, IBM T.J. Watson Research Center, Emeritus 2 Umea University
Over the past six years almost all computer manufacturers have dramatically changed their computer architectures to become Mul- ticore (MC) processors. We briefly describe Cache Blocking as it relates to computer architectures since about 1985 by covering the where, when, how and why of Cache Blocking as it relates to dense linear algebra. We also briefly present new algorithms for Blocked In-Place Rectangular Transposition of an M by N matrix A. We emphasize the importance of Rectangular Block (RB) format and also describe how and why effi- cient algorithms are possible between RB format and standard column and row major formats of 2-D arrays in the Fortran and C languages. From a practical point of view, this work is very important as it will allow existing codes using LAPACK and ScaLAPACK to remain usable by new versions of LAPACK and ScaLAPACK which are currently being developed.