: : ABSTRACT

Malleability, Migration and Replication for Adaptive Distributed Computing over Dynamic Environments

Boleslaw Szymanski with T. Desell, K. El Maghraoui, and C. Varela.

Rensselaer Polytechnic Institute, Troy, NY, USA

szymansk@cs.rpi.edu

Abstract:

Modern parallel execution environments, from grids to volunteered computing resources (e.g., using BOINC), dynamically change over time. Hence, long-lasting parallel computations should adapt autonomously to the available resources. The key to such adaptability is the systemÕs ability to support malleability, migration and fault tolerance of application processes. This talk defines these concepts and discusses how they are implemented in the Internet Operating System (IOS), a middleware system that supports dynamic parallel execution. We will also discuss ramifications of different features of the programming languages required for middleware and application implementation.

Malleability enables a parallel execution system to split or merge processes of a parallel application to modify its granularity. Current support for process migration is limited by the granularity of the application's processes and malleability removes this limitation. We have implemented malleability as an extension to the PCM (Process Checkpointing and Migration) library, a user-level library for iterative MPI applications. PCM is integrated with IOS, a framework for middleware-driven dynamic application reconfiguration written in the SALSA actor programming language. Our approach requires minimal code modifications and enables transparent middleware triggered reconfiguration. The talk will present experimental results that demonstrate the usefulness of malleability and will also outline our future work in this area.

>> presentation .ppt <<