: :
ABSTRACT
Managing Data-Intensive Scientific Workflows in Distributed Environments
Ewa Deelman
Information Sciences Institute
University of Southern California, USA
deelman@isi.edu
Abstract:
In this talk we examine the issue of optimizing disk usage and of
scheduling large-scale scientific workflows onto distributed resources
where the workflows are data-intensive, requiring large amounts of data
storage, and where the resources have limited storage resources. Our
approach is two-fold: we minimize the amount of space a workflow
requires during execution by removing data files at runtime when they
are no longer needed and we demonstrate that some workflows may need to
be restructured in order to significantly reduce the data footprint of
the workflow. We describe the results of our data management and
workflow restructuring solutions using a Laser Interferometer
Gravitational-Wave Observatory (LIGO) application-the binary inspiral
analysis, and an astronomy application, Montage, running on the Open
Science Grid. We also examine the cost of the restructuring in terms of
the application's runtime.
>> presentation .pdf <<
|