Computing with Highly Heterogeneous, Volatile and Potentially Malicious Hosts: An Asynchronous Phylogenic Gibbs Sampler for DNA@Home

Computing with Highly Heterogeneous, Volatile and Potentially Malicious Hosts: An Asynchronous Phylogenic Gibbs Sampler for DNA@Home

Boleslaw Szymanski, Claire and Roland Schmitt Distinguished Professor of Computer Science, Rensselaer Polytechnic Institute, Troy, NY

This talk describes a phylogenic Gibbs sampler designed for the BOINC volunteer computing platform, which consists of highly heterogeneous, volatile and potentially malicious computing hosts. Gibbs sampling is a type of Markov-chain Monte-Carlo algorithm, requiring a randomized {\it burn-in} walk of uncertain length before samples can be taken from a proceeding randomized sampling walk. The approach uses a centralized database to store verified positions of random walks at various checkpoints, as they are computed by volunteered BOINC clients. The goal of this work is to optimize the number of walks which can be used for sampling after they have completed their burn-in.  This approach is also dynamically scalable to a large number of hosts provided by a volunteer computing environment (DNA@Home currently has over 2,000 volunteered hosts and some projects have millions).  The paper compares the results of the asynchronous phylogenic Gibbs sampler to a traditional sequential Gibbs sampler, for varying burn-in lengths and numbers of hosts.

Work done with Travis Desell, Lee Newberg and Malik Magdon-Ismail, RPI