Towards scalable and cost-aware bioinformatics workflow execution in the cloud - Recent advances to the tavaxy workflow system
Cloud-based scientific workflow systems can play an important role in the development of cost effective bioinformatics analysis applications. So far, most efforts for supporting cloud computing in such workflow systems have focused on simply porting them to the cloud environment. The next due steps are to optimize these systems to exploit the advantages of the cloud computing model, basically in terms of managing resource elasticity and the associated business model. In this paper, we introduce new advancements in designing scalable and cost-effective workflows in the cloud using the Tavaxy workflow system, focusing on genome analysis applications. We provide an overview of the system and describe its key cloud features including the configuration and execution of complete workflows and/or specific sub-workflows in the cloud. Taking real world examples, we demonstrate the key elasticity management features of the system. These features are designed to support two common scenarios: (1) minimizing workflow execution time under budget constraints and (2) minimizing budget spend under workflow deadline constraints. We evaluate the effectiveness of our approach by conducting experiments on the Amazon EC2 cloud with dynamic pricing and variable heterogeneous resource allocation.