Cloud-based parallel suffix array construction based on MPI
Massive amount of genomics data are being produced nowadays by Next Generation Sequencing machines. The suffix array is currently the best choice for indexing genomics data, because of its efficiency and large number of applications. In this paper, we address the problem of constructing the suffix array on computer cluster in the cloud. We present a solution that automates the establishment of a computer cluster in a cloud and automatically constructs the suffix array in a distributed fashion over the cluster nodes. This has the advantage of encapsulating all set-up details and execution of the algorithm. The distributed nature of the algorithm we use overcomes the problem that arises when the user wishes, due to cost issues, to use low memory machines in the cloud. Our experiments show that our implementation scales well with the increasing number of processors. The cloud cost is affordable and it provides a cost effective solution. © 2014 IEEE.