NVIDIA Clara Parabricks on the AWS Cloud

Quick Start deployment guide

QS

May 2022
Gary Burnett, NVIDIA Corporation
Olivia Choudhury, PhD, AWS
Vinod Shukla and Troy Ameigh, AWS Integration and Automation team

See the GitHub repository to view source files, report bugs, submit feature ideas, and post feedback about this Quick Start. To comment on the documentation, refer to Feedback.

This Quick Start was created by NVIDIA Corporation in collaboration with Amazon Web Services (AWS). Quick Starts are automated reference deployments that help people deploy popular technologies on AWS according to AWS best practices.

Overview

This Quick Start deploys Parabricks, an accelerated genomics analysis framework, on the AWS Cloud. It helps researchers, clinical teams, and medical centers adopt Parabricks and run multiple standalone tools and pipelines for secondary and tertiary analysis of large-scale genomic data. You can analyze data for cancer sequencing projects, population studies, ribonucleic acid sequencing (RNA-seq), and more.

The Quick Start builds an AWS environment that spans two Availability Zones for high availability, and provisions an AWS Batch compute environment for on-demand instances. The Quick Start environment includes Parabricks G4dn instances with graphics processing units (GPUs) for hardware acceleration.

If you’re unfamiliar with AWS Quick Starts, refer to the AWS Quick Start General Information Guide.

Costs and licenses

There are no additional licenses required to use this Quick Start.

There is no cost to use this Quick Start, but you will be billed for any AWS services or resources that this Quick Start deploys. For more information, refer to the AWS Quick Starts General Information Guide.

Architecture

Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters builds the following Parabricks environment in the AWS Cloud.

Architecture
Figure 1. Quick Start architecture for Parabricks on AWS

As shown in Figure 1, the Quick Start sets up the following:

  • A highly available architecture that spans two Availability Zones.*

  • A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.*

  • In the public subnets:

    • Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.*

    • A Linux bastion host in an Auto Scaling group to allow inbound Secure Shell (SSH) access to Amazon Elastic Compute Cloud (Amazon EC2) instances in public and private subnets.*

  • In the private subnets:

    • Parabricks deployed to Amazon EC2 instances.

    • An AWS Batch compute environment, job queue, and job definition for Parabricks instances.

  • An Amazon Elastic Container Registry (Amazon ECR) wrapper container for Parabricks.

* The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.

Deployment options

This Quick Start provides two deployment options:

The Quick Start provides separate templates for these options. It also lets you configure Classless Inter-Domain Routing (CIDR) blocks, instance types, and Parabricks settings.

Predeployment steps

Perform the following steps to create a version of the Parabricks container that works with AWS Batch and then host it on Amazon ECR.

  1. Create an Amazon ECR repository on AWS to hold the Docker image on AWS. Make note of the Uniform Resource Identifier (URI) of the repository.

  2. Install Parabricks on a machine that has access to sudo.

  3. Request a Parabricks trial license.

  4. Using the installer tarball, run:

sudo parabricks/installer.py
  1. To verify the installation, run:

pbrun –help
  1. Modify the Docker image so that it’s compatible with AWS Batch. Specifically, create the Docker file that builds on top of the Docker image that the installer just created. Don’t forget to add your Parabricks version number in line 1.

  1. To view the current Docker file, run:

$ cat Dockerfile
  1. To edit and verify your Docker file, run:

$ vi Dockerfile
# Add your Parabricks version number
ARG version=”<INSERT VERSION NUMBER HERE Ex. 3.6.1-1>”
FROM parabricks/release:$version

# Untar this folder to access the pbrun executable
RUN cd /parabricks && tar xzvf release-$version.tar.gz

# Add the pbrun executable to the path
ENV PATH="/parabricks/release-$version:${PATH}"

# Remove the entry point from the container to work with AWS Batch
ENTRYPOINT [""]
  1. To build the Docker image and tag it with the URI for the ECR repository, run:

$ docker build -t <URI for your ECR repository> .
  1. To upload the image to Amazon ECR, run:

aws ecr get-login-password --region us-west-2 \| docker login --username AWS --password-stdin <URI for your ECR repository>

Deployment steps

  1. Sign in to your AWS account, and launch this Quick Start, as described under Deployment options. The AWS CloudFormation console opens with a prepopulated template. Deployment takes about 15 minutes to complete.

  2. Ensure that you set the correct AWS Region, and choose Next.

  3. On the Create stack page, keep the default setting for the template URL, and then choose Next.

  4. On the Specify stack details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings and customize them as necessary. When you finish reviewing and customizing the parameters, choose Next.

    Unless you are customizing the Quick Start templates for your own projects, don’t change the default settings for the following Amazon Simple Storage Service (Amazon S3) parameters: Quick Start S3 bucket name, Quick Start S3 bucket Region, and Quick Start S3 key prefix. Changing these settings automatically updates code references to point to a new Quick Start location. For more information, refer to the AWS Quick Start Contributor’s Guide.
  5. On the Configure stack options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you finish, choose Next.

  6. On the Review page, review and confirm the template settings. Under Capabilities, select the two check boxes to acknowledge that the template creates IAM resources that might require the ability to automatically expand macros.

  7. Choose Create stack to deploy the stack.

  8. Monitor the stack’s status, and when the status is CREATE_COMPLETE, the NVIDIA Clara Parabricks deployment is ready.

  9. To view the created resources, choose the Outputs tab.

Troubleshooting

For troubleshooting common Quick Start issues, refer to the AWS Quick Start General Information Guide or the Troubleshooting CloudFormation page in the AWS documentation.

After you successfully deploy a Quick Start, confirm that your resources and services are updated and configured—including any required patches—to meet your security and other needs. For more information, refer to the Shared Responsibility Model.

Feedback

To submit feature ideas and report bugs, use the Issues section of the GitHub repository for this Quick Start. To submit code, refer to the Quick Start Contributor’s Guide. For all other feedback, use the following GitHub links:

Notices

This document is provided for informational purposes only. It represents current AWS product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS products or services, each of which is provided “as is” without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "as is" basis, without warranties or conditions of any kind, either expressed or implied. See the License for specific language governing permissions and limitations.