NVIDIA Clara Parabricks on the AWS Cloud
Quick Start deployment guide


May 2022
Gary Burnett, NVIDIA Corporation
Olivia Choudhury, PhD, AWS
Vinod Shukla and Troy Ameigh, AWS Integration and Automation team
See the GitHub repository to view source files, report bugs, submit feature ideas, and post feedback about this Quick Start. To comment on the documentation, refer to Feedback. |
This Quick Start was created by NVIDIA Corporation in collaboration with Amazon Web Services (AWS). Quick Starts are automated reference deployments that help people deploy popular technologies on AWS according to AWS best practices.
Overview
This Quick Start deploys Parabricks, an accelerated genomics analysis framework, on the AWS Cloud. It helps researchers, clinical teams, and medical centers adopt Parabricks and run multiple standalone tools and pipelines for secondary and tertiary analysis of large-scale genomic data. You can analyze data for cancer sequencing projects, population studies, ribonucleic acid sequencing (RNA-seq), and more.
The Quick Start builds an AWS environment that spans two Availability Zones for high availability, and provisions an AWS Batch compute environment for on-demand instances. The Quick Start environment includes Parabricks G4dn instances with graphics processing units (GPUs) for hardware acceleration.
If you’re unfamiliar with AWS Quick Starts, refer to the AWS Quick Start General Information Guide.
Costs and licenses
There are no additional licenses required to use this Quick Start.
There is no cost to use this Quick Start, but you will be billed for any AWS services or resources that this Quick Start deploys. For more information, refer to the AWS Quick Starts General Information Guide.
Architecture
Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters builds the following Parabricks environment in the AWS Cloud.

As shown in Figure 1, the Quick Start sets up the following:
-
A highly available architecture that spans two Availability Zones.*
-
A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.*
-
In the public subnets:
-
Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.*
-
A Linux bastion host in an Auto Scaling group to allow inbound Secure Shell (SSH) access to Amazon Elastic Compute Cloud (Amazon EC2) instances in public and private subnets.*
-
-
In the private subnets:
-
Parabricks deployed to Amazon EC2 instances.
-
An AWS Batch compute environment, job queue, and job definition for Parabricks instances.
-
-
An Amazon Elastic Container Registry (Amazon ECR) wrapper container for Parabricks.
* The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.
Deployment options
This Quick Start provides two deployment options:
-
Deploy Parabricks into a new VPC. This option builds a new AWS environment consisting of the VPC, subnets, NAT gateways, security groups, bastion hosts, and other infrastructure components. It then deploys Parabricks into this new VPC.
-
Deploy Parabricks into an existing VPC. This option provisions Parabricks in your existing AWS infrastructure.
The Quick Start provides separate templates for these options. It also lets you configure Classless Inter-Domain Routing (CIDR) blocks, instance types, and Parabricks settings.
Predeployment steps
Perform the following steps to create a version of the Parabricks container that works with AWS Batch and then host it on Amazon ECR.
-
Create an Amazon ECR repository on AWS to hold the Docker image on AWS. Make note of the Uniform Resource Identifier (URI) of the repository.
-
Install Parabricks on a machine that has access to sudo.
-
Request a Parabricks trial license.
-
Using the installer tarball, run:
sudo parabricks/installer.py
-
To verify the installation, run:
pbrun –help
-
Modify the Docker image so that it’s compatible with AWS Batch. Specifically, create the Docker file that builds on top of the Docker image that the installer just created. Don’t forget to add your Parabricks version number in line 1.
-
To view the current Docker file, run:
$ cat Dockerfile
-
To edit and verify your Docker file, run:
$ vi Dockerfile
# Add your Parabricks version number ARG version=”<INSERT VERSION NUMBER HERE Ex. 3.6.1-1>” FROM parabricks/release:$version # Untar this folder to access the pbrun executable RUN cd /parabricks && tar xzvf release-$version.tar.gz # Add the pbrun executable to the path ENV PATH="/parabricks/release-$version:${PATH}" # Remove the entry point from the container to work with AWS Batch ENTRYPOINT [""]
-
To build the Docker image and tag it with the URI for the ECR repository, run:
$ docker build -t <URI for your ECR repository> .
-
To upload the image to Amazon ECR, run:
aws ecr get-login-password --region us-west-2 \| docker login --username AWS --password-stdin <URI for your ECR repository>
Deployment steps
-
Sign in to your AWS account, and launch this Quick Start, as described under Deployment options. The AWS CloudFormation console opens with a prepopulated template. Deployment takes about 15 minutes to complete.
-
Ensure that you set the correct AWS Region, and choose Next.
-
On the Create stack page, keep the default setting for the template URL, and then choose Next.
-
On the Specify stack details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings and customize them as necessary. When you finish reviewing and customizing the parameters, choose Next.
Unless you are customizing the Quick Start templates for your own projects, don’t change the default settings for the following Amazon Simple Storage Service (Amazon S3) parameters: Quick Start S3 bucket name, Quick Start S3 bucket Region, and Quick Start S3 key prefix. Changing these settings automatically updates code references to point to a new Quick Start location. For more information, refer to the AWS Quick Start Contributor’s Guide. -
On the Configure stack options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you finish, choose Next.
-
On the Review page, review and confirm the template settings. Under Capabilities, select the two check boxes to acknowledge that the template creates IAM resources that might require the ability to automatically expand macros.
-
Choose Create stack to deploy the stack.
-
Monitor the stack’s status, and when the status is CREATE_COMPLETE, the NVIDIA Clara Parabricks deployment is ready.
-
To view the created resources, choose the Outputs tab.
Troubleshooting
For troubleshooting common Quick Start issues, refer to the AWS Quick Start General Information Guide or the Troubleshooting CloudFormation page in the AWS documentation.
After you successfully deploy a Quick Start, confirm that your resources and services are updated and configured—including any required patches—to meet your security and other needs. For more information, refer to the Shared Responsibility Model.
Feedback
To submit feature ideas and report bugs, use the Issues section of the GitHub repository for this Quick Start. To submit code, refer to the Quick Start Contributor’s Guide. For all other feedback, use the following GitHub links:
Notices
This document is provided for informational purposes only. It represents current AWS product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS products or services, each of which is provided “as is” without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.
The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "as is" basis, without warranties or conditions of any kind, either expressed or implied. See the License for specific language governing permissions and limitations.