NVIDIA Cheminformatics on the AWS Cloud

Quick Start Deployment Guide

QS

April 2022
John Smith, NVIDIA Corporation
Doruk Ozturk, AWS World Wide Specialist Organization (WWSO)
Troy Ameigh, AWS Integration and Automation team

Refer to the GitHub repository to view source files, report bugs, submit feature ideas, and post feedback about this Quick Start. To comment on the documentation, refer to Feedback.

This Quick Start was created by NVIDIA Corporation in collaboration with Amazon Web Services (AWS). Quick Starts are automated reference deployments that help people deploy popular technologies on AWS according to AWS best practices.

Overview

This Quick Start deploys NVIDIA Cheminformatics on the AWS Cloud. It’s designed for chemists and health care scientists who want to explore MegaMolBART, a transformer model of the latent space of small molecules obtained through self-supervised learning of compound SMILES. After deployment, you can access a user interface for analyzing analogues around and between compounds.

This Quick Start can serve as a proof of concept for big pharmaceutical and start-up companies that are curious about representational learning for compounds or generative chemistry before implementing Cheminformatics in their own environments.

If you’re unfamiliar with AWS Quick Starts, refer to the AWS Quick Start General Content Guide.

Costs and licenses

There are no additional licenses required to use this Quick Start.

There is no cost to use this Quick Start, but you will be billed for any AWS services or resources that this Quick Start deploys. For more information, refer to the AWS Quick Start General Information Guide.

Architecture

Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters builds the following Cheminformatics environment in the AWS Cloud.

Architecture
Figure 1. Quick Start architecture for Cheminformatics on AWS

As shown in Figure 1, the Quick Start sets up the following:

  • A highly available architecture that spans multiple Availability Zones (defaults to two).* A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.*

  • In the public subnets:

    • Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.

  • In the private subnets:

    • An Auto Scaling group with GPU instances (defaults to p3.2xlarge).

    • Cheminformatics and MegaMolBART services and Amazon Elastic Container Service (Amazon ECS) tasks.

  • An Application Load Balancer to load balance the traffic.

  • An Amazon ECS cluster to run the tasks.

  • An Amazon Cloud Map namespace for service discovery.

  • Amazon ECS CloudWatch Container Insights for metrics and logging.

  • An Amazon Elastic File System (Amazon EFS) file system to share data between tasks.

* The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.

Predeployment steps

Use AWS Cloud Development Kit (AWS CDK) to deploy this Quick Start. AWS CDK provides familiar programming tools and syntax to define and provision infrastructure as code through AWS CloudFormation.

For more information, refer to Working with the AWS CDK.

Prepare for the AWS CDK deployment

To deploy this stack, install Node.js and Python. Then run the following command to set the AWS_PROFILE variable and --profile parameter:

export AWS_PROFILE=myProfile

Install the AWS CDK Toolkit

  1. Install the AWS CDK Toolkit:

    npm install -g aws-cdk
  2. Verify the installation and check the current version:

    cdk --version
  3. Pull code to your machine.

    git clone {git_repo_url}
    cd {quickstart-project-name}
  4. Install the application’s dependencies:

    pip install -r requirements.txt
  5. In your cloned copy of the repo, edit the parameters in the cheminformatics/cdk.json configuration file. A sample of the file is shown here:

{
    "create_new_vpc": "True",
    "existing_vpc_name": "SomeVpcName",
    "cidr_block": "10.0.0.0/24",
    "number_of_azs": 2,
    "ec2_volume_size": 100,
    "instance_type": "p3.2xlarge",
    "cheminformatics_container": "public.ecr.aws/b9g4r0v3/cheminformatics_demo:0.1.2",
    "megamolbart_container": "nvcr.io/nvidia/clara/megamolbart:0.1.2",
    "megamolbart_model_url": "https://api.ngc.nvidia.com/v2/models/nvidia/clara/megamolbart/versions/0.1/zip"
}
  • To deploy Cheminformatics to an existing virtual private cloud (VPC):

    • Set create_new_vpc to False."

    • Set the existing_vpc_name parameter to the name of your VPC.

  • To create a new VPC during deployment:

    • Set create_new_vpc to True. The existing_vpc_name is ignored.

    • Specify the cidr_block for your VPC.

    • Specify the number of availability zones (AZs) for your VPC.

  • Configure the following parameters:

    • Set ec2_volume_size to a value in GBs.

    • Set the instance_type parameter. Note that it currently accepts only p3 instances.

  • You can also customize the following AWS CloudFormation parameters:

    • cheminformatics_container (default value is *public.ecr.aws/b9g4r0v3/cheminformatics_demo:0.1.2)

    • megamolbart_container (default value is *nvcr.io/nvidia/clara/megamolbart:0.1.2)

    • megamolbart_model_url (default value is *https://api.ngc.nvidia.com/v2/models/nvidia/clara/megamolbart/versions/0.1/zip)

Launch the Quick Start

  1. Deploy the AWS CDK Toolkit stack. For more information, refer to Bootstrapping:

    cdk bootstrap
  2. Deploy the AWS CDK stack:

    cdk deploy

Postdeployment steps

After deployment, navigate to the Outputs section in your CloudFormation dashboard and find the URL. Use this URL to follow the NVidia’s tutorial.

Dashboard
Figure 2. Cheminformatics dashboard

Troubleshooting

For troubleshooting common Quick Start issues, refer to the AWS Quick Start General Content Guide or the Troubleshooting CloudFormation page in the AWS documentation.

Feedback

To submit feature ideas and report bugs, use the Issues section of the GitHub repository for this Quick Start. To submit code, refer to the Quick Start Contributor’s Guide. To submit feedback on this deployment guide, use the following GitHub links:

Notices

This document is provided for informational purposes only. It represents current AWS product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS products or services, each of which is provided "as is" without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "as is" basis, without warranties or conditions of any kind, either expressed or implied. Refer to the License for specific language governing permissions and limitations.