Biotech Blueprint on the AWS Cloud

Quick Start Reference Deployment

QS

April 2021
Paul Underwood, Eric Zimmerman, Aaron Friedman, Michael Miller, and Sean Murphy, AWS WWCO Startup team
Shivansh Singh, AWS Quick Start team

Visit our GitHub repository for source files and to post feedback, report bugs, or submit feature ideas for this Quick Start.

This Quick Start was created by Amazon Web Services (AWS). Quick Starts are automated reference deployments that use AWS CloudFormation templates to deploy key technologies on AWS, following AWS best practices.

Overview

This Quick Start reference deployment guide provides instructions for deploying Biotech Blueprint on the AWS Cloud.

This Quick Start is for teams and individuals responsible for managing informatics infrastructure for a biotech company. It builds an architecture for you that includes identity management, access control, encryption, VPN, network isolation, logging, alarms, DNS, and compliance auditing.

Biotech Blueprint on AWS

Biotech Blueprint is an informatics architecture built for biotech companies on the Amazon Web Services (AWS) Cloud. This Quick Start deploys a virtual data center with partitioned virtual private clouds (VPCs) to separate biotech development, production, and management processes. It deploys this infrastructure and configures it for identity management, access control, logging, alarms, and compliance auditing according to AWS best practices. Into the core architecture the Quick Start deploys, you can launch the industry’s leading scientific research applications from the AWS Service Catalog.

For more information on what this Quick Start deploys, see Architecture, later in this guide.

AWS costs

You are responsible for the cost of the AWS services and any third-party licenses used while running this Quick Start. There is no additional cost for using the Quick Start.

The AWS CloudFormation templates for Quick Starts include configuration parameters that you can customize. Some of the settings, such as the instance type, affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you use. Prices are subject to change.

After you deploy the Quick Start, create AWS Cost and Usage Reports to deliver billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. These reports provide cost estimates based on usage throughout each month and aggregate the data at the end of the month. For more information, see What are AWS Cost and Usage Reports?

Software licenses

The core of the Biotech Blueprint is released under the Apache License 2.0. Software available from the Biotech Blueprint AWS Service Catalog have different licensing terms.

Hail 0.2 is open-source and does not require a license. Dotmatics and Titian require licenses. Before deploying Dotmatics and Titian from the AWS Service Catalog, sign up for a license by contacting the support contact provided in the AWS Service Catalog listing.

The ChemAxon Quick Start requires a subscription to an Amazon Machine Image (AMI), which is available from https://aws.amazon.com/marketplace/pp/B077F6VV3B. Additional pricing, terms, and conditions may apply.

Architecture

Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters builds the following Biotech Blueprint environment in the AWS Cloud.

Architecture
Figure 1. Quick Start architecture for Biotech Blueprint on AWS

As shown in Figure 1, the Quick Start sets up the following:

  • A highly available architecture with three virtual private clouds (VPCs), each with two Availability Zones. VPCs contain public and private subnets according to AWS best practices, to provide you with your own virtual network on AWS.

    • A production VPC into which you can deploy optional research and informatics software from the AWS Service Catalog.

    • A management VPC with AWS Client VPN endpoints in the public subnets.

    • A development VPC to build and test research workloads.

  • Peering connections to allow Secure Shell (SSH) and remote desktop access from the management VPC to private subnets in the production and development VPCs.

  • AWS Config to assess, audit, and evaluate security compliance of your AWS resources and remediate deviations.

  • Amazon Route 53 for a private Domain Name System (DNS).

  • (Optional) An AWS Service Catalog portfolio with informatics software and computational biology tooling you can deploy into the production and development VPCs. For more information, see AWS Service Catalog, later in this guide.

Planning the deployment

Specialized knowledge

This deployment requires a moderate level of familiarity with AWS services. If you’re new to AWS, see Getting Started Resource Center and AWS Training and Certification. These sites provide materials for learning how to design, deploy, and operate your infrastructure and applications on the AWS Cloud.

This Quick Start assumes familiarity with basic concepts of networking and VPN client software such as AWS Client VPN or Open VPN Client Connect.

AWS account

If you don’t already have an AWS account, create one at https://aws.amazon.com by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using the phone keypad.

Your AWS account is automatically signed up for all AWS services. You are charged only for the services you use.

Technical requirements

Before you launch the Quick Start, review the following information and ensure that your account is properly configured. Otherwise, deployment might fail.

Resource quotas

If necessary, request service quota increases for the following resources. You might request quota increases to avoid exceeding the default limits for any resources that are shared across multiple deployments. The Service Quotas console displays your usage and quotas for some aspects of some services. For more information, see What is Service Quotas? and AWS service quotas.

Resource This deployment uses

Configuration recorders

1

AWS Config conformance packs

4

AWS Config delivery channels

1

Client VPN authorization rules

3

Client VPN endpoints

1

Client VPN routes

4

Client VPN target network associations

2

Elastic IP addresses

3

Internet gateways

3

NAT gateways

3

Routes

30

Route tables

16

Security groups

1

Subnets

16

Subnet route table associations

32

VPCs

3

VPC endpoints

2

VPC gateway attachments

2

VPC peering connections

2

IAM policies

1

IAM roles

2

AWS Lambda functions

1

Log groups

1

Log streams

1

Hosted zones

1

Amazon S3 buckets

2

Amazon S3 bucket policies

1

AWS Service Catalog CloudFormation products

6

AWS Service Catalog portfolio product associations

6

AWS Service Catalog portfolios

1

Supported Regions

This Quick Start does not support AWS China Regions. All other AWS Regions are supported.

Certain Regions are available on an opt-in basis. For more information, see Managing AWS Regions.

IAM permissions

Before launching the Quick Start, you must sign in to the AWS Management Console with IAM permissions for the resources that the templates deploy. The AdministratorAccess managed policy within IAM provides sufficient permissions, although your organization may choose to use a custom policy with more restrictions. For more information, see AWS managed policies for job functions.

Deployment options

This Quick Start provides two deployment options:

  • AWS CloudFormation-only deployment. Deploy Biotech Blueprint into your AWS account using the AWS CloudFormation console. Use this option if you are unfamiliar with the AWS Cloud Development Kit. This option takes about seven minutes.

  • AWS CDK deployment. Use AWS CDK to deploy Biotech Blueprint. This option takes about five minutes longer than the AWS CloudFormation-only option.

Deployment steps

Sign in to your AWS account

  1. Sign in to your AWS account at https://aws.amazon.com with an IAM user or role that has the necessary permissions. For details, see Planning the deployment earlier in this guide.

  2. Make sure that your AWS account is configured correctly, as discussed in the Technical requirements section.

You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using this Quick Start. For full details, see the pricing pages for each AWS service used by this Quick Start. Prices are subject to change.

Deployment Options

The Biotech Blueprint is built with the AWS CDK which allows for two deployment options.

If you are unfamiliar with the AWS CDK, or don’t want to setup the CDK dependencies on your current computer, follow the CloudFormation Deployment option (1).

If you are familiar with the AWS CDK, plan on extending or customizing the Blueprint, follow the CDK Deployment option (2).

Both deployment options take about ~7 minutes to complete and create the exact same architecture. The primary difference is that the AWS CDK deployment option takes a little more time for initial setup (<5 min), but the CDK code is easier to maintain overtime compared to the CloudFormation template in option 1.

The option to restrict your account to specific regions (ex US or EU only) is only available for CDK deployment option.

Deployment Option 1: AWS CloudFormation Deployment (quick and easy)

  1. Sign in to your AWS account, and click the deploy link below to launch the AWS CloudFormation template.

Deploy Biotech Blueprint into your AWS Account

View template

  1. Check the AWS Region that’s displayed in the upper-right corner of the navigation bar, and change it if necessary. This is where the network infrastructure for Biotech Blueprint will be built. The template is launched in the us-east-1 Region by default.

  1. On the Create stack page, keep the default setting for the template URL, and then choose Next.

  2. On the Specify stack details page, change the stack name if needed. There are no parameters you need to supply.

  1. On the Configure stack options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you’re finished, choose Next.

  2. On the Review page, review and confirm the template settings. Under Capabilities, select the two check boxes to acknowledge that the template creates IAM resources and might require the ability to automatically expand macros.

  3. Choose Create stack to deploy the stack.

  4. Monitor the status of the stack. When the status is CREATE_COMPLETE, the Biotech Blueprint deployment is ready.

  5. Use the values displayed in the Outputs tab for the stack, as shown in Figure 2, to view the created resources.

cfn_outputs
Figure 2. Biotech Blueprint outputs after successful deployment

Deployment Option 2: AWS CDK deployment

To deploy Biotech Blueprint using AWS CDK, do the following:

  1. Verify that you have the prerequisites to install the AWS CDK.

  2. Install the AWS CDK.

  3. Clone the Biotech Blueprint Quick Start repository.

git clone https://github.com/aws-quickstart/quickstart-aws-biotech-blueprint-cdk.git
cd quickstart-aws-biotech-blueprint-cdk
  1. Build the project.

npm install
npm run build
  1. Bootstrap your AWS environment.

cdk bootstrap
You can review and change code. For example, you can use different VPC CIDR ranges (aws-vpcs.ts) or a different internal DNS apex (aws-dns.ts defaults to corp).
  1. Deploy.

npm run build && cdk deploy
To update the architecture after making changes later, run the command in Step 6.

Post deployment steps

Connect to the VPN

The Quick Start deploys a client VPN endpoint in the management VPC that will route traffic over peering connections to the production and development VPCs. The management VPC is the hub for networking into the other VPCs. The production and development VPCs are designed not to be able to communicate with each other directly. After deploying the Quick Start, follow these instructions to connect to the VPN.

  1. Navigate to the Client VPN Endpoint section in the AWS VPC web console.

  2. Select the client VPN endpoint listed,

  3. Select Download Client Configuration. Your browser downloads a downloaded-client-config.ovpn file.

    VPN
    Figure 3. Download Client Configuration
  1. Navigate to the AWS S3 console.

  2. Open the bucket with the prefix awsstartupblueprintstack-clientvpnvpnconfigbucket*.

  3. Download the client1.domain.tld.key and client1.domain.tld.crt.

    The other three files are the CA chain and server key/cert. You will need those to create additional client certificates.
  1. Open downloaded-client-config.ovpn in a text editor.

  2. Add the following lines to the bottom of the file. Replace the contents of the two files inside the respective <cert> and <key> sections.

    <cert>
    Contents of client certificate file (client1.domain.tld.crt)
    </cert>
    
    <key>
    Contents of private key file (client1.domain.tld.key)
    </key>
  1. Save and close the file. You can establish a VPN connection with the configuration and an OpenVPN client or AWS provided client.

With a VPN connection, you can connect to resources you launch into your VPCs using private IP addresses. For more information about deploying resources, see Deploying resources into VPCs, later in this guide.

Deploying resources into VPCs

This Quick Start builds an architecture with three VPCs: production, development, and management. Use the management VPC for operational resources such as DevOps tools, Active Directory, and security appliances. For example, the Biotech Blueprint Quick Start deploys Client VPN endpoints into the public subnets of the management VPC. Production and development VPCs are provided so you can manage live and test environments with different levels of controls.

Reserve public subnets for internet-facing resources such as load balancers. Use private subnets for resources that should not be internet-facing but require outbound internet access. Deploy sensitive resources such as databases addressable only by internal networks to isolated subnets which do not route traffic to the internet. For more information about public and private subnets, see VPC with public and private subnets (NAT).

The following table provides some examples.

Resource VPC Subnet

Test server

Development

Private

Amazon Relational Database Server (Amazon RDS) snapshot restored from development VPC

Production

Isolated

Application Load Balancer to test a custom TLS certificate

Development

Public

DevOps tool to automate deployments to production and development VPCs

Management

Private

Okta Cloud Connect appliance

Management VPC

Private

Optional DNS setup

The Quick Start sets up a private DNS with .corp as the apex domain using Amazon Route 53 in your account. Using the Amazon Route 53 console, you can create A or CNAME records to private applications you deploy.

Delete the default VPC

Every new AWS account comes with a default VPC with public subnets in each Availability Zone. It is recommended that you delete this default VPC and only deploy resources into the production, management, and development VPCs that the Biotech Blueprint Quick Start provisions. If you have already deployed resources into the default VPC before launching the Quick Start, it is recommended that you migrate these resources to the Biotech Blueprint VPCs and then delete the default VPC. Removing the default VPC will ensure that a user does not launch resources into one of its exposed public subnets.

Figure 4 shows the default VPC listed in the Amazon VPC console with the VPCs created by the Biotech Blueprint Quick Start.

Config
Figure 4. Default VPC

Security and Compliance

The Quick Start deploys the following AWS Config conformance packs:

These packs create AWS Config rules that regularly evaluate resources in your AWS account against security best practices. When AWS Config finds an offending resource, it will flag it for your review in the AWS Config console. AWS Config also scans resources created in your account before deploying the Quick Start.

For example, the Operational Best Practices for NIST Cyber Security Framework (CSF) conformance pack comes with 93 rules. One of which is encrypted-volumes-conformance-pack, which checks whether attached Amazon Elastic Block Store (Amazon EBS) volumes are encrypted.

Config
Figure 5. Operational Best Practices for NIST-CSF

Select encrypted-volumes-conformance-pack to display a list of relevant resources and their compliance status.

Config
Figure 6. Encrytped volumes conformance pack

You can update the AWS Config delivery channel to include an Amazon Simple Notification Service (Amazon SNS) topic to send email or text notifications when resources are flagged. More sophisticated approaches might include regularly reviewing AWS Config reports, using AWS Config’s automatic remediation capabilities, or integrating AWS Config with security ticketing or SEIM solutions.

Operational Best Practices for HIPAA Security conformance pack

While the Health Insurance Portability and Accountability Act (HIPAA) might not be a concern for every user of this Quick Start, many store, transmit, or process protected health information (PHI). Whether you handle PHI or not, the HIPAA security conformance pack has over 80 rules that capture a number of best practices that any user should consider implementing.

If you do have HIPAA/PHI needs, it is strongly encouraged that you read Operational Best Practices for HIPAA Security.

AWS Config conformance packs provide a general-purpose compliance framework designed to enable you to create security, operational or cost-optimization governance checks using managed or custom AWS Config rules and AWS Config remediation actions. Conformance packs, as sample templates, are not designed to fully ensure compliance with a specific governance or compliance standard. You are responsible for making your own assessment of whether your use of the Services meets applicable legal and regulatory requirements.

Restricting IAM actions to specific AWS Regions

You can restrict IAM actions to EU or US AWS Regions. For example, you may want to restrict the creation of Amazon Elastic Compute Cloud (Amazon EC2) instances or AWS Simple Storage Service (Amazon S3) buckets to only European Regions. This could be for compliance reasons or simply because its a good practice to keep resources out of Regions you never intend to use. If you have a single AWS account, the best way to enforce AWS Region restrictions is with an IAM permission boundary. The RegionRestriction class configured in lib/aws-startup-blueprint-stack.ts creates an IAM permission boundary. It restricts IAM actions to the AWS Regions you specify. For example:

      new RegionRestriction(this, 'RegionRestriction', {
        AllowedRegions: ["eu-central-1","eu-west-1","eu-west-3", "eu-south-1", "eu-north-1"]
      });

We have added some helper context variables (apply_EU_RegionRestriction and apply_US_RegionRestriction) inside the the cdk.json file. Setting one of those to "true" and running cdk deploy again will apply the region restriction.

In order for the permission boundary to have any effect, it needs to be attached to all existing and future IAM users and roles. As a best practice, you should always attach this permission boundary when creating any future IAM user or role. While a best practice, sometimes good intentions are forgotten. To enforce the permission boundary gets attached, the RegionRestriction class also creates an AWS Config Rule and Remediation to detect and automatically fix a missing permission boundary to any existing, updated, or future IAM principals.

To enforce the permission boundary, the RegionRestriction class creates an AWS Config rule to detect and remediate a missing IAM permission boundary.

In order for the permission boundary to have any effect, it should be attached to all existing and future IAM users and roles. To enforce the permission boundary, the RegionRestriction class creates an AWS Config rule to detect and remediate a missing IAM permission boundary. For example, in AWS Config Rules console, select the AwsBiotechBlueprint-RegionRestriction rule.

Config
Figure 7. AwsBiotechBlueprint-RegionRestriction rule

The Config Rule evaluates your IAM users and roles and lists their compliance status. To remediate a non-compliant resource, select the resource and select Remediate. The service control policy is applied and that user or role will no longer be able to perform any action outside of the specified Region.

Config

After the remediation is complete, AWS CloudTrail triggers the AWS Config rule. CloudTrail tells AWS Config that that the IAM principal has been updated and that its time to reevaluate the offending resource (takes about 15 minutes). Because the boundary has been applied, the reevaluation will report the role or user as compliant.

Enabling automatic remediation

The Biotech Blueprint Quick Start intentionally leaves the remediation configuration set to Manual instead of Automatic. This is in the event you have existing IAM users or roles. Automatically applying the remediation and attaching the permission boundary will impact those existing IAM principals permissions. You should verify if any of the flagged IAM principals depend on any non-approved Regions before applying the boundary. If you are working in a brand new account or are unconcerned about the impact on existing IAM principals, you can turn on automatic remediation by following these steps:

Enabling automatic remediation will impact existing IAM users and roles not created by the Biotech Blueprint.
  1. In the AWS Config console, select Edit in the Remediation Action section of the AwsBiotechBlueprint-RegionRestriction AWS Config rule.

Config
Figure 8. AwsBiotechBlueprint-RegionRestriction rule
  1. Select Automatic Remediation.

  2. Select Save changes.

Config
Figure 9. Edit Remediation action

AWS Region restriction in multi-account configurations

In a multi-account setup, Service Control Polices (SCPs) are superior to permission boundaries. SCPs are applied across an entire account and do not need to be individually attached to IAM principals. However, if you have only one account, use permission boundaries discussed previously to restrict Regions. SCPs only apply to sub-accounts. When you create a new account, the Region-restricting SCP created by the Biotech Blueprint will be applied automatically to any new account you create.

For more information about the service control policy, see IAM Console

Config
Figure 10. RegionRestriction

AWS Service Catalog

After deploying the Biotech Blueprint Quick Start, you can launch a selection of informatics and scientific applications from the AWS Service Catalog console. You can also deploy these from the launch links provided on their Quick Start pages and in their deployment guides. The following table provides information to learn more about the available applications.

Category Partner Product To install

Compound registry

ChemAxon

Compound Registration

Launch Quick Start template

Genomics analysis

Hail

Hail 0.2

Quick Start page | View guide

Knowledge management

Dotmatics

Dotmatics suite

Quick Start page | View guide

Sample management

Titian

Mosaic FreezerManagement

Quick Start page | View guide

AWS Service Catalog permissions

Access to AWS Service Catalog requires credentials. Those credentials must have permission to access AWS resources, such as an AWS Service Catalog portfolio or product. AWS Service Catalog integrates with AWS Identity and Access Management (IAM) to enable you to grant AWS Service Catalog end users permissions to launch products and manage provisioned products. To control access, you attach these policies to the IAM users, groups, and roles that you use with AWS Service Catalog.

  1. Navigate to the AWS Service Catalog console.

  2. Select the Biotech Blueprint Informatics Catalog portfolio.

  3. Select the Groups, roles, and users tab.

scpermission
Figure 11. Quick Start architecture for Biotech Blueprint on AWS
  1. Select Add groups, users, and roles.

  2. Select the IAM identities requiring AWS Service Catalog permissions. Do not forget to include yourself if you need permissions.

FAQ

Q. Why are there two subnets of the same class in each VPC?

A. Each subnet of the same class is in a different Availability Zone, providing high availability. Subnets of the same class are identical from a networking perspective, so it does not matter which you specify when deploying a resource.

Q. I encountered a CREATE_FAILED error when I launched the Quick Start.

A. If AWS CloudFormation fails to create the stack, we recommend that you relaunch the template with Rollback on failure set to Disabled. (This setting is under Advanced in the AWS CloudFormation console, Options page.) With this setting, the stack’s state is retained and the instance is left running, so you can troubleshoot the issue. (For Windows, look at the log files in %ProgramFiles%\Amazon\EC2ConfigService and C:\cfn\log.)

When you set Rollback on failure to Disabled, you continue to incur AWS charges for this stack. Ensure that you delete the stack after troubleshooting.

For additional information, see Troubleshooting AWS CloudFormation.

Q. I encountered a size limitation error when I deployed the AWS CloudFormation templates.

A. We recommend that you launch the Quick Start templates from the links in this guide or from another S3 bucket. If you deploy the templates from a local copy on your computer or from a location other than an S3 bucket, you might encounter template size limitations. For more information about AWS CloudFormation quotas, see AWS CloudFormation quotas.

Send us feedback

To post feedback, submit feature ideas, or report bugs, use the Issues section of the GitHub repository for this Quick Start. To submit code, see the Quick Start Contributor’s Guide.

Quick Start reference deployments

GitHub repository

Visit our GitHub repository to download the templates and scripts for this Quick Start, to post your comments, and to share your customizations with others.


Notices

This document is provided for informational purposes only. It represents AWS’s current product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied. See the License for specific language governing permissions and limitations.