Etleap ETL on the AWS Cloud

Quick Start Reference Deployment

QS

August 2020
Caius Brindescu and Archie Menzies, Etleap
Dave May, Quick Start team

Visit our GitHub repository for source files and to post feedback, report bugs, or submit feature ideas for this Quick Start.

This Quick Start was created by Etleap in collaboration with Amazon Web Services (AWS). Quick Starts are automated reference deployments that use AWS CloudFormation templates to deploy key technologies on AWS, following AWS best practices.

Overview

This Quick Start is for users who want to run Etleap ETL within an existing or new virtual private cloud (VPC) in their own AWS account. When deploying to a new VPC, the Quick Start builds a new AWS environment with the VPC and all required infrastructure components.

Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on the Quick Start.

Etleap ETL on AWS

Etleap ETL is an extract, transform, and load (ETL) service for building data warehouses using Amazon Redshift and data lakes using Amazon Simple Storage Service (Amazon S3) and AWS Glue. Using Etleap on the AWS Cloud, you can:

  • Extract from any source, including databases, applications, files, and event streams. Even legacy on-premises sources can be integrated with no extra effort.

  • Transform and preview your data using the Etleap interactive data wrangler, without writing any code. Transformations run with automatic scaling on an Amazon EMR cluster, which is included with this Quick Start.

  • Load to an Amazon Redshift data warehouse or Amazon S3 with AWS Glue data lake for immediate analysis, on-demand access, and long-term archiving.

  • Model your data with SQL queries for unification and performance, and let Etleap maintain dependencies and orchestration.

  • Operate your pipelines without headaches. Automatic detection and guided resolution of schema changes and performance issues keep your data repository available, reliable, and fast.

AWS costs

You are responsible for the cost of the AWS services and any third-party licenses used while running this Quick Start. There is no additional cost for using the Quick Start.

The AWS CloudFormation templates for Quick Starts include configuration parameters that you can customize. Some of the settings, such as the instance type, affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you use. Prices are subject to change.

After you deploy the Quick Start, create AWS Cost and Usage Reports to deliver billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. These reports provide cost estimates based on usage throughout each month and aggregate the data at the end of the month. For more information, see What are AWS Cost and Usage Reports?

Software licenses

This Quick Start requires either a subscription to the Amazon Machine Image (AMI) for Etleap or a license provided by Etleap. An AMI subscription is available from AWS Marketplace. Additional pricing, terms, and conditions may apply. For instructions, see the Deployment steps section.

Architecture

Deploying this Quick Start for a new VPC with default parameters builds the following Etleap ETL environment in the AWS Cloud.

Architectural diagram
Figure 1. Quick Start architecture for Etleap ETL on AWS

As shown in figure 1, the Quick Start sets up the following:

  • A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.[1]

  • In the public subnets:

    • A managed network address translation (NAT) gateway to allow outbound internet access for resources in the private subnets.[1]

    • An Amazon EC2 instance running Etleap.

  • In the private subnets:

    • An Amazon Relational Database Service (Amazon RDS) MySQL database used by Etleap to store metadata.

    • An Amazon EMR cluster used by Etleap to run extractions and transformations.

  • An Amazon S3 bucket used by Etleap to store extracted and transformed data.

  • One AWS Key Management Service (AWS KMS) key used to encrypt secrets within Etleap.

  • Four AWS Identity and Access Management (IAM) roles:

    • One attached to the Etleap Amazon EC2 instance.

    • One attached to the Amazon EMR cluster nodes.

    • One used by Amazon EMR for provisioning and service-level actions (see EMR Role).

    • One used by Amazon EMR for auto scaling (see Auto Scaling Role).

If the "Availability" parameter is set to "High Availability", the following Etleap ETL architecture is deployed.

Architectural diagram
Figure 2. Highly available Quick Start architecture for Etleap ETL on AWS

As shown in figure 2, the Quick Start sets up the following:

  • A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.[1]

  • In the public subnets:

    • A managed network address translation (NAT) gateway to allow outbound internet access for resources in the private subnets.[1]

    • Two Amazon EC2 instances running Etleap ETL.

  • In the private subnets:

    • An Amazon Relational Database Service (Amazon RDS) MySQL database replicated across two Availability Zones used by Etleap to store metadata.

    • An Amazon EMR cluster used by Etleap to run extractions and transformations.

  • An Amazon S3 bucket used by Etleap to store extracted and transformed data.

  • One AWS Key Management Service (AWS KMS) key used to encrypt secrets within Etleap.

  • Four AWS Identity and Access Management (IAM) roles:

    • One attached to the Etleap Amazon EC2 instance.

    • One attached to the Amazon EMR cluster nodes.

    • One used by Amazon EMR for provisioning and service-level actions (see EMR Role).

    • One used by Amazon EMR for auto scaling (see Auto Scaling Role).

Planning the deployment

Specialized knowledge

This deployment requires a moderate level of familiarity with AWS services. If you’re new to AWS, see Getting Started Resource Center and AWS Training and Certification. These sites provide materials for learning how to design, deploy, and operate your infrastructure and applications on the AWS Cloud.

This Quick Start assumes that you are familiar with your data sources, Amazon Redshift, and Amazon S3 with AWS Glue data lakes.

AWS account

If you don’t already have an AWS account, create one at https://aws.amazon.com by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using the phone keypad.

Your AWS account is automatically signed up for all AWS services. You are charged only for the services you use.

Technical requirements

Before you launch the Quick Start, review the following information and ensure that your account is properly configured. Otherwise, deployment might fail.

Resource quotas

If necessary, request service quota increases for the following resources. You might need to request increases if your existing deployment currently uses these resources and if this Quick Start deployment could result in exceeding the default quotas. The Service Quotas console displays your usage and quotas for some aspects of some services. For more information, see What is Service Quotas? and AWS service quotas.

Resource This deployment uses

VPCs

1

AWS Identity and Access Management (IAM) security groups

1

IAM roles

4

Application Load Balancers

1

Etleap instances

1

Databases

1

EMR clusters

1

S3 buckets

1

Supported AWS Regions

For any Quick Start to work in a Region other than its default Region, all the services it deploys must be supported in that Region. You can launch a Quick Start in any Region and see if it works. If you get an error such as “Unrecognized resource type,” the Quick Start is not supported in that Region.

For an up-to-date list of AWS Regions and the AWS services they support, see AWS Regional Services.

Certain Regions are available on an opt-in basis. For more information, see Managing AWS Regions.

IAM permissions

Before launching the Quick Start, you must sign in to the AWS Management Console with IAM permissions for the resources that the templates deploy. The AdministratorAccess managed policy within IAM provides sufficient permissions, although your organization may choose to use a custom policy with more restrictions. For more information, see AWS managed policies for job functions.

This Quick Start requires either a subscription to the AMI for Etleap in AWS Marketplace or a license provided by Etleap.

Deployment options

This Quick Start provides two deployment options:

  • Deploy Etleap ETL to a new VPC (end-to-end deployment). Builds a new AWS environment that consists of the VPC, subnets, NAT gateways, security groups, bastion hosts, and other infrastructure components. It then deploys Etleap ETL to this new VPC.

  • Deploy Etleap ETL to an existing VPC. Provisions Etleap ETL in your existing AWS infrastructure.

The Quick Start provides separate templates for these options. It also lets you configure Classless Inter-Domain Routing (CIDR) blocks, instance types, and Etleap ETL settings, as discussed later in this guide.

Deployment steps

Sign in to your AWS account

  1. Sign in to your AWS account at https://aws.amazon.com with an IAM user role that has the necessary permissions. For details, see Planning the deployment earlier in this guide.

  2. Make sure that your AWS account is configured correctly, as discussed in the Technical requirements section.

Subscribe to the Etleap ETL AMI

This Quick Start requires a subscription to the AMI for Etleap ETL in AWS Marketplace or a license provided by Etleap. If Etleap has provided you with a deployment ID, you can skip ahead and launch the Quick Start in the next section.

  1. Sign in to your AWS account.

  2. Open the page for the Etleap ETL AMI in AWS Marketplace, and then choose Continue to Subscribe.

  3. Review the terms and conditions for software usage, and then choose Accept Terms.
    A confirmation page loads, and an email notification is sent to the account owner. For detailed subscription instructions, see the AWS Marketplace documentation.

  4. When the subscription process is complete, close AWS Marketplace. Do not provision the software from AWS Marketplace. The Quick Start deploys the AMI for you.

Launch the Quick Start

You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using this Quick Start. For full details, see the pricing pages for each AWS service used by this Quick Start. Prices are subject to change.
  1. Sign in to your AWS account, and choose one of the following options to launch the AWS CloudFormation template. For help with choosing an option, see the Deployment options section.

Deploy Etleap ETL to a new VPC on AWS

Deploy Etleap ETL to an existing VPC on AWS

If you’re deploying Etleap ETL into an existing VPC, make sure that your VPC has two private subnets in different Availability Zones for the workload instances, and that the subnets aren’t shared. This Quick Start doesn’t support shared subnets. These subnets require NAT gateways in their route tables to allow the instances to download packages and software without exposing them to the internet.

Also, make sure that the domain name option in the DHCP options is configured, as explained in the Amazon VPC documentation. You must provide your VPC settings when you launch the Quick Start.

Each deployment takes about 20 minutes to complete.

  1. Check the AWS Region that’s displayed in the upper-right corner of the navigation bar, and change it if necessary. This is where the network infrastructure for Etleap ETL is built. The template is launched in the us-east-1 Region by default.

  1. On the Create stack page, keep the default setting for the template URL, and then choose Next.

  2. On the Specify stack details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings and customize them as necessary.

+ In the following tables, parameters are listed by category and described separately for the deployment options. When you finish reviewing and customizing the parameters, choose Next.

+ NOTE: Unless you are customizing the Quick Start templates for your own deployment projects, keep the default settings for the parameters Quick Start S3 bucket name, Quick Start S3 bucket Region, and Quick Start S3 key prefix. Changing these settings automatically updates code references to point to a new Quick Start location. For more information, see the AWS Quick Start Contributor’s Guide.

+

Launch into a new VPC

Table 1. Network configuration
Parameter label (name) Default value Description

VPC CIDR block 1 (VpcCidrBlock1)

10

The first octet of the CIDR block of the desired VPC’s address space.

VPC CIDR block 2 (VpcCidrBlock2)

10

The second octet of the CIDR block of the desired VPC’s address space.

Allowed IP CIDR block (AllowedIPCidr)

0.0.0.0/0

CIDR block allowed remote access to the environment.

Table 2. Etleap configuration
Parameter label (name) Default value Description

Etleap Deployment ID (DeploymentId)

Optional

The Deployment ID provided by Etleap. If purchased through AWS Marketplace, leave this blank.

Deployment secret (optional) (UserSpecifiedDeploymentSecret)

Blank string

The ARN of the secret used to communicate with Etleap. If left blank, one is generated for you. The ARN of the secret used in the deployment will be present in the outputs. If you have previously launched Etleap with the same deployment ID, use the same secret key.

First name (FirstName)

Requires input

Your first name.

Last name (LastName)

Requires input

Your last name.

Email address (Email)

Requires input

Your email address. You may opt-in to register your deployment with Etleap. If you do, Etleap requires that your email domain is globally unique and that it isn’t a personal email domain (such as gmail.com or yahoo.com), so please use your company email address.

Etleap initial login password (SetupPassword)

Requires input

Your initial login password. You will be asked to change this password when you log in for the first time.

SSL certificate ARN (SSLCertificateArn)

Blank string

The SSL Certificate ID used with the load balancer. Can be left blank if deploying with Single Availability.

Table 3. Instance configuration
Parameter label (name) Default value Description

EC2 instance type (AppInstanceType)

t3.large

The EC2 instance type.

Key pair name (KeyPairName)

Requires input

EC2 key pair name.

AvailabilityMode (Availability)

Single Availability

Availability Mode

Table 4. AWS Quick Start configuration
Parameter label (name) Default value Description

Quick Start S3 bucket name (QSS3BucketName)

aws-quickstart

S3 bucket name for the Quick Start assets. Quick Start bucket name can include numbers, lowercase letters, uppercase letters, and hyphens (-). It cannot start or end with a hyphen (-).

Quick Start S3 bucket region (QSS3BucketRegion)

us-east-1

The AWS Region where the Quick Start S3 bucket (QSS3BucketName) is hosted. When using your own bucket, you must specify this value.

Quick Start S3 key prefix (QSS3KeyPrefix)

quickstart-etleap-etl/

S3 key prefix for the Quick Start assets. Quick Start key prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/).

Launch into an existing VPC

Table 5. VPC configuration
Parameter label (name) Default value Description

The VPC where to deploy Etleap in. (UserSpecifiedVPCId)

Requires input

NO_DESCRIPTION

The first public subnet (PublicSubnetA)

Requires input

NO_DESCRIPTION

The second public subnet (PublicSubnetB)

Requires input

NO_DESCRIPTION

The first private subnet (PrivateSubnetA)

Requires input

NO_DESCRIPTION

A second private subnet (PrivateSubnetB)

Requires input

NO_DESCRIPTION

Table 6. Network configuration
Parameter label (name) Default value Description

Allowed IP CIDR block (AllowedIPCidr)

0.0.0.0/0

CIDR block allowed remote access to the environment.

Table 7. Etleap configuration
Parameter label (name) Default value Description

Etleap Deployment ID (DeploymentId)

Optional

The Deployment ID provided by Etleap. If purchased through AWS Marketplace, leave this blank.

Deployment secret (optional) (UserSpecifiedDeploymentSecret)

Blank string

The ARN of the secret used to communicate with Etleap. If left blank, one is generated for you.

The ARN of the secret that is used in the deployment will be present in the outputs.

If you have previously launched Etleap with the same deployment ID, please make sure you use the same secret key.

First name (FirstName)

Requires input

Your first name.

Last name (LastName)

Requires input

Your last name.

Email address (Email)

Requires input

Your email address. You may opt-in to register your deployment with Etleap. If you do, Etleap requires that your email domain is globally unique and that it isn’t a personal email domain (such as gmail.com or yahoo.com), so use your company email address.

Etleap initial login password (SetupPassword)

Requires input

Your initial login password. You will be asked to change this password when you log in for the first time.

SSL certificate ARN (SSLCertificateArn)

Blank string

The SSL Certificate ID used with the load balancer. Can be left blank if deploying with Single Availability.

Table 8. Instance configuration
Parameter label (name) Default value Description

EC2 instance type (AppInstanceType)

Requires input

The EC2 instance type.

Key pair name (KeyPairName)

Requires input

EC2 key pair name.

Availability Mode (Availability)

Single Availability

Availability Mode

Table 9. AWS Quick Start configuration
Parameter label (name) Default value Description

Quick Start S3 bucket name (QSS3BucketName)

aws-quickstart

S3 bucket name for the Quick Start assets. Quick Start bucket name can include numbers, lowercase letters, uppercase letters, and hyphens (-). It cannot start or end with a hyphen (-).

Quick Start S3 bucket region (QSS3BucketRegion)

us-east-1

The AWS Region where the Quick Start S3 bucket (QSS3BucketName) is hosted. When using your own bucket, you must specify this value.

Quick Start S3 key prefix (QSS3KeyPrefix)

Requires input

S3 key prefix for the Quick Start assets. Quick Start key prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/).

+ . On the Configure stack options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you finish, choose Next. . On the Review page, review and confirm the template settings. Under Capabilities, select the two check boxes to acknowledge that the template creates IAM resources and might require the ability to automatically expand macros. . Choose Create stack to deploy the stack. . Monitor the status of the stack. When the status is CREATE_COMPLETE, the Etleap ETL deployment is ready. . To view the created resources, see the values displayed in the Outputs tab for the stack.

Test the deployment

After the stack has been successfully created, the Outputs tab shows the IP address of the Etleap instance as well as the email and password that you can use to log in.

Best practices for using Etleap on AWS

Once logged in, the onboarding guide can help you get started with Etleap. For additional guidance and best practices, use the in-application link to the product documentation.

Security

Etleap is accessed through HTTPS only. You should restrict the IP range in the Etleap EC2 security group for port 443 to only the IP addresses that you use to access Etleap. Alternatively, you can launch the Etleap EC2 instance in a private subnet.

Support

The Etleap support team is standing by to assist you in configuring connections and operating your pipelines. Contact them through the in-application instant message option or send an email to support@etleap.com.

Troubleshooting

Q. I encountered a CREATE_FAILED error when I launched the Quick Start.

A. If AWS CloudFormation fails to create the stack, relaunch the template with Rollback on failure set to No. (This setting is under Advanced in the AWS CloudFormation console on the Options page.) With this setting, the stack’s state is retained, and the instance is left running so you can troubleshoot the issue. (For Windows, look at the log files in %ProgramFiles%\Amazon\EC2ConfigService and C:\cfn\log.)

When you set Rollback on failure to Disabled, you continue to incur AWS charges for this stack. Delete the stack when you finish troubleshooting.

For additional information, see Troubleshooting AWS CloudFormation on the AWS website.

Q. I encountered a size limitation error when I deployed the AWS CloudFormation templates.

A. Launch the Quick Start templates from the links in this guide or from another S3 bucket. If you deploy the templates from a local copy on your computer or from a location other than an S3 bucket, you might encounter template size limitations. For more information about AWS CloudFormation quotas, see the AWS documentation.

Customer responsibility

After you successfully deploy this Quick Start, confirm that your resources and services are updated and configured — including any required patches — to meet your security and other needs. For more information, see the AWS Shared Responsibility Model.

Send us feedback

To post feedback, submit feature ideas, or report bugs, use the Issues section of the GitHub repository for this Quick Start. To submit code, see the Quick Start Contributor’s Guide.

Quick Start reference deployments

GitHub repository

Visit our GitHub repository to download the templates and scripts for this Quick Start, to post your comments, and to share your customizations with others.


Notices

This document is provided for informational purposes only. It represents AWS’s current product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied. See the License for specific language governing permissions and limitations.


1. The template that deploys the Quick Start into an existing VPC skips this component and prompts you for your existing VPC configuration.