Tableau for Amazon SageMaker on the AWS Cloud

Quick Start Reference Deployment

QS

February 2021
Holt Calder, InterWorks, and Madeleine Corneli, Tableau Software
Dylan Tong, AWS AI Augmented Analytics, and Shivansh Singh and Tony Bulding, AWS Quick Start team

Visit our GitHub repository for source files and to post feedback, report bugs, or submit feature ideas for this Quick Start.

This Quick Start was created by InterWorks Inc. in collaboration with Amazon Web Services (AWS). Quick Starts are automated reference deployments that use AWS CloudFormation templates to deploy key technologies on AWS, following AWS best practices.

Overview

This Quick Start reference deployment guide provides step-by-step instructions for deploying Tableau for Amazon SageMaker. This Quick Start extends your Tableau dashboard functionality so you can integrate Amazon SageMaker machine learning (ML) models in Tableau’s calculated fields. The serverless application it deploys is based on Tableau’s analytics extension framework. With it, you can connect SageMaker ML models to Tableau workbooks in both Tableau Desktop and Tableau Server.

Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on the Quick Start.

Tableau for Amazon SageMaker on AWS

This Quick Start deploys a REST API managed by Amazon API Gateway, lambda functions to connect Tableau and SageMaker, and Amazon Cognito for user authentication.

The deployment is designed to work with ML models trained with Amazon SageMaker Autopilot without the need for customizations. However, it supports integration of any ML models hosted by SageMaker. You’re responsible for customizing the deployment to match the Tableau Analytics Extension API and your custom-model input and output formats. For more information, see Customization, later in this guide.

AWS costs

You are responsible for the cost of the AWS services and any third-party licenses used while running this Quick Start. There is no additional cost for using the Quick Start.

The AWS CloudFormation templates for Quick Starts include configuration parameters that you can customize. Some of the settings, such as the instance type, affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you use. Prices are subject to change.

After you deploy the Quick Start, create AWS Cost and Usage Reports to deliver billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. These reports provide cost estimates based on usage throughout each month and aggregate the data at the end of the month. For more information, see What are AWS Cost and Usage Reports?

Software licenses

There is no license required to launch this Quick Start. However, to use the connector from your Tableau environment, a license is required for Tableau Server or Tableau Desktop.

Architecture

Deploying this Quick Start into a new virtual private cloud (VPC) using the default parameters builds the following serverless environment in the AWS Cloud:

Architecture
Figure 1. Quick Start architecture for {Tableau for Amazon SageMaker} on AWS

As shown in Figure 1, this Quick Start sets up the following:

  • In the authentication group:

    • Amazon Cognito to provide a managed portal for sign-up and sign-in of connector users and a user pool for authentication.

    • An Amazon API Gateway lambda authorizer to connect API Gateway to the Amazon Cognito user pool.

  • Amazon API Gateway with REST API containing endpoints (GET /info and POST /evaluate).

  • A VPC, configured according to AWS best practices, to provide you with your own virtual network on AWS.*

  • In the VPC:

    • Two lambda functions, one for each REST API endpoint.

    • A VPC endpoint connected to Amazon SageMaker.

*The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.

Planning the deployment

Specialized knowledge

This deployment requires a moderate level of familiarity with AWS services. If you’re new to AWS, see Getting Started Resource Center and AWS Training and Certification. These sites provide materials for learning how to design, deploy, and operate your infrastructure and applications on the AWS Cloud.

This Quick Start assumes familiarity with the AWS services listed in the References section, later in this guide.

AWS account

If you don’t already have an AWS account, create one at https://aws.amazon.com by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using the phone keypad.

Your AWS account is automatically signed up for all AWS services. You are charged only for the services you use.

Technical requirements

Before you launch the Quick Start, review the following information and ensure that your account is properly configured. Otherwise, deployment might fail.

Resource quotas

If necessary, request service quota increases for the following resources. You might need to request increases if your existing deployment currently uses these resources and if this Quick Start deployment could result in exceeding the default quotas. The Service Quotas console displays your usage and quotas for some aspects of some services. For more information, see What is Service Quotas? and AWS service quotas.

Resource

This deployment uses

AWS Identity and Access Management (IAM) roles

3

AWS Lambda functions

3

AWS Lambda permissions

3

REST APIs

1

API Gateway stages

2

Amazon Cognito user pool domains

1

Amazon Cognito user pools

1

Amazon Cognito user pool token clients

1

Amazon Route53 record sets

1

API Gateway domain names

1

VPC endpoints

0 or 1

Security groups

0 or 1

Supported AWS Regions

For any Quick Start to work in a Region other than its default Region, all the services it deploys must be supported in that Region. You can launch a Quick Start in any Region and see if it works. If you get an error such as “Unrecognized resource type,” the Quick Start is not supported in that Region.

For an up-to-date list of AWS Regions and the AWS services they support, see AWS Regional Services.

Certain Regions are available on an opt-in basis. For more information, see Managing AWS Regions.

IAM permissions

Before launching the Quick Start, you must sign in to the AWS Management Console with IAM permissions for the resources that the templates deploy. The AdministratorAccess managed policy within IAM provides sufficient permissions, although your organization may choose to use a custom policy with more restrictions. For more information, see AWS managed policies for job functions.

Prerequisites

Before deploying the Tableau for Amazon SageMaker on AWS, you must have the following:

  • An AWS account.

  • A domain managed by Amazon Route 53.

  • An SSL certificate managed by AWS Certificate Manager.

Deployment options

This Quick Start provides three deployment options:

  • Deploy Tableau for Amazon SageMaker into a new VPC (end-to-end deployment). Builds a new AWS environment consisting of a VPC, API, AWS Lambda functions, identity provider, and other network components.

  • Deploy Tableau for Amazon SageMaker into an existing VPC. Provisions resources into your existing AWS VPC.

To deploy the Quick Start into the AWS Cloud but not into a VPC, deploy the Deploy Tableau for Amazon SageMaker into an existing VPC template. The set the Launch into VPC parameter to No.

The Quick Start provides separate templates for these options. It also lets you configure Classless Inter-Domain Routing (CIDR) blocks, instance types, and other settings, as discussed later in this guide.

Deployment steps

Prepare an AWS Account

  1. If you don’t already have an AWS account, create one at http://aws.amazon.com by following the on-screen instructions.

  2. Use the Region selector in the navigation bar to choose the AWS Region where you want to deploy the Quick Start. Deploy the Quick Start to the same Region where your SageMaker Autopilot models are deployed. If you have Autopilot models deployed to multiple Regions, the recommended architecture is to deploy an instance of the connector to each Region.

  3. Create an SSL Certificate. This certificate must be provisioned in the us-east-1 region. To do this, in the navigation pane of the AWS Certificate Manager console, choose Provision certificates or Request a certificate. Then, enter the domain name you plan to use. Optionally, use an asterisk (*) to create a wildcard certificate for subdomains.

Launch the Quick Start

  1. Sign in to your AWS account, and choose one of the following options to launch the AWS CloudFormation template. For help with choosing an option, see deployment options, earlier in this guide. To deploy without a VPC, choose Deploy Tableau for Amazon SageMaker into an existing VPC on AWS and set the Launch to VPC parameter to No.

Deploy Tableau for Amazon SageMaker into a new VPC on AWS

View template

Deploy Tableau for Amazon SageMaker into an existing VPC on AWS

View template

You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using this Quick Start. Prices are subject to change. See the pricing pages for each AWS service you use in this Quick Start for full details.
  1. Check the Region that’s displayed in the upper-right corner of the navigation bar, and change it if necessary. This Region is where the Quick Start infrastructure is built. The template for this Quick Start is launched in the US East (N. Virginia) Region by default. You can also download the templates to use as a starting point for your own implementation.

  2. On the Create stack page, keep the default setting for the template URL, and then choose Next.

  3. On the Specify Details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings and customize them as necessary. For details on each parameter, see the Parameter reference section of this guide. After reviewing and customizing the parameters, choose Next.

  4. On the Configure stack options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you finish, choose Next.

  5. On the Review page, review and confirm the template settings. Under Capabilities, select the two check boxes to acknowledge that the template creates IAM resources and might require the ability to automatically expand macros.

  6. Choose Create stack to deploy the stack.

  7. Monitor the status of the stack. When the status is CREATE_COMPLETE, the Tableau for SageMaker deployment is ready.

  8. To view the created resources, see the values displayed in the Outputs tab for the stack.

As shown in [cfn_outputs], the following resources display on the Outputs tab after deploying the Quick Start:

  • SageMakerTableauApi: The URL for users to connect to the deployment from Tableau.

  • UserPoolDomain: The Amazon Cognito URL to sign up and sign in users of the deployment.

Test the deployment

To test the deployment, navigate to the UserPoolDomain URL displayed in the Outputs tab, and sign up as a user. Then, sign in with the new credentials.

Optionally, you can test from Tableau (version 2020.1 or later) by doing the following:

  1. In Tableau Desktop, choose Help, Settings & Performance, Manage Analytics Extension Connection.

  2. For Select an Analytics Extension, choose TabPy/External API.

  3. Choose a server from the dropdown list.

  4. For Port, enter 443.

  5. Select Sign in with a username and password, then enter your user name and password.

  6. Select Require SSL.

  7. Choose Test Connection.

  8. Click OK. If successful, the message Successfully connected to the analytics extension displays. If unsuccessful, an error message displays.

TableauConnection
Figure 2. Analytics Extension Connection dialog box

After testing verify that the SolutionSG security group’s inbound and outbound rules conform to your VPC security policies. Modify them as needed. The SolutionSG security group is found on the Resources tab of the CloudFormation console after deploying the stack. For more information, see Work with security groups.

Additional Information

Best practices for using Tableau for Amazon SageMaker on AWS

While using the Tableau for Amazon SageMaker deployment, it is important to follow Tableau Desktop and SageMaker best practices. You can use any ML model hosted by SageMaker. However, you should pass data from Tableau’s calculated fields to the analytics extension at the granularity expected by the model (for example, with no aggregation or translation).

This deployment can be called with Tableau SCRIPT_REAL, SCRIPT_STR, SCRIPT_INT, and SCRIPT_BOOL functions. With these table-calculation functions, you can pass a script and block of data to an external analytics engine. Use these functions with the following syntax:

Script_Function (‘[SageMaker Hosted Endpoint]’, <fields in dataset to pass to model>)

TableauCalc
Figure 3. Mapping a Tableau data source to the input schema of a SageMaker-hosted ML model
  • The function in your calculated field must match the type of data returned by the SageMaker model.

  • The SageMaker model must have a hosted endpoint.

  • Pass fields in the Tableau dataset in the order expected by the SageMaker model.

Customization

We recommend that you use Autopilot-trained ML models with this deployment. To use models that are not trained by Autopilot, you may need to customize the deployment. Tableau sends data from the analytics extension in the following format:

Tableau analytics extension data format
{'_arg1': [37, 40, 56, 45, 46, 55, 52, 45], '_arg2': ['services', 'admin.', 'services', 'services', 'blue-collar', 'retired', 'technician', 'blue-collar'], '_arg3': ['married', 'married', 'married', 'married', 'married', 'single', 'married', 'married'], '_arg4': ['high.school', 'basic.6y', 'high.school', 'basic.9y', 'basic.6y', 'high.school', 'basic.9y', 'basic.9y'], '_arg5': ['no', 'no', 'no', 'unknown', 'unknown', 'no', 'no', 'no'], '_arg6': ['yes', 'no', 'no', 'no', 'yes', 'yes', 'yes', 'yes'], '_arg7': ['no', 'no', 'yes', 'no', 'yes', 'no', 'no', 'no'], '_arg8': ['telephone', 'telephone', 'telephone', 'telephone', 'telephone', 'telephone', 'telephone', 'telephone'], '_arg9': ['may', 'may', 'may', 'may', 'may', 'may', 'may', 'may'], '_arg10': ['mon', 'mon', 'mon', 'mon', 'mon', 'mon', 'mon', 'mon'], '_arg11': [226, 151, 307, 198, 440, 342, 1666, 225], '_arg12': [1, 1, 1, 1, 1, 1, 1, 2], '_arg13': [999, 999, 999, 999, 999, 999, 999, 999], '_arg14': [0, 0, 0, 0, 0, 0, 0, 0], '_arg15': ['nonexistent', 'nonexistent', 'nonexistent', 'nonexistent', 'nonexistent', 'nonexistent', 'nonexistent', 'nonexistent'], '_arg16': [1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1], '_arg17': [93.994, 93.994, 93.994, 93.994, 93.994, 93.994, 93.994, 93.994], '_arg18': [-36.4, -36.4, -36.4, -36.4, -36.4, -36.4, -36.4, -36.4], '_arg19': [4.857, 4.857, 4.857, 4.857, 4.857, 4.857, 4.857, 4.857], '_arg20': [5191, 5191, 5191, 5191, 5191, 5191, 5191, 5191]}

The evaluate endpoint’s AWS Lambda function contains the function create_sagemaker_body. It transforms Tableau JSON into the following comma-separated text output:

Formatted data for SageMaker Autopilot-trained model
37,services,married,high.school,no,yes,no,telephone,may,mon,226,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191
40,admin.,married,basic.6y,no,no,no,telephone,may,mon,151,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191
56,services,married,high.school,no,no,yes,telephone,may,mon,307,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191
45,services,married,basic.9y,unknown,no,no,telephone,may,mon,198,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191
46,blue-collar,married,basic.6y,unknown,yes,yes,telephone,may,mon,440,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191
55,retired,single,high.school,no,yes,no,telephone,may,mon,342,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191
52,technician,married,basic.9y,no,yes,no,telephone,may,mon,1666,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191
45,blue-collar,married,basic.9y,no,yes,no,telephone,may,mon,225,2,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191

The evaluate endpoint’s AWS Lambda function is found on the Resources tab of the AWS CloudFormation console after stack deployment is complete. It is authored in Python 3.7.

The preprocessing logic accommodates additional data transformations and facilitates the integration of your customizations of this deployment. We do not recommend modifying AWS Lambda function code itself if your ML model needs additional transformations. The best practice is to package preprocessing logic with the ML model as an SageMaker inference pipeline. For more information, see Preprocess input data before making predictions using Amazon SageMaker inference pipelines and Scikit-learn.

References

Quick Start reference deployments

GitHub repository

You can visit our GitHub repository to download the templates and scripts for this Quick Start, to post your comments, and to share your customizations with others.

FAQ

Q. I encountered a CREATE_FAILED error when I launched the Quick Start.

A. If AWS CloudFormation fails to create the stack, relaunch the template with Rollback on failure set to Disabled. This setting is under Advanced in the AWS CloudFormation console on the Configure stack options page. With this setting, the stack’s state is retained, and you can troubleshoot the issue.

When you set Rollback on failure to Disabled, you continue to incur AWS charges for this stack. Ensure that you delete stack after troubleshooting.

For more information, see Troubleshooting AWS CloudFormation.

Q. I encountered a size-limitation error when I deployed the AWS CloudFormation templates.

A. Launch the Quick Start templates from the links in this guide or from another S3 bucket. If you deploy the templates from a local copy on your computer or from a location other than an S3 bucket, you might encounter template-size limitations. For more information, see AWS CloudFormation quotas.

Q. How do I launch the Quick Start into AWS GovCloud (US)?

A. While AWS GovCloud (US) isn’t listed as a supported Region, this Quick Start can be deployed into AWS GovCloud (US) with a few modifications to the workload template. To do this, make the following changes to the SageMakerAPI section of the workload template:

Domain:
  CertificateArn: !Ref CertificateARN
  DomainName: !Ref DomainName
  EndpointConfiguration: EDGE
  Route53:
   HostedZoneId: !Ref HostedZoneId
  • Change the EndpointConfiguration parameter from EDGE to REGIONAL.

  • Delete Route53: and HostedZoneId: !Ref HostedZoneId, the two lines immediately following the EndpointConfiguration parameter.

Domain:
  CertificateArn: !Ref CertificateARN
  DomainName: !Ref DomainName
  EndpointConfiguration: REGIONAL

These changes require a modification to the deployment steps, as your certificate is deployed in the same Region as your AWS deployment to support the Regional domain configuration.

After modifying the workload template, you can launch the Quick Start into your AWS account. After the stack is deployed, navigate to Route 53 and identify the alias record created for the custom domain (matching the output in your CloudFormation console). Manually modify this record to change it from an alias record to a CNAME record. For more information about AWS GovCloud (US) and Route 53, see Amazon Route 53.

With these modifications, the deployment should be accessible from AWS GovCloud (US). You can test this by calling the info endpoint of your API using the custom domain name.

Customer responsibility

After you successfully deploy this Quick Start, confirm that your resources and services are updated and configured — including any required patches — to meet your security and other needs. For more information, see the AWS Shared Responsibility Model.

Parameter reference

Unless you are customizing the Quick Start templates for your own deployment projects, keep the default settings for the parameters labeled Quick Start S3 bucket name, Quick Start S3 bucket Region, and Quick Start S3 key prefix. Changing these parameter settings automatically updates code references to point to a new Quick Start location. For more information, see the AWS Quick Start Contributor’s Guide.

Launch into a new VPC

Table 1. Network configuration
Parameter label (name) Default value Description

Availability Zones (AvailabilityZones)

Requires input

List of Availability Zones to use for the subnets in the VPC. Two Availability Zones are used for this deployment.

VPC CIDR (VPCCIDR)

10.0.0.0/16

CIDR block for the VPC.

Private subnet 1 CIDR (PrivateSubnet1CIDR)

10.0.0.0/19

CIDR block for private subnet 1, located in Availability Zone 1.

Private subnet 2 CIDR (PrivateSubnet2CIDR)

10.0.32.0/19

CIDR block for private subnet 2, located in Availability Zone 2.

Public subnet 1 CIDR (PublicSubnet1CIDR)

10.0.128.0/20

CIDR Block for the public DMZ subnet 1, located in Availability Zone 1.

Public subnet 2 CIDR (PublicSubnet2CIDR)

10.0.144.0/20

CIDR Block for the public DMZ subnet 2, located in Availability Zone 2.

VPC tenancy (VPCTenancy)

default

Tenancy of instances launched into the VPC.

Table 2. AWS Quick Start configuration
Parameter label (name) Default value Description

Quick Start S3 bucket name (QSS3BucketName)

aws-quickstart

Name of the S3 bucket for your copy of the Quick Start assets. Keep the default name unless you are customizing the template. Changing the name updates code references to point to a new Quick Start location. This name can include numbers, lowercase letters, uppercase letters, and hyphens, but do not start or end with a hyphen (-). See https://aws-quickstart.github.io/option1.html.

Quick Start S3 key prefix (QSS3KeyPrefix)

quickstart-tableau-sagemaker/

S3 key prefix that is used to simulate a directory for your copy of the Quick Start assets. Keep the default prefix unless you are customizing the template. Changing this prefix updates code references to point to a new Quick Start location. This prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/). See https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html and https://aws-quickstart.github.io/option1.html.

Quick Start S3 bucket Region (QSS3BucketRegion)

us-east-1

AWS Region where the Quick Start S3 bucket (QSS3BucketName) is hosted. Keep the default Region unless you are customizing the template. Changing this Region updates code references to point to a new Quick Start location. When using your own bucket, specify the Region. See https://aws-quickstart.github.io/option1.html.

Table 3. Domain configuration
Parameter label (name) Default value Description

Domain Name (DomainName)

Requires input

Route 53 hosted domain, with prefix. For example, tableauapi.domain.com.

Hosted Zone ID (HostedZoneId)

Requires input

Route 53 hosted zone ID of the domain.

Certificate ARN (CertificateARN)

Requires input

Amazon Resource Number (ARN) of the domain certificate.

Launch into existing VPC

Table 4. Network configuration
Parameter label (name) Default value Description

Launch into VPC (LaunchToVpc)

Requires input

Choose Yes to deploy into a VPC. Choose No to deploy without a VPC.

VPC ID (VpcId)

Requires input

ID of the VPC to deploy into.

Subnet IDs (SubnetIds)

Requires input

ID of the subnet to deploy into.

Table 5. Domain configuration
Parameter label (name) Default value Description

Domain name (DomainName)

Requires input

Route 53 hosted domain, with prefix. For example, tableauapi.domain.com.

Hosted zone ID (HostedZoneId)

Requires input

Route 53 hosted zone ID for the domain.

Certificate ARN (CertificateARN)

Requires input

ARN of domain certificate.

Table 6. AWS Quick Start configuration
Parameter label (name) Default value Description

Quick Start S3 bucket name (QSS3BucketName)

aws-quickstart

Name of the S3 bucket for your copy of the Quick Start assets. Keep the default name unless you are customizing the template. Changing the name updates code references to point to a new Quick Start location. This name can include numbers, lowercase letters, uppercase letters, and hyphens, but do not start or end with a hyphen (-). See https://aws-quickstart.github.io/option1.html.

Quick Start S3 key prefix (QSS3KeyPrefix)

quickstart-tableau-sagemaker/

S3 key prefix that is used to simulate a directory for your copy of the Quick Start assets. Keep the default prefix unless you are customizing the template. Changing this prefix updates code references to point to a new Quick Start location. This prefix can include numbers, lowercase letters, uppercase letters, hyphens (-), and forward slashes (/). See https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html and https://aws-quickstart.github.io/option1.html.

Send us feedback

To post feedback, submit feature ideas, or report bugs, use the Issues section of the GitHub repository for this Quick Start. To submit code, see the Quick Start Contributor’s Guide.

Quick Start reference deployments

GitHub repository

Visit our GitHub repository to download the templates and scripts for this Quick Start, to post your comments, and to share your customizations with others.


Notices

This document is provided for informational purposes only. It represents AWS’s current product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied. See the License for specific language governing permissions and limitations.