ClickHouse Cluster on the AWS Cloud

Quick Start Reference Deployment

QS

October 2021
Wei Qiao, AWS Great China Region Solution team
Troy Ameigh, AWS Integration & Automation team

Visit our GitHub repository for source files and to post feedback, report bugs, or submit feature ideas for this Quick Start.

This Quick Start was created by AWS Great China Region Solution team in collaboration with Amazon Web Services (AWS). Quick Starts are automated reference deployments that use AWS CloudFormation templates to deploy key technologies on AWS, following AWS best practices.

Overview

This Quick Start guide provides instructions for deploying a ClickHouse cluster on the AWS Cloud. ClickHouse is an open-source, column-oriented database management system (DBMS), which can be used for online analytical processing of queries (OLAP).

This deployment is for customers who want to process analytical queries using a DBMS, such as MySQL, PostgreSQL, and Oracle Database. During the deployment, customers can configure the AWS CloudFormation templates to define the desired cluster nodes and settings.

Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on the Quick Start.

ClickHouse Cluster on AWS

ClickHouse is an SQL-based DBMS that customers can use to store, retrieve, and update data. Functions of a DBMS include the following:

  • User-defined catalog of metadata

  • Transaction support

  • Database recovery

  • Access management

  • Database constraints

AWS costs

You are responsible for the cost of the AWS services and any third-party licenses used while running this Quick Start. There is no additional cost for using the Quick Start.

The AWS CloudFormation templates for Quick Starts include configuration parameters that you can customize. Some of the settings, such as the instance type, affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you use. Prices are subject to change.

After you deploy the Quick Start, create AWS Cost and Usage Reports to deliver billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. These reports provide cost estimates based on usage throughout each month and aggregate the data at the end of the month. For more information, see What are AWS Cost and Usage Reports?

Software licenses

No additional licenses are required to deploy this Quick Start.

Architecture

Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters builds the following ClickHouse environment in the AWS Cloud.

Architecture
Figure 1. Quick Start architecture for ClickHouse on AWS

As shown in Figure 1, the Quick Start sets up the following:

  • A highly available architecture that spans two Availability Zones.*

  • A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.*

    • An internet gateway to allow internet access for bastion hosts.*

  • In the public subnets:

    • Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.*

    • A Linux bastion host in an Auto Scaling group to allow inbound Secure Shell (SSH) access to Amazon EC2 instances in public and private subnets.*

  • In the private subnets:

    • A ClickHouse client in an Auto Scaling group to allow administrators to connect to the ClickHouse cluster.

    • A ClickHouse database cluster that contains Amazon EC2 instances.

    • A ZooKeeper cluster that contains Amazon EC2 instances for storing metadata for ClickHouse replication. Each replica stores its state in ZooKeeper as the set of parts and its checksums. The default is three.

  • Elastic Load Balancing for the ClickHouse cluster.

  • An Amazon S3 bucket for tiered storage of the ClickHouse cluster.

  • Amazon CloudWatch Logs to centralize ClickHouse logs and modify the log-retention policy.

  • Amazon Simple Notification Service (Amazon SNS) for sending email notifications when an alarm triggers.

  • AWS Secrets Manager to store dynamically generated passwords.

* The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.

Planning the deployment

Specialized knowledge

This deployment requires a moderate level of familiarity with AWS services. If you’re new to AWS, see Getting Started Resource Center and AWS Training and Certification. These sites provide materials for learning how to design, deploy, and operate your infrastructure and applications on the AWS Cloud.

This Quick Start assumes familiarity with Amazon EC2, Amazon Virtual Private Cloud (VPC), and ClickHouse.

AWS account

If you don’t already have an AWS account, create one at https://aws.amazon.com by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using the phone keypad.

Your AWS account is automatically signed up for all AWS services. You are charged only for the services you use.

Technical requirements

Before you launch the Quick Start, review the following information and ensure that your account is properly configured. Otherwise, deployment might fail.

Resource quotas

If necessary, request service quota increases for the following resources. You might need to request increases if your existing deployment currently uses these resources and if this Quick Start deployment could result in exceeding the default quotas. The Service Quotas console displays your usage and quotas for some aspects of some services. For more information, see What is Service Quotas? and AWS service quotas.

Resource This deployment uses

VPCs

1

Elastic IP addresses

3

Security groups

3

AWS Identity and Access Management (IAM) roles

2

Auto Scaling groups

2

Network Load Balancers

1

Amazon CloudWatch dashboard

1

S3 bucket

1

m5.xlarge instances (ClickHouse cluster)

2

m5.xlarge instances (ClickHouse client)

1

m5.large instances (ZooKeeper cluster)

3

t2.micro instances (bastion hosts)

1

Supported AWS Regions

For any Quick Start to work in a Region other than its default Region, all the services it deploys must be supported in that Region. You can launch a Quick Start in any Region and see if it works. If you get an error such as “Unrecognized resource type,” the Quick Start is not supported in that Region.

For an up-to-date list of AWS Regions and the AWS services they support, see AWS Regional Services.

Certain Regions are available on an opt-in basis. For more information, see Managing AWS Regions.

Amazon EC2 key pairs

Ensure that at least one Amazon EC2 key pair exists in your AWS account in the Region where you plan to deploy the Quick Start. Note the key-pair name because you will use it during deployment. To create a key pair, see Amazon EC2 key pairs and Linux instances.

For testing or proof-of-concept purposes, we recommend creating a new key pair instead of using one that’s already being used by a production instance.

IAM permissions

Before launching the Quick Start, you must sign in to the AWS Management Console with IAM permissions for the resources that the templates deploy. The AdministratorAccess managed policy within IAM provides sufficient permissions, although your organization may choose to use a custom policy with more restrictions. For more information, see AWS managed policies for job functions.

Deployment options

This Quick Start provides two deployment options:

  • Deploy ClickHouse into a new VPC. This option builds a new AWS environment consisting of the VPC, subnets, NAT gateways, security groups, bastion hosts, and other infrastructure components. It then deploys ClickHouse into this new VPC.

  • Deploy ClickHouse into an existing VPC. This option provisions ClickHouse in your existing AWS infrastructure.

The Quick Start provides separate templates for these options. It also lets you configure Classless Inter-Domain Routing (CIDR) blocks, instance types, and ClickHouse settings, as discussed later in this guide.

Deployment steps

Confirm your AWS account configuration

  1. Sign in to your AWS account at https://aws.amazon.com with an IAM user role that has the necessary permissions. For details, see Planning the deployment earlier in this guide.

  2. Make sure that your AWS account is configured correctly, as discussed in the Technical requirements section.

Launch the Quick Start

If you’re deploying ClickHouse into an existing VPC, make sure that your VPC has two private subnets in different Availability Zones for the workload instances and that the subnets aren’t shared. This Quick Start doesn’t support shared subnets. These subnets require NAT gateways in their route tables to allow the instances to download packages and software without exposing them to the internet. Also make sure that the domain name option in the DHCP options is configured as explained in DHCP options sets. You provide your VPC settings when you launch the Quick Start.

Each deployment takes about 60 minutes to complete.

  1. Sign in to your AWS account, and choose one of the following options to launch the AWS CloudFormation template. For help with choosing an option, see Deployment options earlier in this guide.

    Deploy ClickHouse into a new VPC on AWS

    View template

    Deploy ClickHouse into an existing VPC on AWS

    View template

  2. Check the AWS Region that’s displayed in the upper-right corner of the navigation bar, and change it if necessary. This Region is where the network infrastructure for ClickHouse is built. The template is launched in the us-east-1 Region by default. For other choices, see Supported Regions, earlier in this guide.

  3. On the Create stack page, keep the default setting for the template URL, and then choose Next.

  4. On the Specify stack details page, change the stack name if needed. Review the parameters for the template, and provide values for any parameters that require input. For all other parameters, review the default settings and customize them as necessary. When you finish reviewing and customizing the parameters, choose Next.

  5. On the Configure stack options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you finish, choose Next.

  6. On the Review page, review and confirm the template settings. Under Capabilities, select the two check boxes to acknowledge that the template creates IAM resources and might require the ability to automatically expand macros.

  7. Choose Create stack to deploy the stack.

  8. Monitor the status of the stack. When the status is CREATE_COMPLETE, the ClickHouse deployment is ready.

  9. To view the created resources, see the values displayed in the Outputs tab for the stack.

Log in to a ClickHouse server node

  1. Locate your private key, which is used to launch clickhouse.pem. Run the following command to ensure your key is not publicly viewable:

chmod 400 ./clickhouse.pem
  1. Upload the PEM key pair to the bastion host:

scp -i "clickhouse.pem" ./clickhouse.pem ec2-user@ec2-11-11-11-11.compute-1.amazonaws.com:/home/ec2-user
  1. Sign in to the bastion host:

ssh -i "clickhouse.pem" ec2-user@ec2-11-11-11-11.compute-1.amazonaws.com
  1. Obtain the IP address from the Amazon EC2 console.

    1. Navigate to the Amazon EC2 console.

    2. On the Instances page, select the check box for your instance. In the Description tab, note the Private IPs, as shown in Figure 2.

ClickHouseClientAddress
Figure 2. Private IP address for Amazon EC2 instance
  1. From the bastion host, ensure that the PEM key pair file is in the bastion host directory, and then log in to the client node:

    (ec2-user@ip-11-11-11-11) $ ssh -i "clickhouse.pem" ec2-user@ec2-22-22-22-22.compute-1.amazonaws.com
    
    
           __|  __|_  )
           _|  (     /   Amazon Linux 2 AMI
          ___|\___|___|
    
    https://aws.amazon.com/amazon-linux-2/
    No packages needed for security; 2 packages available
    Run "sudo yum update" to apply all updates.
    [ec2-user@ip-22-22-22-22 ~]$
  2. To query, manage, and diagnose issues, use the ClickHouse command line client.

Grafana web console

By default, the deployment installs the Grafana web console on the ClickHouse client host in the private subnets. The subnets cannot be accessed directly through a browser. To access port 3000 of the private IP address of the Grafana server, configure an SSH (Secure Shell) connection using the tunnel of the bastion host. Then use the SSH tunnel to access the web console.

  1. Connect to the bastion host using SSH. Replace port number, key pair.pem, user name, and host name with your parameters:

    ssh -qTfnN -D port number -i "key pair.pem" user name@host name

    For example:

    ssh -qTfnN -D 40011 -i "clickhouse.pem" ec2-user@ec2-54-223-36-247.cn-north-1.compute.amazonaws.com.cn

  2. Set up a proxy manager, such as Proxy SwitchyOmega, in your browser. There are many proxy manager plugins available. The following example uses Proxy SwitchyOmega.

    • Install Proxy SwitchyOmega for Microsoft Edge

    • [Install Proxy SwitchyOmega for Mozilla Firefox

    • [Install Proxy SwitchyOmega for Google Chrome]

      1. Open the SwitchyOmega Options page, and choose New Profile in the left sidebar.

        SwitchyOmega
        Figure 3. Add a new SwitchyOmega profile.
      2. Enter a name, and choose Create.

        SwitchyOmega
        Figure 4. Profile name
      3. Provide the protocol, server, and port for the proxy server. The port is the local port where you set up the SSH tunnel.

        SwitchyOmega
        Figure 5. Proxy servers.
      4. Choose Apply Changes.

      5. Access SwitchyOmega through the extension in your browser. Choose your created profile in the proxy list. The browser sends all traffic through port 40011 to the bastion host.

      SwitchyOmega
      Figure 6. Proxy list.
  3. To view the Grafana web console on the ClickHouse client host, navigate to http://10.0.xx.xx:3000 (replace xx.xx with the private IP address of the client host). You can find the private IP address of the server named ClickHouseAdminClient in the Amazon EC2 console.

    ec2
    Figure 7. Private IP address of the ClickHouse client host in the Amazon EC2 console.
    console
    Figure 8. Grafana web console
  1. The user name is admin. To retrieve the password for the Grafana web console, navigate to the AWS CloudFormation console, choose Outputs, and search for the DBPassword parameter.

    console
    Figure 9. AWS CloudFormation outputs
  1. To find the password, navigate to the AWS Secrets Manager console, and choose Retrieve secret value.

    console
    Figure 10. AWS Secrets Manager console

Resources

ClickHouse server nodes

  • ClickHouse server installation directory: /etc/clickhouse-server

  • ClickHouse server data directory in local file storage: /home/clickhouse/data

  • ClickHouse server data directory in S3 bucket: clickhouse-data-vpcid

  • Deployment script installation log to troubleshoot error messages: /home/ec2-user/ch-install.log

ClickHouse client nodes

  • ClickHouse client installation directory: /etc/clickhouse-client

  • Deployment script installation log to troubleshoot error messages: /home/ec2-user/clickhouse-client-install.log

  • Grafana web console: /etc/grafana

ZooKeeper server nodes

  • Apache ZooKeeper installation directory: /usr/local/apache-zookeeper-3.5.9-bin/

  • Deployment script installation logs: /home/ec2-user/zk.log

ClickHouse resources

FAQ

Q. I encountered a CREATE_FAILED error when I launched the Quick Start.

A. If AWS CloudFormation fails to create the stack, relaunch the template with Rollback on failure set to Disabled. This setting is under Advanced in the AWS CloudFormation console on the Configure stack options page. With this setting, the stack’s state is retained, and the instance keeps running so that you can troubleshoot the issue. (For Windows, look at the log files in %ProgramFiles%\Amazon\EC2ConfigService and C:\cfn\log.)

When you set Rollback on failure to Disabled, you continue to incur AWS charges for this stack. Delete the stack when you finish troubleshooting.

For more information, see Troubleshooting AWS CloudFormation.

Q. I encountered a size-limitation error when I deployed the AWS CloudFormation templates.

A. Launch the Quick Start templates from the links in this guide or from another S3 bucket. If you deploy the templates from a local copy on your computer or from a location other than an S3 bucket, you might encounter template-size limitations. For more information, see AWS CloudFormation quotas.

Q. How can I access ClickHouse internal logs?

A. Internal ClickHouse and ZooKeeper logs are found in the dashboard. On the AWS CloudFormation Service page, choose the Quick Start stack you created, and then choose CloudWatchDashboard from Outputs.

Customer responsibility

After you successfully deploy this Quick Start, confirm that your resources and services are updated and configured — including any required patches — to meet your security and other needs. For more information, see the AWS Shared Responsibility Model.

Send us feedback

To post feedback, submit feature ideas, or report bugs, use the Issues section of the GitHub repository for this Quick Start. To submit code, see the Quick Start Contributor’s Guide.

Quick Start reference deployments

GitHub repository

Visit our GitHub repository to download the templates and scripts for this Quick Start, to post your comments, and to share your customizations with others.


Notices

This document is provided for informational purposes only. It represents AWS’s current product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied. See the License for specific language governing permissions and limitations.