Prometheus for Amazon EKS

Quick Start Reference Deployment

QS

April 2021
Sumit Joshi and Jay McConnell, AWS Quick Start team

Visit our GitHub repository for source files and to post feedback, report bugs, or submit feature ideas for this Quick Start.

This Quick Start was created by Amazon Web Services (AWS). Quick Starts are automated reference deployments that use AWS CloudFormation templates to deploy key technologies on AWS, following AWS best practices.

Overview

This guide provides instructions for deploying Prometheus for Amazon Elastic Kubernetes Service (Amazon EKS).

This Quick Start deploys Prometheus open-source monitoring for Amazon Elastic Kubernetes Service (Amazon EKS). It uses Prometheus query language (PromQL) to monitor the performance of containerized workloads without the need for underlying infrastructure. You can use the Kubernetes API server to ingest metrics from Amazon EKS nodes and self-managed clusters. For visualizations, Prometheus features an expression browser and offers multiple modes of graphing and dashboard support.

In addition to Prometheus for Amazon EKS, this Quick Start also configures monitoring for Amazon EKS nodes (node-exporter) and the Kubernetes API server (kube-state-metrics).

Amazon Web Service (AWS) customers can deploy Prometheus into a new virtual private cloud (VPC) and a new EKS cluster, an existing VPC and a new EKS cluster, or an existing VPC and an existing EKS cluster.

Architecture

Deploying this Quick Start with default parameters into an existing Amazon EKS cluster builds the following environment. For a diagram of the new virtual private cloud (VPC) and Amazon EKS cluster, see Amazon EKS on the AWS Cloud.

Architecture
Figure 1. Quick Start architecture for Prometheus for Amazon EKS

As shown in Figure 1, the Quick Start sets up the following:

  • A Kubernetes namespace for Prometheus.

  • Node-exporter DaemonSet with a pod to monitor Amazon EKS nodes.

  • Pushgateway deployment with a pod to push metrics from short-lived jobs to intermediary jobs that Prometheus can scrape.

  • Kube-state-metrics DaemonSet with a pod to monitor the Kubernetes API server.

  • Server StatefulSet with a pod and attached persistent volume (PV) to scrap and store time-series data. The pod uses persistent volume claims (PVCs) to request PV resources.

  • Alertmanager StatefulSet with a pod and attached PV for deduplication, grouping, and routing of alerts.

  • Amazon Elastic Block Storage (Amazon EBS) General Purpose SSD (gp2) storage volume.

Planning the deployment

Specialized knowledge

This deployment guide requires a moderate level of familiarity with AWS services. If you’re new to AWS, see the Getting Started Resource Center and AWS Training and Certification. These sites provide materials for learning how to design, deploy, and operate your infrastructure and applications on the AWS Cloud.

This Quick Start assumes familiarity with Amazon EKS, AWS CloudFormation, and Kubernetes.

AWS account

If you don’t already have an AWS account, create one at https://aws.amazon.com by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using your phone’s keypad.

Your AWS account is automatically signed up for all AWS services. You are charged only for the services you use.

Amazon EKS cluster

If you deploy your cluster into an existing Amazon EKS cluster that was not created by the Amazon EKS on the AWS Cloud Quick Start, you must configure your cluster to allow this Quick Start to manage it. For more information, see the Deployment steps section.

IAM permissions

Before launching the Quick Start, you must log in to the AWS Management Console with AWS Identity and Access Management (IAM) permissions for the resources and actions that each template deploys.

The AdministratorAccess managed policy within IAM provides sufficient permissions, although your organization may choose to use a custom policy with more restrictions.

Deployment options

This Quick Start provides three deployment options:

  • Deploy Prometheus into a new VPC (end-to-end deployment). This option builds a new AWS environment consisting of the VPC, subnets, NAT gateways, security groups, bastion hosts, EKS cluster, a node group, and other infrastructure components. It then deploys Prometheus into this new EKS cluster.

  • Deploy Prometheus into a new EKS cluster of an existing VPC. This option builds a new Amazon EKS cluster, node group, and other infrastructure components into an existing VPC. It then deploys Prometheus into this new EKS cluster.

  • Deploy Prometheus into an existing EKS cluster. This option provisions Prometheus in your existing AWS infrastructure. Note that when deploying into an EKS cluster that was not created by the Amazon EKS on the AWS Cloud Quick Start, you must prepare the cluster as described in the Deployment steps section.

Deployment steps

Prepare an existing EKS cluster

This step is only required if you launch this Quick Start into an existing Amazon EKS cluster that was not created using the Amazon EKS on the AWS Cloud deployment. If you want to create a new EKS cluster with your deployment, skip to step 3.
  1. Sign in to your AWS account at https://aws.amazon.com with an IAM user role that has the necessary permissions. For details, see Planning the deployment, earlier in this guide.

  2. Launch the cluster preparation template.

  3. The template launches in the US East (Ohio) Region by default. To change the Region, choose another Region from the list in the upper-right corner of the navigation bar.

  4. On the Create stack page, keep the default setting for the template URL, and then choose Next.

  5. On the Specify stack details page, change the stack name if needed. Enter the name of the Amazon EKS cluster you want to deploy to in addition to the subnet IDs and security group ID associated with the cluster. These can be obtained from the EKS cluster console.

  6. On the Options page, specify the key-value pairs for resources in your stack, and set advanced options. When you’re done, choose Next.

  7. On the Review page, review and confirm your template settings. Under Capabilities, select the two check boxes to acknowledge that the template creates IAM resources and might require the ability to automatically expand macros.

  8. Choose Create stack to deploy the stack.

  9. Monitor the stack’s status until it is CREATE_COMPLETE.

  10. From the Outputs section of the stack, note the KubernetesRoleArn and HelmRoleArn roles.

  11. Add the roles to the aws-auth config map in your cluster, specifying system:masters for the groups. This allows the Quick Start to manage your cluster via AWS CloudFormation. For more information, see Managing users or IAM roles for your cluster.

Unless you are customizing the Quick Start templates for your own deployment projects, we recommend that you keep the default settings for the parameters labeled Quick Start S3 bucket name, Quick Start S3 bucket Region, and Quick Start S3 key prefix. Changing these parameter settings automatically updates code references to point to a new Quick Start location. For more information, see the AWS Quick Start Contributor’s Guide.

Launch the Quick Start

You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using this Quick Start. For full details, see the pricing pages for each AWS service used by this Quick Start. Prices are subject to change.
  1. Sign in to your AWS account, and choose one of the following options to launch the AWS CloudFormation template. For help with choosing an option, see the Deployment options section, earlier in this guide.

Deploy into a new VPC and new Amazon EKS cluster

Deploy into a new Amazon EKS cluster in an existing VPC

Deploy into an existing Amazon EKS cluster

View template

View template

View template

New clusters take about 1.5 hours to deploy. Existing clusters take about 45-60 minutes to deploy.

If you deploy Prometheus into an existing VPC, ensure that any private subnets have NAT gateways in their route tables to allow the Quick Start to download packages and software. Also, ensure that the domain name in the DHCP options is configured. For more information, see DHCP options sets.
  1. Check the AWS Region that’s displayed in the upper-right corner of the navigation bar, and change it if necessary. This is where the network infrastructure for Prometheus is built. The template launches in the us-east-2 Region by default.

  2. On the Create stack page, keep the default setting for the template URL, and then choose Next.

  3. On the Specify stack details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings, and customize them as necessary. For details on each parameter, see the Parameter reference section of this guide. When you finish reviewing and customizing the parameters, choose Next.

  4. On the Options page, specify the key-value pairs for resources in your stack, and set advanced options. When you’re done, choose Next.

  5. On the Review page, review and confirm the template settings. Under Capabilities, select the two check boxes to acknowledge that the template creates IAM resources and might require the ability to automatically expand macros.

  6. Choose Create stack to deploy the stack.

  7. Monitor the status of the stack. When the status is CREATE_COMPLETE, the Prometheus deployment is ready.

  8. Use the values displayed in the Outputs tab for the stack, as shown in the following figure.

cfn_outputs
Figure 2. Prometheus outputs after successful deployment

Test the deployment

Test from a network with access to the Kubernetes API, as configured by the Amazon EKS public access endpoint and Kubernetes API public access CIDR parameters. For more information, see Installing kubectl. If you enabled the optional bastion host, you can connect to it using SSH (Secure Shell). Use the key pair that you specified during deployment and the IP address from the Outputs tab of the AWS CloudFormation stack. The bastion host already has Kubectl installed and configured so that it connects to the cluster. To test the CLI, connect to the cluster, and run the following command.
  1. Configure the Kubectl command line utility to connect to your Amazon EKS cluster according to Cluster authentication.

  2. Set up proxy traffic to the Prometheus server by running the following command:

kubectl port-forward -n prometheus sts/prometheus-server 8080:9090
  1. Navigate to http://localhost:8080/targets/ in your web browser. You should see a page like the one shown in Figure 3:

Web UI
Figure 3. Prometheus web interface

Best practices for using Prometheus on EKS

Best practices for alerting, naming, instrumentation, and more can be found in the Prometheus documentation.

Security

Prometheus contains many components and integrations with other systems, and some configurations may enable additional attack vectors. For more information, see the Prometheus security model.

Deployment customization

To customize the Quick Start, you can create a file with custom values. During stack deployment, enter the file URI in the Override values parameter. The file must be in YAML format and placed in a public HTTPS location or an S3 bucket. If you use an S3 bucket, ensure that the AWS Identity and Access Management (IAM) role awsqs-kubernetes-helm has read permissions to the file. For a list of configurable options, see the prometheus-community/helm-charts GitHub page.

Scalability

By default, this deployment uses General Purpose SSD (gp2) volumes for storage. Large deployments may need more I/O performance or storage space. To increase I/O performance, provide a custom-values file to point the Prometheus server disk to a StorageClass with IOPS provisioned by Amazon EBS. To increase storage size, set server.persistentVolume.size and alertmanager.persistentVolume.size parameters in your custom-values file.

FAQ

Q. I encountered a CREATE_FAILED error when I launched the Quick Start.

A. If AWS CloudFormation fails to create the stack, we recommend that you relaunch the template with Rollback on failure set to Disabled. (This setting is under Advanced in the AWS CloudFormation console, Options page.) With this setting, the stack’s state is retained, and the workload remains running so you can troubleshoot the issue.

When you set Rollback on failure to Disabled, you continue to incur AWS charges for the stack. Ensure that you delete the stack after troubleshooting.

Troubleshooting

Parameter reference

Deploy into a new VPC and new Amazon EKS cluster

The full list of parameters for this entrypoint are documented in Amazon EKS on the AWS Cloud.

Deploy into a new Amazon EKS cluster in an existing VPC

The full list of parameters for this entrypoint are documented in Amazon EKS on the AWS Cloud.

Launch into an existing Amazon EKS cluster

Table 1. Prometheus for Amazon EKS configuration
Parameter label (name) Default value Description

Amazon EKS cluster name (KubeClusterName)

Requires input

Name of the Amazon EKS cluster to deploy Prometheus into.

Namespace (Namespace)

prometheus

(Optional) Kubernetes namespace to deploy Prometheus into.

Override values (OverrideValues)

Blank string

(Optional) URI to a file containing custom values to pass to the Helm install. Can be http(s):// or s3://.

Send us feedback

To post feedback, submit feature ideas, or report bugs, use the Issues section of the GitHub repository for this Quick Start. If you want to submit code, review the Quick Start Contributor’s Guide.

Quick Start reference deployments

GitHub repository

See the GitHub repository to download the templates and scripts for this Quick Start, post comments, and share customizations with others.


Notices

This document is provided for informational purposes only. It represents AWS’s current product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied. See the License for specific language governing permissions and limitations.