AWS ECS Cluster by HTTP
Overview
The template to monitor AWS ECS Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.
Additional information about the metrics and used API methods:
- Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-metrics-ECS.html
Requirements
Zabbix version: 6.4 and higher.
Tested versions
This template has been tested on:
- AWS ECS Cluster by HTTP
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API.
Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.
Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices",
"esc:ListTasks"
],
"Effect":"Allow",
"Resource":"*"
}
]
}
If you are using role-based authorization, set the appropriate permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarms",
"cloudwatch:GetMetricData",
"ecs:ListServices",
"esc:ListTasks",
"ec2:AssociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation"
],
"Resource": "*"
}
]
}
Set the following macros "{$AWS.AUTH_TYPE}", "{$AWS.REGION}", "{$AWS.ECS.CLUSTER.NAME}"
If you are using access key-based authorization, set the following macros "{$AWS.ACCESS.KEY.ID}", "{$AWS.SECRET.ACCESS.KEY}"
For more information about managing access keys, see official documentation
Refer to the Macros section for a list of macros used for LLD filters.
Additional information about the metrics and used API methods:
- Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html
Macros used
Name | Description | Default |
---|---|---|
{$AWS.PROXY} | Sets HTTP proxy value. If this macro is empty then no proxy is used. |
|
{$AWS.ACCESS.KEY.ID} | Access key ID. |
|
{$AWS.SECRET.ACCESS.KEY} | Secret access key. |
|
{$AWS.REGION} | Amazon ECS Region code. |
us-west-1 |
{$AWS.AUTH_TYPE} | Authorization method. Possible values: role_base, access_key. |
access_key |
{$AWS.ECS.CLUSTER.NAME} | ECS cluster name. |
|
{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES} | Filter of discoverable alarms by name. |
.* |
{$AWS.ECS.LLD.FILTER.ALARM_NAME.NOT_MATCHES} | Filter to exclude discovered alarms by name. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES} | Filter of discoverable alarms by namespace. |
.* |
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES} | Filter to exclude discovered alarms by namespace. |
CHANGE_IF_NEEDED |
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES} | Filter of discoverable services by name. |
.* |
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES} | Filter to exclude discovered services by name. |
CHANGE_IF_NEEDED |
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} | The warning threshold of the cluster CPU utilization expressed in %. |
70 |
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} | The warning threshold of the cluster memory utilization expressed in %. |
70 |
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} | The warning threshold of the cluster service CPU utilization expressed in %. |
80 |
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} | The warning threshold of the cluster service memory utilization expressed in %. |
80 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ECS Cluster: Get cluster metrics | Get cluster metrics. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html |
Script | aws.ecs.get_metrics Preprocessing
|
AWS ECS Cluster: Get cluster services | Get cluster services. Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html |
Script | aws.ecs.get_cluster_services Preprocessing
|
AWS ECS Cluster: Get alarms data | Get alarms data. DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html |
Script | aws.ecs.get_alarms Preprocessing
|
AWS ECS Cluster: Get metrics check | Data collection check. |
Dependent item | aws.ecs.metrics.check Preprocessing
|
AWS ECS Cluster: Get alarms check | Data collection check. |
Dependent item | aws.ecs.alarms.check Preprocessing
|
AWS ECS Cluster: Container Instance Count | 'The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.' |
Dependent item | aws.ecs.container_instance_count Preprocessing
|
AWS ECS Cluster: Task Count | 'The number of tasks running in the cluster.' |
Dependent item | aws.ecs.task_count Preprocessing
|
AWS ECS Cluster: Service Count | 'The number of services in the cluster.' |
Dependent item | aws.ecs.service_count Preprocessing
|
AWS ECS Cluster: CPU Reserved | 'A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition.' |
Dependent item | aws.ecs.cpu_reserved Preprocessing
|
AWS ECS Cluster: CPU Utilization | Cluster CPU utilization |
Dependent item | aws.ecs.cpu_utilization Preprocessing
|
AWS ECS Cluster: Memory Utilization | 'The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.' |
Dependent item | aws.ecs.memory_utilization Preprocessing
|
AWS ECS Cluster: Network rx bytes | 'The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.' |
Dependent item | aws.ecs.network.rx Preprocessing
|
AWS ECS Cluster: Network tx bytes | 'The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.' |
Dependent item | aws.ecs.network.tx Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ECS Cluster: Failed to get metrics data | length(last(/AWS ECS Cluster by HTTP/aws.ecs.metrics.check))>0 |
Warning | ||
AWS ECS Cluster: Failed to get alarms data | length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarms.check))>0 |
Warning | ||
AWS ECS Cluster: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS ECS Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} |
Warning | |
AWS ECS Cluster: High memory utilization | The system is running out of free memory. |
min(/AWS ECS Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} |
Warning |
LLD rule Cluster Alarms discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster Alarms discovery | Discovery instance alarms. |
Dependent item | aws.ecs.alarms.discovery Preprocessing
|
Item prototypes for Cluster Alarms discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ECS Cluster Alarms: [{#ALARM_NAME}]: Get metrics | Get alarm metrics about the state and its reason. |
Dependent item | aws.ecs.alarm.get_metrics["{#ALARM_NAME}"] Preprocessing
|
AWS ECS Cluster Alarms: [{#ALARM_NAME}]: State reason | An explanation for the alarm state, in text format. Alarm description: {#ALARM_DESCRIPTION} |
Dependent item | aws.ecs.alarm.state_reason["{#ALARM_NAME}"] Preprocessing
|
AWS ECS Cluster Alarms: [{#ALARM_NAME}]: State | The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM). Alarm description: {#ALARM_DESCRIPTION} |
Dependent item | aws.ecs.alarm.state["{#ALARM_NAME}"] Preprocessing
|
Trigger prototypes for Cluster Alarms discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ECS Cluster Alarms: [{#ALARM_NAME}] has 'Alarm' state | Alarm "{#ALARM_NAME}" has 'Alarm' state. |
last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0 |
Average | |
AWS ECS Cluster Alarms: [{#ALARM_NAME}] has 'Insufficient data' state | last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1 |
Info |
LLD rule Cluster Services discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster Services discovery | Discovery {$AWS.ECS.CLUSTER.NAME} services. |
Dependent item | aws.ecs.services.discovery Preprocessing
|
Item prototypes for Cluster Services discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Running Task | The number of tasks currently in the |
Dependent item | aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Pending Task | The number of tasks currently in the |
Dependent item | aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Desired Task | The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service. |
Dependent item | aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Task Set | The number of task sets in the {#AWS.ECS.SERVICE.NAME} service. |
Dependent item | aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: CPU Reserved | "A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition." |
Dependent item | aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: CPU Utilization | "A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined CPU reservation in their task definition." |
Dependent item | aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Memory utilized | 'The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.' |
Dependent item | aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Memory utilization | 'The memory being used by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.' |
Dependent item | aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Memory reserved | 'The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is only collected for tasks that have a defined memory reservation in their task definition.' |
Dependent item | aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Network rx bytes | 'The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.' |
Dependent item | aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Network tx bytes | 'The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is only available for containers in tasks using the awsvpc or bridge network modes.' |
Dependent item | aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: Get metrics | Get metrics of ESC services. Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html |
Script | aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"] Preprocessing
|
Trigger prototypes for Cluster Services discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/AWS ECS Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} |
Warning | |
AWS ECS Cluster Service: [{#AWS.ECS.SERVICE.NAME}]: High memory utilization | The system is running out of free memory. |
min(/AWS ECS Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} |
Warning |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums