Hadoop

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

Available solutions




This template is for Zabbix version: 5.4
Also available for: 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/hadoop_http?at=release/5.4

Hadoop by HTTP

Overview

For Zabbix version: 5.4 and higher
The template for monitoring Hadoop over HTTP that works without any external scripts.
It collects metrics by polling the Hadoop API remotely using an HTTP agent and JSONPath preprocessing.
Zabbix server (or proxy) execute direct requests to ResourceManager, NodeManagers, NameNode, DataNodes APIs.
All metrics are collected at once, thanks to the Zabbix bulk data collection.

This template was tested on:

  • Zabbix, version 5.0 and later
  • Hadoop, version 3.1 and later

Setup

See Zabbix template operation for basic instructions.

You should define the IP address (or FQDN) and Web-UI port for the ResourceManager in {$HADOOP.RESOURCEMANAGER.HOST} and {$HADOOP.RESOURCEMANAGER.PORT} macros and for the NameNode in {$HADOOP.NAMENODE.HOST} and {$HADOOP.NAMENODE.PORT} macros respectively. Macros can be set in the template or overridden at the host level.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$HADOOP.CAPACITY_REMAINING.MIN.WARN}

The Hadoop cluster capacity remaining percent for trigger expression.

20
{$HADOOP.NAMENODE.HOST}

The Hadoop NameNode host IP address or FQDN.

NameNode
{$HADOOP.NAMENODE.PORT}

The Hadoop NameNode Web-UI port.

9870
{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN}

The Hadoop NameNode API page maximum response time in seconds for trigger expression.

10s
{$HADOOP.RESOURCEMANAGER.HOST}

The Hadoop ResourceManager host IP address or FQDN.

ResourceManager
{$HADOOP.RESOURCEMANAGER.PORT}

The Hadoop ResourceManager Web-UI port.

8088
{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN}

The Hadoop ResourceManager API page maximum response time in seconds for trigger expression.

10s

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Node manager discovery

-

HTTP_AGENT hadoop.nodemanager.discovery

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

Data node discovery

-

HTTP_AGENT hadoop.datanode.discovery

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

Items collected

Group Name Description Type Key and additional info
Hadoop ResourceManager: Service status

Hadoop ResourceManager API port availability.

SIMPLE net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Hadoop ResourceManager: Service response time

Hadoop ResourceManager API performance.

SIMPLE net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"]
Hadoop ResourceManager: Uptime DEPENDENT hadoop.resourcemanager.uptime

Preprocessing:

- JSONPATH: $.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()

- MULTIPLIER: 0.001

Hadoop ResourceManager: RPC queue & processing time

Average time spent on processing RPC requests.

DEPENDENT hadoop.resourcemanager.rpc_processing_time_avg

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=ResourceManager,name=RpcActivityForPort8031')].RpcProcessingTimeAvgTime.first()

Hadoop ResourceManager: Active NMs

Number of Active NodeManagers.

DEPENDENT hadoop.resourcemanager.num_active_nm

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=ResourceManager,name=ClusterMetrics')].NumActiveNMs.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop ResourceManager: Decommissioning NMs

Number of Decommissioning NodeManagers.

DEPENDENT hadoop.resourcemanager.num_decommissioning_nm

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=ResourceManager,name=ClusterMetrics')].NumDecommissioningNMs.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop ResourceManager: Decommissioned NMs

Number of Decommissioned NodeManagers.

DEPENDENT hadoop.resourcemanager.num_decommissioned_nm

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=ResourceManager,name=ClusterMetrics')].NumDecommissionedNMs.first()

Hadoop ResourceManager: Lost NMs

Number of Lost NodeManagers.

DEPENDENT hadoop.resourcemanager.num_lost_nm

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=ResourceManager,name=ClusterMetrics')].NumLostNMs.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop ResourceManager: Unhealthy NMs

Number of Unhealthy NodeManagers.

DEPENDENT hadoop.resourcemanager.num_unhealthy_nm

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=ResourceManager,name=ClusterMetrics')].NumUnhealthyNMs.first()

Hadoop ResourceManager: Rebooted NMs

Number of Rebooted NodeManagers.

DEPENDENT hadoop.resourcemanager.num_rebooted_nm

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=ResourceManager,name=ClusterMetrics')].NumRebootedNMs.first()

Hadoop ResourceManager: Shutdown NMs

Number of Shutdown NodeManagers.

DEPENDENT hadoop.resourcemanager.num_shutdown_nm

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=ResourceManager,name=ClusterMetrics')].NumShutdownNMs.first()

Hadoop NameNode: Service status

Hadoop NameNode API port availability.

SIMPLE net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 10m

Hadoop NameNode: Service response time

Hadoop NameNode API performance.

SIMPLE net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"]
Hadoop NameNode: Uptime DEPENDENT hadoop.namenode.uptime

Preprocessing:

- JSONPATH: $.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()

- MULTIPLIER: 0.001

Hadoop NameNode: RPC queue & processing time

Average time spent on processing RPC requests.

DEPENDENT hadoop.namenode.rpc_processing_time_avg

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=RpcActivityForPort9000')].RpcProcessingTimeAvgTime.first()

Hadoop NameNode: Block Pool Renaming DEPENDENT hadoop.namenode.percent_block_pool_used

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=NameNodeInfo')].PercentBlockPoolUsed.first()

Hadoop NameNode: Transactions since last checkpoint

Total number of transactions since last checkpoint.

DEPENDENT hadoop.namenode.transactions_since_last_checkpoint

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].TransactionsSinceLastCheckpoint.first()

Hadoop NameNode: Percent capacity remaining

Available capacity in percent.

DEPENDENT hadoop.namenode.percent_remaining

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=NameNodeInfo')].PercentRemaining.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop NameNode: Capacity remaining

Available capacity.

DEPENDENT hadoop.namenode.capacity_remaining

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].CapacityRemaining.first()

Hadoop NameNode: Corrupt blocks

Number of corrupt blocks.

DEPENDENT hadoop.namenode.corrupt_blocks

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].CorruptBlocks.first()

Hadoop NameNode: Missing blocks

Number of missing blocks.

DEPENDENT hadoop.namenode.missing_blocks

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].MissingBlocks.first()

Hadoop NameNode: Failed volumes

Number of failed volumes.

DEPENDENT hadoop.namenode.volume_failures_total

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].VolumeFailuresTotal.first()

Hadoop NameNode: Alive DataNodes

Count of alive DataNodes.

DEPENDENT hadoop.namenode.num_live_data_nodes

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].NumLiveDataNodes.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop NameNode: Dead DataNodes

Count of dead DataNodes.

DEPENDENT hadoop.namenode.num_dead_data_nodes

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].NumDeadDataNodes.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop NameNode: Stale DataNodes

DataNodes that do not send a heartbeat within 30 seconds are marked as "stale".

DEPENDENT hadoop.namenode.num_stale_data_nodes

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].StaleDataNodes.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop NameNode: Total files

Total count of files tracked by the NameNode.

DEPENDENT hadoop.namenode.files_total

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].FilesTotal.first()

Hadoop NameNode: Total load

The current number of concurrent file accesses (read/write) across all DataNodes.

DEPENDENT hadoop.namenode.total_load

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].TotalLoad.first()

Hadoop NameNode: Blocks allocable

Maximum number of blocks allocable.

DEPENDENT hadoop.namenode.block_capacity

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].BlockCapacity.first()

Hadoop NameNode: Total blocks

Count of blocks tracked by NameNode.

DEPENDENT hadoop.namenode.blocks_total

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].BlocksTotal.first()

Hadoop NameNode: Under-replicated blocks

The number of blocks with insufficient replication.

DEPENDENT hadoop.namenode.under_replicated_blocks

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NameNode,name=FSNamesystem')].UnderReplicatedBlocks.first()

Hadoop {#HOSTNAME}: RPC queue & processing time

Average time spent on processing RPC requests.

DEPENDENT hadoop.nodemanager.rpc_processing_time_avg[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NodeManager,name=RpcActivityForPort8040')].RpcProcessingTimeAvgTime.first()

Hadoop {#HOSTNAME}: Container launch avg duration DEPENDENT hadoop.nodemanager.container_launch_duration_avg[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NodeManager,name=NodeManagerMetrics')].ContainerLaunchDurationAvgTime.first()

Hadoop {#HOSTNAME}: JVM Threads

The number of JVM threads.

DEPENDENT hadoop.nodemanager.jvm.threads[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='java.lang:type=Threading')].ThreadCount.first()

Hadoop {#HOSTNAME}: JVM Garbage collection time

The JVM garbage collection time in milliseconds.

DEPENDENT hadoop.nodemanager.jvm.gc_time[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NodeManager,name=JvmMetrics')].GcTimeMillis.first()

Hadoop {#HOSTNAME}: JVM Heap usage

The JVM heap usage in MBytes.

DEPENDENT hadoop.nodemanager.jvm.mem_heap_used[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=NodeManager,name=JvmMetrics')].MemHeapUsedM.first()

Hadoop {#HOSTNAME}: Uptime DEPENDENT hadoop.nodemanager.uptime[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()

- MULTIPLIER: 0.001

Hadoop {#HOSTNAME}: State

State of the node - valid values are: NEW, RUNNING, UNHEALTHY, DECOMMISSIONING, DECOMMISSIONED, LOST, REBOOTED, SHUTDOWN.

DEPENDENT hadoop.nodemanager.state[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $[?(@.HostName=='{#HOSTNAME}')].State.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop {#HOSTNAME}: Version DEPENDENT hadoop.nodemanager.version[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $[?(@.HostName=='{#HOSTNAME}')].NodeManagerVersion.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop {#HOSTNAME}: Number of containers DEPENDENT hadoop.nodemanager.numcontainers[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $[?(@.HostName=='{#HOSTNAME}')].NumContainers.first()

Hadoop {#HOSTNAME}: Used memory DEPENDENT hadoop.nodemanager.usedmemory[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $[?(@.HostName=='{#HOSTNAME}')].UsedMemoryMB.first()

Hadoop {#HOSTNAME}: Available memory DEPENDENT hadoop.nodemanager.availablememory[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $[?(@.HostName=='{#HOSTNAME}')].AvailableMemoryMB.first()

Hadoop {#HOSTNAME}: Remaining

Remaining disk space.

DEPENDENT hadoop.datanode.remaining[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=DataNode,name=FSDatasetState')].Remaining.first()

Hadoop {#HOSTNAME}: Used

Used disk space.

DEPENDENT hadoop.datanode.dfs_used[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=DataNode,name=FSDatasetState')].DfsUsed.first()

Hadoop {#HOSTNAME}: Number of failed volumes

Number of failed storage volumes.

DEPENDENT hadoop.datanode.numfailedvolumes[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=DataNode,name=FSDatasetState')].NumFailedVolumes.first()

Hadoop {#HOSTNAME}: JVM Threads

The number of JVM threads.

DEPENDENT hadoop.datanode.jvm.threads[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='java.lang:type=Threading')].ThreadCount.first()

Hadoop {#HOSTNAME}: JVM Garbage collection time

The JVM garbage collection time in milliseconds.

DEPENDENT hadoop.datanode.jvm.gc_time[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=DataNode,name=JvmMetrics')].GcTimeMillis.first()

Hadoop {#HOSTNAME}: JVM Heap usage

The JVM heap usage in MBytes.

DEPENDENT hadoop.datanode.jvm.mem_heap_used[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='Hadoop:service=DataNode,name=JvmMetrics')].MemHeapUsedM.first()

Hadoop {#HOSTNAME}: Uptime DEPENDENT hadoop.datanode.uptime[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.beans[?(@.name=='java.lang:type=Runtime')].Uptime.first()

- MULTIPLIER: 0.001

Hadoop {#HOSTNAME}: Version

DataNode software version.

DEPENDENT hadoop.datanode.version[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.[?(@.HostName=='{#HOSTNAME}')].version.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop {#HOSTNAME}: Admin state

Administrative state.

DEPENDENT hadoop.datanode.admin_state[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.[?(@.HostName=='{#HOSTNAME}')].adminState.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Hadoop {#HOSTNAME}: Oper state

Operational state.

DEPENDENT hadoop.datanode.oper_state[{#HOSTNAME}]

Preprocessing:

- JSONPATH: $.[?(@.HostName=='{#HOSTNAME}')].operState.first()

- DISCARD_UNCHANGED_HEARTBEAT: 6h

Zabbix_raw_items Get ResourceManager stats

-

HTTP_AGENT hadoop.resourcemanager.get
Zabbix_raw_items Get NameNode stats

-

HTTP_AGENT hadoop.namenode.get
Zabbix_raw_items Get NodeManagers states

-

HTTP_AGENT hadoop.nodemanagers.get

Preprocessing:

- JAVASCRIPT: return JSON.stringify(JSON.parse(JSON.parse(value).beans[0].LiveNodeManagers))

Zabbix_raw_items Get DataNodes states

-

HTTP_AGENT hadoop.datanodes.get

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

Zabbix_raw_items Hadoop NodeManager {#HOSTNAME}: Get stats HTTP_AGENT hadoop.nodemanager.get[{#HOSTNAME}]
Zabbix_raw_items Hadoop DataNode {#HOSTNAME}: Get stats HTTP_AGENT hadoop.datanode.get[{#HOSTNAME}]

Triggers

Name Description Expression Severity Dependencies and additional info
ResourceManager: Service is unavailable

-

{TEMPLATE_NAME:net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"].last()}=0 AVERAGE

Manual close: YES

ResourceManager: Service response time is too high (over {$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN} for 5m)

-

{TEMPLATE_NAME:net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"].min(5m)}>{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN} WARNING

Manual close: YES

Depends on:

- ResourceManager: Service is unavailable

ResourceManager: Service has been restarted (uptime < 10m)

Uptime is less than 10 minutes

{TEMPLATE_NAME:hadoop.resourcemanager.uptime.last()}<10m INFO

Manual close: YES

ResourceManager: Failed to fetch ResourceManager API page (or no data for 30m)

Zabbix has not received data for items for the last 30 minutes.

{TEMPLATE_NAME:hadoop.resourcemanager.uptime.nodata(30m)}=1 WARNING

Manual close: YES

Depends on:

- ResourceManager: Service is unavailable

ResourceManager: Cluster has no active NodeManagers

Cluster is unable to execute any jobs without at least one NodeManager.

{TEMPLATE_NAME:hadoop.resourcemanager.num_active_nm.max(5m)}=0 HIGH
ResourceManager: Cluster has unhealthy NodeManagers

YARN considers any node with disk utilization exceeding the value specified under the property yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage (in yarn-site.xml) to be unhealthy. Ample disk space is critical to ensure uninterrupted operation of a Hadoop cluster, and large numbers of unhealthyNodes (the number to alert on depends on the size of your cluster) should be quickly investigated and resolved.

{TEMPLATE_NAME:hadoop.resourcemanager.num_unhealthy_nm.min(15m)}>0 AVERAGE
NameNode: Service is unavailable

-

{TEMPLATE_NAME:net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"].last()}=0 AVERAGE

Manual close: YES

NameNode: Service response time is too high (over {$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN} for 5m)

-

{TEMPLATE_NAME:net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"].min(5m)}>{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN} WARNING

Manual close: YES

Depends on:

- NameNode: Service is unavailable

NameNode: Service has been restarted (uptime < 10m)

Uptime is less than 10 minutes

{TEMPLATE_NAME:hadoop.namenode.uptime.last()}<10m INFO

Manual close: YES

NameNode: Failed to fetch NameNode API page (or no data for 30m)

Zabbix has not received data for items for the last 30 minutes.

{TEMPLATE_NAME:hadoop.namenode.uptime.nodata(30m)}=1 WARNING

Manual close: YES

Depends on:

- NameNode: Service is unavailable

NameNode: Cluster capacity remaining is low (below {$HADOOP.CAPACITY_REMAINING.MIN.WARN}% for 15m)

A good practice is to ensure that disk use never exceeds 80 percent capacity.

{TEMPLATE_NAME:hadoop.namenode.percent_remaining.max(15m)}<{$HADOOP.CAPACITY_REMAINING.MIN.WARN} WARNING
NameNode: Cluster has missing blocks

A missing block is far worse than a corrupt block, because a missing block cannot be recovered by copying a replica.

{TEMPLATE_NAME:hadoop.namenode.missing_blocks.min(15m)}>0 AVERAGE
NameNode: Cluster has volume failures

HDFS now allows for disks to fail in place, without affecting DataNode operations, until a threshold value is reached. This is set on each DataNode via the dfs.datanode.failed.volumes.tolerated property; it defaults to 0, meaning that any volume failure will shut down the DataNode; on a production cluster where DataNodes typically have 6, 8, or 12 disks, setting this parameter to 1 or 2 is typically the best practice.

{TEMPLATE_NAME:hadoop.namenode.volume_failures_total.min(15m)}>0 AVERAGE
NameNode: Cluster has DataNodes in Dead state

The death of a DataNode causes a flurry of network activity, as the NameNode initiates replication of blocks lost on the dead nodes.

{TEMPLATE_NAME:hadoop.namenode.num_dead_data_nodes.min(5m)}>0 AVERAGE
{#HOSTNAME}: Service has been restarted (uptime < 10m)

Uptime is less than 10 minutes

{TEMPLATE_NAME:hadoop.nodemanager.uptime[{#HOSTNAME}].last()}<10m INFO

Manual close: YES

{#HOSTNAME}: Failed to fetch NodeManager API page (or no data for 30m)

Zabbix has not received data for items for the last 30 minutes.

{TEMPLATE_NAME:hadoop.nodemanager.uptime[{#HOSTNAME}].nodata(30m)}=1 WARNING

Manual close: YES

Depends on:

- {#HOSTNAME}: NodeManager has state {ITEM.VALUE}.

{#HOSTNAME}: NodeManager has state {ITEM.VALUE}.

The state is different from normal.

{TEMPLATE_NAME:hadoop.nodemanager.state[{#HOSTNAME}].last()}<>"RUNNING" AVERAGE
{#HOSTNAME}: Service has been restarted (uptime < 10m)

Uptime is less than 10 minutes

{TEMPLATE_NAME:hadoop.datanode.uptime[{#HOSTNAME}].last()}<10m INFO

Manual close: YES

{#HOSTNAME}: Failed to fetch DataNode API page (or no data for 30m)

Zabbix has not received data for items for the last 30 minutes.

{TEMPLATE_NAME:hadoop.datanode.uptime[{#HOSTNAME}].nodata(30m)}=1 WARNING

Manual close: YES

Depends on:

- {#HOSTNAME}: DataNode has state {ITEM.VALUE}.

{#HOSTNAME}: DataNode has state {ITEM.VALUE}.

The state is different from normal.

{TEMPLATE_NAME:hadoop.datanode.oper_state[{#HOSTNAME}].last()}<>"Live" AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

References

https://hadoop.apache.org/docs/current/

Articles and documentation

+ Propose new article

Didn't find what you are looking for?