Ad Widget
Collapse
AS/400 Monitoring solutions
Collapse
X
-
Hi Kos and anyone else with an idea :-)
I'm looking for a smart way to catch jobs using high CPU over a period of time.
Many systems run for long periods of time without IPL, so some "OK" jobs can have high total CPU usage (cpu seconds) time without being a problem.
What I'm trying to detect is jobs that suddenly consume a lot of CPU because of loop, bad user request (typically an SQL doing something the user didn't anticipate) etc. I would like the trigger to be able to include the exact jobname/number/user.
Has anyone created something like this that they can share?
I have looked at:
Zabbix agent metrics:
proc.cpu.util[name,user,type,subsystem,mode,jobnum]
Here - if I read the documentation correctly - I have to know each job ahead of the problem which isn't practical.
proc.cpu.util.discovery[seconds]
This will almost always after some time return the same "OK" jobs that I want to ignore.
IBM i Services:
ACTIVE_JOB_INFO (table function - https://www.ibm.com/docs/en/i/7.6.0?...table-function)
I thought of creating a discovery rule that creates an item for each job consuming over x% CPU during a specific interval. I could use RESET_STATISTICS = YES on the discovery rule, and set it to NO on the prototype items querying the CPU of a single job. I'm a bit worried that I will end up with too many items and put unnecessary load on the system.
Any other ideas?Comment

Comment