Ad Widget

Collapse

AS/400 Monitoring solutions

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sancho
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Mar 2015
    • 295

    #121
    Hello again Kos

    First of all, thanks for your help.

    I have tested the item as indicated, with the name of the subsystem.

    proc.num [,,,GRPALM04] (I changed subsystem because 03 today had no jobs in the queue).

    And the value it gives me is 1, this queue currently has 6 jobs. I think that value 1 returns because it is detecting the name of the subsystem to which this queue belongs.

    If the item you set up with GRPALM03 also gives me 1, although his job queue is empty.

    Comment

    • sancho
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Mar 2015
      • 295

      #122
      Hi Kos


      I hope everything goes well.

      I edit my message to try to explain myself better,

      The idea is through an item discovery to obtain users with jobs in RUN or LCKW status for example.

      This would be possible???

      Sorry if I can't explain myself better
      Last edited by sancho; 24-02-2020, 21:06.

      Comment

      • guille_pm
        Junior Member
        • Mar 2020
        • 16

        #123
        Hi Kos


        I'm doing some test with Zabbix and some internal IBM i systems, so far so good! Thanks for all your hard work

        I'm having a problem with one of the items, as400.outputqueue.size[QPRINT,QGPL]. The agent is writing the following to the log

        32:20200413:034047.315 active check "as400.outputqueue.size[QPRINT,QGPL]" is not supported: com.ibm.as400.access.AS400Exception: CPF34C4 List is too large for user space QNPSLIST.
        32:20200413:035154.419 As400Metric.process() error: com.ibm.as400.access.AS400Exception: CPF34C4 List is too large for user space QNPSLIST.
        32:20200413:040656.018 As400Metric.process() error: com.ibm.as400.access.AS400Exception: CPF34C4 List is too large for user space QNPSLIST.

        QGPL/QPRINT has currently 45033 spool files.

        Comment

        • Kos
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Aug 2015
          • 3404

          #124
          Hi guille_pm ,

          unfortunately, I could not help in your situation.
          Our Zabbix agent emulator just re-transmits the error message from Operating System if some error occurs.
          I do not use explicitly any user space or IBM i API; the agent in this case uses the standard Java class (wrapper over the IBM i API) SpooledFileList , setting 2 filters - by user (*ALL) and by library/queue (using methods setUserFilter() and setQueueFilter()), and then calling method openSynchronously() to get a size of result. Probably, this error is occurring on the last step (the openSynchronously() call), but it is out of my control :-(

          Is it possible to decrease the size of this queue? We have the queue sizes a bit more 10000 in our environment, it works without problems.

          Comment

          • guille_pm
            Junior Member
            • Mar 2020
            • 16

            #125
            Yes, I can probably clean it up a bit, but it won't work for production systems, where the count can go up to 500k. That's fine though, I can do a database monitor check for output queues.
            By the way, just want to share some enhancement IBM has been doing for IBM i. Starting on V7R2, they have implemented what they call IBM i Services. These are several SQL views and UDTF that can get some data pretty easily, and without the need to use APIs. So, for output queues spool files, I can do a SELECT NUMBER_OF_FILE from QSYS2.OUTPUT_QUEUE_INFO where OUTPUT_QUEUE_LIBRARY = 'your output queue library' and OUTPUT_QUEUE_NAME = 'your output queue name'.
            It may make some checks easier to do.

            Comment

            • guille_pm
              Junior Member
              • Mar 2020
              • 16

              #126
              Hi Kos
              Another quick question. Is there any way to get the fully qualified job name, like NUMBER/USER/JOBNAME from evenlog? I used ITEM.LOG.SOURCE macro, but it just returns the job name.
              Thanks!

              Comment

              • Kos
                Senior Member
                Zabbix Certified SpecialistZabbix Certified Professional
                • Aug 2015
                • 3404

                #127
                Hi, guille_pm !

                Thank you for information; probably, it's reasonable to use some other way to obtain size of queue (and IBM i Services - one of possible methods).

                Regarding fully qualified job name from event log. Unfortunately, direct answer is: no, there is no such feature now.
                However, as you wrote, you can obtain just a job name via macro.
                Additionally, you can use the "as400UserAsMessagePrefix=1" option in the agent's config.file: it'll produce the Job's user name as a prefix of every message (i.e. its value), that could be easy extractable via regular expression and macro-functions.
                Unfortunately, it's impossible to get the job number in the current implementation; however, it is not very hard to add this feature (using the same method as for User name of job, i.e. as an additional prefix for the message body). There was just no such need up to now :-)

                Comment

                • Fabian
                  Junior Member
                  • Apr 2020
                  • 2

                  #128
                  Good morning.

                  Thank you for this agent and for mantaining this thread alive, I found it while researching an error we are having.

                  Our agent responds to the passive checks, but does not so with the active checks.

                  Our AS400 Agent is behind a zabbix proxy. Both server and proxy are on version 4.2.8.

                  We have activated the debug feature. When we start the agent, we obtain the following logs:

                  Code:
                     34:20200422:101158.448  Agent hostname: 'CLIENT_APMTEST', System info: IBM OS/400 APMTEST V7R2M0, IBM Corporation IBM J9 VM (v1.8.0_201)
                      39:20200422:101158.479 agent #1 started [collector]
                      40:20200422:101158.521 agent #2 (10.129.240.13:10051) started [active checks #3]
                      42:20200422:101158.545 agent #3 started[listener #1]
                      44:20200422:101158.553 agent #5 started[listener #3]
                      43:20200422:101158.556 agent #4 started[listener #2]
                      34:20200422:101801.885 Starting Zabbix Agent v0.7.7
                      34:20200422:101801.892 using configuration file: /home/ZABBIX/agentd/zabbix_agentd.conf
                      34:20200422:101801.893 agent #-1 started [ZabbixAgent config]
                      34:20200422:101801.893 IBM Corporation Java version "1.8.0_201"
                      34:20200422:101801.894  Java(TM) SE Runtime Environment (build 8.0.5.30 - pap3280sr5fp30-20190207_01(SR5 FP30))
                      34:20200422:101801.894  IBM J9 VM (build 2.9, JRE 1.8.0 OS/400 ppc-32-Bit 20190124_408237 (JIT enabled, AOT enabled)
                  OpenJ9   - 9c77d86
                  OMR      - dad8ba7
                  IBM      - e2996d1)
                      34:20200422:101801.913  Open Source Software, JTOpen 9.4, codebase 5770-SS1 V7R3M0.00 built=20170816 @U4
                      34:20200422:101801.922 ZbxMetric() constructor: metric 'log' successfully added
                      34:20200422:101801.923 ZbxMetric() constructor: metric 'logrt' successfully added
                      34:20200422:101801.923 ZbxMetric() constructor: metric 'eventlog' successfully added
                      34:20200422:101801.925 ZbxMetric() constructor: metric 'agent.exit' successfully added
                      34:20200422:101801.926 ZbxMetric() constructor: metric 'agent.hostname' successfully added
                      34:20200422:101801.927 ZbxMetric() constructor: metric 'agent.ping' successfully added
                      34:20200422:101801.929 ZbxMetric() constructor: metric 'agent.version' successfully added
                      34:20200422:101801.938 ZbxMetric() constructor: metric 'system.hostname' successfully added
                      34:20200422:101801.940 ZbxMetric() constructor: metric 'system.uname' successfully added
                      34:20200422:101801.943 ZbxMetric() constructor: metric 'system.localtime' successfully added
                      34:20200422:101801.944 ZbxMetric() constructor: metric 'system.cpu.num' successfully added
                      34:20200422:101801.945 ZbxMetric() constructor: metric 'as400.cpu.capacity' successfully added
                      34:20200422:101801.946 ZbxMetric() constructor: metric 'system.users.num' successfully added
                      34:20200422:101801.947 ZbxMetric() constructor: metric 'proc.num' successfully added
                      34:20200422:101801.949 ZbxMetric() constructor: metric 'as400.subsystem' successfully added
                      34:20200422:101801.950 ZbxMetric() constructor: metric 'as400.outputqueue.size' successfully added
                      34:20200422:101801.951 ZbxMetric() constructor: metric 'as400.services' successfully added
                      34:20200422:101801.952 ZbxMetric() constructor: metric 'vfs.fs.discovery' successfully added
                      34:20200422:101801.953 ZbxMetric() constructor: metric 'vfs.fs.size' successfully added
                      34:20200422:101801.954 ZbxMetric() constructor: metric 'vfs.fs.state' successfully added
                      34:20200422:101801.955 ZbxMetric() constructor: metric 'as400.disk.discovery' successfully added
                      34:20200422:101801.956 ZbxMetric() constructor: metric 'as400.disk.size' successfully added
                      34:20200422:101801.957 ZbxMetric() constructor: metric 'as400.disk.state' successfully added
                      34:20200422:101801.958 ZbxMetric() constructor: metric 'as400.disk.asp' successfully added
                      34:20200422:101801.959 ZbxMetric() constructor: metric 'as400.systemPool.discovery' successfully added
                      34:20200422:101801.961 ZbxMetric() constructor: metric 'as400.systemPool.state' successfully added
                      34:20200422:101801.962 ZbxMetric() constructor: metric 'proc.cpu.util.discovery' successfully added
                      34:20200422:101801.963 ZbxMetric() constructor: metric 'proc.cpu.util' successfully added
                      34:20200422:101802.011 constuctor AgentRequest(): str='system.uname', key_name='system.uname'
                      34:20200422:101802.011  parameters list: null
                      34:20200422:101802.012 in ZabbixAgent.process(): key_name='system.uname', full key='system.uname'
                      34:20200422:101802.959 As400Metric.process() is OK for system.uname: 'IBM OS/400 APMTEST V7R2M0, IBM Corporation IBM J9 VM (v1.8.0_201)'
                      34:20200422:101802.962 end of ZabbixAgent.process()
                      34:20200422:101802.963  Agent hostname: 'CLIENT_APMTEST', System info: IBM OS/400 APMTEST V7R2M0, IBM Corporation IBM J9 VM (v1.8.0_201)
                      34:20200422:101802.985 agent #-1 stopped [ZabbixAgent config]
                      38:20200422:101802.994 agent #0 started [Cache Controller thread]
                      39:20200422:101802.996 agent #1 started [collector]
                      40:20200422:101803.008 agent #2 (10.129.240.13:10051) started [active checks #3]
                       1:20200422:101803.009 PassiveCheck.init(): starting
                      40:20200422:101803.010 in refreshActiveChecks(): host:10.129.240.13, port:10051
                      42:20200422:101803.012 agent #3 started[listener #1]
                      43:20200422:101803.019 agent #4 started[listener #2]
                      44:20200422:101803.028 agent #5 started[listener #3]
                      40:20200422:101803.036 in send() to server '10.129.240.13:10051'
                      39:20200422:101803.445  Procstat.updateJobinfoList() error: com.ibm.as400.access.AS400Exception: CPF3C53 No encontrado trabajo 979274/QUSER/QZRCSRVS.
                      40:20200422:101806.067 End of send()
                      40:20200422:101806.067  active check configuration update from [10.129.240.13:10051] started to fail (java.net.SocketTimeoutException: connect timed out)
                      40:20200422:101806.068 end of refreshActiveChecks(): false
                      40:20200422:101806.068 in processActiveChecks() server:'10.129.240.13' port:10051
                      40:20200422:101806.069 End of processActiveChecks()
                       1:20200422:101848.585  Thread [main]: got incoming connection from 10.129.240.13
                       1:20200422:101848.585  connection will be processed by thread 44[listener #3]
                      44:20200422:101848.586 in PassiveCheck.checkConnection()
                      44:20200422:101848.587 end of PassiveCheck.checkConnection(): true
                      44:20200422:101848.587 in PassiveCheck.process(), #3
                      44:20200422:101848.587 ZBXD header is OK, data length=10
                      44:20200422:101848.588 PassiveCheck.process(): request is: 'agent.ping'
                      44:20200422:101848.589 constuctor AgentRequest(): str='agent.ping', key_name='agent.ping'
                      44:20200422:101848.589  parameters list: null
                      44:20200422:101848.589 in ZabbixAgent.process(): key_name='agent.ping', full key='agent.ping'
                      44:20200422:101848.590 GenericMetric.process() is OK for agent.ping
                      44:20200422:101848.590 end of ZabbixAgent.process()
                      44:20200422:101848.591 PassiveCheck.process(): sending result: '1'
                  Could you direct us as to where we could find the solution to this error? I have found some references to CPF3C53 in this thread, but not a conclusive solution.

                  Thanks in advance.

                  Comment

                  • guille_pm
                    Junior Member
                    • Mar 2020
                    • 16

                    #129
                    About your

                    active check configuration update from [10.129.240.13:10051] started to fail

                    I saw something similar, but with one of our VIO servers. I think it means that the timeout value was reached before the check finished. Check your agent config file for the timeout value, and also check your server config file, also for timeout value. The server timeout should always be bigger than the agent timeout. I ended up putting my agent at 7 and server at 9.

                    Comment

                    • Kos
                      Senior Member
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Aug 2015
                      • 3404

                      #130
                      Fabian , sorry, I did not see your message in time.

                      Really, the message about CPF3C53 in Collector thread could be safely ignored - it does mean that some job has disappeared during the job list processing.

                      What is more important is the message noted by guille_pm : it does really mean that the Agent could not connect to Zabbix Server (or Proxy, in your case).
                      Check, please, that there is no firewall restrictions for your Zabbix Proxy (10.129.240.13) to accept TCP-connections to port 10051 from your CLIENT_APMTEST host.
                      The log-file reveals that there is no problems with communications in reverse direction (10.129.240.13 -> CLIENT_APMTEST:10050 is OK).

                      Comment

                      • Fabian
                        Junior Member
                        • Apr 2020
                        • 2

                        #131
                        Thanks a lot to you both. That was the problem. There was a wrong configuration of iptables in our zabbix proxy. It works perfectly now.

                        Comment

                        • guille_pm
                          Junior Member
                          • Mar 2020
                          • 16

                          #132
                          Hi! Just wanted to share something, and also ask something...

                          So, I've been working with a friend on a way to get jobnames of jobs in MSGW. We've come up with something that works pretty well.
                          We recommend IBM i 7.3 for it.
                          For this, first you need to create a web services server on your IBM i. Open a browser and go to http (or https if you've secured it)://youserver:2001/HTTPAdmin, then on the left Create a Web Services Server. Follow the instructions (make sure to select an unused port), and once it is created and showing it on your screen, select it and click Deploy. Select REST in type and *SQL in implementation, then fill the prompts are described below

                          Procedure name: Get_Job_Status
                          SQL Statement: SELECT
                          job_name as fulljob,
                          substr(job_name, 1, LOCATE('/',job_name)-1) as numjob,
                          substr(job_name, LOCATE('/',job_name)+1, LOCATE('/',job_name, 8) -locate('/', job_name)-1) as user,
                          substr(job_name, LOCATE('/', job_name,LOCATE('/',job_name)+1)+1) as jobname,
                          subsystem as sbs
                          FROM table(active_job_info())
                          where job_status = ?
                          HTTP request method: GET
                          URI path template for method: /{job_status}
                          SQL result type: Multi-row result set
                          Trim mode for output fields: Trailing
                          SQL state information in response: On errors
                          Treat warnings as SQL Errors: Yes
                          User-defined error message:
                          HTTP status code on SQL success: 200
                          HTTP status code on SQL failure: 500
                          HTTP header information:
                          Allowed input media types: *ALL
                          Returned output media types: *JSON
                          Input parameter mappings:
                          job_status VARCHAR *PATH_PARAM job_status *NONE
                          Once done, start your web services server, and go to http://yourhost:yourport/web/service...ix_Checks/MSGW

                          If you open that in Firefox, you'll see a nice formatted JSON. With that done, you only need to create a discovery rule type HTTP Agent. Make sure to create a preprocessing step type JSONPath, and then the Macros to hold values in the JSON returned.
                          A trigger can then be created using, for instance, proc.num for the job, if the agent returns 1, you have your problem.
                          Since MSGW is a variable in the PATH, you could really use it for any other status, like LCKW.


                          Now to the question part, I'm trying to do some filtering on the eventlog metric. I tried using the logseverity function, but then realized that most messages come with "Unknown" severity, or even no value at all. Kos, is there any way to include the severity in the values returned?

                          Thanks a lot!

                          Comment


                          • hab
                            hab commented
                            Editing a comment
                            Hi guille_pm. I've been away for a while here. Not sure if you considered this, but an approach that requires no work on the host is to use a discovery rule for jobs in MSGW/LCKW etc.They will then become individual items/triggers including job number/name/user etc.
                            Works very well for us for quite some time.We monitor around 80 partitions this way.
                            We're very happy with Zabbix and Kos' great agent emulator.
                        • Kos
                          Senior Member
                          Zabbix Certified SpecialistZabbix Certified Professional
                          • Aug 2015
                          • 3404

                          #133
                          guille_pm , first of all - great thanks, very interesting approach!

                          Originally posted by guille_pm
                          Now to the question part, I'm trying to do some filtering on the eventlog metric. I tried using the logseverity function, but then realized that most messages come with "Unknown" severity, or even no value at all. Kos, is there any way to include the severity in the values returned?
                          I'd like to make more accurate this moment.
                          The original purpose of this attribute in Zabbix database was to store a Severity of messages received from Windows Event Log. This field has a numeric type in Zabbix internal database, but it is transformed onto text for readability for a standard Windows Event Log grades as described in description of "logseverity()" trigger function (see documentation). My Zabbix "agent" for AS/400 just transmits the original Severity level from AS/400 (IBM i) message queue "as is", without any transformations. Accordingly, it is stored in Zabbix database also in its original form (as a number). However, the value of this number, most often, differs from the values used for a standard Windows Event Log levels (1 - 10), therefore Zabbix Web interface displays them as "Unknown". The same is true, probably, if you use the "{ITEM.LOG.SEVERITY<N>}" macro in notification templates: it will be resolved onto "UNKNOWN" (or something similar). However, you can use the "{ITEM.LOG.NSEVERITY<N>}" macro (with "N" letter before "SEVERITY") - it will substitute the original numeric value. The "logseverity()" trigger function (returning a number) also should work correctly.

                          Comment

                          • guille_pm
                            Junior Member
                            • Mar 2020
                            • 16

                            #134
                            Thanks a lot! I tested it and works like a charm, now I get sev10 messages as warning and the rest as average.

                            Should have tested it before asking, sorry

                            Comment

                            • bigredau15
                              Junior Member
                              • Aug 2020
                              • 4

                              #135
                              Hi,

                              We've recently implemented the AS400 / IBM i Zabbix agent, and it's working fine. However, in the Zabbix Portal under Problems we've got the alert:
                              'ASP1 used more than 65%'

                              How do we modify that percentage value? When we click to modify the Trigger, the expression shows:
                              {LPAR1:vfs.fs.size[1,pused].min(#2)}>{$MAX_DISK_PUSED:"1"} and {LPAR1:vfs.fs.state[1].count(#2,1)}=0 and {LPAR1:vfs.fs.state[1].count(#2,2)}=0

                              Where do we set the percentage to be, say 80%?

                              Thanks for your time.

                              Comment

                              Working...