Ad Widget

Collapse

AS/400 Monitoring solutions

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • grantd
    Junior Member
    • Jun 2015
    • 6

    #61
    The issue for me ended up being I was specifying the User=zabbix when I should just have left that line out since it defaults to zabbix

    Comment

    • Kos
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Aug 2015
      • 3404

      #62
      Version 0.7.0 was released.

      Gerald, try, please, this version. I've re-worked seriously the auto-reconnection mechanism. Unfortunately, it was not tested enough before (and really worked very bad). I hope, that it should work better now :-)

      Other enhancements:
      • metrics vfs.fs.state and as400.disk.asp added;
      • metric as400.disk.state improved (returns a fictitious value “4294967295” if the disk unit is NOT owned by the system);
      • Initialization and output were improved. The only place for any messages is a log file, if some problems with its accessing/creating – stdout. Presence of required libraries is checked explicitly. Metrics agent.hostname and system.uname are written to the log during initialization. In the case of any problems during initialization the program is stopped immediately.
      Last edited by Kos; 25-05-2017, 20:49. Reason: misspelling

      Comment

      • RohrbaGe
        Senior Member
        • Aug 2005
        • 167

        #63
        Hi Kos,

        just installed the new version.
        As we shut down TCP IP today at 15:00 I hope we will the the first result today evening.

        Comment

        • Kos
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Aug 2015
          • 3404

          #64
          Originally posted by RohrbaGe
          Hi Kos,

          just installed the new version.
          As we shut down TCP IP today at 15:00 I hope we will the the first result today evening.
          Hi Gerald,
          are there some results of your testing?
          It is interesting for me if the auto-reconnection does work, finally ;-)
          Thanks in advance!

          Comment

          • ElTestigo
            Junior Member
            • Apr 2017
            • 1

            #65
            Originally posted by Kos
            Hi Gerald,
            are there some results of your testing?
            It is interesting for me if the auto-reconnection does work, finally ;-)
            Thanks in advance!
            Hi KOS, thanks for ur work.

            I will test the Agent 0.7 on my QA enviroment and will post the feedback.

            Comment

            • KevC
              Junior Member
              • Aug 2016
              • 7

              #66
              Error: 27730:20170609:143008.788 cannot send list of active checks to "x.x.x.96"

              Hi KOS,
              Thanks for coding this emulator. We were able to get it working today and we are successfully collecting data. We have one problem we are trying to figure out.
              Our machine uses a virtual ip address of: x.x.x.6 (left out x.x.x for security reasons).
              And we have two physical IP's which are:x.x.x.86 and x.x.x.96.

              In Zabbix, the agent is defined as x.x.x.6 and we are successfully collecting data.
              We plan to use the eventlog function to obtain QSYSOPR messages, so we need active checks working.

              Any ideas ?

              Thanks

              Comment

              • RohrbaGe
                Senior Member
                • Aug 2005
                • 167

                #67
                Kos,

                unfortunately there is no improvement.

                There are no real helfull messages. There are not much messages at all.

                I have now switched on the debug level and restarted the client.
                So next weekend I should get more details.





                13:20170611:150519.487 current cpu_used=0, ji.cpu_used=33
                13:20170611:150522.437 Procstat.updateJobinfoList() communication to AS/400 error: java.net.SocketException: Connection reset
                13:20170611:172406.794 Procstat.updateJobinfoList() error: com.ibm.as400.ac cess.AS400Exception: CPF3C53 Job 688412/QLWISVR/QP0ZSPWP nicht gefunden.
                13:20170611:172409.570 Procstat.updateJobinfoList() communication to AS/400 is working again
                13:20170611:172439.523 Procstat.updateJobinfoList() error: com.ibm.as400.ac cess.AS400Exception: CPF3C53 Job 688629/QCPMGTDIR/QP0ZSPWP nicht gefunden.
                13:20170612:052507.478 Procstat.updateJobinfoList() error: com.ibm.as400.ac cess.AS400Exception: CPF3C53 Job 688862/QUSER/ZDASOINIT nicht gefunden.
                root@DEZABBIX:/GRprog/as400#

                Comment

                • Kos
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Aug 2015
                  • 3404

                  #68
                  Thank you for a response, even not very good for me
                  It is a bit strange for me that there is no improvements.
                  There should be a messages (at "WARNING" level) from the different Java threads (like "active checks" or "listener") about communication errors, then - about reconnections. Something like the following:
                  Code:
                      14:20170525:192326.040 agent #1 started [collector]
                      15:20170525:192326.040 agent #2 (zabbix.*******.***) started [active checks #3]
                      17:20170525:192326.040 agent #4 started[listener #2]
                      16:20170525:192326.040 agent #3 started[listener #1]
                      18:20170525:192326.040 agent #5 started[listener #3]
                      14:20170525:192525.511  Procstat.updateJobinfoList() communication to AS/400 error: java.net.SocketException: Software caused connection abort: recv failed
                      18:20170525:192533.873  ZabbixAgent.process(): 'vfs.fs.state[33]' communication error: java.net.SocketException: Software caused connection abort: recv failed, trying to reconnect...
                      18:20170525:192535.386   ZabbixAgent.process() 'vfs.fs.state[33]' communication error: java.net.SocketTimeoutException: connect timed out
                      15:20170525:192545.589  ZabbixAgent.process(): 'system.users.num' communication error: java.net.SocketException: Software caused connection abort: recv failed, trying to reconnect...
                      15:20170525:192547.102   ZabbixAgent.process() 'system.users.num' communication error: java.net.SocketTimeoutException: connect timed out
                      15:20170525:192650.969 active check "system.users.num" is not supported: java.net.SocketTimeoutException: connect timed out
                      15:20170525:192826.332  ZabbixAgent.process() 'vfs.fs.size[33,pfree]' communication to AS/400 is working again
                      14:20170525:192827.455  Procstat.updateJobinfoList() communication to AS/400 is working again
                      18:20170525:192911.816  ZabbixAgent.process() 'system.hostname' communication to AS/400 is working again
                  The first part of each line (before a colon and timestamp) is the Java thread number. So, in this example we can see that there were communication errors in the threads 14 (collector), 18 (listener, i.e. thread for a passive checks) and 15 (active checks). Accordingly, each of these threads had re-connected when the communications to AS/400 host were restored.

                  In your case I can see only messages from the thead #13 (it is "collector" thread), and it worked as designed. However, I don't see any messages from the threads that should proccess real metrics (active or passive checks), they should looks like: "ZabbixAgent.process() '<some metric>' communication error [...]". I don't understand what's occuring

                  Maybe, it is really could be useful to set "DebugLevel=4" for a weekend and then send me your logs for analysis. Please note, the size of this log will be much more; so, it will necessary to set the "LogFileSize=100" parameter also (I hope that 100 MB should be enough). Of course, you need to re-start the program to have a modified settings in effect.

                  Comment

                  • Kos
                    Senior Member
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Aug 2015
                    • 3404

                    #69
                    Originally posted by KevC
                    Hi KOS,
                    Thanks for coding this emulator. We were able to get it working today and we are successfully collecting data. We have one problem we are trying to figure out.
                    Our machine uses a virtual ip address of: x.x.x.6 (left out x.x.x for security reasons).
                    And we have two physical IP's which are:x.x.x.86 and x.x.x.96.

                    In Zabbix, the agent is defined as x.x.x.6 and we are successfully collecting data.
                    We plan to use the eventlog function to obtain QSYSOPR messages, so we need active checks working.

                    Any ideas ?

                    Thanks
                    Hello KevC,
                    your message appeared only now (probable, it has been approved by moderator at the moment).

                    When you use an active checks, the connections are established from Zabbix agent to Zabbix server. So, you need to correctly set the "ServerActive=" parameter referring to your Zabbix server. It is unimportant what IP-address has Zabbix-agent.

                    The message
                    Error: 27730:20170609:143008.788 cannot send list of active checks to "x.x.x.96"
                    tells that the address "x.x.x.96" is not, probably, your Zabbix server. Check, please, that your Zabbix server really listen on that IP-address and uses the appropriate port (default is 10051 for incoming connections).

                    Comment

                    • KevC
                      Junior Member
                      • Aug 2016
                      • 7

                      #70
                      Thanks Kos. We do have the Zabbix server server defined in the ServerActive= parameter but the ListenPort= may be missing, we use 10050 as the ListenPort. Having our AS400 guru check on that, will keep you posted. We are so close. Appreciate all your work on this emulator.

                      Comment

                      • KevC
                        Junior Member
                        • Aug 2016
                        • 7

                        #71
                        Originally posted by Kos
                        Hello KevC,
                        your message appeared only now (probable, it has been approved by moderator at the moment).

                        When you use an active checks, the connections are established from Zabbix agent to Zabbix server. So, you need to correctly set the "ServerActive=" parameter referring to your Zabbix server. It is unimportant what IP-address has Zabbix-agent.

                        The messagetells that the address "x.x.x.96" is not, probably, your Zabbix server. Check, please, that your Zabbix server really listen on that IP-address and uses the appropriate port (default is 10051 for incoming connections).
                        Hi Kos,
                        We were able to get this working by making sure the hostname on the rverActive= parameter matches the DNS name in Zabbix.
                        Thanks for your help

                        Comment

                        • KevC
                          Junior Member
                          • Aug 2016
                          • 7

                          #72
                          Hi Kos,
                          Now that we have the emulator working, we have a couple challenges and would like to know if you have any thoughts for a solution.

                          In order to meets our customers requirements we need filter on the following from QSYSOPR:

                          Full Message id: We need to exclude specific Message id's, such as CPA3138, the trouble is there is also a CPF3138, so excluding based on the number will exclude additional messages. We could use the trigger to do this but that would mean we would bring in extra messages to the Zabbix DB that are not needed and there could be 100 or more per day. Once solution might be to put the Message id in the source (we don't need to filter on the jobname, not sure about others).

                          Users: Need to filter out messages from specific users(for example DEV testers) but this field is not provided, any thoughts.

                          Severity 50 or greater - This is provided and works great

                          For MIMIX Queue:
                          We need to monitor for messages in Library QUSRSYS with severity 50 or greater.
                          Don't believe the Library is provided but not sure since we are not yet able to generate MIMIX message in test, working on that.


                          Please let me know if you have any thoughts on how we might be able to handle the above requirements. Appreciate all your work on this.

                          Comment

                          • Kos
                            Senior Member
                            Zabbix Certified SpecialistZabbix Certified Professional
                            • Aug 2015
                            • 3404

                            #73
                            Hello KevC,
                            I'm glad that my work is useful for anybody :-)

                            Regarding your questions:

                            Full Message id:
                            You can filter by the full EventID field using a regular expression if you use it in the item's key. In this case the filtering is performed on the agent's side, where is has a full EventID (contrasting to filtering in trigger on the server side, where only digital part of EventID is available). By the way, documentation has an example of this trick.

                            Users:
                            Unfortunately, there is no such functionality at the moment. In theory, API provides the methods getUser() (returns the sender job's user) and getCurrentUser() (returns the current user name). However, in general case, both these methods could return an empty string (""). Additionally, it's impossible to transfer these values onto Zabbix server (the only way that I see is to concatenate this field to some other existing field like text of message). Finally, it will require to extend the Item's key to accept an additional parameter. So, at the moment I'm not sure if it is really needed. Maybe, filtering by the Job name could be enough in your case?

                            Library:
                            As I understand, the library name is part of IFS path name of the message queue (again, see example in documentation). You could consult with your AS/400 admin for a details, but I'm supposing that you could use something like that as the queue name: "/QUSRSYS.LIB/MIMIX.MSGQ".
                            Last edited by Kos; 06-07-2017, 11:58. Reason: mis-spelling corrected

                            Comment

                            • RohrbaGe
                              Senior Member
                              • Aug 2005
                              • 167

                              #74
                              As400 monitoring

                              Kos,

                              attached the compressed logfile.
                              This was already with debug level four but I forgot to
                              switch on larger file size....
                              I did this today, so after next weekend I shoud lbe able to provide more details.
                              But maybe the small file is already helpfull.

                              Our AS400 makes a full system backup at sunday, 15:00.

                              Regards
                              Attached Files

                              Comment

                              • Kos
                                Senior Member
                                Zabbix Certified SpecialistZabbix Certified Professional
                                • Aug 2015
                                • 3404

                                #75
                                Thanks Gerald for a debug log, it is really useful.

                                If I understand correctly, this time our Zabbix agent emulator just stopped gracefully after problem instead of stay running and trying to reconnect later.
                                However, it collected a good dying trace (including a stack trace of Java exception that caused to stop).

                                The root cause of this problem is the following: some of API calls wrap the original exceptions ("java.net.NoRouteToHostException" in this case, the subclass of "java.io.IOException") into some other type of exception ("java.lang.RuntimeException" in this case). My program is trying to correctly process all communication errors by catching IOException. However, in this case it's receiving the RuntimeException instead of IOException. It don't understand what to do in this situation, and just stopping in result.

                                By the way, it's much better behaviour than it was in previous versions: 1) it is evidently signalling that something went wrong; 2) it saves the current log, preventing it from overwriting; 3) there is no more situation where some of threads was died unexpectedly but the others stay to run.

                                I've got to think out how to fix this problem.

                                Comment

                                Working...