Ad Widget

Collapse

Rabbit server - zabbi GRAPHIC missing data

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • karmukis
    Member
    • Aug 2014
    • 37

    #1

    Rabbit server - zabbi GRAPHIC missing data

    Hello guys,

    I'm having this strange issue where only 2 graphics (2 items) from the same server same template are having the following issue... when you check the graphics for " RabbitMQ Message Deliver Rates" and "RabbitMQ Message receive Rates" you can see holes (missing data) on the graphic. We check this manually and the check works ok, we also create 2 cron jobs, inside the server, checking the same data and with the same Update interval and that cron job show us on the logs the data is there, while the graphic on zabbix-server show missing data:

    Click image for larger version

Name:	deliver.png
Views:	402
Size:	76.2 KB
ID:	370912




    And now the other...
    Click image for larger version

Name:	receive.png
Views:	243
Size:	80.5 KB
ID:	370913


    The rabbit server is a linux. The zabbix-server is 3.0.14 version, and the agent: 3.0.16
    Both check are being run as "zabbix-agent(active)
    The rabbit server is being monitoreby proxy, but we already test doing it without it, and the problem remain.

    If you have any ideas, it would be really appreciated.
    karina




  • dimir
    Zabbix developer
    • Apr 2011
    • 1080

    #2
    Looks like the issue could be Timeout in Agent configuration, which by default is 3 seconds:
    Code:
    ### Option: Timeout
    #       Spend no more than Timeout seconds on processing
    #
    # Mandatory: no
    # Range: 1-30
    # Default:
    # Timeout=3
    You could try increasing this value but it would affect all the checks of the agent. Configurable Timeout per item is planned for 4.2 currently:



    You can vote for this feature request by clicking "Vote" button on top-right to give it more attention.

    Comment

    • karmukis
      Member
      • Aug 2014
      • 37

      #3
      Need to clarify that the NO-PROXY test we perform was wrong.... the autodiscover function kept changing the server conf back to "monitor by proxy xxx" so, we are still to test that but we are not sure how to.... the autodiscover has become an issue.

      Comment

      • karmukis
        Member
        • Aug 2014
        • 37

        #4
        dimir currently the agent configuration for "timeout" is 30 seconds:
        Timeout=30

        Comment

        • dimir
          Zabbix developer
          • Apr 2011
          • 1080

          #5
          Looks like this is a UserParameter with Update interval around 3 minutes but the values are not exactly received every 3 minutes because of timeout. And that gap is probably one missing value due to such a timeout.
          Can you execute that UserParameter command on agent manually from command line measuring the time:
          Code:
          { time <UserParameter command>; } |& egrep '^([0-9\.]|real)'
          for example:
          Code:
          { time message-receive.py; } |& egrep '^([0-9\.]|real)'
          few times.

          Comment

          • karmukis
            Member
            • Aug 2014
            • 37

            #6
            dimir I'll try your suggestion, and see what I get. I'll get back to you as soon as I have info.

            Comment

            • karmukis
              Member
              • Aug 2014
              • 37

              #7
              dimir I created a small script, using my check and your line.... and the tmies I get are actually not bad, here is an example of the RABBITMQ MESSAGE RECEIVE RATES... you can see the gap in the graphic and the data on the server. The script is running every minute.
              The script is running inside the zabbix-proxy that checks the rabbitserver, in order to siulate the true item...and as you can see the times are not that bad, neither so different to other times where we are not seeying gaps on the graphic.
              Any ideas?

              Click image for larger version  Name:	zabbix-rabbir.png Views:	1 Size:	694.5 KB ID:	371017

              Comment

              • dimir
                Zabbix developer
                • Apr 2011
                • 1080

                #8
                Looks like the issue is not related to getting metric data. I'd advise to set DebugLevel=4 on the agent side and search for the place where the missing data point happens. What is the Update interval value for these items?

                Comment

                • karmukis
                  Member
                  • Aug 2014
                  • 37

                  #9
                  We already try to set the "DebugLevel=4" and found nothing, it was really anoying, Let me check the log and see if I can paste something here (need to remove some data from it).
                  Then, the UPDATE INTERVAL is now set to 90sec .... Originally it was set to 60sec, having the same issues, tried to set it to 30sec and it got worst. Setting it to 90 got better but, still seeying the issue. We are thinking about setting it to 180sec or more, at least to give it a try.

                  Comment

                  • dimir
                    Zabbix developer
                    • Apr 2011
                    • 1080

                    #10
                    You could provide the part of the log where it executes the check and gets the value.

                    Comment

                    • kloczek
                      Senior Member
                      • Jun 2006
                      • 1771

                      #11
                      Originally posted by dimir
                      You could provide the part of the log where it executes the check and gets the value.
                      No logs are needed in case if someone is using passive agent and passive items.
                      If it is the case only solution is witching to active monitoring as passive one does not scale above some size of the monitored env.
                      http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                      https://kloczek.wordpress.com/
                      zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                      My zabbix templates https://github.com/kloczek/zabbix-templates

                      Comment

                      • dimir
                        Zabbix developer
                        • Apr 2011
                        • 1080

                        #12
                        No logs are needed in case if someone is using passive agent and passive items.
                        If it is the case only solution is witching to active monitoring as passive one does not scale above some size of the monitored env.
                        Here's the part of passive check:

                        Code:
                         20698:20181227:135800.766 Requested [system.cpu.util[,iowait]]
                         20698:20181227:135800.767 Sending back [0.232310]
                        I can tell that it took 0.001 seconds to execute the check.

                        Comment

                        • kloczek
                          Senior Member
                          • Jun 2006
                          • 1771

                          #13
                          I don't think that RabbitMQ monitoring some one is using CPU iowait metric.
                          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                          https://kloczek.wordpress.com/
                          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                          My zabbix templates https://github.com/kloczek/zabbix-templates

                          Comment

                          • dimir
                            Zabbix developer
                            • Apr 2011
                            • 1080

                            #14
                            This is just an example. I assume they are using UserParameter.

                            Comment

                            • karmukis
                              Member
                              • Aug 2014
                              • 37

                              #15
                              Hi guys, sorry the delay... got married.
                              About this... still having the issue, so anoying.

                              Boths checks are ACTIVE....

                              Comment

                              Working...