Ad Widget

Collapse

queue problem

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • perun.84
    Member
    • May 2016
    • 73

    #1

    queue problem

    Hello. I'm using zabbix to monitor my network which consists of about 1500 devices. I'm going to monitor them via multiple proxies.

    I have autodiscovery rules and actions which are applying appropriate snmp template to discovered devices.

    I have one problem, in my queue are a lots of "more than 10 minutes" entries. But there is some strange, all of them are Alias or Description of interfaces (there are no other monitored items which is delayed).

    Do you know what is going wrong?

    Thanks in advance...
  • mortuletti
    Member
    • May 2016
    • 76

    #2
    Hi!
    If you have large number of Items in the Queue for SNMP type, first try to narrow down the list of suspects:
    1. identify is it related to one device or network segment?
    2. is it related to some specific Item inside template?

    So, if one device or segment, check connection.
    If related to all devices, but one of checks is very slow, check that.
    In case there are different devices and different Item checks it can be overall Zabbix server or networking performance issue, so in this case you can:
    1. reduce number of checks
    2. switch SNMPv3 to SNMPv2 (SNMPv3 much slower)
    3. tune/improve performance of Zabbix server.

    Hope this information was helpful.
    Regards,
    Alexander

    Comment

    • perun.84
      Member
      • May 2016
      • 73

      #3
      Thanks for the answer.

      I'm using snmp2 already. I've just changed snmp template. I changed update interval of some items from 1 minute to 6 hours. I'll see if it gave good result...

      Comment

      • perun.84
        Member
        • May 2016
        • 73

        #4
        I forgot, my proxy server is vmware esxi vm and it has 8G RAM and 4CPUs (without ssd). Currently performance is about 1K vps. It monitors 610 devices (mostly l2 and l3 switches). What do you think, is it OK?

        Comment

        • mortuletti
          Member
          • May 2016
          • 76

          #5
          for 1nvps 8G RAM and 4CPUs for Proxy is more than enough. I think you can reduce CPU's to 1 and will do not get any performance degradation.

          It is very important to get info about how busy is data gathering processes on Proxy.

          To get more information it is highly recommended to switch on Zabbix internal monitoring. If it is not done yet, crate Host for Proxy in host list and apply Template App Zabbix Proxy. In monitored by "Monitored by proxy" select the same proxy. In result, you will get status and graphs for internal processes and will be able to tune performance using zabbix_proxy.conf file.
          Br, Alexander

          Comment

          • perun.84
            Member
            • May 2016
            • 73

            #6
            I already did it (I'm watching on internal processes busy). Now I have a problem with "Zabbix history syncer processes more than 75% busy". It is showed for proxy and for server as well. When I restart proxy and server processes everything is going to be OK for about 20 minutes and after that my graphs are lost. I saw a lot of messages about slow query "insert into history_uint (itemid,clock,ns,value) values " on mysql and on zabbix server also.

            Thanks for your time...

            Comment

            • mortuletti
              Member
              • May 2016
              • 76

              #7
              Please, check how many Syncers is started?
              \etc\zabbix\zabbix_server.conf file - StartDBSyncers=X (default 4) , location of file can be different.
              You can try to extend number of syncers by 2-4 Not too much. Theoretically each Syncer can manage up to 1000 NVPS, but it depends from item types.

              Looks sometimes syncers cannot save data to the database. Could issue is related to database performance.
              Is it MySQL?
              can you paste here zabbix_server.conf parameters?

              To make it faster without investments:
              1. Review housekeeper settings and Item History time period;
              2. Check calculated Items and Triggers. If it goes too long time back in database it makes additional load;
              3. Performance can be extended by implementation of Partitioning, but read documentation very careful! This solution is not 100% approved and supported by Zabbix because it is related to Database only. But it makes performance much better.
              Join the friendly and open Zabbix community on our forums and social media platforms.


              Br, Alexander

              Comment

              • perun.84
                Member
                • May 2016
                • 73

                #8
                Yes, it looks like db problem. I've tried with increasing dbsyncer number, but situation was the same.

                I already partitioned mysql tables and disabled housekeeper. Also, I've tried with partitioning proxy_history table in proxy database.

                I forgot, HistorySyncer was on 100% since yesterday till today morning. Since 5 o'clock (+1 GMT) everything is working well. Old data are here, Graphs are full. But, I don't know why it happened.

                One more thing, I've noticed that history sync process was on 100% on zabbix server as well as on zabbix proxies.

                Zabbix server configuration:
                Code:
                LogFile=/var/log/zabbix/zabbix_server.log
                LogFileSize=0
                DebugLevel=3
                PidFile=/var/run/zabbix/zabbix_server.pid
                
                DBHost=xxx.xxx.xxx.xxx
                DBName=xxxxxx
                DBUser=xxxxxxx
                DBPassword=xxxxxx
                
                StartPollers=128
                
                StartDiscoverers=9
                
                SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
                
                HousekeepingFrequency=0
                
                CacheSize=1G
                
                HistoryCacheSize=256M
                
                HistoryIndexCacheSize=16M
                
                TrendCacheSize=1G
                
                ValueCacheSize=1G
                
                Timeout=15
                
                AlertScriptsPath=/usr/lib/zabbix/alertscripts
                
                ExternalScripts=/usr/lib/zabbix/externalscripts
                
                LogSlowQueries=3000

                Thanks in advance.

                Comment

                • perun.84
                  Member
                  • May 2016
                  • 73

                  #9
                  Here are graphs for internal process busy on server and proxy..

                  Server:


                  Proxy:

                  Comment

                  • mortuletti
                    Member
                    • May 2016
                    • 76

                    #10
                    Hi!
                    Glad to hear, what performance much better today.
                    Could be HystorySyncers was not able to write some Item in to the database.
                    Next time it will happens, somehow need to identify what syncers are actually doing.
                    At least, can run "ps aux | grep zabbix" identify PID, what is taking much time and then using some tool like "strace" to identify where process stacked.

                    In regards of settings, looks good. Just can set HistoryIndexCacheSize=64M. Don't think this is a reason of problem you had yesterday.

                    Anyway, time by time check what is going on with Cashes and how busy is processes. It can help to identify cause.

                    In worst case I would recommend to ask Zabbix support for help. Them can make webex session, check what is going on, identify cause and make some recommendations in regards of existing situation and future. Of course this is not for free, but it can be done without any subscription.
                    Regards,
                    Alexander

                    Comment

                    • mortuletti
                      Member
                      • May 2016
                      • 76

                      #11
                      Server and Proxy uses the same server for database? Or different?

                      Comment

                      • perun.84
                        Member
                        • May 2016
                        • 73

                        #12
                        Thanks a lot for your efforts. I'll try with increasing HistoryIndexCacheSize value.
                        Best regards,
                        Aleksandar

                        Comment

                        • mortuletti
                          Member
                          • May 2016
                          • 76

                          #13
                          Welcome!
                          Hope, this was helpful.
                          Br, Alex

                          Comment

                          • mortuletti
                            Member
                            • May 2016
                            • 76

                            #14
                            Aleksandar,
                            One more thing.
                            If you have the same graphs for Server and Proxy, could be Proxy Host is monitored incorrectly.
                            Proxy Host should be monitored by proxy it self (not by primary server). Check it.
                            Br, Alexander
                            Attached Files

                            Comment

                            • perun.84
                              Member
                              • May 2016
                              • 73

                              #15
                              Yes. Both of my proxies were monitored by zabbix server. I've just changed that. But I have problem with syncer again. And again it was till noon...

                              I've checked historysyncer processes and noone of them using more of cpu time than others. Also since history process jump to 100% there is trigger "Disk overloaded on db"...

                              Comment

                              Working...