Ad Widget

Collapse

1.8.3 Appliance "Server is unreachable for 3 minutes"

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • RaymondH
    Junior Member
    Zabbix Certified Specialist
    • Dec 2010
    • 24

    #1

    1.8.3 Appliance "Server is unreachable for 3 minutes"

    I'm extreamly new to Zabbix (and linux in general) - but I was able to get the VM Appliance up and running without much fuss. It's been running smooth as butter for about 3 weeks now until today.

    Today every single server I have being monitored started throwing the error "Server is unreachable for 3 minutes". I've restarted the Zabbix server, the agent on all computers (what a pain) and still nothing has worked.

    Right now I have the trigger disabled, because I'm tired of getting spammed (got about 10000 emails in 3 hours).

    I've included a screenshot of the error in action from 1 of the servers being monitored (ironically it's the zabbix server) and also from a windows server.

    Anyone have any suggestion to fix this issue?
    Attached Files
  • RaymondH
    Junior Member
    Zabbix Certified Specialist
    • Dec 2010
    • 24

    #2
    Whoops - forgot to say that the trigger in questions is the default

    Template_Zabbix_Agent
    >> Server {HOSTNAME} is unreachable for 3 minutes
    >> {Template_Zabbix_Agent:agent.ping.nodata(3m)}=1

    Comment

    • richlv
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2005
      • 3112

      #3
      my first suspicion - overloaded zabbix server. how many new values per second ? how high is iowait on that system ? (vmstat 1 2 and values for 'wa' column - be careful, columns can be shifted)
      Zabbix 3.0 Network Monitoring book

      Comment

      • RaymondH
        Junior Member
        Zabbix Certified Specialist
        • Dec 2010
        • 24

        #4
        Thanks for the reply!

        My thought was the same thing. As you can see in the IO Image there was a time when the Zabbix Server stopped and nothing was being reported. I'm unsure of why it stopped - but this issue started happening after I restarted the server (both the process and the VM). This server is by no means "Production" - although it is monitoring all of my systems (minus desktops), switches, and firewalls. Plus I really want to get rid of this Nagios "tool" the company is using.

        I believe I have gathered all the info you need. If not, could you spell it out for me please; as I said my experience with Linux/Zabbix is that of a 1st grader.

        Thanks again!

        Click image for larger version

Name:	Image 1.jpg
Views:	1
Size:	45.0 KB
ID:	309288

        Click image for larger version

Name:	Image 2.jpg
Views:	1
Size:	116.9 KB
ID:	309289

        Click image for larger version

Name:	Image 3.jpg
Views:	1
Size:	19.9 KB
ID:	309290

        Comment

        • richlv
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2005
          • 3112

          #5
          well, it's clearly overloaded - the vm can't handle all the values per second you want to push in there. either migrate to better performing db installation, or reduce amount of items you are monitoring and increase their intervals
          Zabbix 3.0 Network Monitoring book

          Comment

          • RaymondH
            Junior Member
            Zabbix Certified Specialist
            • Dec 2010
            • 24

            #6
            Good to know -

            So is this a case of the pre-configured VM doesn't have enough resources? If that's the case could I just give it more CPU/Memory? Short of your other recommendations (which I will work on) would this help to increase the performance of Zabbix??


            This build is the 1.8.3 Appliance that was availible on the website.

            Comment

            • richlv
              Senior Member
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Oct 2005
              • 3112

              #7
              it's the db that does not keep up - you could try giving more memory to it first (and if needed, increasing vm memory during that process) - try increasing mysql innodb buffer
              Zabbix 3.0 Network Monitoring book

              Comment

              • RaymondH
                Junior Member
                Zabbix Certified Specialist
                • Dec 2010
                • 24

                #8
                Yeah, I see that now. I'm trying to get rid of the templates and start fresh (without starting over), but this thing is just crawling...

                I went from 1 CPU to 4 CPU and 2GB to 4 GB w/ no restrictions on the VM, Looking @ vCenter, the load really isn't anything special, but I'm starting to get a ton of mySQL database busy errors. Basically at this point I'm just trying to unlink all the templates...and I'm being forced to unlink 1 server at a time and then reboot the Zabbix server.

                I guess a better question - what am I doing wrong here to overload the DB. I'm only monitoring 86 systems using the Windows / Linux / Agent / SNMP templates. I've read posts of people who have hunderds or thousands of monitored systems using Zabbix, so what is the difference?

                Using the Screenshots above, what numbers should I be shooting for?

                Like I said, this tool is new to me, so forgive the silly question, I was just under the impression that this was an "Enterprise Solution". I also just want to thank you for sticking with me. At the very least, maybe someone else will be able to use this post as a reference point.

                Thanks again for the continued support and sorry for being such a noob!

                Comment

                • richlv
                  Senior Member
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Oct 2005
                  • 3112

                  #9
                  1. you have 168.48 new values per second. that is, let's see... that's 10 108.8 new values per minute. so each minute zabbix is expected to stuff more than 10 thousand values in the database, all the while calculating triggers, generating events, storing all that information in the database as well... so that's the source of the problem

                  2. you have a vm, which quite likely isn't the top performing i/o environment. that's another problem factor.

                  3. the vm db is supposed to be used as an example. people who monitor lots of systems with zabbix don't do that with a testing/demo virtual machine

                  so, the summary... get a faster db, finetune the db, match that with your new values per second. and, most likely, don't run any serious production setup on the appliance, at least not yet - see the note in the download section for it as well.
                  Zabbix 3.0 Network Monitoring book

                  Comment

                  • RaymondH
                    Junior Member
                    Zabbix Certified Specialist
                    • Dec 2010
                    • 24

                    #10
                    All makes sense.

                    As a note - I know I'm not suppose to use the appliance for production, I absolutely saw the note before I downloaded. I was just having serious issues working through the install from scratch. My linux abilities aren't up to snuff with the documentation. So I was running into issues where the instructions implied things that I didn't know how to do. The appliance was a quick and dirty way for me to get a working system online so I could test it.

                    Thanks again for all your time, I really appreciate it!

                    Comment

                    • RaymondH
                      Junior Member
                      Zabbix Certified Specialist
                      • Dec 2010
                      • 24

                      #11
                      Just wanted to follow-up "offically" - because I hate searching for problems on community forums where the OP doesn't "close" the issue.

                      In any case, I was able to clean up my copy of Zabbix by drastically reducing the number of items being checked. I still have some issues that I need to address as it relates to the DB that came w/ the appliance. Come to find out it doesn't look like this appliance is really tweaked to provide optimal preformance for anything (but we knew that). For now it's running, even if it's not great.

                      Being so new to Linux and Zabbix, I'm inclined to just leave well enough alone until I can get my head wrapped around how I can better optimize this thing.

                      So this issue is "Closed" and hopefully the suggestions provided by Richlv can help out others w/ this issue.

                      Thanks again!

                      Comment

                      Working...