Ad Widget

**venquessa** · 08-09-2023, 14:28

Probably a familiar story to you all. I installed Zabbix, turned the discovery agents up to max and applied templates left right and centre.

Network summary: "3 bedroom house". One big flat LAN, 3 side VLANs. Wired pyhsical backbone, multi AP. 5 or 6 part-time user clients. 1 Proxmox central server. 1 Backup server. 5 switches. 4 Access points. 40 or so total hosts. Split horizon DNS. ISC OpenWRT controlled.

Zabbix correctly pointed out that a lot of my network is a-ok, but a lot of it is utter trash.

Having being made aware of this already known fact, I looked for ways to explain to Zabbix that I either don't care or I care much less than it thinks I should.

Examples: High ping time / loss on random Wifi dynamic IoT end points. HTTP Endpoint unreachable. Host not reachable.

I spent 2 evenings making customised "IoT slow" and "IoT don't care" style templates. However, then I discovered that uniqueness was by IP and so half of the High warnings about hosts disappearing was because they did disappear... at that IP. Then they got auto discovered again on a different IP and started emitting the same warnings it did before.

DROP DATABASE zabbix; Reimport the init script.

So I have decided to implement the discover in phases this time. The zabbix agents were already installed everywhere. So all I had to do this time was divide them into "physical, vm, container" host categories and apply altered cloned zabbix templates to read the ct.* memory and cpu stats.

I focusing on these hots, tuning and tweaking them till they are just right... for now.

Next I plan on discovering by SNMP public sysDesc OID. Which should fine all the switches and routers. Focus on tuning those up.

Questions:

* On "tuning up". I assume best practice is to only override per-host when it's absolutely necessary (credentials) or when you are 100% sure you will only need a single host customisation, now and in the future. (Example: I set a 23 hour 'silence' window for alerts from the backup node which is only online between 3am and 4am for service.) If you have or will have 2 or more of those hosts, then clone their primary template and reconfigure/reassign the template, not the host.

* Items and graphs: When you have hosts when auto-discover a large number of items for interfaces and disks, should I manually remove items from the hosts or again create custom templates and discovery settings? I have not explored the autodiscovery natures within templates yet. Examples might be that only one of your switches supports a certain SNMP OID the generic template requests. However it doesn't disappear and remains as a redlag. So you just delete the items, triggers and graphs from that host? I suppose it's the same as host templates?

Bigger questions:
Dynamic hosts. IoT gadgets. Hosts not unique by IP. Part-time hosts? Any good guidebooks, cookbook entries to refer me to case studies?

Ad Widget

New user - guidance and reassurance on the start of my Zabbix journey.

New user - guidance and reassurance on the start of my Zabbix journey.