Ad Widget

Collapse

60k network devices environment

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • O.C.
    Junior Member
    • Jul 2019
    • 3

    #1

    60k network devices environment

    Hi to anyone,

    I searched for a while trying to find out a documents or infos about a similar context but nothing found.

    I have some goal I think I can satisfy with a Zabbix infrastructure.
    And would be a pleasure get some help from your experience to guide along the architecture design step.

    This is an ISP context in which a large number of devices may be discovered and registered into the inventory.

    #GOALS:
    .1 discover by configured subnet linked to their own proxy. (add new up, (remove old down in case of dhcp subnet))
    .2 do a minimal rules (eg: ping + snmp || + telnet/ssh) and minimal actions (add to group, template ping)
    .3 get a strategic info only (model, sn, vers, etc)
    .4 send a full discover (for each subnet) once a day (or more if too hard in 24h)

    #CLIENTS:
    From smb network devices to enterprise network devices

    #ENVIRONMENT:
    Large amount of network subnets segregated by area (as a standard ISP), so at least 1 proxy per area (I think 5 instances at the moment).
    Zabbix server and proxies are based on OL7.7 - VmWare 6.7
    Available RAM: 400GB (max preferred)
    Storage type: SSD
    vCpu = 24 (max preferred)

    #QUESTIONS:
    I'm not expert enough to understand how many proxies may I need to accomplish that GOALS in the right way, especially by figuring how many devices a proxy can handle in that way.
    And I don't know how much powerful the central instance may be in such case.

    How many proxies may I need? (es: X per proxy or similar approaches)
    How many resources per proxy? (es: 2vcpu, 24G ram per 1000 devices or similar approaches)
    There are specific tunings (by OS/kernel and zabbix config and proxy db) to get the best performances on discovery process and ping process? (I want focus the best on that)


    ----------
    I'm here for any kind of additional info or questions!
    Thank you in advance

    Omar C.
    ----------
  • vendrusculo
    Junior Member
    • Jul 2015
    • 26

    #2
    Hi!

    Usually the problem on big environments are not the count of devices but the "vps" that you will need, I mean values per second... if you intend to capture values each 30, 60 or 360 seconds the behaviour is very different

    about your goals:

    #GOALS:
    .1 discover by configured subnet linked to their own proxy. (add new up, (remove old down in case of dhcp subnet))
    Yes! you can define the discovery by proxy

    .2 do a minimal rules (eg: ping + snmp || + telnet/ssh) and minimal actions (add to group, template ping)
    the most simple is ICMP ping but you can add others as you need

    .3 get a strategic info only (model, sn, vers, etc)
    I believe SNMP is the best option that you have, based on SNMP you can define groups and link in a specific template as you want

    .4 send a full discover (for each subnet) once a day (or more if too hard in 24h)
    This is a "possible" problem, when you use a subnet on discovery you need to considerate that has a "timeout" set on server config file that can impact in the discovery thing... example if you have 30 seconds of timeout configured you will "lose" 30 seconds for each device that not respond you discovery, because Zabbix will wait for a answer by 30 seconds for a single IP

    How many proxies may I need? Again depends more on VPS than number of devices
    How many resources per proxy? I have a proxy 6GB RAM monitoring about 1600 devices and using effectly less than 4GB will depends of VPS...

    There are specific tunings (by OS/kernel and zabbix config and proxy db) to get the best performances on discovery process and ping process? (I want focus the best on that)
    Monitoring the proxy with the proxy template can help you do this tunning and find the correct values... my recommendation is to have a discovery rule per subnet to be discovered, this way you will parallelize the discovery process

    Hope that helps,
    Leo

    Comment


    • omar.cacciotti
      omar.cacciotti commented
      Editing a comment
      Hi Leo,
      a delayed thanks, I lost my old account.

      after 2 years I reached my goal but I started experiencing issues in the last week.

      Actually I'm in this condition:

      Zabbix 4.4.10
      12 vcpu 80G ram
      Server & DB on same VM
      2 gui
      8 proxy

      Any suggestions?
      Thank you very much

      Parameter Value Details
      Zabbix server is running Yes xxxxxxxx:10051
      Number of hosts (enabled/disabled/templates) 61670 61528 / 0 / 142
      Number of items (enabled/disabled/not supported) 1773444 1461377 / 6094 / 305973
      Number of triggers (enabled/disabled [problem/ok]) 960222 953702 / 6520 [19200 / 934502]
      Number of users (online) 244 6
      Required server performance, new values per second 8181.42

      And I'm quite lost on how to solve history syncer and configuration syncer 100%

      Actually I'm trying to tune the DB that seems responsive on each manual command I give to it, but seems really slow for Zabbix.
      MySQL tuner run in less than 1 sec

      This is the 3rd test I did in 3 days (I read tons of posts but never solved, Percona's blog too):

      [mariadb-5.5]
      #Tuning 30/10/2020 (06/11/2020 dopo analisi)
      max_connections = 110
      slow_query_log=1

      innodb_buffer_pool_size=40G
      innodb_flush_method=O_DIRECT

      #Tuning 06/06/2020
      innodb_file_per_table = ON
      innodb_log_file_size = 1G
      #Tuning 30/07/2021 - Set to 12
      innodb_buffer_pool_instances = 12
      innodb_stats_on_metadata = OFF

      #Tuning 06/06/2020
      thread_cache_size = 128
      query_cache_size = 0
      query_cache_type = 0

      #Tuning 22/06/2020 - From 16M to 128M
      #Tuning 30/07/2021 - Set to default (Percona)
      #join_buffer_size = 512M

      #FIX BLOB query >1M 30/10/2020
      max_allowed_packet = 16M

      #Tuning 05/11/2020
      tmp_table_size = 32M
      max_heap_table_size = 32M

      #Tuning 05/11/2020
      innodb_io_capacity=2500


      #Tuning per cache: table_open_cache, open_files, etc
      open_files_limit=65535
      table_open_cache=10240

      #last
      innodb_old_blocks_time = 1000
      innodb_flush_log_at_trx_commit = 0
Working...