Ad Widget

Collapse

Zbx 6LTS to 6.4 Docker Zwazing host item interval rates. /StartTimers

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Wolfsbane2k
    Member
    • Nov 2022
    • 48

    #1

    Zbx 6LTS to 6.4 Docker Zwazing host item interval rates. /StartTimers

    Hi.

    We've just migrated from 6LTS running on mysql natively on Ubuntu to a 6.4 Docker based solution, also using mysql, and have transitioned the mysql database and most of our server.conf parameters (poller numbers etc) over, and everything appears to work except three items on one host. The one parameter we've not copied over is StartTimers, which was 5 based on 'try stuff', but reverted to the default of 1 because it seems, from the manual, to only a maintenance tasks, not active poll tasks. Host Memory, processor speed and disk access rates have all massively improved too in the new host.

    Those three items are a userparameter that runs a custom dos batch file on the host, which typically takes less than 1 second to complete.

    The items were queried with a simple update interval of 20 seconds, and worked reliably prior to the migration. Following the migration, a random one of these will often sit unsampled for 2 minutes while the others updates about every 20s.

    Server and agent Debug at 3 doesn't show these items failing : they just seem entirely unqueried.. I've tried setting updates to s0-59/20 amongst other values and it's had no affect.

    While I'm going to boost StartTimers back to 5 in the morning, just to rule it out, and increasing debug to 4, but is there anything else I should be looking at?

    (will likely go zbx 7.0 Lts via docker if this all works out)

    Ta!
  • Wolfsbane2k
    Member
    • Nov 2022
    • 48

    #2
    Digging through the logs, found what appears to be the problem, and it's related to Timeouts (as well as possibly something else behind the scenes?)

    Had jumped from 6.0LTS to 6.4.11 and not touched the agents, so may be related.

    Details:

    An entirely different item, monitoring a different piece of software on the remote host (monitored by Zabbix Agent) , through the use of a cmd shell script was timing out as that SW wasn't running (and wasn't before the migration either) . The shell script takes time to timeout; when it does, it puts the entire host in "offline" mode for 15 seconds. This then prevents any further objects querying that host, and then appears to be the first item that is then polled on that host at the end of the 15 seconds, and the loop of silence continues.

    What i don't understand is why that item has suddenly become an issue after 2 years with "only" a change in Zabbix server versions unless Zabbix has changed it's behaviours?

    Have fixed it with a bandaid by increasing the Timeout settings on the Server config by 1 second and decreasing the timeout period on the Agent by 1 second, and looking in Zabbix, we're now geting "Item not supported" rather than "TCP errors" so it's not pausing the host for those 15 seconds any more.

    Will see what happens when we jump to 7.0LTS..
    Last edited by Wolfsbane2k; 14-06-2024, 14:19.

    Comment

    Working...