Ad Widget

**ingus.vilnis** · 25-11-2015, 16:09

Hello and welcome to Zabbix forums!

There might be various reasons for deadlocks but most likely they are only consequences of some other issues.

1. Please tell what exact versions do you have for Zabbix server and MySQL.

2. Make sure you have LogSlowQueries=3000 set in your zabbix_server.con file. If you have, check zabbix_server.log file for any slow queries.

3. Setting Zabbix server to DebugLevel=4 will not help much here. Level 3 is absolutely enough here.

4. How big is your environment? What do you monitor?

5. Show Zabbix server performance graphs as described here. https://www.zabbix.com/forum/showthread.php?t=47781

6. Show zabbix_server.conf

7. Show MySQL config my.cnf

8. "MySQL server has gone away" errors is a sign of misconfiguration. Here is a good article on MySQl tuning for Zabbix https://www.percona.com/blog/2014/11...-mysql-zabbix/

9. By the way, MySQL wait_timeout normally should be left at default 28800 seconds.

Best Regards,
Ingus

**GeoffE** · 25-11-2015, 17:02

Thank you for your reply!

Originally posted by ingus.vilnis

1. Please tell what exact versions do you have for Zabbix server and MySQL.

Zabbix server is version 2.4.6.
MySQL is "mysql Ver 15.1 Distrib 5.5.44-MariaDB"

Originally posted by ingus.vilnis

2. Make sure you have LogSlowQueries=3000 set in your zabbix_server.con file. If you have, check zabbix_server.log file for any slow queries.

Enabled that option, and will keep an eye on the logs.

Originally posted by ingus.vilnis

4. How big is your environment? What do you monitor?

We're still in the very early stages of rolling this out at the company. At the moment, we've only hooked up a few dozen systems (49.82 values per second, according to the Dashboard). That will grow by several orders of magnitude once it's in a full production environment.

Originally posted by ingus.vilnis

5. Show Zabbix server performance graphs as described here. https://www.zabbix.com/forum/showthread.php?t=47781

The performance graphs are pretty dull (except for the point where we did LLD for an entire rack of systems, ran out of pollers, and had to increase that number drastically). Currently, the Internal Process % is basically 0 across the board (which would be expected, with this few systems connected). And the highest that any of the Data Gathering % items get is about 12%.

Originally posted by ingus.vilnis

6. Show zabbix_server.conf

Here's a list of everything that's deviated from the default:

Code:

StartPollers=100
StartPollersUnreachable=5
StartHTTPPollers=2
LogSlowQueries=3000

Originally posted by ingus.vilnis

7. Show MySQL config my.cnf

I don't have access to this personally, although I can have it fetched for me easily enough.

I know that we increased the wait_timeout threshold to 600.

Are there any other configuration options you'd like to see?

Originally posted by ingus.vilnis

8. "MySQL server has gone away" errors is a sign of misconfiguration. Here is a good article on MySQl tuning for Zabbix https://www.percona.com/blog/2014/11...-mysql-zabbix/

9. By the way, MySQL wait_timeout normally should be left at default 28800 seconds.

I'd already seen that article (but thank you for the link), and like I said, we're working with the database administrators to get things tuned. We're using a company-wide cluster (for performance reasons - we're going to be growing the number of hosts drastically and retaining data for quite some time - and we already have a cluster that's set up just for that kind of thing), and they're not comfortable with setting the wait_timeout value that high (because they could run out of available connections - and it's apparently a global setting, and not possible on a per-database level). We had bumped it up to 600, which seemed to be sufficient, but when we added more systems yesterday, those error messages showed up again.

And I'm sure we're going to have to do more tuning as far as poller numbers, database thresholds, etc., as we start adding more systems.

To be honest, my concern is more for the "deadlock" messages - because they appear in the front-end. This early in our deployment process, we have ... shall we say ... management-types ... who are very interested to see our progress. The last thing we want is for them to tab back into the Dashboard and see MySQL error messages.

With the upcoming holiday in the US, I may not be able to respond (or provide MySQL information) for a couple of days - but thank you once again for your assistance so far.

**ingus.vilnis** · 26-11-2015, 10:59

Originally posted by GeoffE

Thank you for your reply!
I'd already seen that article (but thank you for the link), and like I said, we're working with the database administrators to get things tuned. We're using a company-wide cluster (for performance reasons - we're going to be growing the number of hosts drastically and retaining data for quite some time - and we already have a cluster that's set up just for that kind of thing), and they're not comfortable with setting the wait_timeout value that high (because they could run out of available connections - and it's apparently a global setting, and not possible on a per-database level). We had bumped it up to 600, which seemed to be sufficient, but when we added more systems yesterday, those error messages showed up again.

And I'm sure we're going to have to do more tuning as far as poller numbers, database thresholds, etc., as we start adding more systems.

To be honest, my concern is more for the "deadlock" messages - because they appear in the front-end. This early in our deployment process, we have ... shall we say ... management-types ... who are very interested to see our progress. The last thing we want is for them to tab back into the Dashboard and see MySQL error messages.
.

Hi,

Thank you for the detailed answers.
I did not spot anything wrong in your current Zabbix config.

Additionally you can check Zabbix cache usage free graph to see if you don't run out of cache with your current 49 nvps. All caches should be no less than 80% free.

However I am still concerned about that MySQL part. Having it all on one shared cluster for performance reasons does not sound like a bad idea but the global settings there might not be tuned for such write intensive process as Zabbix.

Can't tell if wait_timeout=600 would be the reason for deadlocks (probably not) but please be aware that you might have lots of "MySQL has gone away..." errors because of so low setting. And you have actually decreased the timeout drastically from default 8 hours (28800 seconds) to 10 minutes (600 s). Those errors are nothing sinister, Zabbix will restart the transactions but still not the best thing you might want to see in the logs.

I fully understand your concerns about management seeing MySQL errors however I would be still looking for issues in MySQL itself because with Zabbix 2.4.6 and 50 nvps you should not be seeing anything like this by default.

By the way, you did show me one lock error having users_groups table in question. Do you see the locks for other tables as well?

Wish you nice holidays!

Best Regards,
Ingus

**GeoffE** · 27-11-2015, 15:23

Originally posted by ingus.vilnis

Additionally you can check Zabbix cache usage free graph to see if you don't run out of cache with your current 49 nvps. All caches should be no less than 80% free.

The configuration cache is sitting around 70%. Everything else is 95% or higher.

Originally posted by ingus.vilnis

However I am still concerned about that MySQL part. Having it all on one shared cluster for performance reasons does not sound like a bad idea but the global settings there might not be tuned for such write intensive process as Zabbix.

Can't tell if wait_timeout=600 would be the reason for deadlocks (probably not) but please be aware that you might have lots of "MySQL has gone away..." errors because of so low setting. And you have actually decreased the timeout drastically from default 8 hours (28800 seconds) to 10 minutes (600 s). Those errors are nothing sinister, Zabbix will restart the transactions but still not the best thing you might want to see in the logs.

I fully understand your concerns about management seeing MySQL errors however I would be still looking for issues in MySQL itself because with Zabbix 2.4.6 and 50 nvps you should not be seeing anything like this by default.

Right. We're definitely going to be doing a lot of fiddling with the database cluster settings - and if it gets too bad, we can always look into getting a dedicated database system (or other workarounds).

Thankfully, the folks in charge of that cluster are happy to work with us; we just won't be able to do much quite yet.

Originally posted by ingus.vilnis

By the way, you did show me one lock error having users_groups table in question. Do you see the locks for other tables as well?

I have seen others in the front end, but they're not showing up in the zabbix_server.log file - even with the increased threshold for slow query logging that you recommended.

I very much appreciate your help; it sounds like most of what we're looking at is related to database timings and thresholds, and we're going to need to do some work tuning those (or possibly looking into other options) - but now we have a good starting point.

I'll post back if we come up with some settings that help, in case anyone else runs into this sort of problem.

Ad Widget

MYSQL Deadlocks

MYSQL Deadlocks

Comment

Comment

Comment

Comment