Ad Widget

**TariqN** · 21-10-2024, 22:12

Hi folks,

I was directed here to the community forum thru Zabbix Support for an issue I'm looking to solve. Hopefully, this post isn't too lengthy.

Our organization is planning to upgrade our Zabbix installs from 6.0.13 to 7.0.1 across 3 separate environments (Dev, Stage, and Prod).

Our environment setup primarily consists of custom Docker images based on Zabbix 7.0 Dockerfiles as per the below which each service running in a separate Docker container.

zabbix-proxy (proxy-sqlite3), single container instance

zabbix-server (server-mysql), single container instance/standalone, HA is not enabled

zabbix-frontend (web-nginx-mysql), single container instance

zabbix-agent (agent - not agent2), single container instance

AWS RDS Aurora/MySQL 8.0 db backend (cluster - 1 writer (Dev/Stage environments), 1 writer/1 reader (Prod environment)

NOTE: The Prod environment has multiple proxies across multiple AWS regions.

With our upgrade cycle, we are looking to leverage AWS blue/green deployments for the db backend which would allow us to accomplish the below:

1) Have an active blue cluster that the environment zabbix server instance points to

2) Have a non-active green cluster with mysql replication from the blue cluster which would allow us to:

a) Perform required db schema changes needed for the upgrade
b) Standup an additional isolated zabbix server instance pointed to this cluster just for the purposes of performing the db upgrade. This instance within itself would have no proxy or frontend traffic directed towards it.

3) Upon completion of the db upgrade, we would then switch the green cluster over to become the active blue cluster, de-commission the isolated zabbix server instance, and upgrade the environment zabbix server instance and all other components (frontend, proxy, and agent)

During preparation for this in our Stage environment, we performed Step #2 listed above and noticed the following:

The active blue cluster which still has the zabbix db at 6.0.13 has the below columns defined for the hosts table.

Code:

+--------------------+-----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+-----------------+------+-----+---------+-------+
| hostid | bigint unsigned | NO | PRI | NULL | |
| proxy_hostid | bigint unsigned | YES | MUL | NULL | |
| host | varchar(128) | NO | MUL | | |
| status | int | NO | MUL | 0 | |
| lastaccess | int | NO | | 0 | |
| ipmi_authtype | int | NO | | -1 | |
| ipmi_privilege | int | NO | | 2 | |
| ipmi_username | varchar(16) | NO | | | |
| ipmi_password | varchar(20) | NO | | | |
| maintenanceid | bigint unsigned | YES | MUL | NULL | |
| maintenance_status | int | NO | | 0 | |
| maintenance_type | int | NO | | 0 | |
| maintenance_from | int | NO | | 0 | |
| name | varchar(128) | NO | MUL | | |
| flags | int | NO | | 0 | |
| templateid | bigint unsigned | YES | MUL | NULL | |
| description | text | NO | | NULL | |
| tls_connect | int | NO | | 1 | |
| tls_accept | int | NO | | 1 | |
| tls_issuer | varchar(1024) | NO | | | |
| tls_subject | varchar(1024) | NO | | | |
| tls_psk_identity | varchar(128) | NO | | | |
| tls_psk | varchar(512) | NO | | | |
| proxy_address | varchar(255) | NO | | | |
| auto_compress | int | NO | | 1 | |
| discover | int | NO | | 0 | |
| custom_interfaces | int | NO | | 0 | |
| uuid | varchar(32) | NO | | | |
| name_upper | varchar(128) | NO | MUL | | |
+--------------------+-----------------+------+-----+---------+-------+
29 rows in set (0.002 sec)

The non-active green cluster had the zabbix db upgraded to 7.0.1 and has the below columns defined for the hosts table.

Code:

+--------------------+-----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+-----------------+------+-----+---------+-------+
| hostid | bigint unsigned | NO | PRI | NULL | |
| proxyid | bigint unsigned | YES | MUL | NULL | |
| host | varchar(128) | NO | MUL | | |
| status | int | NO | MUL | 0 | |
| ipmi_authtype | int | NO | | -1 | |
| ipmi_privilege | int | NO | | 2 | |
| ipmi_username | varchar(16) | NO | | | |
| ipmi_password | varchar(20) | NO | | | |
| maintenanceid | bigint unsigned | YES | MUL | NULL | |
| maintenance_status | int | NO | | 0 | |
| maintenance_type | int | NO | | 0 | |
| maintenance_from | int | NO | | 0 | |
| name | varchar(128) | NO | MUL | | |
| flags | int | NO | | 0 | |
| templateid | bigint unsigned | YES | MUL | NULL | |
| description | text | NO | | NULL | |
| tls_connect | int | NO | | 1 | |
| tls_accept | int | NO | | 1 | |
| tls_issuer | varchar(1024) | NO | | | |
| tls_subject | varchar(1024) | NO | | | |
| tls_psk_identity | varchar(128) | NO | | | |
| tls_psk | varchar(512) | NO | | | |
| discover | int | NO | | 0 | |
| custom_interfaces | int | NO | | 0 | |
| uuid | varchar(32) | NO | | | |
| name_upper | varchar(128) | NO | MUL | | |
| vendor_name | varchar(64) | NO | | | |
| vendor_version | varchar(32) | NO | | | |
| proxy_groupid | bigint unsigned | YES | MUL | NULL | |
| monitored_by | int | NO | | 0 | |
+--------------------+-----------------+------+-----+---------+-------+
30 rows in set (0.002 sec)

With this change to the table structure as a result of the database upgrade, replication failed from the blue to the green cluster with an error similar to the below being spewed every few minutes (in the mysql binary log) with the lastaccess time/value changing during each occurrence. The specific type of error within itself is mysql error 1054 (Bad field error, Unknown column) as noted in the mysql replication status.

Code:

| mysql-bin-changelog.000479 |  83416470 | Query          | 413013016 |    83416612 | use `<our zabbix db>`; update hosts set lastaccess=1729143834 where hostid=10318

In order to resolve the replication failure, we are skipping mysql error 1054 on the replication as it's to be expected that it will occur since the lastaccess field is not present in the hosts table from the green cluster where the zabbix db was upgraded.

The question I have though is whether this error can be safely ignored until we switch the green cluster over to become the active blue cluster and then upgrade the zabbix components (server, frontend, proxy, and agent) within the environment to version 7.0.1.

Also, is there any information on the purpose/importance of the lastaccess field as there is no clear reference for it in the Zabbix 6.0 API reference guide for the hosts object/table. For instance, does this field maintain the lastaccess time of a given configured host?

Thanks in advance!

Ad Widget

hosts table and change to "lastaccess" field

hosts table and change to "lastaccess" field