Ad Widget

Collapse

how to update major zabbix version with less downtime ?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ghoz
    Senior Member
    • May 2011
    • 204

    #1

    how to update major zabbix version with less downtime ?

    Hi all.
    I'm posting here as it should be relevant to large deployment.
    I have a moderately small environement, and this week , I tried on my test box to upgrade from 1.8 to 1.9
    The update took 18 hours.

    granted the test box is slow, but still. so when 2.0 will arrive, how do you guys with huge databases plan to upgrade ?

    Idealy, I would upgrade a copy of the database and make the switch at the end, and then somehow 'replay' the data in the updated database ...

    Using proxies, it should technicaly be doable, but how can we tell the proxies to send data to the 'old' zabbix server and keep the data localy to allow the new server to catch up ?

    Right now I think we only have the choice between
    • keeping the old server up an keeping the alerts during the update, but loosing the data sent in the timeframe
    • stopping the server, so no alerts and no data lost (at leat with proxies) during the update
  • ke_sheng_jie
    Member
    • Aug 2011
    • 40

    #2
    Wondering the same thing

    I cloned our main Zabbix MySQL server and attempted to run both the upgrade script and patch.sql scripts. They both failed on the alter statement for the history table, which is 40 GB. I even quadrupled the memory and it ran for about 3 hours before quitting.

    At the moment, I am now re-evaluating all my history settings, slowly moving them from 31 days to 3 days for most items. The database just seems to be too big.

    Comment

    • richlv
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2005
      • 3112

      #3
      please note that 1.8 proxies are not supported with the 2.0 server.

      theoretically you could leave 1.8 proxies in place, have them collect data while the main db is being updated. then, without allowing them to connect to the new server, shut down proxies & upgrade their databases (this should be much faster than the main db upgrade).

      a bit messy, but should reduce the gap in the collected data
      Zabbix 3.0 Network Monitoring book

      Comment

      • heaje
        Senior Member
        Zabbix Certified Specialist
        • Sep 2009
        • 325

        #4
        Upgrade idea

        Here's what I would try if I were using MySQL as my backend (this assumes no config changes to the old DB after the process is started):

        1. Take a mysqldump of your database WITHOUT any of the history/trends.
        2. Restore that dump to a new database (like zabbix2_0_0).
        3. Run the upgrade scripts on the zabbix2_0_0 DB.

        #### Downtime starts ####

        4. Stop Zabbix server/proxies.
        5. Upgrade Zabbix server/proxy binaries and point them to zabbix2_0_0.
        6. Start up Zabbix server/proxies

        #### Downtime ends ####

        At this point you'll have a fully functional monitoring system. The only thing missing is the new frontend and the history/trend tables.

        7. Install the new frontend and point it to zabbix2_0_0.
        8. Take a mysqldump of just the data (no table drop/create) from the history tables and trend tables in the OLD database.
        9. Import that data into the new database (this assumes the history/trends schema is the same between the old and new databases).

        After #9 completes you'll have all your history back in the new database. In reality, you could take all the time you need for #8 and #9 since those steps don't affect the actual monitoring at all.

        In my opinion, this would be the best way to do it with a large database. The only downtime is to upgrade the Zabbix server/proxies to the newest version (less than 10 minutes). If the history/trends schema has changed between the old and new versions, you could always make the schema changes to the old database AFTER switching monitoring over to the new one. At that point the history/trend schemas will match and you can do the dump like I mentioned in #8.

        Comment

        • richlv
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2005
          • 3112

          #5
          Originally posted by heaje
          8. Take a mysqldump of just the data (no table drop/create) from the history tables and trend tables in the OLD database.
          9. Import that data into the new database (this assumes the history/trends schema is the same between the old and new databases).
          history table schema is different in 2.0
          Zabbix 3.0 Network Monitoring book

          Comment

          • heaje
            Senior Member
            Zabbix Certified Specialist
            • Sep 2009
            • 325

            #6
            Yeah, I pretty much figured that would be the case after I'd posted my reply .

            In that case I would just do what I said at the end of my last post.

            Originally posted by heaje
            If the history/trends schema has changed between the old and new versions, you could always make the schema changes to the old database AFTER switching monitoring over to the new one. At that point the history/trend schemas will match and you can do the dump like I mentioned in #8.
            Using that approach you still get the benefit of having a working system, it will just take a couple days longer to migrate the history (depending on the DB size). If people can live without the history for a few days I think this gives the least downtime (although I won't say that mine is the only idea or the best).

            Comment

            • ke_sheng_jie
              Member
              • Aug 2011
              • 40

              #7
              Thanks

              Thanks for the suggestions. I'll wait until my database is a bit cleaner then give them a try.

              Comment

              • ghoz
                Senior Member
                • May 2011
                • 204

                #8
                Ok,
                I've beein digging a bit deeper, and there is one thing you have to account for when upgrading in 2 steps :

                Zabbix working on the 'empty' config-only database will create new events, audit logs, alerts etc.
                and will create IDs that may clash when reintegrating the full data...
                so you have to populate the tables with fake data so zabbix cat continue getting it's IDs in increasing order.

                Here is what we're planning to do, integrating comments from heaje and richv. There is no waranty whatsoever, you really should take a backup, and test your own procedure on a test rig...

                old database is zabbix,
                new one will be zabbix2

                Excutive Summary

                ==> Downtime
                1- Stop Zabbix Server
                2- Dump configuration from db 'zabbix' to 'zabbix2'
                3- create fake NextIDs in 'zabbix2'
                4- upgrade DB 'zabbix2', upgrade server NOT STARTING THE SERVER
                5- Shutdown , upgrade and restart ALL the proxies
                6- Start new zabbix server on 'zabbix2' database
                ==> End of downtime. No configuration change allowed from here
                7- Upgrade db 'zabbix' with full data, This one can be very long
                8- integrate migrated data from 'zabbix' to 'zabbix2'
                9- cleanup fake NextIds in 'zabbix2'
                ==> allow configuration change, welcome to zabbix 2.0

                as we have a lot of proxies, we plan to change step 5 and 6 by :
                - remove network access from all proxies (we use ssh tunnels)
                - start main server
                - for each site :
                - shutdown, upgrade, allow network acces, restart the proxies

                The important point here is that the old proxies should not see the new server before the upgrade.

                Details
                now for some scripts. these assume that mysql do not require a password or you have a ~/.my.cnf with your credentials
                Code:
                [client]
                user=root
                password=XXXXX
                Step 1- Stop Zabbix Server
                no comment.

                Step 2 : Dump configuration from db 'zabbix' to 'zabbix2'
                Code:
                mysqladmin create zabbix2
                # import data structure
                mysqldump -B "zabbix" --no-data  | sed 's/\(^CREATE DATABASE \|^USE \)/-- &/' | mysql zabbix2
                
                # import config
                mysqldump -B "zabbix" --single-transaction --no-create-info --no-create-db \
                    --ignore-table="zabbix.acknowledges" \
                    --ignore-table="zabbix.alerts" \
                    --ignore-table="zabbix.auditlog" \
                    --ignore-table="zabbix.auditlog_details" \
                    --ignore-table="zabbix.escalations" \
                    --ignore-table="zabbix.events" \
                    --ignore-table="zabbix.history" \
                    --ignore-table="zabbix.history_log" \
                    --ignore-table="zabbix.history_str" \
                    --ignore-table="zabbix.history_str_sync" \
                    --ignore-table="zabbix.history_sync" \
                    --ignore-table="zabbix.history_text" \
                    --ignore-table="zabbix.history_uint" \
                    --ignore-table="zabbix.history_uint_sync" \
                    --ignore-table="zabbix.trends" \
                    --ignore-table="zabbix.trends_uint" \
                    | sed 's/\(^CREATE DATABASE \|^USE \)/-- &/' \
                    | mysql zabbix2
                Step 3 : create fake NextIDs in 'zabbix2'
                This creates fake data in order to get the right nextids. only valid for 1 node setups.
                Code:
                cat << 'EOF' | mysql zabbix2
                -- OLD DATABASE
                use zabbix
                
                --  temp tables for nexids
                drop table zabbix2.tmp_nextids;
                create table zabbix2.tmp_nextids ( `field_name` varchar(255) NOT NULL, `id` bigint(20) unsigned NOT NULL );
                
                
                -- find nextids
                -- only working for NON distributed setup (ids are shifted by nodeid * 100000000000000)
                
                insert into zabbix2.tmp_nextids select  'eventid',  max(eventid)+1  from `events` ;
                insert into zabbix2.tmp_nextids select  'acknowledgeid',  max(acknowledgeid)+1  from `acknowledges`;
                insert into zabbix2.tmp_nextids select  'alertid',  max(alertid)+1  from `alerts`;
                insert into zabbix2.tmp_nextids select  'auditid',  max(auditid)+1  from `auditlog`;
                insert into zabbix2.tmp_nextids select  'auditdetailid',  max(auditdetailid)+1  from `auditlog_details`;
                insert into zabbix2.tmp_nextids select  'escalationid',  max(escalationid)+1  from `escalations`;
                
                -- NEW DATABASE
                use zabbix2
                
                --
                -- creates fake data so DCget_nextid keeps working while stage 2 upgrade is munching
                -- should work for nodes provided tmp_nextids is populated
                
                -- get one eventid for constraints
                select id into @eventid
                        from tmp_nextids
                        where field_name='eventid'
                        limit 1;
                
                -- get first userid for constraints
                select userid into @userid
                        from users
                        order by userid asc
                        limit 1;
                
                -- get first media  for constraints
                select mediatypeid into @mediatypeid
                        from media_type
                        order by mediatypeid asc
                        limit 1;
                
                -- get first action  for constraints
                select actionid into @actionid
                        from actions
                        order by actionid asc
                        limit 1;
                -- get one auditlog for constraints
                select id into @auditid
                        from tmp_nextids
                        where field_name='auditid'
                        limit 1;
                
                -- Create Fake data
                
                insert into `events` (eventid) select id from tmp_nextids where field_name='eventid' ;
                
                insert into `acknowledges` (acknowledgeid,eventid,userid, message)
                        select id, @eventid, @userid, 'Fake ack for 1.8 => 2.0 Migration'
                        from tmp_nextids
                        where field_name='acknowledgeid' ;
                
                insert into `alerts` ( alertid, actionid, eventid, userid, mediatypeid, message)
                        select id, @actionid,  @eventid, @userid,  @mediatypeid, 'Fake alert for 1.8 => 2.0 Migration'
                        from tmp_nextids
                        where field_name='alertid' ;
                
                insert into `auditlog` ( auditid, userid, details)
                        select id,  @userid, 'Fake audit for 1.8 => 2.0 Migration'
                        from tmp_nextids
                        where field_name='auditid' ;
                
                insert into `auditlog_details` ( auditdetailid, auditid, oldvalue, newvalue)
                        select id,  @auditid, '', 'Fake audit for 1.8 => 2.0 Migration'
                        from tmp_nextids
                        where field_name='auditdetailid' ;
                
                insert into `escalations` (escalationid, actionid)
                        select id, @actionid
                        from tmp_nextids
                        where field_name='escalationid' ;
                
                EOF
                step 4- upgrade DB 'zabbix2', upgrade server,
                Please Read the Fine upgrade Manual.
                DO NOT start the server.

                step 5- Shutdown , upgrade and restart ALL the proxies
                Please Read once again the Fine upgrade Manual. (not yet tested.)

                step 6- Start new zabbix server on 'zabbix2' database
                Don't forget to grant your zabbix user rights to the new database, and change frontend and server configuration

                step 7- Upgrade db 'zabbix' with full data
                At the begininng of the thread, upgrade took us 18 hours.
                meanwhile our installation has growed... and so did the database.
                we are now at 41 hours for the full upgrade...

                step 8- integrate migrated data from 'zabbix' to 'zabbix2'
                This one may be tricky : it may provoke blanks in the data as mysql is copying the data around.
                We add unique keys in order to avoid duplicates at the first dump time.
                You may bypass history* tables if the upgrade took too long and the data is going to be deleted by the houskeper anyway .
                Code:
                cat <<'EOF' | mysql zabbix2
                --  ack,alerts -> events
                insert into events select * from zabbix.events;
                
                insert into acknowledges select * from zabbix.acknowledges;
                insert into alerts select * from zabbix.alerts;
                
                --  auditlog_detail -> auditlog
                insert into auditlog select * from zabbix.auditlog;
                insert into auditlog_details select * from zabbix.auditlog_details;
                
                insert into escalations select * from zabbix.escalations;
                
                alter table history add unique _u_history (itemid,clock);
                insert ignore into history select * from zabbix.history;
                alter table history drop index _u_history;
                
                alter table history_log add unique _u_history_log (itemid,clock);
                insert ignore into history_log select * from zabbix.history_log;
                alter table history_log drop index _u_history_log;
                
                alter table history_str add unique _u_history_str (itemid,clock);
                insert ignore into history_str select * from zabbix.history_str;
                alter table history_str drop index _u_history_str;
                
                alter table history_str_sync add unique _u_history_str_sync (itemid,clock);
                insert ignore into history_str_sync select * from zabbix.history_str_sync;
                alter table history_str_sync drop index _u_history_str_sync;
                
                alter table history_sync add unique _u_history_sync (itemid,clock);
                insert ignore into history_sync select * from zabbix.history_sync;
                alter table history_sync drop index _u_history_sync;
                
                alter table history_text add unique _u_history_text (itemid,clock);
                insert ignore into history_text select * from zabbix.history_text;
                alter table history_text drop index _u_history_text;
                
                alter table history_uint add unique _u_history_uint (itemid,clock);
                insert ignore into history_uint select * from zabbix.history_uint;
                alter table history_uint drop index _u_history_uint;
                
                alter table history_uint_sync add unique _u_history_uint_sync (itemid,clock);
                insert ignore into history_uint_sync select * from zabbix.history_uint_sync;
                alter table history_uint_sync drop index _u_history_uint_sync;
                
                alter table trends add unique _u_trends (itemid,clock);
                insert ignore into trends select * from zabbix.trends;
                alter table trends drop index _u_trends;
                
                alter table trends_uint add unique _u_trends_uint (itemid,clock);
                insert ignore into trends_uint select * from zabbix.trends_uint;
                alter table trends_uint drop index _u_trends_uint;
                
                EOF
                step 9- cleanup fake NextIds in 'zabbix2'
                Code:
                cat <<'EOF' | mysql zabbix2
                delete t from acknowledges t, tmp_nextids n
                        where t.acknowledgeid=n.id
                        and n.field_name='acknowledgeid';
                
                delete t from events t, tmp_nextids n
                        where t.eventid=n.id
                        and n.field_name='eventid';
                
                delete t from alerts t, tmp_nextids n
                        where t.alertid=n.id
                        and n.field_name='alertid';
                
                delete t from auditlog_details t, tmp_nextids n
                        where t.auditdetailid=n.id
                        and n.field_name='auditdetailid';
                
                delete t from auditlog t, tmp_nextids n
                        where t.auditid=n.id
                        and n.field_name='auditid';
                
                delete t from escalations t, tmp_nextids n
                        where t.escalationid=n.id
                        and n.field_name='escalationid';
                
                drop table `tmp_nextids` ;
                
                EOF
                Step 10 - ???

                Step 11 - Profit !
                Last edited by ghoz; 06-06-2012, 17:20.

                Comment

                • rwilliamsit
                  Junior Member
                  • Mar 2013
                  • 1

                  #9
                  Large database

                  Great post!!

                  I plan to use this to update my zabbix DB from 1.8 to 2.0 to save time. (Based on my tests I expect 3-4 hours to run the DB upgrade script. My DB size is 80GB.)

                  In step 8 and 9 would it be possible to copy the data from zabbix2 to the zabbix DB instead of the other way around? I would not have enough disk space to copy the data from zabbix to zabbix2 where it sits. (And also that would take a while)

                  I do not mind the few minutes downtime that this would require, I'm just trying to avoid the multi-hour downtime.

                  Comment

                  • billjam
                    Junior Member
                    • Jan 2009
                    • 27

                    #10
                    nice post!

                    I know this is an old thread but the best one I've seen on the zabbix 2.0 upgrade process.
                    I don't know if anyone else experienced it but in my dev testing of this process I ran into issues with the ids table, its nextid was 1 too low.
                    Two possible workarounds that I'm aware of, one delete ids table since zabbix recreates it if missing or modify the table (which is what I used):
                    update ids set nextid=(select max(auditid)+1 from `auditlog`) where table_name='auditlog' and field_name='auditid';
                    update ids set nextid=(select max(acknowledgeid)+1 from `acknowledges`) where table_name='acknowledges' and field_name='acknowledgeid';
                    update ids set nextid=(select max(auditdetailid)+1 from `auditlog_details`) where table_name='auditlog_details' and field_name='auditdetailid';
                    update ids set nextid=(select max(eventid)+1 from `events`) where table_name='events' and field_name='eventid';

                    Comment

                    • melpheos
                      Member
                      • Dec 2008
                      • 64

                      #11
                      Originally posted by rwilliamsit
                      Great post!!

                      I plan to use this to update my zabbix DB from 1.8 to 2.0 to save time. (Based on my tests I expect 3-4 hours to run the DB upgrade script. My DB size is 80GB.)

                      In step 8 and 9 would it be possible to copy the data from zabbix2 to the zabbix DB instead of the other way around? I would not have enough disk space to copy the data from zabbix to zabbix2 where it sits. (And also that would take a while)

                      I do not mind the few minutes downtime that this would require, I'm just trying to avoid the multi-hour downtime.
                      There is no multihour downtime. The dowtime is only a few minutes long. All update of the old db and data transfert can be done while zabbix is running.

                      Comment

                      Working...