Ad Widget

Collapse

Duplicate entry 'xx-xx' for key 'PRIMARY' - trends & history tables

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mschlegel
    Member
    • Oct 2008
    • 40

    #1

    Duplicate entry 'xx-xx' for key 'PRIMARY' - trends & history tables

    I've been seeing these messages since ramping up our monitoring load here, and I've seen them with 1.8, 1.8.3, and 1.8.4 as well. I'm not seeing anything in the logs that appears to be a triggering scenario for this behavior.

    Example log entry:
    10907:20110107:123208.317 [Z3005] Query failed: [1062] Duplicate entry '100100000047086-1294419600' for key 'PRIMARY' [insert into trends (itemid,clock,num,value_min,value_
    avg,value_max) values (100100000047086,1294419600,1,96.292486,96.292486, 96.292486),(100100000046969,1294419600,1,96.294073 ,96.294073,96.294073),(100100000046970,1294419600, 1
    ,96.294073,96.294073,96.294073);
    ]


    The extra ']' always appears on the line following the duplicate entry messages.

    All history and trends tables were truncated back to empty earlier this week.

    While I don't think its the only cause of the notification storms I've seen, I can't help but think it's related in some way. I had one proxy worth of systems start sending alerts shortly before the first of these messages showed up since upgrading to 1.8.4 yesterday.

    The database is exclusive to zabbix, running on a quad-core AMD opteron with 6G ram. Disk is 2* 150G internal disks with software mirroring for the OS and DRBD on one partition from each disk to an identical partner system as part of an HA pair. Data is placed on one disk and InnoDB logs on the other disk.

    5,632 active hosts
    41,618 active items
    39,468 active triggers
    110.76 required values per second

    Most hosts are behind proxies each serving 70 nodes, for a total of 78 proxies involved.

    Is anyone else running anywhere close to this scale successfully?
  • mschlegel
    Member
    • Oct 2008
    • 40

    #2
    In the latest case I've experienced, I got the same errors showing up shortly after restarting the zabbix_server process, however, after restarting one of the proxies, I have not seen any additional duplicate key messages.

    What additional information would be helpful in identifying the cause of this problem?

    Thank you

    Comment

    • untergeek
      Senior Member
      Zabbix Certified Specialist
      • Jun 2009
      • 512

      #3
      I haven't seen errors like that. Can you elaborate on your db setup?

      For instance:

      Is it MySQL (presumed)? Postgres? Oracle on linux?
      How many dbsyncers do you have configured in the zabbix_server.conf?

      Your setup is curious. You have an extremely high number of servers, items and triggers and a seemingly low number of values per second. Do you have a long delay between checks? That many triggers must result in a rather long queue of DB reads. That low number of required writes per second just seems so incongruous with the rest of the numbers. I wonder what the backlog is from 70+ proxies all trying to contact your zabbix server.

      Our setup is considerably different:

      446 active hosts
      24,145 active items
      7,013 active triggers
      408.6 required values per second (actual measured value from the Zabbix Internal of WriteCache is closer to 180 values per second).

      We currently have NO proxies, but we heavily monitor the hosts we have in a variety of ways. We're RHEL5 on HP 380DL servers with 6G of RAM and a monster Oracle backend (since we use it for our customers, we might as well use it for ourselves too).

      Comment

      • mschlegel
        Member
        • Oct 2008
        • 40

        #4
        Howdy,

        We are running mysql 5.1.41. The zabbix server is running on on a quad core AMD Opteron with 6G ram. Disk is setup with DRBD partitions - one for data & one for InnoDB logs, mirrored to an identical host for HA purposes. In normal operation, the mysql server runs on one host and the zabbix_server process runs on the other host.

        Possibly relevant zabbix_server config items:
        HousekeepingFrequency=4
        CacheSize=128M
        HistoryCacheSize=8M
        TrendCacheSize=32M

        DB Syncer's is not defined in our config at this point, so it would be at the default of 4.



        The vast majority of the hosts we have in zabbix are only monitoring 7 parameter, and most of those are recently changed to from a 5 minute to a 10 minute update cycle to reduce database load. This change reduced the values per second from about 183/sec to the current 110/sec.



        Does it seem likely that the large number of proxies might be causing the zabbix server to have a harder time keeping up with item updates from these hosts than it might have with a smaller number of proxies?

        Is there a practical limit to how many hosts can be carried directly off a single server without proxies?


        Thank you,

        Comment

        • untergeek
          Senior Member
          Zabbix Certified Specialist
          • Jun 2009
          • 512

          #5
          Since we're running Oracle, and a monster one at that, this should be taken with the necessary salt…

          That said, I think you should experiment with increasing the number of dbsyncers. This may require tweaks to max connections on your db, but increasing syncers may help. Each proxy is trying to get a connection to write to the db and you're trying to get 70+ proxies with only 4 syncers. Bump that to 8, 12, 16, 24, 32 or something and see if the problem changes any.

          Comment

          • mschlegel
            Member
            • Oct 2008
            • 40

            #6
            I'm running the server with DB Syncers set up to 8 now. Doesn't look like its made an impact on the system load of the database server, so at least it can be left for observation for a while.

            Is there any way to see what is in the zabbix server's database queue? It seems like that would be the best indicator overall of how far behind the database is running.


            Thank you for your assistance.

            Comment

            • untergeek
              Senior Member
              Zabbix Certified Specialist
              • Jun 2009
              • 512

              #7
              You can see a graphic representation of which items are behind in the UI:

              Administration->Queue

              You can also see how far behind in columns, 5 seconds, 10 seconds, 30 seconds, etc.

              If you select the dropdown on the right side, you can do an Overview by Proxy, or Details to see individual items and the delay.

              In your case, the Overview by Proxy could be invaluable.

              Comment

              • mschlegel
                Member
                • Oct 2008
                • 40

                #8
                I know the queue shows how far behind the system is as a whole, but I didn't think that it could distinguish between items that are in the server but not in the database and items that have not yet been sent to the server from the agents/proxies. While both can be useful, it seems like knowing what data has arrived at the server but not yet been pushed into the database would be the more important item to know in this particular case.

                Comment

                • untergeek
                  Senior Member
                  Zabbix Certified Specialist
                  • Jun 2009
                  • 512

                  #9
                  Ah, I see. I do not think that Zabbix has any way of informing you of what is in the write caches, but it does let you know how many values are in it. That's about it.

                  To see what hasn't been written yet, you'd probably have to look in your database. In our case, we found that Oracle was writing VERY fast in most cases, but that's also why we went for the full 64 DBSyncers. Oracle worked best that way as it gets the best bang-for-the-zabbix-buck (zabbix being single-threaded, oracle totally parallel minded). MySQL may behave differently.

                  What's the DataSenderFrequency from your proxies?
                  The heartbeat frequency?
                  Are they in active or passive mode?

                  We're going to be deploying lots of proxies soon too, I think, so these become very important for me to understand as well.

                  Comment

                  • mschlegel
                    Member
                    • Oct 2008
                    • 40

                    #10
                    We have the heartbeat frequency set to 60 and SenderFrequency set to 30. All proxies are currently active proxies, though I'm wondering if switching them over to passive might help with the issues we are seeing. After all, the server can't be swamped by incoming connections if it has to go ask for the data itself.

                    Thank you

                    Comment

                    • untergeek
                      Senior Member
                      Zabbix Certified Specialist
                      • Jun 2009
                      • 512

                      #11
                      Hmmm. A sender frequency of 30 seconds. How frequent were your items polling?

                      I'm just trying to figure out how many messages are coming all at once from how many servers. It may just be that "batching" them like that sends more than Zabbix knows what do do with. How many Trappers do you have?

                      Comment

                      • mschlegel
                        Member
                        • Oct 2008
                        • 40

                        #12
                        Most items behind the proxies are 600s cycle now. 70 hosts behind each proxy, 7 items each host.

                        Proxy has the following start options:
                        Pollers 2
                        IPMIPollers 0
                        PollersUnreachable 1
                        Trappers 1
                        Pingers 1
                        Discoverers 0
                        HTTPPollers 0

                        Proxies are using SQLite and doing hourly housekeeping.

                        Server has the following start options:
                        Pollers 5
                        IPMIPollers 0
                        PollersUnreachable 1
                        Trappers 5
                        Pingers 3
                        Discoverers 1
                        HTTPPollers 1

                        Currently running housekeeping every 4 hours.

                        Comment

                        • untergeek
                          Senior Member
                          Zabbix Certified Specialist
                          • Jun 2009
                          • 512

                          #13
                          5 trappers may be too few. Consider that if you have 70 proxies, that's how they communicate back to the server (at least, that's how I understand it).

                          The items come back from the proxies to the trappers. If 5 come back simultaneously, your trappers are all busy, causing the others to wait, fail, or have to reschedule. It might be beneficial to try to boost that number a bunch. I have our setup (which you've seen the numbers for) with 100 trappers.

                          Comment

                          • zaicnupagadi
                            Member
                            Zabbix Certified SpecialistZabbix Certified Professional
                            • Dec 2010
                            • 73

                            #14
                            I had similar issue while I was adding new user, the SQL has been saying to me that it cannot add another record with ID fields "8".

                            Deleting the database and recreating it didn't work out, as I looked on that table, I saw that last user that has been added has that ID equal "8". So I have deleted that user, added him again, and later there was no problem with adding other users. I don't know what was the problem at the end, but deleting the last entry worked for me.

                            Comment

                            Working...