Ad Widget

Collapse

Падает zabbix server

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sitnikov
    Junior Member
    • Jan 2010
    • 15

    #1

    Падает zabbix server

    за 2 дня дважды упал zabbix сервер. первый раз это был 1.8.0 с mysql, второй раз 1.8.1 с pgsql

    в логах примерно такое

    [root@area51 session]# grep "One child process died" /tmp/zabbix_server.log
    23145:20100129:233516.009 One child process died (PID:23147). Exiting ...
    4795:20100129:235511.496 One child process died (PID:4797). Exiting ...

    после чего
    23145:20100129:233518.018 Syncing history data...
    23145:20100129:233518.507 Insufficient space for trends. Flushing to disk.
    (последная строка повторятся несколько сот раз)
    23145:20100129:233518.734 Syncing history data...done.
    23145:20100129:233518.734 Syncing trends data...
    23145:20100129:233531.231 Syncing trends data...done.

    после чего все процессы zabbix-server висят в

    4826 ? ZN 0:00 [zabbix_server] <defunct>
    4827 ? ZN 0:00 [zabbix_server] <defunct>
    4828 ? ZN 0:00 [zabbix_server] <defunct>
    4829 ? ZN 0:00 [zabbix_server] <defunct>

    попробовал запустить его заново получаю
    zabbix_server [4797]: ERROR: Configuration buffer is too small. Please increase CacheSize parameter.
    4795:20100129:235511.496 One child process died (PID:4797). Exiting ...

    и опять defunct процессы.

    CacheSize=300M

    P.S. web gui с mysql (innodb plugin) сильно тормозит (host: ~3000, items: ~150K).
  • sitnikov
    Junior Member
    • Jan 2010
    • 15

    #2
    в догонку

    5757:20100130:003855.989 One child process died (PID:5786). Exiting ...
    5757:20100130:003857.994 [Z3001] Connection to database 'zabbix' failed: [0] FATAL: the database system is shutting down

    5757:20100130:003857.994 [Z3005] Query failed: [0] Result is NULL [select oid from pg_type where typname = 'bytea']

    5757:20100130:003857.994 [Z3005] Query failed: [0] PGRES_FATAL_ERROR: [select oid from pg_type where typname = 'bytea']

    зачем сервер падает если отвалилась база данных ?

    Comment

    • dotneft
      Senior Member
      • Nov 2008
      • 699

      #3
      Äëÿ íà÷àëà íóæíî óâåëè÷èòü CacheSize â êîíôèãå:
      http://www.zabbix.com/documentation/...abbix_server?s[]=cachesize

      Äàëåå, íóæíî ëîã ïðèâåñòè â áîëåå ðàçâåðíóòîì âèäå:
      1) Ïîñëåäíþþ ñòðî÷êó ñ PID, êîòîðûé âàëèòñÿ. PID áåðåì èç ýòîé ñòðîêè "5757:20100130:003855.989 One child process died (PID:5786). Exiting ...", òóò îí PID:5786, à â ëîãå ïåðâûå öèôðû äî äâîåòî÷èÿ æòî ïèä ïðîöåññà.
      2) Âñå ÷òî åñòü ïîñëå ñòðîêè "5757:20100130:003855.989 One child process died (PID:5786). Exiting ...".

      ÇÛ: åñëè ñîîáùåíèå î CacheSize ïîñëå ïîâòîðíîãî çàïóñêà ïîÿâëÿåòñÿ, òî íóæíî ñíà÷àëà óáèòü ïðîöåññû zabbix_process <defunct>

      Comment

      • sitnikov
        Junior Member
        • Jan 2010
        • 15

        #4
        Originally posted by dotneft
        Äëÿ íà÷àëà íóæíî óâåëè÷èòü CacheSize â êîíôèãå:
        http://www.zabbix.com/documentation/...abbix_server?s[]=cachesize

        Äàëåå, íóæíî ëîã ïðèâåñòè â áîëåå ðàçâåðíóòîì âèäå:
        1) Ïîñëåäíþþ ñòðî÷êó ñ PID, êîòîðûé âàëèòñÿ. PID áåðåì èç ýòîé ñòðîêè "5757:20100130:003855.989 One child process died (PID:5786). Exiting ...", òóò îí PID:5786, à â ëîãå ïåðâûå öèôðû äî äâîåòî÷èÿ æòî ïèä ïðîöåññà.
        2) Âñå ÷òî åñòü ïîñëå ñòðîêè "5757:20100130:003855.989 One child process died (PID:5786). Exiting ...".
        уже не осталось этих строк в логе

        теперь новая беда

        7847:20100130:104012.384 [Z3005] Query failed: [0] Result is NULL [insert into ids (nodeid,table_name,field_name,nextid) values (0,'events','eventid',0)]
        7847:20100130:104012.384 [Z3005] Query failed: [0] Result is NULL [update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid']
        7847:20100130:104012.384 [Z3005] Query failed: [0] Result is NULL [select nextid from ids where nodeid=0 and table_name='events' and field_name='eventid']
        7847:20100130:104012.384 [Z3005] Query failed: [0] PGRES_FATAL_ERROR: [select nextid from ids where nodeid=0 and table_name='events' and field_name='eventid']
        7847:20100130:104012.384 [Z3005] Query failed: [0] Result is NULL [select max(eventid) from events where eventid between 0 and 99999999999999]
        7847:20100130:104012.384 [Z3005] Query failed: [0] PGRES_FATAL_ERROR: [select max(eventid) from events where eventid between 0 and 99999999999999]
        7847:20100130:104012.384 [Z3005] Query failed: [0] Result is NULL [insert into ids (nodeid,table_name,field_name,nextid) values (0,'events','eventid',0)]
        7847:20100130:104012.384 [Z3005] Query failed: [0] Result is NULL [update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid']
        7847:20100130:104012.384 [Z3005] Query failed: [0] Result is NULL [select nextid from ids where nodeid=0 and table_name='events' and field_name='eventid']
        7847:20100130:104012.384 [Z3005] Query failed: [0] PGRES_FATAL_ERROR: [select nextid from ids where nodeid=0 and table_name='events' and field_name='eventid']

        в огромных количествах. похоже zabbix сервер не умеет делать reconnect к базе.

        Comment

        • sitnikov
          Junior Member
          • Jan 2010
          • 15

          #5
          Originally posted by dotneft
          Äëÿ íà÷àëà íóæíî óâåëè÷èòü CacheSize â êîíôèãå:
          http://www.zabbix.com/documentation/...abbix_server?s[]=cachesize
          на момент падения у меня было порядка 250K items, получается что 300M не хватает для такого количества ?

          Comment

          Working...