[op5-users] Merlin crashed on me?
Frater, Greg J
GJFRATER at bechtel.com
Tue Jun 30 23:32:29 CEST 2009
>I have been occupied for a bit but now have some time to get back to
Merlin. I grabbed the latest Merlin snapshot from git. After compiling
it and having it recreate the Merlin database I ran the ulimit -c
unlimited command and started it, no core dumps yet. I am seeing a SQL
error though, and what appears to be a configuration problem, I'm only
getting data in a few tables.
>SQL problem (from daemon.log):
>[1246379421] 6: dbi_conn_query_null(): Failed to run [UPDATE
merlin.host SET scheduled_downtime_depth = scheduled_downtime_depth + 1
WHERE host_name = 'host52' AND service_description = '']: 1054: Unknown
column 'service_description' in 'where clause'
>I'm also getting these, they don't seem to indicate a problem (the data
is in the database) but may be duplicate/unnecessary SQL calls? (also
from daemon.log):
>[1246379416] 6: dbi_conn_query_null(): Failed to run [INSERT INTO
merlin.comment(comment_type, host_name, service_description, entry_time,
author_name, comment_data, persistent, source, entry_type, expires,
expire_time, comment_id) VALUES(1, 'host0052', '', 1246379410, '(Nagios
Process)', 'This host has been scheduled for fixed downtime from
06-30-2009 09:29:56 to 06-30-2009 11:29:56. Notifications for the host
will not be sent out during that time period.', 0, 0, 2, 0, 0, 12093)]:
>1062: Duplicate entry '12093' for key 2
>Plus Merlin does not appear to be writing any host or service data, the
only tables that are showing any data are the comment, program_status,
and scheduled_downtime tables. Here is what I see in the daemon.log,
this pattern is repeated over and over again, I get this for each
host/service.
>...
>[1246380200] 6: inbound data available on ipc socket
>[1246380200] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA
event
>(555 bytes; 491 bytes body) from socket 7
>[1246380200] 3: Failed to get stored state for service 'CPU:
>Utilization' on host 'host0052'
>[1246380200] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock:
>6/usr/local/nagios/merlin/logs/
>[1246380200] 7: select() returned 1 (errno = 0: Success)
>[1246380200] 6: inbound data available on ipc socket
>...
I looked at my config files and realized that I needed to run the import
script manually. When I first looked at the merlin.conf file I thought
it would run as part of the daemon or something. Since then, I'm not
seeing the SQL errors that I reported previously. The daemon is still
crashing, sometimes stopping without a coredump but I'm not seeing the
SQL errors at least.
Regards,
-greg
More information about the op5-users
mailing list