[op5-users] Merlin crashed on me?

Frater, Greg J GJFRATER at bechtel.com
Tue Jun 30 23:32:29 CEST 2009


>I have been occupied for a bit but now have some time to get back to
Merlin.  I grabbed the latest Merlin snapshot from git.  After compiling
it and having it recreate the Merlin database I ran the ulimit -c
unlimited command and started it, no core dumps yet.  I am seeing a SQL
error though, and what appears to be a configuration problem, I'm only
getting data in a few tables.  

>SQL problem (from daemon.log):

>[1246379421] 6: dbi_conn_query_null(): Failed to run [UPDATE
merlin.host SET scheduled_downtime_depth = scheduled_downtime_depth + 1
WHERE host_name = 'host52' AND service_description = '']: 1054: Unknown
column 'service_description' in 'where clause' 

>I'm also getting these, they don't seem to indicate a problem (the data
is in the database) but may be duplicate/unnecessary SQL calls? (also
from daemon.log):

>[1246379416] 6: dbi_conn_query_null(): Failed to run [INSERT INTO
merlin.comment(comment_type, host_name, service_description, entry_time,
author_name, comment_data, persistent, source, entry_type, expires,
expire_time, comment_id) VALUES(1, 'host0052', '', 1246379410, '(Nagios
Process)', 'This host has been scheduled for fixed downtime from
06-30-2009 09:29:56 to 06-30-2009 11:29:56.  Notifications for the host
will not be sent out during that time period.', 0, 0, 2, 0, 0, 12093)]:
>1062: Duplicate entry '12093' for key 2

>Plus Merlin does not appear to be writing any host or service data, the
only tables that are showing any data are the comment, program_status,
and scheduled_downtime tables.  Here is what I see in the daemon.log,
this pattern is repeated over and over again, I get this for each
host/service.

>...
>[1246380200] 6: inbound data available on ipc socket

>[1246380200] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA
event
>(555 bytes; 491 bytes body) from socket 7

>[1246380200] 3: Failed to get stored state for service 'CPU:
>Utilization' on host 'host0052'
>[1246380200] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock:
>6/usr/local/nagios/merlin/logs/
>[1246380200] 7: select() returned 1 (errno = 0: Success)

>[1246380200] 6: inbound data available on ipc socket 
>...


I looked at my config files and realized that I needed to run the import
script manually.  When I first looked at the merlin.conf file I thought
it would run as part of the daemon or something.  Since then, I'm not
seeing the SQL errors that I reported previously.  The daemon is still
crashing, sometimes stopping without a coredump but I'm not seeing the
SQL errors at least.

Regards, 

-greg 


More information about the op5-users mailing list