[op5-users] Merlin crashed on me?
Andreas Ericsson
ae at op5.se
Wed Jul 1 10:41:20 CEST 2009
Frater, Greg J wrote:
>> I have been occupied for a bit but now have some time to get back to
> Merlin. I grabbed the latest Merlin snapshot from git. After compiling
> it and having it recreate the Merlin database I ran the ulimit -c
> unlimited command and started it, no core dumps yet. I am seeing a SQL
> error though, and what appears to be a configuration problem, I'm only
> getting data in a few tables.
>
>> SQL problem (from daemon.log):
>
>> [1246379421] 6: dbi_conn_query_null(): Failed to run [UPDATE
> merlin.host SET scheduled_downtime_depth = scheduled_downtime_depth + 1
> WHERE host_name = 'host52' AND service_description = '']: 1054: Unknown
> column 'service_description' in 'where clause'
>
>> I'm also getting these, they don't seem to indicate a problem (the data
> is in the database) but may be duplicate/unnecessary SQL calls? (also
> from daemon.log):
>
>> [1246379416] 6: dbi_conn_query_null(): Failed to run [INSERT INTO
> merlin.comment(comment_type, host_name, service_description, entry_time,
> author_name, comment_data, persistent, source, entry_type, expires,
> expire_time, comment_id) VALUES(1, 'host0052', '', 1246379410, '(Nagios
> Process)', 'This host has been scheduled for fixed downtime from
> 06-30-2009 09:29:56 to 06-30-2009 11:29:56. Notifications for the host
> will not be sent out during that time period.', 0, 0, 2, 0, 0, 12093)]:
>> 1062: Duplicate entry '12093' for key 2
>
This I think is a bug in Nagios. It's supposed to use a code called
NEBCALLBACK_LOAD_DOWNTIME when it starts up, but it appears it uses
NEBCALLBACK_ADD_DOWNTIME instead, which makes the module think it's
a new downtime. I'll investigate this. The bug has zero actual impact
though as the downtime is already in the database if you get this
error, so I won't be making this a priority.
>> Plus Merlin does not appear to be writing any host or service data, the
> only tables that are showing any data are the comment, program_status,
> and scheduled_downtime tables. Here is what I see in the daemon.log,
> this pattern is repeated over and over again, I get this for each
> host/service.
>
>> ...
>> [1246380200] 6: inbound data available on ipc socket
>
>> [1246380200] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA
> event
>> (555 bytes; 491 bytes body) from socket 7
>
>> [1246380200] 3: Failed to get stored state for service 'CPU:
>> Utilization' on host 'host0052'
>> [1246380200] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock:
>> 6/usr/local/nagios/merlin/logs/
>> [1246380200] 7: select() returned 1 (errno = 0: Success)
>
>> [1246380200] 6: inbound data available on ipc socket
>> ...
>
>
> I looked at my config files and realized that I needed to run the import
> script manually.
That shouldn't be necessary with the merlin version you're using. The
daemon should run the import script when the module connects and sends
it the paths it needs (objects.cache and status.log), so this really
needs to be amended. Can you try and see what happens if you start the
daemon first and Nagios afterwards? It could be as simple as just an
ordering thing. If not, have you got the CLI-version of PHP installed?
Merlin requires that in order to be able to run the import script
properly.
> When I first looked at the merlin.conf file I thought
> it would run as part of the daemon or something. Since then, I'm not
> seeing the SQL errors that I reported previously.
I'm guessing you're still seeing the error when adding or deleting host
downtime (UPDATE merlin.host set scheduled_downtime = scheduled_downtime + 1
where host_name = 'foo' and service_description = 'something'), as that
was an actual bug. I've pushed a fix for it now, so if you update in 5
minutes you should get the fixed version.
> The daemon is still
> crashing, sometimes stopping without a coredump but I'm not seeing the
> SQL errors at least.
>
I hate this, as I can't reproduce it myself. As such it's extremely hard
to fix :-/
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
More information about the op5-users
mailing list