[op5-users] Merlin crashed on me?

Patrik Båt Patrik.Bat at cypoint.se
Fri Jul 3 15:07:06 CEST 2009


1.
chown -R nagios:www /path/to/merlin
touch /var/run/merlin.pid
chown nagios:www /var/run/merlin.pid

2.
/etc/init.d/merlind restart

3.

ls -la /path/to/merlin/ipc.sock <-- check if the sock exist.

-----Original Message-----
From: op5-users-bounces at lists.op5.com [mailto:op5-users-bounces at lists.op5.com] On Behalf Of Frater, Greg J
Sent: Thursday, July 02, 2009 6:05 PM
To: Mailinglist for op5's products
Subject: Re: [op5-users] Merlin crashed on me?

>>>Ooh, I'd quite like to know what that host check looks like. It seems
>>>as if it crashes on the same host-check result both times (judging by
the size only, which is quite a poor heuristic, but still).
>>>
>>>I'll re-enable the debugging machinery that dumps inbound messages to
>>>a
>>>binary logfile. When that's done, I'll need you to run Merlin until
it crashes again so I get the sequence of events leading up to the
>>>actual crash in the format Merlin sees them. If I replay the same
event-chain on our 64-bit machine, I *should* get the same crash
>>>you're getting. If that's the case, finding and fixing this bug
should be fairly trivial.
>>>
>>>
>>Let me know when you've got that done, I'll run it again.

>I see it in git, I've grabbed it and am testing right now.  I'll send
results when I get them.

When I started Merlin first and then Nagios it did not run the import
script.  I was getting a bunch of errors in the neb log and daemon log
that I think are related to the import script not running.

neb.log:
# tail logs/neb.log
[1246546087] 6: Active check result processed for service 'memory:
System Page Table Entries' on host 'host176'
[1246546087] 6: ipc socket isn't ready to accept data: Success
[1246546087] 6: Active check result processed for service 'memory:
physical' on host 'host64'
[1246546087] 6: Active check result processed for service 'battery
health' on host 'host-ups'
[1246546087] 6: ipc socket isn't ready to accept data: Success
[1246546087] 6: Active check result processed for service 'memory: non
paged' on host 'host362'
[1246546087] 6: Active check result processed for service 'service: all'
on host 'host124'
[1246546087] 6: ipc socket isn't ready to accept data: Success
[1246546087] 6: Active check result processed for service 'memory:
virtual' on host 'host347'
[1246546087] 6: Active check result processed for service 'service: all'
on host 'host227'

daemon.log
# tail logs/daemon.log

[1246546082] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA event
(597 bytes; 533 bytes body) from socket 7

[1246546082] 7: Inserting check result for service 'memory: System Page
Table Entries' on host 'host165'
[1246546082] 3: Failed to get stored state for service 'memory: System
Page Table Entries' on host 'host165'
[1246546082] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246546082] 7: select() returned 1 (errno = 0: Success)

[1246546082] 6: inbound data available on ipc socket

The coredump file was only 992K.  I don't think this one is valid/useful
because the import script did not run, thus data was not being written
to the database.  I'll send it along anyways just in case it is useful.


The second time, I started merlind after Nagios was started, this time
the import script ran.  I got some errors on the console from the import
script, I think I already sent these once, here they are again in case
they are useful.

# service merlind start
Logging to '/usr/local/nagios/merlin/logs/daemon.log'
# Importing objects to database merlin
importing objects from /usr/local/nagios/var/objects.cache
importing status from /usr/local/nagios/var/status.dat
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>REPLACE INTO hostdowntime(id) VALUES('1')</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>REPLACE INTO hostdowntime(id) VALUES('2')</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>REPLACE INTO hostdowntime(id) VALUES('3')</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>describe hostdowntime</b><br />
SQL query failed with the following error message;<br />
Table 'merlin.hostdowntime' doesn't exist<br />
Query was;<br />
<b>REPLACE INTO hostdowntime(id) VALUES('4')</b><br />
obj_array is not empty
Array
(
    [0] => hostdowntime
)
PHP Warning:  assert(): Assertion "empty($obj_array)" failed in
/usr/local/nagios/merlin/object_importer.inc.php on line 492

Here is the end of the neb.log file:
# tail -n30 logs/neb.log
[1246550286] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550286] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550286] 6: ipc socket isn't ready to accept data: Connection
refused
[1246550286] 6: Active check result processed for service 'memory:
System Page Table Entries' on host 'host368'
[1246550286] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550286] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550286] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550286] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550286] 6: ipc socket isn't ready to accept data: Connection
refused
[1246550286] 6: Active check result processed for service 'service: all'
on host 'host241'
[1246550286] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550286] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550286] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550286] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550286] 6: ipc socket isn't ready to accept data: Connection
refused
[1246550289] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550289] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550289] 7: reaping ipc events
[1246550289] 7: Asked to read from ipc socket with negative value
[1246550289] 7: **** SCHEDULING NEW REAPING AT 1246550294
[1246550289] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550289] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550289] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550289] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550289] 6: ipc socket isn't ready to accept data: Connection
refused
[1246550290] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550290] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550290] 6: Initializing IPC socket
'/usr/local/nagios/merlin/ipc.sock' for module
[1246550290] 3: Failed to connect to ipc socket (111): Connection
refused
[1246550290] 6: ipc socket isn't ready to accept data: Connection
refused

And the daemon.log file:
# tail -n50 logs/daemon.log
[1246550033] 7: Inserting check result for service 'service: all' on
host 'host177'
[1246550033] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246550033] 7: select() returned 1 (errno = 0: Success)

[1246550033] 6: inbound data available on ipc socket

[1246550033] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA event
(519 bytes; 455 bytes body) from socket 7

[1246550033] 7: Inserting check result for service 'service: all' on
host 'host360'
[1246550033] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246550033] 7: select() returned 1 (errno = 0: Success)

[1246550033] 6: inbound data available on ipc socket

[1246550033] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA event
(508 bytes; 444 bytes body) from socket 7

[1246550033] 7: Inserting check result for service 'service: all' on
host 'host205'
[1246550033] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246550033] 7: select() returned 1 (errno = 0: Success)

[1246550033] 6: inbound data available on ipc socket

[1246550033] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA event
(530 bytes; 466 bytes body) from socket 7

[1246550033] 7: Inserting check result for service 'remaining battery
time' on host 'host7-ups'
[1246550033] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246550033] 7: select() returned 1 (errno = 0: Success)

[1246550033] 6: inbound data available on ipc socket

[1246550033] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA event
(511 bytes; 447 bytes body) from socket 7

[1246550033] 7: Inserting check result for service 'service: all' on
host 'host227'
[1246550033] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246550033] 7: select() returned 1 (errno = 0: Success)

[1246550033] 6: inbound data available on ipc socket

[1246550033] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
(546 bytes; 482 bytes body) from socket 7

[1246550033] 7: Inserting check result for host 'host1-ups' to database
[1246550033] 6: dbi_conn_query_null(): Failed to run [UPDATE merlin.host
SET current_attempt = 1, check_type = 0, state_type = 1, current_state =
0, timeout = 30, start_time = 1246550023, end_time = 1246550026,
early_timeout = 0, execution_time = 0.020534, latency = '7.187',
last_check = 1246550026, return_code = 0, output = 'PING OK - Packet
loss = 0%, RTA = 0.97 ms', long_output = '', perf_data =
'rta=0.969000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0' WHERE
host_name = ''host1-ups'']: 1064: You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version for the
right syntax to use near 'host1-ups''' at line 1
[1246550033] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246550033] 7: select() returned 1 (errno = 0: Success)

[1246550033] 6: inbound data available on ipc socket

[1246550033] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
(486 bytes; 422 bytes body) from socket 7

[1246550033] 7: Inserting check result for host 'host-probe' to database

I've sent the coredump files seperately.  I was expecting separate
binary log files but none were generated, hope this is what you needed.

Regards,

-greg
_______________________________________________
op5-users mailing list
op5-users at lists.op5.com
http://lists.op5.com/mailman/listinfo/op5-users




More information about the op5-users mailing list