[op5-users] Merlin crashed on me?

Frater, Greg J GJFRATER at bechtel.com
Wed Jul 1 18:26:24 CEST 2009


 
Frater, Greg J wrote:
>> 
>> I never get neb.log file, should I?  When I start nagios I see a 
>> console message that says 'Starting nagios:Logging to 
>> '/usr/local/nagios/merlin/logs/neb.log' but the log file never
appears.

>This almost certainly has to do with directory permissions. You can
try, as root, doing

>   # chmod 777 /usr/local/nagios/merlin/logs
>   # (restart nagios)

>and it should start working.

It did start working, the neb.log file is now being written, previously
the permissions were set as follows:  

drwxr-xr-x 2 root root    4096 Jun 16 09:07 logs


>> Ah, there's my crash, it dumped while I was writing this message.  

>Yes, we have a 64-bit system up and running now, but I still haven't
seen any crashes on it so I'm guessing we're just not exercising it as
heavily as you are. Does the crash by any chance always happen after
receiving the same type of event? Inspecting the last 10 or so lines of
daemon.log after a crash should tell you if this is so, since it logs
the event type quite a long time before it starts messing around with
free()ing any pointers.

Crash #1
daemon.log
[1246458609] 7: select() returned 1 (errno = 0: Success)
[1246458609] 6: inbound data available on ipc socket
[1246458609] 7: Successfully read 1 NEBCALLBACK_PROGRAM_STATUS_DATA
event (352 bytes; 288 bytes body) from socket 7
[1246458609] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246458611] 7: select() returned 1 (errno = 0: Success)
[1246458611] 6: inbound data available on ipc socket
[1246458611] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
(546 bytes; 482 bytes body) from socket 7
[1246458611] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246458611] 7: select() returned 1 (errno = 0: Success)
[1246458611] 6: inbound data available on ipc socket
[1246458611] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
(486 bytes; 422 bytes body) from socket 7


Crash #2
[1246462221] 7: select() returned 1 (errno = 0: Success)
[1246462221] 6: inbound data available on ipc socket
[1246462221] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
(546 bytes; 482 bytes body) from socket 7
[1246462221] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246462221] 7: select() returned 1 (errno = 0: Success)
[1246462221] 6: inbound data available on ipc socket
[1246462221] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA event
(575 bytes; 511 bytes body) from socket 7
[1246462221] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1246462221] 7: select() returned 1 (errno = 0: Success)
[1246462221] 6: inbound data available on ipc socket
[1246462221] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
(486 bytes; 422 bytes body) from socket 7

Success, well successful crash! :-)  I found the core dump files.  I
tar'ed one and sent it to you directly, I wasn't sure what is in it and
was not comfortable sending it to a public list.  Let me know if it is
useful.

Regards, 

-greg




More information about the op5-users mailing list