[op5-users] Merlin crashed on me?

a.hanifi at ville.laval.qc.ca a.hanifi at ville.laval.qc.ca
Thu Jul 2 05:57:31 CEST 2009


im wondering if Selinux could be the problem .. i  saw something related to libselinux in the Stack last day ... so maybe just be sure Selinux is Set to disabled  (not permissive and not enforcing)

________________________________________
From: op5-users-bounces at lists.op5.com [op5-users-bounces at lists.op5.com] On Behalf Of Andreas Ericsson [ae at op5.se]
Sent: Wednesday, July 01, 2009 5:52 PM
To: Mailinglist for op5's products
Subject: Re: [op5-users] Merlin crashed on me?

Frater, Greg J wrote:
>
> Frater, Greg J wrote:
>>> I never get neb.log file, should I?  When I start nagios I see a
>>> console message that says 'Starting nagios:Logging to
>>> '/usr/local/nagios/merlin/logs/neb.log' but the log file never
> appears.
>
>> This almost certainly has to do with directory permissions. You can
> try, as root, doing
>
>>   # chmod 777 /usr/local/nagios/merlin/logs
>>   # (restart nagios)
>
>> and it should start working.
>
> It did start working, the neb.log file is now being written, previously
> the permissions were set as follows:
>
> drwxr-xr-x 2 root root    4096 Jun 16 09:07 logs
>
>
>>> Ah, there's my crash, it dumped while I was writing this message.
>
>> Yes, we have a 64-bit system up and running now, but I still haven't
> seen any crashes on it so I'm guessing we're just not exercising it as
> heavily as you are. Does the crash by any chance always happen after
> receiving the same type of event? Inspecting the last 10 or so lines of
> daemon.log after a crash should tell you if this is so, since it logs
> the event type quite a long time before it starts messing around with
> free()ing any pointers.
>
> Crash #1
> daemon.log
> [1246458609] 7: select() returned 1 (errno = 0: Success)
> [1246458609] 6: inbound data available on ipc socket
> [1246458609] 7: Successfully read 1 NEBCALLBACK_PROGRAM_STATUS_DATA
> event (352 bytes; 288 bytes body) from socket 7
> [1246458609] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
> [1246458611] 7: select() returned 1 (errno = 0: Success)
> [1246458611] 6: inbound data available on ipc socket
> [1246458611] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
> (546 bytes; 482 bytes body) from socket 7
> [1246458611] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
> [1246458611] 7: select() returned 1 (errno = 0: Success)
> [1246458611] 6: inbound data available on ipc socket
> [1246458611] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
> (486 bytes; 422 bytes body) from socket 7
>
>
> Crash #2
> [1246462221] 7: select() returned 1 (errno = 0: Success)
> [1246462221] 6: inbound data available on ipc socket
> [1246462221] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
> (546 bytes; 482 bytes body) from socket 7
> [1246462221] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
> [1246462221] 7: select() returned 1 (errno = 0: Success)
> [1246462221] 6: inbound data available on ipc socket
> [1246462221] 7: Successfully read 1 NEBCALLBACK_SERVICE_CHECK_DATA event
> (575 bytes; 511 bytes body) from socket 7
> [1246462221] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
> [1246462221] 7: select() returned 1 (errno = 0: Success)
> [1246462221] 6: inbound data available on ipc socket
> [1246462221] 7: Successfully read 1 NEBCALLBACK_HOST_CHECK_DATA event
> (486 bytes; 422 bytes body) from socket 7
>

Ooh, I'd quite like to know what that host check looks like. It seems as
if it crashes on the same host-check result both times (judging by the
size only, which is quite a poor heuristic, but still).

I'll re-enable the debugging machinery that dumps inbound messages to a
binary logfile. When that's done, I'll need you to run Merlin until it
crashes again so I get the sequence of events leading up to the actual
crash in the format Merlin sees them. If I replay the same event-chain
on our 64-bit machine, I *should* get the same crash you're getting. If
that's the case, finding and fixing this bug should be fairly trivial.

--
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
_______________________________________________
op5-users mailing list
op5-users at lists.op5.com
http://lists.op5.com/mailman/listinfo/op5-users


More information about the op5-users mailing list