[op5-users] merlin troubleshooting
Russell Jennings
russ at geekwhiz.com
Wed Oct 14 14:36:03 CEST 2009
well, by "not working" i mean, the noc isn't updating nagios correctly
that a host is up, which it should know from the poller. in my nagios,
it still shows my given test host as "down" even though it should
update it as UP. When running latest on the poller, this is what
happens. rolled back to 6.2b4 and it works.
I AM running two different versions. But this is out of need, as the
latest on both NOC and Poller doesn't work, and is where i get that
error message. It seems downgrading the poller is the only way i can
get data to flow again.
as is, everything seems to be in harmony, at least as far as merlin
goes. Neither the daemon logs nor NEB (on either poller or noc) are
spitting out any heavy messages. But, this is only when the poller
version is old. When i run a later version, doesn't even need to be
THAT much later, things just stop working, and both logs on NOC and
Poller seem to have more errors. not sure what's a big deal in the
logs and whats not, i imagine level 6 & 7 messages are all normal, and
things like 4&3's are concerning errors?
So what should I make of this? I have a poller who is fine on 6.2b4,
but anything later (haven't pinpointed the exact point it breaks at
with versions) does not work. Could this be an actual problem (in
merlin), or is it more likely that though one way or the other, the
fault is with that particular server's config?
aside from just compiling different versions and seeing what works and
what doesn't, is there anything else i can do? or should i just try
running latest and post relevant errors from logs?
I am just trying to get a grasp of what i can/should do here. I know
being stuck on an older version is bad, but i am not sure what i can
do to help correct this.
Thanks,
Russell
On Oct 14, 2009, at 3:09 AM, Andreas Ericsson wrote:
> On 10/14/2009 03:15 AM, Russell Jennings wrote:
>> I am trying to troubleshoot why my merlin isn't working.
>
> "isn't working" could mean a lot of different things.
> What is your setup?
> What is the expected behaviour?
> What is the observed behaviour?
>
> "I'm running merlin with 2 pollers and 1 noc. The pollers are
> getting updated, but the checkresults never seem to reach the
> noc server" would be one way to disambiguate that "it isn't
> working" in a way that would be meaningful to me.
>
>> the daemon
>> log looks OK, (i think...) but i see this in the NOC's NEB log:
>>
>> [1255452191] 6: Received control packet code 0 for selection 'temp'
>> [1255452191] 4: Unknown control code: 0
>>
>> could this be the cause of why the NOC is not updating nagios?
>>
>
> Are you running different versions of merlin on the various
> instances? The protocol hasn't really changed, but later versions
> have some different capabilities. Since we're not at 1.0 yet, I've
> foregone backwards-compatibility as far as capabilities are
> concerned. I've still retained protocol compatibility ofcourse.
>
> --
> Andreas Ericsson andreas.ericsson at op5.se
> OP5 AB www.op5.se
> Tel: +46 8-230225 Fax: +46 8-230231
>
> Considering the successes of the wars on alcohol, poverty, drugs and
> terror, I think we should give some serious thought to declaring war
> on peace.
> _______________________________________________
> op5-users mailing list
> op5-users at lists.op5.com
> http://lists.op5.com/mailman/listinfo/op5-users
More information about the op5-users
mailing list