[op5-users] merlin troubleshooting

Russell Jennings russ at geekwhiz.com
Wed Oct 14 14:36:03 CEST 2009


well, by "not working" i mean, the noc isn't updating nagios correctly  
that a host is up, which it should know from the poller. in my nagios,  
it still shows my given test host as "down" even though it should  
update it as UP. When running latest on the poller, this is what  
happens. rolled back to 6.2b4 and it works.

I AM running two different versions. But this is out of need, as the  
latest on both NOC and Poller doesn't work, and is where i get that  
error message. It seems downgrading the poller is the only way i can  
get data to flow again.

as is, everything seems to be in harmony, at least as far as merlin  
goes. Neither the daemon logs nor NEB (on either poller or noc) are  
spitting out any heavy messages. But, this is only when the poller  
version is old. When i run a later version, doesn't even need to be  
THAT much later, things just stop working, and both logs on NOC and  
Poller seem to have more errors. not sure what's a big deal in the  
logs and whats not, i imagine level 6 & 7 messages are all normal, and  
things like 4&3's are concerning errors?

So what should I make of this? I have a poller who is fine on 6.2b4,  
but anything later (haven't pinpointed the exact point it breaks at  
with versions) does not work. Could this be an actual problem (in  
merlin), or is it more likely that though one way or the other, the  
fault is with that particular server's config?

aside from just compiling different versions and seeing what works and  
what doesn't, is there anything else i can do? or should i just try  
running latest and post relevant errors from logs?

I am just trying to get a grasp of what i can/should do here. I know  
being stuck on an older version is bad, but i am not sure what i can  
do to help correct this.

Thanks,
Russell




On Oct 14, 2009, at 3:09 AM, Andreas Ericsson wrote:

> On 10/14/2009 03:15 AM, Russell Jennings wrote:
>> I am trying to troubleshoot why my merlin isn't working.
>
> "isn't working" could mean a lot of different things.
> What is your setup?
> What is the expected behaviour?
> What is the observed behaviour?
>
> "I'm running merlin with 2 pollers and 1 noc. The pollers are
> getting updated, but the checkresults never seem to reach the
> noc server" would be one way to disambiguate that "it isn't
> working" in a way that would be meaningful to me.
>
>> the daemon
>> log looks OK, (i think...) but i see this in the NOC's NEB log:
>>
>> [1255452191] 6: Received control packet code 0 for selection 'temp'
>> [1255452191] 4: Unknown control code: 0
>>
>> could this be the cause of why the NOC is not updating nagios?
>>
>
> Are you running different versions of merlin on the various
> instances?  The protocol hasn't really changed, but later versions
> have some different capabilities. Since we're not at 1.0 yet, I've
> foregone backwards-compatibility as far as capabilities are
> concerned. I've still retained protocol compatibility ofcourse.
>
> -- 
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
>
> Considering the successes of the wars on alcohol, poverty, drugs and
> terror, I think we should give some serious thought to declaring war
> on peace.
> _______________________________________________
> op5-users mailing list
> op5-users at lists.op5.com
> http://lists.op5.com/mailman/listinfo/op5-users



More information about the op5-users mailing list