[op5-users] active checks & merlin

Russell Jennings russ at geekwhiz.com
Tue Sep 29 15:34:25 CEST 2009


DNX is purely for distributed monitoring and does not allow NOC's or  
anything like that. Looked into it a lot, and was sadly disappointed.  
But then I found merlin! and it was love at first sight :)

So with merlin, is it a goal of some sort to have it distribute the  
commands and configs to the nodes in some manner? Everytime i start  
thinking about the logistics about how to distribute it in a useful  
way my brain goes all gooey.

> This depends. If the status of the check doesn't change from UNKNOWN
> to something else, it should stay at 3/3. Note that the status of the
> check has nothing what so ever to do with what the plugin prints for
> output.

Well, for me, if the state starts out as ok (1/3) then goes unknown  
twice then OK again, the state will still be 2/3. It doesn't reset  
even though it recovered and all... so the next unknown it gets makes  
it 3/3... this is at least the behavior i observed in the nagios  
interface.

Thanks,
Russell

On Sep 29, 2009, at 3:46 AM, Andreas Ericsson wrote:

> On 09/28/2009 11:24 PM, Russell Jennings wrote:
>> So it's was a bit confusing/surprising , when i got merlin running,
>> that active checks weren't actually active. Took me a bit to figure
>> out that it's kinda a passive submittal. Is there a way to make the
>> active checks active? such as when merlin see's that nagios wants to
>> execute a check on a hostgroup it controls, actively tell the
>> responsible node(s) to execute that check and get back the data? THAT
>> would be sweet. Would give me back the "reschedule the next check..."
>> link in nagios, which i use in troubleshooting. if the NOC could
>> control when the checks are executed like that, it would mean that it
>> would give more control to the NOC on when it gets data. Just thought
>> i'd throw this idea out.. but you dudes are pretty smart in my book,
>> so i wouldn't be surprised if there's a why for this.
>>
>
> Why, thank you :-)
>
> There is, sort of, but we didn't write it. DNX is a check scheduler
> that uses slave nodes to do its actual work. I don't think you can
> decide which node does which checks with DNX, but perhaps I'm  
> mistaken.
> I haven't looked at it for quite some time.
>
> The idea is that commands should be distributed to the proper nodes
> for handling, although that's not implemented yet. Also, since merlin
> is a two-way communication thing, we need to make sure we know *which*
> commands to send which way, and how to configure it.
>
>> right now i have an active check that runs every 240 minutes that
>> generates an unknown state. i'll scale it down in production to maybe
>> 5 minutes or something (so long as it's bigger than the  
>> check_interval
>> on the node). So, i figure between that, if it is able to stay in the
>> unknown state for that long, after the max check and all is filled  
>> in,
>> that there actually IS a problem. I don't know if there's a better /
>> smarter way to handle this (so if you've got one, i'm all ears!)
>>
>> though also, if i let a service stay unknown, and it exceeds the
>> max_retry (so the status becomes something like 3/3) when he DOES get
>> data back, it stays as 3/3, and doesn't reset back to 1/3. not sure
>> what to make of this, but it toyed with me, since it means that if i
>> fix the issue and all, it still keeps it in 3/3 so the next unknown
>> that comes in results in an alert immediately.
>>
>
> This depends. If the status of the check doesn't change from UNKNOWN
> to something else, it should stay at 3/3. Note that the status of the
> check has nothing what so ever to do with what the plugin prints for
> output.
>
> -- 
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
>
> Considering the successes of the wars on alcohol, poverty, drugs and
> terror, I think we should give some serious thought to declaring war
> on peace.
> _______________________________________________
> op5-users mailing list
> op5-users at lists.op5.com
> http://lists.op5.com/mailman/listinfo/op5-users



More information about the op5-users mailing list