[op5-users] active checks & merlin

Russell Jennings russ at geekwhiz.com
Mon Sep 28 23:24:23 CEST 2009


So it's was a bit confusing/surprising , when i got merlin running,  
that active checks weren't actually active. Took me a bit to figure  
out that it's kinda a passive submittal. Is there a way to make the  
active checks active? such as when merlin see's that nagios wants to  
execute a check on a hostgroup it controls, actively tell the  
responsible node(s) to execute that check and get back the data? THAT  
would be sweet. Would give me back the "reschedule the next check..."  
link in nagios, which i use in troubleshooting. if the NOC could  
control when the checks are executed like that, it would mean that it  
would give more control to the NOC on when it gets data. Just thought  
i'd throw this idea out.. but you dudes are pretty smart in my book,  
so i wouldn't be surprised if there's a why for this.

right now i have an active check that runs every 240 minutes that  
generates an unknown state. i'll scale it down in production to maybe  
5 minutes or something (so long as it's bigger than the check_interval  
on the node). So, i figure between that, if it is able to stay in the  
unknown state for that long, after the max check and all is filled in,  
that there actually IS a problem. I don't know if there's a better /  
smarter way to handle this (so if you've got one, i'm all ears!)

though also, if i let a service stay unknown, and it exceeds the  
max_retry (so the status becomes something like 3/3) when he DOES get  
data back, it stays as 3/3, and doesn't reset back to 1/3. not sure  
what to make of this, but it toyed with me, since it means that if i  
fix the issue and all, it still keeps it in 3/3 so the next unknown  
that comes in results in an alert immediately.

Thanks,
Russell



More information about the op5-users mailing list