[op5-users] active checks & merlin
Russell Jennings
russ at geekwhiz.com
Mon Sep 28 23:24:23 CEST 2009
So it's was a bit confusing/surprising , when i got merlin running,
that active checks weren't actually active. Took me a bit to figure
out that it's kinda a passive submittal. Is there a way to make the
active checks active? such as when merlin see's that nagios wants to
execute a check on a hostgroup it controls, actively tell the
responsible node(s) to execute that check and get back the data? THAT
would be sweet. Would give me back the "reschedule the next check..."
link in nagios, which i use in troubleshooting. if the NOC could
control when the checks are executed like that, it would mean that it
would give more control to the NOC on when it gets data. Just thought
i'd throw this idea out.. but you dudes are pretty smart in my book,
so i wouldn't be surprised if there's a why for this.
right now i have an active check that runs every 240 minutes that
generates an unknown state. i'll scale it down in production to maybe
5 minutes or something (so long as it's bigger than the check_interval
on the node). So, i figure between that, if it is able to stay in the
unknown state for that long, after the max check and all is filled in,
that there actually IS a problem. I don't know if there's a better /
smarter way to handle this (so if you've got one, i'm all ears!)
though also, if i let a service stay unknown, and it exceeds the
max_retry (so the status becomes something like 3/3) when he DOES get
data back, it stays as 3/3, and doesn't reset back to 1/3. not sure
what to make of this, but it toyed with me, since it means that if i
fix the issue and all, it still keeps it in 3/3 so the next unknown
that comes in results in an alert immediately.
Thanks,
Russell
More information about the op5-users
mailing list