[op5-users] Merlin and Ninja roadmap - performance data fixes going in?
Benjamin Ritcey
op5 at lists.ritcey.com
Fri Feb 26 18:16:17 CET 2010
(sorry for the delay in replying; I've out-clevered myself with some
mail filtering)
Peter, thank you for the clarification in the roadmap.
Max,
The RabbitMQ stuff is really just proof-of-concept; I setup RabbitMQ
& shoved some messages into an exchange, then had a consumer script
(in Perl) to read the messages out.
My thought was to just replace npcd with a periodic script that parses
the perfdata files and shoves them into a fanout exchange in RMQ. A
NEB broker would likely be more efficient, but C isn't my forte.
I was looking to modify process_perfdata.pl to read from a queue vs.
command-line -- it'd be a fairly minor code change, just something
like:
my $mq = Net::RabbitMQ->new();
$mq->connect("localhost", {});
$mq->channel_open(1);
$mq->consume(1, "pnpserver1");
(and then just sit in a loop consuming perfdata)
each NOC/graphserver machine would just have a different queue in the
consume() call. With a fanout exchange, each queue gets a copy of the
message, so adding a new graph server (e.g., for testing) would just
involve connecting to the exchange. I hadn't yet decided how I'd
setup the RMQ infrastructure - a copy on each NOC machine or a copy on
every machine? Persistent queues?
My POC code was slightly more complicated - it will timeout after X
seconds w/o any messages in the queue - but the gist of it is as I
describe. I'm happy to send you anything I have.
Thanks, all, for your time.
-b
--------------
Benjamin,
Hi, we are planning on using RabbitMQ as well; we have not started the
design phase of our implementation but wow it would be much preferable
to work on code that you have done and help enhance / bug fix etc aka
collaborate.
We are a very open source friendly shop; anything with Nagios we have
done we have been allowed to contribute back.
Mind sharing how you accomplished your Nagios -> Rabbit bridge?
I have a nice efficient pnp -> socket NEB module I lead and our team
contributed to at work that I am about to release open source; happy
to share that with you if it sounds like something you could use
(which would come out of a discussion about how you are integrated
with RabbitMQ if you will have that discussion with me).
I also have a blog that has a number of the performance tuning tips we
use for our current non-distributed model if you haven't seen it:
http://www.semintelligent.com/blog/?c=nagios
We currently are getting about 2500 hosts and 15000 active checks
(45000+ with passives included) out of a dual quad-core x86_64 host
with 8 GB RAM and SCSI disks; our polling cycle is 5 minutes for that
load.
Thanks,
Max
More information about the op5-users
mailing list