[op5-users] Merlin and Ninja roadmap - performance data fixes going in?

Benjamin Ritcey op5 at lists.ritcey.com
Fri Feb 26 18:16:17 CET 2010


(sorry for the delay in replying;  I've out-clevered myself with some
mail filtering)

Peter, thank you for the clarification in the roadmap.

Max,

The RabbitMQ stuff is really just proof-of-concept;  I setup RabbitMQ
& shoved some messages into an exchange, then had a consumer script
(in Perl) to read the messages out.

My thought was to just replace npcd with a periodic script that parses
the perfdata files and shoves them into a fanout exchange in RMQ.  A
NEB broker would likely be more efficient, but C isn't my forte.

I was looking to modify process_perfdata.pl to read from a queue vs.
command-line -- it'd be a fairly minor code change, just something
like:

my $mq = Net::RabbitMQ->new();
$mq->connect("localhost", {});
$mq->channel_open(1);
$mq->consume(1, "pnpserver1");

(and then just sit in a loop consuming perfdata)

each NOC/graphserver machine would just have a different queue in the
consume() call.  With a fanout exchange, each queue gets a copy of the
message, so adding a new graph server (e.g., for testing) would just
involve connecting to the exchange.  I hadn't yet decided how I'd
setup the RMQ infrastructure - a copy on each NOC machine or a copy on
every machine?  Persistent queues?

My POC code was slightly more complicated - it will timeout after X
seconds w/o any messages in the queue - but the gist of it is as I
describe.  I'm happy to send you anything I have.

Thanks, all, for your time.

-b
--------------

Benjamin,

Hi, we are planning on using RabbitMQ as well; we have not started the
design phase of our implementation but wow it would be much preferable
to work on code that you have done and help enhance / bug fix etc aka
collaborate.

We are a very open source friendly shop; anything with Nagios we have
done we have been allowed to contribute back.

Mind sharing how you accomplished your Nagios -> Rabbit bridge?

I have a nice efficient pnp -> socket NEB module I lead and our team
contributed to at work that I am about to release open source; happy
to share that with you if it sounds like something you could use
(which would come out of a discussion about how you are integrated
with RabbitMQ if you will have that discussion with me).

I also have a blog that has a number of the performance tuning tips we
use for our current non-distributed model if you haven't seen it:

http://www.semintelligent.com/blog/?c=nagios

We currently are getting about 2500 hosts and 15000 active checks
(45000+ with passives included) out of a dual quad-core x86_64 host
with 8 GB RAM and SCSI disks; our polling cycle is 5 minutes for that
load.

Thanks,
Max


More information about the op5-users mailing list