[op5-users] New to Op5 and Merlin
Andreas Ericsson
ae at op5.se
Fri Jun 12 12:33:38 CEST 2009
Frater, Greg J wrote:
> Greetings All,
>
Hi there.
> I just joined the mailing list, primarily due to interest in your Merlin
> and Ninja projects. But first I wanted to say thanks to Op5 for the
> Nordic Meet on Nagios. I am from the US and was able to view many of
> the video sessions online and learned a lot from the presenters. Thanks
> for putting the Meet together and for sharing the event with the rest of
> us non Nordic folk. ;-) I also like being able to put a face with the
> emails that I read on the mailing lists.
>
Welcome, and you're welcome. We had almost 3000 viewers in total on the
videofeeds, so it was surely appreciated by more than a few people :-)
You know, non-nordic people are also welcome to the conference. The name
just reflects where it's being held, so make sure you attend next year :-)
> I do have questions about Merlin. We are using NDO to support our
> NagVis implementation and having intermittent troubles with it. I think
> the problems are related to MySQL performance but am pretty much a hack
> when it comes to DBA work so I'm not sure. I've done some tweaking to
> the MySQL settings which made things better for a while. In any case,
> I'm thinking I need to find a better solution for NagVis and as I
> understand it I can use either the ndo2fs and Merlin. My question is in
> regard to the stability of Merlin, would you say it's like alpha, beta
> or production stable?
Short answer:
It's very stable, but not feature complete.
Long answer:
That depends largely on what you intend to do with it. Insofar as stability
goes, we've had it running in our beta environments for the past two months
or so, inserting status data into the database. Cross-host event transport
has been working for over a year, but was broken sometime during the
changes to the socket polling made to increase performance a bit for what
we consider the "normal" case (single nagios server). I've just recently
gotten it to work again, but that part should not be considered stable.
There are three things left to do to make the status database part of
Merlin feature-complete.
1) Make it import object configuration automatically when the module
connects. This is partially implemented, so I just need to make the
import-command configurable so it works even if it's not located in
/home/exon/git/monitor/merlin. ETA 30 minutes (I'm working on it now).
2) Make it object-state aware so that the last_state_change field can be
populated in the database. This is necessary to display fe "Duration"
in the GUI. Estimated time required: 2 hours. It already primes the
status data tables from database after an import has been done, but
doesn't take that information into account when updating the database
for each successive event later (quite easy to hack up really) and it
doesn't update the object status when a statechange happens (also
extremely trivial to fix.
3) Make the merlin daemon write the state history table that the report
module writes today. This depends on the groundwork done for 2), and
the queries to make it happen already exist in the report-module, so
this shouldn't take long to finish. Estimated time reuired: 12 hours.
When those three items are completed, the database update parts of the
merlin daemon will be considered complete and ready for 1.0 release.
> Also do you have any sense for server sizing to
> support Merlin? In our environment we have around 500 hosts and 5300
> service checks, currently everything (Nagios, NDO database, NagVis,
> etc.) are all running on a single box (2 CPU 2.6MHz, 4GB RAM), currently
> the web interfaces all respond fine but we have a high check latency
> (~90 sec for service checks). What I want to know is at what point
> would you recommend moving Merlin/MySQL to a dedicated server separate
> from the Nagios server?
Basically never. The number of hosts and services Nagios is handling is
not a very good number to measure on. Instead, a checks-per-second (CPS)
number would be good to have. I've got a test-installation running on my
laptop which does about 80 CPS (with a default check interval of 5 mins,
this is equal to services+hosts = 80*300 = 24000). Without loading Merlin,
Nagios doesn't really exert any load on the machine (about 0.26, fairly
stably). With merlin loaded and updating status database, the load rises
to between 0.29 and 0.31. With NDOUtils, the load very nearly quadrupled
for the exact same settings.
Note that this is with an extremely stupid plugin that is written in C
and returns a random exit-code immediately. With *real* checks that have
to actually do something, I expect the load to be much higher. The
interesting part is the increase in load though, which is expected to
stay constant unless adding Merlin to the mix makes the server run out
of physical RAM. Since, in my setup, each merlin daemon consumes about
1.3MB of non-shared memory, I'm not really worried that this will happen.
> Thanks again for the sharing the Nordic Meet videos and for your work in
> this arena,
You're welcome. Very glad you liked it :-)
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
More information about the op5-users
mailing list