[op5-users] Réf. : Re: Nagios start time delay with Merlin

nicolas.raspail at bnpparibas.com nicolas.raspail at bnpparibas.com
Wed Aug 26 13:09:34 CEST 2009


op5-users-bounces at lists.op5.com wrote on 26/08/2009 10:54:16:

> nicolas.raspail at bnpparibas.com wrote:
> > Hi
> > 
> > I'm currently evaluating Merlin to replace NDO. But since I have 
enabled 
> > Merlin, when Nagios is reloaded (by the gui or external command), 
during 
> > several minutes, merlind fills its log with some information saying 
that 
> > the database is updated. And during this time, no checks are performed 
by 
> > Nagios. I have see something like that with NDO, but with lower delay.
> > 
> > Is it the expected behavior or I have missed something ?
> 
> It might be. "some information" is a pretty vague description. What's
> happening is most likely that it claims to update the database 
correctly,
> which just shows that it's working and doing something. Next time you
> report something you find odd, please include a cut-n-paste section of
> the log. That way I'll be able to give you a fairly exact description
> of what's happening.
> 
> 
> > Is there a 
> > specific way to reload Nagios when merlin is used ? We are using 
NagiosQL 
> > to manage the configuration and restart Nagios from its GUI (external 
> > command, like in the Nagios GUI).
> > 
> 
> That should work just fine.
> 
> > Also, is there a configuration item to tell merlind to stop logging 
> > something like that
> > 
> > [1251181261] 7: Updating status for service 'some service' on host 
> > 'some-host'
> > [1251181261] 7: sel_val: 9; ipc_listen_sock: 5; ipc_sock: 9; net_sock: 
6
> > [1251181261] 7: select() returned 1 (errno = 0: Success)
> > 
> > We have 1987 hosts, 14689 services, and the log size grows up quickly.
> > 
> 
> Not while Merlin is still in beta, no. The mechanic to set the desired
> log_levels is already there, but it's not honored by the actual logging
> mechanism. Later, you'll be able to set
> 
>   log_levels = info
> 
> and have merlin only log things of severity "info" or above.
> 

Hi

sorry, that will be a long mail this one. Here is the missing information 
from my previous report.

Versions used :
        * RHEL 5.3 x86_64
        * Nagios 3.2.0
        * merlin 0.6.2-beta2
        * MySQL 5.0.45 (on its own server)


First step :
        * merlind is running
        * nagios is running
        * I enable the broker module in nagios.cfg and restart it
        * this is what I get in the daemon.log

[1251280642] 7: select() returned 1 (errno = 2: No such file or directory)

[1251280642] 7: Accepting inbound connection on ipc socket
[1251280642] 7: sel_val: 9; ipc_listen_sock: 5; ipc_sock: 9; net_sock: 6
[1251280642] 7: select() returned 1 (errno = 2: No such file or directory)

[1251280642] 6: inbound data available on ipc socket

[1251280642] 7: sel_val: 9; ipc_listen_sock: 5; ipc_sock: 9; net_sock: 6
[1251280642] 7: select() returned 1 (errno = 0: Success)

[1251280642] 6: inbound data available on ipc socket

[1251280642] 7: Successfully read 1 (null) event (131 bytes; 67 bytes 
body) from socket 9

[1251280642] 7: nagios_paths[0]: /bnp/apps/nagios/etc/nagios.cfg
[1251280642] 7: nagios_paths[1]: /bnp/apps/nagios/var/objects.cache
[1251280642] 7: Executing import command 'php 
/bnp/apps/nagios/merlin/import.php 
--nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg 
--cache=/bnp/apps/nagios/var/objects.cache --db-name=m
erlindb --db-user=merlin --db-pass=Chae2yei --db-host=eqd-nagios-sql'
[1251280659] 6: dbi_conn_query_null(): Failed to run [SELECT host_name, 
current_state, state_type FROM merlindb.host ORDER BY host_name]: 2006: 
MySQL server has gone away

        * merlind crashed and I don't know why it says that the MySQL 
server has gone away because it is working fine.


Second step :
        * nagios is still running and doing some checks
        * I restart merlin
        * this is what I get in the daemon.log

[1251280908] 6: Initializing IPC socket '/bnp/apps/nagios/merlin/ipc.sock' 
for daemon
[1251280908] 6: Primed object states for 1967 hosts and 14654 services
[1251280908] 6: Merlin daemon  successfully initialized
[1251280908] 7: sel_val: 6; ipc_listen_sock: 5; ipc_sock: -1; net_sock: 6
[1251280909] 7: select() returned 1 (errno = 0: Success)

[1251280909] 7: Accepting inbound connection on ipc socket
[1251280909] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1251280909] 7: select() returned 1 (errno = 0: Success)

[1251280909] 6: inbound data available on ipc socket

[1251280909] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1251280909] 7: select() returned 1 (errno = 0: Success)

[1251280909] 6: inbound data available on ipc socket

[1251280909] 7: Successfully read 1 (null) event (131 bytes; 67 bytes 
body) from socket 7

[1251280909] 7: nagios_paths[0]: /bnp/apps/nagios/etc/nagios.cfg
[1251280909] 7: nagios_paths[1]: /bnp/apps/nagios/var/objects.cache
[1251280909] 7: Executing import command 'php 
/bnp/apps/nagios/merlin/import.php 
--nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg 
--cache=/bnp/apps/nagios/var/objects.cache --db-name=m
erlindb --db-user=merlin --db-pass=Chae2yei --db-host=eqd-nagios-sql'
[1251280926] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1251280926] 7: select() returned 1 (errno = 0: Success)

[1251280926] 6: inbound data available on ipc socket

[1251280926] 7: Successfully read 1 NEBCALLBACK_SERVICE_STATUS_DATA event 
(832 bytes; 768 bytes body) from socket 7

[1251280926] 7: Updating status for service 'bnp-check-gprime-swap-snmp' 
on host 'cg41-026'
[1251280926] 7: sel_val: 7; ipc_listen_sock: 5; ipc_sock: 7; net_sock: 6
[1251280926] 7: select() returned 1 (errno = 0: Success)

and so on

        * All seems to be okay


Third step
        * merlind is still running
        * nagios is restarted
        * this is what I get in the daemon.log

[1251281935] 7: nagios_paths[0]: /bnp/apps/nagios/etc/nagios.cfg
[1251281935] 7: nagios_paths[1]: /bnp/apps/nagios/var/objects.cache
[1251281935] 7: Executing import command 'php 
/bnp/apps/nagios/merlin/import.php 
--nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg 
--cache=/bnp/apps/nagios/var/objects.cache --db-name=m
erlindb --db-user=merlin --db-pass=Chae2yei --db-host=eqd-nagios-sql'
[1251281952] 7: sel_val: 8; ipc_listen_sock: 5; ipc_sock: 8; net_sock: 6
[1251281952] 7: select() returned 1 (errno = 0: Success)

[1251281952] 6: inbound data available on ipc socket

[1251281952] 7: Successfully read 1 NEBCALLBACK_HOST_STATUS_DATA event 
(802 bytes; 738 bytes body) from socket 8

[1251281952] 7: Updating status for host 'SGRS-rf0-LTE0'
[1251281952] 7: sel_val: 8; ipc_listen_sock: 5; ipc_sock: 8; net_sock: 6
[1251281952] 7: select() returned 1 (errno = 0: Success)

[1251281952] 6: inbound data available on ipc socket

[1251281952] 7: Successfully read 1 NEBCALLBACK_HOST_STATUS_DATA event 
(798 bytes; 734 bytes body) from socket 8

[1251281952] 7: Updating status for host 'anasty-bd2'
[1251281952] 7: sel_val: 8; ipc_listen_sock: 5; ipc_sock: 8; net_sock: 6
[1251281952] 7: select() returned 1 (errno = 0: Success)

and so on
        * nagios has been restarted at timestamp 1251281934 and since 
that, no check have been made until timestamp 1251282809 ! 15 mins of 
update, that is a lot of time. The load on the mysql server is mostly null 
during this time. On the Nagios server, merlind doesn't use a lot of cpu


Also, During these tests, I have see my check and hosts latencies grow up 
and now, with Merlin enabled, I have a large number of orphaned checks.

With Merlin I have the following latencies (and it is increasing as I 
write my email) :

Service Check Latency:  0.00 / 1299.86 / 160.156 sec
Host Check Execution Time:      2.54 / 3.19 / 2.564 sec
Host Check Latency:     0.00 / 723.49 / 304.415 sec

Before Merlin, i don't have the exact values, but I remember that the 
service latency was under 50s and the host latency under 1s

Regards

Nicolas




This message and any attachments (the "message") is
intended solely for the addressees and is confidential. 
If you receive this message in error, please delete it and 
immediately notify the sender. Any use not in accord with 
its purpose, any dissemination or disclosure, either whole 
or partial, is prohibited except formal approval. The internet
can not guarantee the integrity of this message. 
BNP PARIBAS (and its subsidiaries) shall (will) not 
therefore be liable for the message if modified. 
Do not print this message unless it is necessary,
consider the environment.

                ---------------------------------------------

Ce message et toutes les pieces jointes (ci-apres le 
"message") sont etablis a l'intention exclusive de ses 
destinataires et sont confidentiels. Si vous recevez ce 
message par erreur, merci de le detruire et d'en avertir 
immediatement l'expediteur. Toute utilisation de ce 
message non conforme a sa destination, toute diffusion 
ou toute publication, totale ou partielle, est interdite, sauf 
autorisation expresse. L'internet ne permettant pas 
d'assurer l'integrite de ce message, BNP PARIBAS (et ses
filiales) decline(nt) toute responsabilite au titre de ce 
message, dans l'hypothese ou il aurait ete modifie.
N'imprimez ce message que si necessaire,
pensez a l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.op5.com/pipermail/op5-users/attachments/20090826/45f2daf3/attachment.html 


More information about the op5-users mailing list