[op5-users] Réf. : Re: Réf. : Re: merlind crash after loosing mysql connection
nicolas.raspail at bnpparibas.com
nicolas.raspail at bnpparibas.com
Tue Sep 1 10:04:42 CEST 2009
op5-users-bounces at lists.op5.com wrote on 31/08/2009 17:04:24:
> nicolas.raspail at bnpparibas.com wrote:
> > p5-users-bounces at lists.op5.com wrote on 31/08/2009 10:38:27:
> >
> >> nicolas.raspail at bnpparibas.com wrote:
<snip>
> >
> > I have just installed the 0.6.2-beta10 version of merlind. At this
> > subject, is
> > there a place where we can find the running version ?
>
> Well, yes and no. If you're building from git, you should be able to
> see the exact version in the logs when Merlin is loaded. I just noticed
> that there was a bug in the gen-version.sh script that caused it to not
> print the DEF_VER variable properly when building from tarballs.
>
I can't access the git repository beacause of firewalls and proxies. So
I grab the tarball from your web interface :)
> > I see nothing in the
> > logs
> > of nagios or merlind. And in the source, in the file gen-version.sh,
there
> > is
> > only DEF_VER=v0.6.1 and this script seems to do nothing
> >
>
> You're not meant to run the script manually. It's run by invoking
> 'make', and it's supposed to create a file called version.c
>
> > [merlin at eqd-nagios01 merlin]$ ./gen-version.sh
> > #include "shared.h"
> > const char *merlin_version = "";
> >
> > But let's to why I have installed the new version : the lost of mysql
> > connection !
> >
>
> Right. I *think* I may have fixed this one, although a mismerge between
> the upstream Nagios core and some of our own patches forced me to
> redirect my efforts a while.
ok
>
> > ** After I switch over my MySQL server, I can see that in the log a
lot of
> > message like that :
> >
> > [1251728254] 6: dbi_conn_query_null(): Failed to run [UPDATE
> > merlindb.service SET initial_state = 0, flap_detection_enabled = 1,
> > low_flap_threshold = 0.000000, high_flap_threshold
> > = 0.000000, check_freshness = 0, freshness_threshold = 0,
> > process_performance_data = 1, active_checks_enabled = 1,
> > passive_checks_enabled = 1, event_handler_enabled = 1, obsess_ove
> > r_service = 1251669600, problem_has_been_acknowledged = 0,
> > acknowledgement_type = 0, check_type = 0, current_state = 0,
last_state =
> > 0, last_hard_state = 0, state_type = 1, current
> > _attempt = 1, current_event_id = 0, last_event_id = 0,
current_problem_id
> > = 0, last_problem_id = 0, latency = 0.032000, execution_time =
0.044196,
> > notifications_enabled = 1, last_n
> > otification = 0, next_check = 1251729152, should_be_scheduled = 1,
> > last_check = 1251728252, last_state_change = 1248342907,
> > last_hard_state_change = 1248342907, has_been_checked =
> > 1, current_notification_number = 0, current_notification_id = 0,
> > check_flapping_recovery_notification = 0, scheduled_downtime_depth =
0,
> > pending_flex_downtime = 0, is_flapping = 0,
> > flapping_comment_id = 0, percent_state_change = 0.000000, output =
'SNMP
> > OK - TODEFINE', long_output = '', perf_data = '' WHERE host_name =
'xxxx'
> > AND service_description = 'b
> > np-check-snmpd']: 2006: MySQL server has gone away
>
> Does it ever say that it's managed to connect o the MySQL server in
> the first place?
Yes, I think so because my database gets updated
Here is some log extract from daemon.log when I have started merlind
[1251727986] 6: Initializing IPC socket '/bnp/apps/nagios/merlin/ipc.sock'
for daemon
[1251727986] 6: Primed object states for 0 hosts and 0 services
[1251727986] 6: Merlin daemon successfully initialized
[1251727997] 6: Accepting inbound connection on ipc socket
[1251727997] 6: inbound data available on ipc socket
[1251727997] 6: Executing import command 'php
/bnp/apps/nagios/merlin/import.php
--nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg
--cache=/bnp/apps/nagios/var/ob jects.cache --db-name=merlindb
--db-user=merlin --db-pass=xxx --db-host=eqd-nagios-sql
--status-log=/bnp/apps/nagios/var/status.dat'
[1251727997] 6: Handled 5 ipc events in 0.002 seconds
[1251727998] 6: inbound data available on ipc socket
>
> >
> > Until that point, everything to be good and the merlind process is
still
> > running.
> >
> > ** But when the MySQL server is up again, I see a lot of messages like
> > that :
> >
> > [1251728254] 6: Handled 110 ipc events in 0.086 seconds
> > [1251728255] 6: inbound data available on ipc socket
> >
> > [1251728255] 6: dbi_conn_query_null(): Failed to run [UPDATE
> > merlindb.program_status SET is_running = 1, last_alive = 1251728255,
> > program_start = 1251727995, pid = 24400, daemon_mo
> > de = 1, last_command_check = 1251728254, last_log_rotation = 0,
> > notifications_enabled = 1, active_service_checks_enabled = 1,
> > passive_service_checks_enabled = 1, active_host_checks
> > _enabled = 1, passive_host_checks_enabled = 1, event_handlers_enabled
= 1,
> > flap_detection_enabled = 0, failure_prediction_enabled = 1,
> > process_performance_data = 0, obsess_over_hos
> > ts = 0, obsess_over_services = 0, modified_host_attributes = 0,
> > modified_service_attributes = 0, global_host_event_handler = '',
> > global_service_event_handler = ''WHERE instance_id
> > = 0]: 2006: MySQL server has gone away
>
> Basically the same, then. Again, has it ever stated that it has
> successfully connected to the MySQL server?
>
> >
> > And after that, the logs are filled with the same messages : handled
ipc
> > event, inbound data and failed query
> >
> > I have restarted the merlind process, it ran an import and after, it
works
> > fine again
> >
>
> Ok. In that case it's not a configuration error. Can you try using the
> latest git snapshot (download it directly from git for simpler updates)
> and see if that solves this particular problem?
>
> The latest core code changes can be found in v0.6.2-beta11.
>
I will try later in the day. But one last thing weird. Yesterday, after
restarted
the merlind process, ninja stop displaying any information about hosts and
services.
After some search, I have ssen that the table host show 0 rows. Other
tables get
updated but not host.
The logs are showing the process restarted and the initial import
[1251729329] 6: Handled 1 ipc events in 0.001 seconds
[1251729338] 6: Initializing IPC socket '/bnp/apps/nagios/merlin/ipc.sock'
for daemon
[1251729339] 6: Primed object states for 2004 hosts and 14809 services
[1251729339] 6: Merlin daemon successfully initialized
[1251729342] 6: Accepting inbound connection on ipc socket
[1251729342] 6: inbound data available on ipc socket
[1251729342] 6: Executing import command 'php
/bnp/apps/nagios/merlin/import.php
--nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg
--cache=/bnp/apps/nagios/var/ob
jects.cache --db-name=merlindb --db-user=merlin --db-pass=xxx
--db-host=eqd-nagios-sql --status-log=/bnp/apps/nagios/var/status.dat'
[1251729342] 6: Executing import command 'php
/bnp/apps/nagios/merlin/import.php
--nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg
--cache=/bnp/apps/nagios/var/ob
jects.cache --db-name=merlindb --db-user=merlin --db-pass=xxx
--db-host=eqd-nagios-sql --status-log=/bnp/apps/nagios/var/status.dat'
[1251729342] 6: Executing import command 'php
/bnp/apps/nagios/merlin/import.php
--nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg
--cache=/bnp/apps/nagios/var/ob
jects.cache --db-name=merlindb --db-user=merlin --db-pass=xxx
--db-host=eqd-nagios-sql --status-log=/bnp/apps/nagios/var/status.dat'
[1251729342] 6: Handled 66 ipc events in 0.477 seconds
[1251729358] 6: inbound data available on ipc socket
[1251729358] 6: Executing import command 'php
/bnp/apps/nagios/merlin/import.php
--nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg
--cache=/bnp/apps/nagios/var/ob
jects.cache --db-name=merlindb --db-user=merlin --db-pass=xxx
--db-host=eqd-nagios-sql --status-log=/bnp/apps/nagios/var/status.dat'
[1251729358] 6: Handled 50 ipc events in 0.019 seconds
[1251729359] 6: inbound data available on ipc socket
But the next import has been made at timestamp 1251767006
[1251767006] 6: Executing import command 'php
/bnp/apps/nagios/merlin/import.php
--nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg
--cache=/bnp/apps/nagios/var/ob
jects.cache --db-name=merlindb --db-user=merlin --db-pass=xxx
--db-host=eqd-nagios-sql --status-log=/bnp/apps/nagios/var/status.dat'
It seems that 37648 seconds has passed before an another import. Maybe
this is the
correct behaviour, but I don't really understand why the table host
contain 0 rows
for several minutes/hours.
> Thanks for your reports. I really appreciate them :-)
You're welcome. We are looking for a good tool to allow a team to have
access to the
Nagios events. We are testing NDOutils and Merlin. with NDO, nagios is too
long to
restart, and it is not acceptable. With Merlin, nagios starts immediately,
that's cool :)
We just want to be sure that there is no side effect with Merlin so I test
it. When
things seems to settle down, I will look the database schema to see how
they will be
able to get the events from the merlin database
Regards
Nicolas
This message and any attachments (the "message") is
intended solely for the addressees and is confidential.
If you receive this message in error, please delete it and
immediately notify the sender. Any use not in accord with
its purpose, any dissemination or disclosure, either whole
or partial, is prohibited except formal approval. The internet
can not guarantee the integrity of this message.
BNP PARIBAS (and its subsidiaries) shall (will) not
therefore be liable for the message if modified.
Do not print this message unless it is necessary,
consider the environment.
---------------------------------------------
Ce message et toutes les pieces jointes (ci-apres le
"message") sont etablis a l'intention exclusive de ses
destinataires et sont confidentiels. Si vous recevez ce
message par erreur, merci de le detruire et d'en avertir
immediatement l'expediteur. Toute utilisation de ce
message non conforme a sa destination, toute diffusion
ou toute publication, totale ou partielle, est interdite, sauf
autorisation expresse. L'internet ne permettant pas
d'assurer l'integrite de ce message, BNP PARIBAS (et ses
filiales) decline(nt) toute responsabilite au titre de ce
message, dans l'hypothese ou il aurait ete modifie.
N'imprimez ce message que si necessaire,
pensez a l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.op5.com/pipermail/op5-users/attachments/20090901/7e5d4e4c/attachment.html
More information about the op5-users
mailing list