[op5-users] Réf. : Re: merlind crash after loosing mysql connection
nicolas.raspail at bnpparibas.com
nicolas.raspail at bnpparibas.com
Mon Aug 31 16:37:33 CEST 2009
p5-users-bounces at lists.op5.com wrote on 31/08/2009 10:38:27:
> nicolas.raspail at bnpparibas.com wrote:
> > Hi
> >
> > Merlind (version 0.6.2-beta5) crash if it loose the connection with
the
> > MySQL. My sql server is running inside a cluster, and for some
reasons, I
> > have switched it to another node. To be sure that everything is okay,
I
> > have checked the log of merlin. As expected, I see the following
message :
> >
> > [1251471412] 6: dbi_conn_query_null(): Failed to run [SELECT
host_name,
> > current_state, state_type FROM merlindb.host ORDER BY host_name]:
2006:
> > MySQL server has gone away
> >
> > But what is not expected is the fact that no more merlind process is
> > running.
>
> Do you mean that no merlind process is running, or that only one is?
>
> > I have see the same behaviour with the beta2 as reported in a
> > previous email and Andreas answered :
> > "Hmm. It should try to reconnect at that point instead. Not sure if it
did
> > that in beta2."
> >
>
> It should, although it log that it does. What version of libdbi are you
> using? There has been some changes to the error handling in recent
versions
> of libdbi, so perhaps your version no longer returns DBI_ERROR_NOCONN
when
> it notices the database connection has died. I'll run some tests and add
> some logging and make sure it at least tries to reconnect.
>
> > It seems that even with the beta5, merlind is not trying to reconnect
> >
>
> Well, the code in sql.c hasn't changed between beta5 and beta10, so it
> wouldn't help to upgrade for this particular problem.
>
> I'll get back to you in an hour when I've run those tests.
Hi
I have just installed the 0.6.2-beta10 version of merlind. At this
subject, is
there a place where we can find the running version ? I see nothing in the
logs
of nagios or merlind. And in the source, in the file gen-version.sh, there
is
only DEF_VER=v0.6.1 and this script seems to do nothing
[merlin at eqd-nagios01 merlin]$ ./gen-version.sh
#include "shared.h"
const char *merlin_version = "";
But let's to why I have installed the new version : the lost of mysql
connection !
** After I switch over my MySQL server, I can see that in the log a lot of
message like that :
[1251728254] 6: dbi_conn_query_null(): Failed to run [UPDATE
merlindb.service SET initial_state = 0, flap_detection_enabled = 1,
low_flap_threshold = 0.000000, high_flap_threshold
= 0.000000, check_freshness = 0, freshness_threshold = 0,
process_performance_data = 1, active_checks_enabled = 1,
passive_checks_enabled = 1, event_handler_enabled = 1, obsess_ove
r_service = 1251669600, problem_has_been_acknowledged = 0,
acknowledgement_type = 0, check_type = 0, current_state = 0, last_state =
0, last_hard_state = 0, state_type = 1, current
_attempt = 1, current_event_id = 0, last_event_id = 0, current_problem_id
= 0, last_problem_id = 0, latency = 0.032000, execution_time = 0.044196,
notifications_enabled = 1, last_n
otification = 0, next_check = 1251729152, should_be_scheduled = 1,
last_check = 1251728252, last_state_change = 1248342907,
last_hard_state_change = 1248342907, has_been_checked =
1, current_notification_number = 0, current_notification_id = 0,
check_flapping_recovery_notification = 0, scheduled_downtime_depth = 0,
pending_flex_downtime = 0, is_flapping = 0,
flapping_comment_id = 0, percent_state_change = 0.000000, output = 'SNMP
OK - TODEFINE', long_output = '', perf_data = '' WHERE host_name = 'xxxx'
AND service_description = 'b
np-check-snmpd']: 2006: MySQL server has gone away
[1251728254] 6: dbi_conn_query_null(): Failed to run [UPDATE
merlindb.service SET initial_state = 0, flap_detection_enabled = 1,
low_flap_threshold = 0.000000, high_flap_threshold
= 0.000000, check_freshness = 0, freshness_threshold = 0,
process_performance_data = 1, active_checks_enabled = 1,
passive_checks_enabled = 1, event_handler_enabled = 1, obsess_ove
r_service = 0, problem_has_been_acknowledged = 0, acknowledgement_type =
0, check_type = 0, current_state = 0, last_state = 0, last_hard_state = 0,
state_type = 1, current_attempt
= 1, current_event_id = 0, last_event_id = 0, current_problem_id = 0,
last_problem_id = 0, latency = 0.032000, execution_time = 0.044196,
notifications_enabled = 1, last_notificati
on = 0, next_check = 1251729152, should_be_scheduled = 1, last_check =
1251728252, last_state_change = 1248342907, last_hard_state_change =
1248342907, has_been_checked = 1, curren
t_notification_number = 0, current_notification_id = 0,
check_flapping_recovery_notification = 0, scheduled_downtime_depth = 0,
pending_flex_downtime = 0, is_flapping = 0, flapping
_comment_id = 0, percent_state_change = 0.000000, output = 'SNMP OK -
TODEFINE', long_output = '', perf_data = '' WHERE host_name = 'xxx' AND
service_description = 'bnp-check-
snmpd']: 2006: MySQL server has gone away
[1251728254] 6: dbi_conn_query_null(): Failed to run [UPDATE
merlindb.service SET initial_state = 0, flap_detection_enabled = 1,
low_flap_threshold = 0.000000, high_flap_threshold
= 0.000000, check_freshness = 0, freshness_threshold = 0,
process_performance_data = 1, active_checks_enabled = 1,
passive_checks_enabled = 1, event_handler_enabled = 1, obsess_ove
r_service = 1251669600, problem_has_been_acknowledged = 0,
acknowledgement_type = 0, check_type = 0, current_state = 0, last_state =
0, last_hard_state = 0, state_type = 1, current
_attempt = 1, current_event_id = 0, last_event_id = 0, current_problem_id
= 0, last_problem_id = 0, latency = 0.037000, execution_time = 0.045664,
notifications_enabled = 1, last_n
otification = 0, next_check = 1251729152, should_be_scheduled = 1,
last_check = 1251728252, last_state_change = 1248343161,
last_hard_state_change = 1248343161, has_been_checked =
1, current_notification_number = 0, current_notification_id = 0,
check_flapping_recovery_notification = 0, scheduled_downtime_depth = 0,
pending_flex_downtime = 0, is_flapping = 0,
flapping_comment_id = 0, percent_state_change = 0.000000, output = 'SNMP
OK - TODEFINE', long_output = '', perf_data = '' WHERE host_name = 'xxx'
AND service_description = 'b
np-check-snmpd']: 2006: MySQL server has gone away
Until that point, everything to be good and the merlind process is still
running.
** But when the MySQL server is up again, I see a lot of messages like
that :
[1251728254] 6: Handled 110 ipc events in 0.086 seconds
[1251728255] 6: inbound data available on ipc socket
[1251728255] 6: dbi_conn_query_null(): Failed to run [UPDATE
merlindb.program_status SET is_running = 1, last_alive = 1251728255,
program_start = 1251727995, pid = 24400, daemon_mo
de = 1, last_command_check = 1251728254, last_log_rotation = 0,
notifications_enabled = 1, active_service_checks_enabled = 1,
passive_service_checks_enabled = 1, active_host_checks
_enabled = 1, passive_host_checks_enabled = 1, event_handlers_enabled = 1,
flap_detection_enabled = 0, failure_prediction_enabled = 1,
process_performance_data = 0, obsess_over_hos
ts = 0, obsess_over_services = 0, modified_host_attributes = 0,
modified_service_attributes = 0, global_host_event_handler = '',
global_service_event_handler = ''WHERE instance_id
= 0]: 2006: MySQL server has gone away
[1251728255] 6: Handled 1 ipc events in 0.001 seconds
[1251728256] 6: inbound data available on ipc socket
[1251728256] 6: dbi_conn_query_null(): Failed to run [UPDATE
merlindb.service SET initial_state = 0, flap_detection_enabled = 1,
low_flap_threshold = 0.000000, high_flap_threshold
= 0.000000, check_freshness = 0, freshness_threshold = 0,
process_performance_data = 1, active_checks_enabled = 1,
passive_checks_enabled = 1, event_handler_enabled = 1, obsess_ove
r_service = 1251669600, problem_has_been_acknowledged = 0,
acknowledgement_type = 0, check_type = 0, current_state = 0, last_state =
0, last_hard_state = 0, state_type = 1, current
_attempt = 1, current_event_id = 0, last_event_id = 0, current_problem_id
= 0, last_problem_id = 0, latency = 0.062000, execution_time = 0.044641,
notifications_enabled = 1, last_n
otification = 0, next_check = 1251729154, should_be_scheduled = 1,
last_check = 1251728254, last_state_change = 1248343539,
last_hard_state_change = 1248343539, has_been_checked =
1, current_notification_number = 0, current_notification_id = 0,
check_flapping_recovery_notification = 0, scheduled_downtime_depth = 0,
pending_flex_downtime = 0, is_flapping = 0,
flapping_comment_id = 0, percent_state_change = 0.000000, output = 'SNMP
OK - TODEFINE', long_output = '', perf_data = '' WHERE host_name = 'xxx'
AND service_description = 'b
np-check-snmpd']: 2006: MySQL server has gone away
And after that, the logs are filled with the same messages : handled ipc
event, inbound data and failed query
I have restarted the merlind process, it ran an import and after, it works
fine again
Regards
Nicolas
This message and any attachments (the "message") is
intended solely for the addressees and is confidential.
If you receive this message in error, please delete it and
immediately notify the sender. Any use not in accord with
its purpose, any dissemination or disclosure, either whole
or partial, is prohibited except formal approval. The internet
can not guarantee the integrity of this message.
BNP PARIBAS (and its subsidiaries) shall (will) not
therefore be liable for the message if modified.
Do not print this message unless it is necessary,
consider the environment.
---------------------------------------------
Ce message et toutes les pieces jointes (ci-apres le
"message") sont etablis a l'intention exclusive de ses
destinataires et sont confidentiels. Si vous recevez ce
message par erreur, merci de le detruire et d'en avertir
immediatement l'expediteur. Toute utilisation de ce
message non conforme a sa destination, toute diffusion
ou toute publication, totale ou partielle, est interdite, sauf
autorisation expresse. L'internet ne permettant pas
d'assurer l'integrite de ce message, BNP PARIBAS (et ses
filiales) decline(nt) toute responsabilite au titre de ce
message, dans l'hypothese ou il aurait ete modifie.
N'imprimez ce message que si necessaire,
pensez a l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.op5.com/pipermail/op5-users/attachments/20090831/7a5822af/attachment.html
More information about the op5-users
mailing list