[op5-users] Réf. : Re: merlind crash after loosing mysql connection

nicolas.raspail at bnpparibas.com nicolas.raspail at bnpparibas.com
Mon Aug 31 16:37:33 CEST 2009


p5-users-bounces at lists.op5.com wrote on 31/08/2009 10:38:27:

> nicolas.raspail at bnpparibas.com wrote:
> > Hi
> > 
> > Merlind (version 0.6.2-beta5) crash if it loose the connection with 
the 
> > MySQL. My sql server is running inside a cluster, and for some 
reasons,  I 
> > have switched it to another node. To be sure that everything is okay, 
I 
> > have checked the log of merlin. As expected, I see the following 
message :
> > 
> > [1251471412] 6: dbi_conn_query_null(): Failed to run [SELECT 
host_name, 
> > current_state, state_type FROM merlindb.host ORDER BY host_name]: 
2006: 
> > MySQL server has gone away
> > 
> > But what is not expected is the fact that no more merlind process is 
> > running.
> 
> Do you mean that no merlind process is running, or that only one is?
> 
> > I have see the same behaviour with the beta2 as reported in a 
> > previous email and Andreas answered :
> > "Hmm. It should try to reconnect at that point instead. Not sure if it 
did 
> > that in beta2."
> > 
> 
> It should, although it log that it does. What version of libdbi are you
> using? There has been some changes to the error handling in recent 
versions
> of libdbi, so perhaps your version no longer returns DBI_ERROR_NOCONN 
when
> it notices the database connection has died. I'll run some tests and add
> some logging and make sure it at least tries to reconnect.
> 
> > It seems that even with the beta5, merlind is not trying to reconnect
> > 
> 
> Well, the code in sql.c hasn't changed between beta5 and beta10, so it
> wouldn't help to upgrade for this particular problem.
> 
> I'll get back to you in an hour when I've run those tests.

Hi

I have just installed the 0.6.2-beta10 version of merlind. At this 
subject, is
there a place where we can find the running version ? I see nothing in the 
logs
of nagios or merlind. And in the source, in the file gen-version.sh, there 
is
only DEF_VER=v0.6.1 and this script seems to do nothing

[merlin at eqd-nagios01 merlin]$ ./gen-version.sh 
#include "shared.h"
const char *merlin_version = "";

But let's to why I have installed the new  version : the lost of mysql 
connection !

** After I switch over my MySQL server, I can see that in the log a lot of 
message like that :

[1251728254] 6: dbi_conn_query_null(): Failed to run [UPDATE 
merlindb.service SET initial_state = 0, flap_detection_enabled = 1, 
low_flap_threshold = 0.000000, high_flap_threshold 
= 0.000000, check_freshness = 0, freshness_threshold = 0, 
process_performance_data = 1, active_checks_enabled = 1, 
passive_checks_enabled = 1, event_handler_enabled = 1, obsess_ove
r_service = 1251669600, problem_has_been_acknowledged = 0, 
acknowledgement_type = 0, check_type = 0, current_state = 0, last_state = 
0, last_hard_state = 0, state_type = 1, current
_attempt = 1, current_event_id = 0, last_event_id = 0, current_problem_id 
= 0, last_problem_id = 0, latency = 0.032000, execution_time = 0.044196, 
notifications_enabled = 1, last_n
otification = 0, next_check = 1251729152, should_be_scheduled = 1, 
last_check = 1251728252, last_state_change = 1248342907, 
last_hard_state_change = 1248342907, has_been_checked = 
1, current_notification_number = 0, current_notification_id = 0, 
check_flapping_recovery_notification = 0, scheduled_downtime_depth = 0, 
pending_flex_downtime = 0, is_flapping = 0,
 flapping_comment_id = 0, percent_state_change = 0.000000, output = 'SNMP 
OK - TODEFINE', long_output = '', perf_data = '' WHERE host_name = 'xxxx' 
AND service_description = 'b
np-check-snmpd']: 2006: MySQL server has gone away
[1251728254] 6: dbi_conn_query_null(): Failed to run [UPDATE 
merlindb.service SET initial_state = 0, flap_detection_enabled = 1, 
low_flap_threshold = 0.000000, high_flap_threshold 
= 0.000000, check_freshness = 0, freshness_threshold = 0, 
process_performance_data = 1, active_checks_enabled = 1, 
passive_checks_enabled = 1, event_handler_enabled = 1, obsess_ove
r_service = 0, problem_has_been_acknowledged = 0, acknowledgement_type = 
0, check_type = 0, current_state = 0, last_state = 0, last_hard_state = 0, 
state_type = 1, current_attempt 
= 1, current_event_id = 0, last_event_id = 0, current_problem_id = 0, 
last_problem_id = 0, latency = 0.032000, execution_time = 0.044196, 
notifications_enabled = 1, last_notificati
on = 0, next_check = 1251729152, should_be_scheduled = 1, last_check = 
1251728252, last_state_change = 1248342907, last_hard_state_change = 
1248342907, has_been_checked = 1, curren
t_notification_number = 0, current_notification_id = 0, 
check_flapping_recovery_notification = 0, scheduled_downtime_depth = 0, 
pending_flex_downtime = 0, is_flapping = 0, flapping
_comment_id = 0, percent_state_change = 0.000000, output = 'SNMP OK - 
TODEFINE', long_output = '', perf_data = '' WHERE host_name = 'xxx' AND 
service_description = 'bnp-check-
snmpd']: 2006: MySQL server has gone away
[1251728254] 6: dbi_conn_query_null(): Failed to run [UPDATE 
merlindb.service SET initial_state = 0, flap_detection_enabled = 1, 
low_flap_threshold = 0.000000, high_flap_threshold 
= 0.000000, check_freshness = 0, freshness_threshold = 0, 
process_performance_data = 1, active_checks_enabled = 1, 
passive_checks_enabled = 1, event_handler_enabled = 1, obsess_ove
r_service = 1251669600, problem_has_been_acknowledged = 0, 
acknowledgement_type = 0, check_type = 0, current_state = 0, last_state = 
0, last_hard_state = 0, state_type = 1, current
_attempt = 1, current_event_id = 0, last_event_id = 0, current_problem_id 
= 0, last_problem_id = 0, latency = 0.037000, execution_time = 0.045664, 
notifications_enabled = 1, last_n
otification = 0, next_check = 1251729152, should_be_scheduled = 1, 
last_check = 1251728252, last_state_change = 1248343161, 
last_hard_state_change = 1248343161, has_been_checked = 
1, current_notification_number = 0, current_notification_id = 0, 
check_flapping_recovery_notification = 0, scheduled_downtime_depth = 0, 
pending_flex_downtime = 0, is_flapping = 0,
 flapping_comment_id = 0, percent_state_change = 0.000000, output = 'SNMP 
OK - TODEFINE', long_output = '', perf_data = '' WHERE host_name = 'xxx' 
AND service_description = 'b
np-check-snmpd']: 2006: MySQL server has gone away

Until that point, everything to be good and the merlind process is still 
running.

** But when the MySQL server is up again, I see a lot of messages like 
that :

[1251728254] 6: Handled 110 ipc events in 0.086 seconds
[1251728255] 6: inbound data available on ipc socket

[1251728255] 6: dbi_conn_query_null(): Failed to run [UPDATE 
merlindb.program_status SET is_running = 1, last_alive = 1251728255, 
program_start = 1251727995, pid = 24400, daemon_mo
de = 1, last_command_check = 1251728254, last_log_rotation = 0, 
notifications_enabled = 1, active_service_checks_enabled = 1, 
passive_service_checks_enabled = 1, active_host_checks
_enabled = 1, passive_host_checks_enabled = 1, event_handlers_enabled = 1, 
flap_detection_enabled = 0, failure_prediction_enabled = 1, 
process_performance_data = 0, obsess_over_hos
ts = 0, obsess_over_services = 0, modified_host_attributes = 0, 
modified_service_attributes = 0, global_host_event_handler = '', 
global_service_event_handler = ''WHERE instance_id 
= 0]: 2006: MySQL server has gone away
[1251728255] 6: Handled 1 ipc events in 0.001 seconds
[1251728256] 6: inbound data available on ipc socket

[1251728256] 6: dbi_conn_query_null(): Failed to run [UPDATE 
merlindb.service SET initial_state = 0, flap_detection_enabled = 1, 
low_flap_threshold = 0.000000, high_flap_threshold 
= 0.000000, check_freshness = 0, freshness_threshold = 0, 
process_performance_data = 1, active_checks_enabled = 1, 
passive_checks_enabled = 1, event_handler_enabled = 1, obsess_ove
r_service = 1251669600, problem_has_been_acknowledged = 0, 
acknowledgement_type = 0, check_type = 0, current_state = 0, last_state = 
0, last_hard_state = 0, state_type = 1, current
_attempt = 1, current_event_id = 0, last_event_id = 0, current_problem_id 
= 0, last_problem_id = 0, latency = 0.062000, execution_time = 0.044641, 
notifications_enabled = 1, last_n
otification = 0, next_check = 1251729154, should_be_scheduled = 1, 
last_check = 1251728254, last_state_change = 1248343539, 
last_hard_state_change = 1248343539, has_been_checked = 
1, current_notification_number = 0, current_notification_id = 0, 
check_flapping_recovery_notification = 0, scheduled_downtime_depth = 0, 
pending_flex_downtime = 0, is_flapping = 0,
 flapping_comment_id = 0, percent_state_change = 0.000000, output = 'SNMP 
OK - TODEFINE', long_output = '', perf_data = '' WHERE host_name = 'xxx' 
AND service_description = 'b
np-check-snmpd']: 2006: MySQL server has gone away

And after that, the logs are filled with the same messages : handled ipc 
event, inbound data and failed query

I have restarted the merlind process, it ran an import and after, it works 
fine again

Regards

Nicolas




This message and any attachments (the "message") is
intended solely for the addressees and is confidential. 
If you receive this message in error, please delete it and 
immediately notify the sender. Any use not in accord with 
its purpose, any dissemination or disclosure, either whole 
or partial, is prohibited except formal approval. The internet
can not guarantee the integrity of this message. 
BNP PARIBAS (and its subsidiaries) shall (will) not 
therefore be liable for the message if modified. 
Do not print this message unless it is necessary,
consider the environment.

                ---------------------------------------------

Ce message et toutes les pieces jointes (ci-apres le 
"message") sont etablis a l'intention exclusive de ses 
destinataires et sont confidentiels. Si vous recevez ce 
message par erreur, merci de le detruire et d'en avertir 
immediatement l'expediteur. Toute utilisation de ce 
message non conforme a sa destination, toute diffusion 
ou toute publication, totale ou partielle, est interdite, sauf 
autorisation expresse. L'internet ne permettant pas 
d'assurer l'integrite de ce message, BNP PARIBAS (et ses
filiales) decline(nt) toute responsabilite au titre de ce 
message, dans l'hypothese ou il aurait ete modifie.
N'imprimez ce message que si necessaire,
pensez a l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.op5.com/pipermail/op5-users/attachments/20090831/7a5822af/attachment.html 


More information about the op5-users mailing list