[op5-users] Réf. : Re: Réf. : Re: Réf. : Re: merlind crash after loosing mysql connection
Andreas Ericsson
ae at op5.se
Tue Sep 1 14:59:01 CEST 2009
nicolas.raspail at bnpparibas.com wrote:
> op5-users-bounces at lists.op5.com wrote on 01/09/2009 13:46:07:
>
>> nicolas.raspail at bnpparibas.com wrote:
>>> op5-users-bounces at lists.op5.com wrote on 31/08/2009 17:04:24:
>>>
>>>> nicolas.raspail at bnpparibas.com wrote:
>>>>> p5-users-bounces at lists.op5.com wrote on 31/08/2009 10:38:27:
>>>>>
>>> <snip>
>>>
>>>> Ok. In that case it's not a configuration error. Can you try using
> the
>>>> latest git snapshot (download it directly from git for simpler
> updates)
>>>> and see if that solves this particular problem?
>>>>
>>>> The latest core code changes can be found in v0.6.2-beta11.
>>>>
>>>> Thanks for your reports. I really appreciate them :-)
>>>>
>>> Hi,
>>>
>>> I have just compiled and installed the tarball from the git commit
>>> b0703a40b91d39b57d84d52ddb81a8e34933c362. I have modified the
>>> gen-version.sh script to add
>>> DEF_VER=v0.6.2-b0703a40b91d39b57d84d52ddb81a8e34933c362
>>>
>>> * When I start merlind, I get the following messages
>>>
>>> [1251803549] 6: Initializing IPC socket
>>> '/bnp/apps/nagios/merlin/ipc.sock' for daemon [1251803549] 6:
>>> dbi_conn_query_null(): Failed to run [SELECT host_name, current_state,
>
>>> state_type FROM merlindb.host ORDER BY host_name]: no database
>>> connection. Error-code is 7 [1251803549] 3: Attempting to reconnect to
>
>>> database [1251803549] 6: Successfully ran the previously failed query
>>> [1251803550] 6: Primed object states for 0 hosts and 14639 services
>>> [1251803550] 6: Merlin daemon
>>> v0.6.2-b0703a40b91d39b57d84d52ddb81a8e34933c362 successfully
> initialized
>>> [1251803550] 6: Accepting inbound connection on ipc socket
>>>
>>> * After that, a large number (41) of php importer process are launched
>>>
>>> [1251803550] 6: Executing import command 'php
>>> /bnp/apps/nagios/merlin/import.php
>>> --nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg
>>> --cache=/bnp/apps/nagios/var/objects.cache --db-name=merlindb
>>> --db-user=merlin --db-pass=xxx --db-host=eqd-nagios-sql'
>>> <40 times>
>>> [1251803554] 6: Handled 86 ipc events in 4.326 seconds
>>> [1251803566] 6: Executing import command 'php
>>> /bnp/apps/nagios/merlin/import.php
>>> --nagios-cfg=/bnp/apps/nagios/etc/nagios.cfg
>>> --cache=/bnp/apps/nagios/var/objects.cache --db-name=merlindb
>>> --db-user=merlin --db-pass=xxx --db-host=eqd-nagios-sql'
>>> [1251803566] 6: Handled 6 ipc events in 0.006 seconds
>>> [1251803566] 6: Handled 2 ipc events in 0.003 seconds
>>> [1251803566] 6: Handled 128 ipc events in 0.083 seconds
>>> [1251803566] 6: Handled 1 ipc events in 0.000 seconds
>>>
>>> I have stopped merlind because the load on the server was very high.
>>> Several minutes after I have stopped the merlind process, I can see
> php
>>> running, but they finally disappeared.
>>>
>>> I did not can test the reconnection after a failover of my mysql
> server
>>> because of this new merlind behaviour :)
>>>
>> Can you please send me the log from your eventbroker module as well
>> (the one call neb.log in the example config file)?
>>
>> What I think is happening is that your config is simply so large that
>> the import takes far too much time. I'll add a check to make sure we
>> aren't running one import while another one's working.
>>
>
> Hi
>
> unfortunately, a wrong permission on the logs directory prevent nagios
> to write the neb.log. Maybe a warning in nagios.log from the module would
> be a nice feature :)
>
> But I have corrected the permission, enabled again merlind for some
> minutes and stopped it. And I have attached the file in this email.
>
Thanks. I'll dig in to this tomorrow.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
More information about the op5-users
mailing list