[op5-users] merlin noc-poller setup

Daniel Tuecks dtuecks at googlemail.com
Thu Dec 3 17:52:12 CET 2009


Hello List!

I am trying to setup nagios/merlin in a noc/poller scenario. Therefore I
have installed two Nagios 3.2.0 servers (nagios-a and nagios-b) with
sample-configs (make install-config). For testing purposes I created two
hosts (server-a and server-b) and put each of those hosts in separate
hostgroups (server-group-a and server-group-b).

Now I want
- nagios-b to check server-a & server-b.
- Nagios server 'nagios-a' should only check server-a.
- Nagios server 'nagios-a' should only recieve passive checkresults for
server-b from nagios-b:


------NOC----------       send pasv results    ------POLLER-------
| nagios-a        |  <<---------------------   | nagios-b        |
-------------------       server-b             -------------------
  |                                               |
  |-- check & display server-a                    |-- check & display server-a
  |-- display server-b                            |-- check & display server-b

(see below for nagios / merlin config files)

With merlin-0.6.6.tar.gz I can't get this done. I see a very strange
behaviour in
the neb.log:

## 'server-b' is a member of 'server-group-b', so can't add to poller
for 'server-group-b'
yes, it is a member of server-group-b.

## 'server-group-b' is a selection without hosts. Are you sure you want this?
no, it is not a selection without hosts (as stated one line before)

Checks for hostgroup 'server-group-b' were not disabled on nagios-a. Both
nagios-servers were still checking both hostgroups (and therefore both servers)



With merlin checked out from git (and switched to 'next') things were also a
little bit strange.

The neb.log on nagios-a looked like this:

## [1259848873] 6: Merlin Module Loaded
## [1259848873] 6: setting connect and disconnect handlers
## [1259848873] 6: Coredumps in /root
## [1259848873] 6: Merlin module v0.6.6p18-g26db1b7e41ae initialized
successfully
## [1259848913] 6: Object configuration parsed.
## [1259848913] 6: Creating hash tables
## [1259848913] 6: Creating host object tree
## [1259848913] 6: Creating service object tree
## [1259848913] 6: Attempting ipc connect
## [1259848913] 6: Shoutcasting active status through IPC socket
## [1259848913] 6: Running on_connect hook for module
## [1259848914] 6: ipc successfully connected
## [1259848914] 6: Reaping ipc events
## [1259848914] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848914] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848914] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848914] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848914] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848914] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 4: Handled 5 'ipc' events in 0.252 seconds in: 5, out: 0
## [1259848914] 6: Scheduling next ipc reaping at 1259848919
## [1259848914] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848914] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848914] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848919] 6: Reaping ipc events
## [1259848919] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848919] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848919] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848919] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848919] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848919] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848919] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Inbound IPC event, callback 18, len 100, type -80
## [1259848919] 4: Ignoring unrecognized/unhandled callback type: 18
(NEBCALLBACK_PROGRAM_STATUS_DATA)
## [1259848919] 6: Inbound IPC event, callback 19, len 334, type 0

Now 'server-group-b' was disabled on nagios-server 'a'. This looked
promising but
the nagios daemon died after a few seconds.

## Nagios Core 3.2.0
## Copyright (c) 2009 Nagios Core Development Team and Community Contributors
## Copyright (c) 1999-2009 Ethan Galstad
## Last Modified: 08-12-2009
## License: GPL
##
## Website: http://www.nagios.org
## Nagios 3.2.0 starting... (PID=12781)
## Local time is Thu Dec 03 15:01:13 CET 2009
## Logging to '/opt/nagios/var/log/merlin-neb.log'
## Caught SIGSEGV, shutting down...

Furthermore I'd say that the alternating reception of "code 3" and "code 2"
packets looks fishy.

The neb.log on nagios-server 'b':

## [1259848862] 6: Merlin Module Loaded
## [1259848862] 6: setting connect and disconnect handlers
## [1259848862] 6: Coredumps in /root
## [1259848862] 6: Merlin module v0.6.6p18-g26db1b7e41ae initialized
successfully
## [1259848902] 6: Object configuration parsed.
## [1259848902] 6: Creating hash tables
## [1259848902] 6: Attempting ipc connect
## [1259848902] 6: Shoutcasting active status through IPC socket
## [1259848902] 6: Running on_connect hook for module
## [1259848902] 6: ipc successfully connected
## [1259848902] 6: Reaping ipc events
## [1259848902] 4: Handled 0 'ipc' events in 0.043 seconds in: 0, out: 0
## [1259848902] 6: Scheduling next ipc reaping at 1259848907
## [1259848902] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848902] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848907] 6: Reaping ipc events
## [1259848907] 4: Handled 0 'ipc' events in 4.335 seconds in: 0, out: 0
## [1259848907] 6: Scheduling next ipc reaping at 1259848912
## [1259848908] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848912] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848912] 4: key 'server-b' doesn't match any possible selection
## [1259848912] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848912] 4: key 'server-b' doesn't match any possible selection
## [1259848912] 6: Reaping ipc events
## [1259848912] 4: Handled 0 'ipc' events in 9.404 seconds in: 0, out: 0
## [1259848912] 6: Scheduling next ipc reaping at 1259848917
## [1259848912] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848914] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848917] 6: Reaping ipc events
## [1259848917] 4: Handled 0 'ipc' events in 14.519 seconds in: 0, out: 0
## [1259848917] 6: Scheduling next ipc reaping at 1259848922
## [1259848920] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848922] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848922] 6: Reaping ipc events
## [1259848922] 4: Handled 0 'ipc' events in 19.319 seconds in: 0, out: 0
## [1259848922] 6: Scheduling next ipc reaping at 1259848927
## [1259848926] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848927] 6: Reaping ipc events
## [1259848927] 4: Handled 0 'ipc' events in 24.400 seconds in: 0, out: 0
## [1259848927] 6: Scheduling next ipc reaping at 1259848932
## [1259848932] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848932] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848932] 4: key 'server-b' doesn't match any possible selection
## [1259848932] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848932] 4: key 'server-b' doesn't match any possible selection
## [1259848932] 6: Reaping ipc events
## [1259848932] 4: Handled 0 'ipc' events in 29.657 seconds in: 0, out: 0
## [1259848932] 6: Scheduling next ipc reaping at 1259848937
## [1259848932] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848937] 6: Reaping ipc events
## [1259848937] 4: Handled 0 'ipc' events in 34.452 seconds in: 0, out: 0
## [1259848937] 6: Scheduling next ipc reaping at 1259848942
## [1259848938] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848942] 6: Reaping ipc events
## [1259848942] 4: Handled 0 'ipc' events in 39.496 seconds in: 0, out: 0
## [1259848942] 6: Scheduling next ipc reaping at 1259848947
## [1259848942] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848944] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848947] 6: Reaping ipc events
## [1259848947] 4: Handled 0 'ipc' events in 44.334 seconds in: 0, out: 0
## [1259848947] 6: Scheduling next ipc reaping at 1259848952
## [1259848950] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848952] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848952] 4: key 'localhost' doesn't match any possible selection
## [1259848952] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848952] 4: key 'localhost' doesn't match any possible selection
## [1259848952] 6: Reaping ipc events
## [1259848952] 4: Handled 0 'ipc' events in 49.431 seconds in: 0, out: 0
## [1259848952] 6: Scheduling next ipc reaping at 1259848957
## [1259848952] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848956] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848957] 6: Reaping ipc events
## [1259848957] 4: Handled 0 'ipc' events in 54.492 seconds in: 0, out: 0
## [1259848957] 6: Scheduling next ipc reaping at 1259848962
## [1259848962] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848962] 6: Reaping ipc events
## [1259848962] 4: Handled 0 'ipc' events in 59.349 seconds in: 0, out: 0
## [1259848962] 6: Scheduling next ipc reaping at 1259848967
## [1259848962] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848967] 6: Reaping ipc events
## [1259848967] 4: Handled 0 'ipc' events in 64.410 seconds in: 0, out: 0
## [1259848967] 6: Scheduling next ipc reaping at 1259848972
## [1259848968] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848972] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848972] 4: key 'localhost' doesn't match any possible selection
## [1259848972] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848972] 4: key 'localhost' doesn't match any possible selection
## [1259848972] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848972] 4: key 'localhost' doesn't match any possible selection
## [1259848972] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848972] 4: key 'localhost' doesn't match any possible selection
## [1259848972] 6: Reaping ipc events

The suspicious line here was:
## key 'server-b' doesn't match any possible selection

What does that mean?

Another thing that puzzles me is the option to create binary logs
(ipc_debug_read/write). Is this supposed to work? I can't find those
logs...

I have no clue what I'm doing so terribly wrong here..  I really hope
you guys can put me
in the right direction :)

Daniel



---------------------------------------------
----NAGIOS CONFIG----------------------------
---------------------------------------------

### server-b ###############

define host{
        use                     linux-server
        host_name               server-b
        alias                   server-b
        address                 10.100.5.10
        }

define service{
        use                     local-service
        host_name               server-b
        service_description     PING
	check_command           check_ping!100.0,20%!500.0,60%
        }

define service{
        use                     local-service
        host_name               server-b
        service_description     HTTP
	check_command           check_http
	notifications_enabled   0
        }

### server-a ######################

define host{
        use                     linux-server
        host_name               server-a
        alias                   server-a
        address                 10.100.5.11
        }

define service{
        use                     local-service
        host_name               server-a
        service_description     PING
	check_command           check_ping!100.0,20%!500.0,60%
        }

define service{
        use                     local-service
        host_name               server-a
        service_description     HTTP
	check_command           check_http
	notifications_enabled   0
        }

### server-group-b ###############

define hostgroup{
        hostgroup_name  server-group-b
        members         server-b
        }

### server-group-a ###############

define hostgroup{
        hostgroup_name  server-group-a
        members         server-a
        }





---------------------------------------------
----MERLIN CONFIG FILES----------------------
---------------------------------------------

#################################################
### nagios-server-a #############################
#################################################





poller nagios-b {
 address = 10.100.5.2;
 port = 15551;
 hostgroup = server-group-b;
}


#
# Sample configuration file for merlin
#
# Default options have been commented out
#
ipc_socket = /opt/nagios/merlin/current/ipc.sock;

# address to listen to. 0.0.0.0 is default
#address = 0.0.0.0;

# module-specific configuration options.
module {
	# textual log of normal hum-drum events
	# log_file = /opt/nagios/merlin/current/logs/neb.log;
	log_file = /opt/nagios/var/log/merlin-neb.log;

	# binary log of everything read/written from/to the ipc socket
	ipc_debug_read = /opt/nagios/var/log/neb.ipc.read.bin;
	ipc_debug_write = /opt/nagios/var/log/neb.ipc.write.bin;
}

# daemon-specific config options
daemon {
	pidfile = /var/run/merlin.pid

	# same as the "module" section above
	log_file = /opt/nagios/merlin/current/logs/daemon.log;
#	ipc_debug_read = /opt/nagios/merlin/current/logs/daemon.ipc.read.bin;
#	ipc_debug_write = /opt/nagios/merlin/current/logs/daemon.ipc.write.bin;
        ipc_debug_read = /opt/nagios/var/log/daemon.ipc.read.bin;
        ipc_debug_write = /opt/nagios/var/log/daemon.ipc.write.bin;




	# The import_program is responsible for priming the merlin database
	# with configuration information and an initial import of status data.
	# It's invoked with the following arguments:
	# --cache=/path/to/objects.cache
	# --status-log=/path/to/status.log
	# --db-name=database_name
	# --db-user=database_user_name
	# --db-pass=database_password
	# --db-host=database_host
	# The database parameters are taken from "database" section if such
	# a section exists.
	import_program = php /opt/nagios/merlin/current/import.php

	# port to listen to. 15551 is default. This is a daemon
	# specific config setting, as the module never listens to
	# the network
	port = 15551;
	database {
		name = merlin
		user = merlin
		pass = merlin
		host = localhost
		type = mysql
	}
}




#################################################
### nagios-server-b #############################
#################################################







noc nagios-a {
 address = 10.100.5.1;
 port = 15551;
}

#
# Sample configuration file for merlin
#
# Default options have been commented out
#
ipc_socket = /opt/nagios/merlin/current/ipc.sock;

# address to listen to. 0.0.0.0 is default
#address = 0.0.0.0;

# module-specific configuration options.
module {
	# textual log of normal hum-drum events
	# log_file = /opt/nagios/merlin/current/logs/neb.log;
	log_file = /opt/nagios/var/log/merlin-neb.log;

	# binary log of everything read/written from/to the ipc socket
	ipc_debug_read = /opt/nagios/var/log/neb.ipc.read.bin;
	ipc_debug_write = /opt/nagios/var/log/neb.ipc.write.bin;
}

# daemon-specific config options
daemon {
	pidfile = /var/run/merlin.pid

	# same as the "module" section above
	log_file = /opt/nagios/merlin/current/logs/daemon.log;
	ipc_debug_read = /opt/nagios/var/log/daemon.ipc.read.bin;
	ipc_debug_write = /opt/nagios/var/log/daemon.ipc.write.bin;

	# The import_program is responsible for priming the merlin database
	# with configuration information and an initial import of status data.
	# It's invoked with the following arguments:
	# --cache=/path/to/objects.cache
	# --status-log=/path/to/status.log
	# --db-name=database_name
	# --db-user=database_user_name
	# --db-pass=database_password
	# --db-host=database_host
	# The database parameters are taken from "database" section if such
	# a section exists.
	import_program = php /opt/nagios/merlin/current/import.php

	# port to listen to. 15551 is default. This is a daemon
	# specific config setting, as the module never listens to
	# the network
	port = 15551;
	database {
		name = merlin
		user = merlin
		pass = merlin
		host = localhost
		type = mysql
	}
}


More information about the op5-users mailing list