[op5-users] merlin noc-poller setup
Daniel Tuecks
dtuecks at googlemail.com
Thu Dec 3 17:52:12 CET 2009
Hello List!
I am trying to setup nagios/merlin in a noc/poller scenario. Therefore I
have installed two Nagios 3.2.0 servers (nagios-a and nagios-b) with
sample-configs (make install-config). For testing purposes I created two
hosts (server-a and server-b) and put each of those hosts in separate
hostgroups (server-group-a and server-group-b).
Now I want
- nagios-b to check server-a & server-b.
- Nagios server 'nagios-a' should only check server-a.
- Nagios server 'nagios-a' should only recieve passive checkresults for
server-b from nagios-b:
------NOC---------- send pasv results ------POLLER-------
| nagios-a | <<--------------------- | nagios-b |
------------------- server-b -------------------
| |
|-- check & display server-a |-- check & display server-a
|-- display server-b |-- check & display server-b
(see below for nagios / merlin config files)
With merlin-0.6.6.tar.gz I can't get this done. I see a very strange
behaviour in
the neb.log:
## 'server-b' is a member of 'server-group-b', so can't add to poller
for 'server-group-b'
yes, it is a member of server-group-b.
## 'server-group-b' is a selection without hosts. Are you sure you want this?
no, it is not a selection without hosts (as stated one line before)
Checks for hostgroup 'server-group-b' were not disabled on nagios-a. Both
nagios-servers were still checking both hostgroups (and therefore both servers)
With merlin checked out from git (and switched to 'next') things were also a
little bit strange.
The neb.log on nagios-a looked like this:
## [1259848873] 6: Merlin Module Loaded
## [1259848873] 6: setting connect and disconnect handlers
## [1259848873] 6: Coredumps in /root
## [1259848873] 6: Merlin module v0.6.6p18-g26db1b7e41ae initialized
successfully
## [1259848913] 6: Object configuration parsed.
## [1259848913] 6: Creating hash tables
## [1259848913] 6: Creating host object tree
## [1259848913] 6: Creating service object tree
## [1259848913] 6: Attempting ipc connect
## [1259848913] 6: Shoutcasting active status through IPC socket
## [1259848913] 6: Running on_connect hook for module
## [1259848914] 6: ipc successfully connected
## [1259848914] 6: Reaping ipc events
## [1259848914] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848914] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848914] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848914] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848914] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848914] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848914] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848914] 4: Handled 5 'ipc' events in 0.252 seconds in: 5, out: 0
## [1259848914] 6: Scheduling next ipc reaping at 1259848919
## [1259848914] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848914] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848914] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848919] 6: Reaping ipc events
## [1259848919] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848919] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848919] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848919] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848919] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 2 for selection 'server-group-b'
## [1259848919] 6: Enabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Enabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Received control packet code 3 for selection 'server-group-b'
## [1259848919] 6: Disabling active checks for hosts in hostgroup
'server-group-b'
## [1259848919] 6: Disabling active checks for services of hosts in
hostgroup 'server-group-b'
## [1259848919] 6: Inbound IPC event, callback 18, len 100, type -80
## [1259848919] 4: Ignoring unrecognized/unhandled callback type: 18
(NEBCALLBACK_PROGRAM_STATUS_DATA)
## [1259848919] 6: Inbound IPC event, callback 19, len 334, type 0
Now 'server-group-b' was disabled on nagios-server 'a'. This looked
promising but
the nagios daemon died after a few seconds.
## Nagios Core 3.2.0
## Copyright (c) 2009 Nagios Core Development Team and Community Contributors
## Copyright (c) 1999-2009 Ethan Galstad
## Last Modified: 08-12-2009
## License: GPL
##
## Website: http://www.nagios.org
## Nagios 3.2.0 starting... (PID=12781)
## Local time is Thu Dec 03 15:01:13 CET 2009
## Logging to '/opt/nagios/var/log/merlin-neb.log'
## Caught SIGSEGV, shutting down...
Furthermore I'd say that the alternating reception of "code 3" and "code 2"
packets looks fishy.
The neb.log on nagios-server 'b':
## [1259848862] 6: Merlin Module Loaded
## [1259848862] 6: setting connect and disconnect handlers
## [1259848862] 6: Coredumps in /root
## [1259848862] 6: Merlin module v0.6.6p18-g26db1b7e41ae initialized
successfully
## [1259848902] 6: Object configuration parsed.
## [1259848902] 6: Creating hash tables
## [1259848902] 6: Attempting ipc connect
## [1259848902] 6: Shoutcasting active status through IPC socket
## [1259848902] 6: Running on_connect hook for module
## [1259848902] 6: ipc successfully connected
## [1259848902] 6: Reaping ipc events
## [1259848902] 4: Handled 0 'ipc' events in 0.043 seconds in: 0, out: 0
## [1259848902] 6: Scheduling next ipc reaping at 1259848907
## [1259848902] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848902] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848907] 6: Reaping ipc events
## [1259848907] 4: Handled 0 'ipc' events in 4.335 seconds in: 0, out: 0
## [1259848907] 6: Scheduling next ipc reaping at 1259848912
## [1259848908] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848912] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848912] 4: key 'server-b' doesn't match any possible selection
## [1259848912] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848912] 4: key 'server-b' doesn't match any possible selection
## [1259848912] 6: Reaping ipc events
## [1259848912] 4: Handled 0 'ipc' events in 9.404 seconds in: 0, out: 0
## [1259848912] 6: Scheduling next ipc reaping at 1259848917
## [1259848912] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848914] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848917] 6: Reaping ipc events
## [1259848917] 4: Handled 0 'ipc' events in 14.519 seconds in: 0, out: 0
## [1259848917] 6: Scheduling next ipc reaping at 1259848922
## [1259848920] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848922] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848922] 6: Reaping ipc events
## [1259848922] 4: Handled 0 'ipc' events in 19.319 seconds in: 0, out: 0
## [1259848922] 6: Scheduling next ipc reaping at 1259848927
## [1259848926] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848927] 6: Reaping ipc events
## [1259848927] 4: Handled 0 'ipc' events in 24.400 seconds in: 0, out: 0
## [1259848927] 6: Scheduling next ipc reaping at 1259848932
## [1259848932] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848932] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848932] 4: key 'server-b' doesn't match any possible selection
## [1259848932] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848932] 4: key 'server-b' doesn't match any possible selection
## [1259848932] 6: Reaping ipc events
## [1259848932] 4: Handled 0 'ipc' events in 29.657 seconds in: 0, out: 0
## [1259848932] 6: Scheduling next ipc reaping at 1259848937
## [1259848932] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848937] 6: Reaping ipc events
## [1259848937] 4: Handled 0 'ipc' events in 34.452 seconds in: 0, out: 0
## [1259848937] 6: Scheduling next ipc reaping at 1259848942
## [1259848938] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848942] 6: Reaping ipc events
## [1259848942] 4: Handled 0 'ipc' events in 39.496 seconds in: 0, out: 0
## [1259848942] 6: Scheduling next ipc reaping at 1259848947
## [1259848942] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848944] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848947] 6: Reaping ipc events
## [1259848947] 4: Handled 0 'ipc' events in 44.334 seconds in: 0, out: 0
## [1259848947] 6: Scheduling next ipc reaping at 1259848952
## [1259848950] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848952] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848952] 4: key 'localhost' doesn't match any possible selection
## [1259848952] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848952] 4: key 'localhost' doesn't match any possible selection
## [1259848952] 6: Reaping ipc events
## [1259848952] 4: Handled 0 'ipc' events in 49.431 seconds in: 0, out: 0
## [1259848952] 6: Scheduling next ipc reaping at 1259848957
## [1259848952] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848956] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848957] 6: Reaping ipc events
## [1259848957] 4: Handled 0 'ipc' events in 54.492 seconds in: 0, out: 0
## [1259848957] 6: Scheduling next ipc reaping at 1259848962
## [1259848962] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848962] 6: Reaping ipc events
## [1259848962] 4: Handled 0 'ipc' events in 59.349 seconds in: 0, out: 0
## [1259848962] 6: Scheduling next ipc reaping at 1259848967
## [1259848962] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848967] 6: Reaping ipc events
## [1259848967] 4: Handled 0 'ipc' events in 64.410 seconds in: 0, out: 0
## [1259848967] 6: Scheduling next ipc reaping at 1259848972
## [1259848968] 7: Processing callback NEBCALLBACK_PROGRAM_STATUS_DATA
## [1259848972] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848972] 4: key 'localhost' doesn't match any possible selection
## [1259848972] 7: Processing callback NEBCALLBACK_HOST_STATUS_DATA
## [1259848972] 4: key 'localhost' doesn't match any possible selection
## [1259848972] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848972] 4: key 'localhost' doesn't match any possible selection
## [1259848972] 7: Processing callback NEBCALLBACK_SERVICE_STATUS_DATA
## [1259848972] 4: key 'localhost' doesn't match any possible selection
## [1259848972] 6: Reaping ipc events
The suspicious line here was:
## key 'server-b' doesn't match any possible selection
What does that mean?
Another thing that puzzles me is the option to create binary logs
(ipc_debug_read/write). Is this supposed to work? I can't find those
logs...
I have no clue what I'm doing so terribly wrong here.. I really hope
you guys can put me
in the right direction :)
Daniel
---------------------------------------------
----NAGIOS CONFIG----------------------------
---------------------------------------------
### server-b ###############
define host{
use linux-server
host_name server-b
alias server-b
address 10.100.5.10
}
define service{
use local-service
host_name server-b
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service
host_name server-b
service_description HTTP
check_command check_http
notifications_enabled 0
}
### server-a ######################
define host{
use linux-server
host_name server-a
alias server-a
address 10.100.5.11
}
define service{
use local-service
host_name server-a
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service
host_name server-a
service_description HTTP
check_command check_http
notifications_enabled 0
}
### server-group-b ###############
define hostgroup{
hostgroup_name server-group-b
members server-b
}
### server-group-a ###############
define hostgroup{
hostgroup_name server-group-a
members server-a
}
---------------------------------------------
----MERLIN CONFIG FILES----------------------
---------------------------------------------
#################################################
### nagios-server-a #############################
#################################################
poller nagios-b {
address = 10.100.5.2;
port = 15551;
hostgroup = server-group-b;
}
#
# Sample configuration file for merlin
#
# Default options have been commented out
#
ipc_socket = /opt/nagios/merlin/current/ipc.sock;
# address to listen to. 0.0.0.0 is default
#address = 0.0.0.0;
# module-specific configuration options.
module {
# textual log of normal hum-drum events
# log_file = /opt/nagios/merlin/current/logs/neb.log;
log_file = /opt/nagios/var/log/merlin-neb.log;
# binary log of everything read/written from/to the ipc socket
ipc_debug_read = /opt/nagios/var/log/neb.ipc.read.bin;
ipc_debug_write = /opt/nagios/var/log/neb.ipc.write.bin;
}
# daemon-specific config options
daemon {
pidfile = /var/run/merlin.pid
# same as the "module" section above
log_file = /opt/nagios/merlin/current/logs/daemon.log;
# ipc_debug_read = /opt/nagios/merlin/current/logs/daemon.ipc.read.bin;
# ipc_debug_write = /opt/nagios/merlin/current/logs/daemon.ipc.write.bin;
ipc_debug_read = /opt/nagios/var/log/daemon.ipc.read.bin;
ipc_debug_write = /opt/nagios/var/log/daemon.ipc.write.bin;
# The import_program is responsible for priming the merlin database
# with configuration information and an initial import of status data.
# It's invoked with the following arguments:
# --cache=/path/to/objects.cache
# --status-log=/path/to/status.log
# --db-name=database_name
# --db-user=database_user_name
# --db-pass=database_password
# --db-host=database_host
# The database parameters are taken from "database" section if such
# a section exists.
import_program = php /opt/nagios/merlin/current/import.php
# port to listen to. 15551 is default. This is a daemon
# specific config setting, as the module never listens to
# the network
port = 15551;
database {
name = merlin
user = merlin
pass = merlin
host = localhost
type = mysql
}
}
#################################################
### nagios-server-b #############################
#################################################
noc nagios-a {
address = 10.100.5.1;
port = 15551;
}
#
# Sample configuration file for merlin
#
# Default options have been commented out
#
ipc_socket = /opt/nagios/merlin/current/ipc.sock;
# address to listen to. 0.0.0.0 is default
#address = 0.0.0.0;
# module-specific configuration options.
module {
# textual log of normal hum-drum events
# log_file = /opt/nagios/merlin/current/logs/neb.log;
log_file = /opt/nagios/var/log/merlin-neb.log;
# binary log of everything read/written from/to the ipc socket
ipc_debug_read = /opt/nagios/var/log/neb.ipc.read.bin;
ipc_debug_write = /opt/nagios/var/log/neb.ipc.write.bin;
}
# daemon-specific config options
daemon {
pidfile = /var/run/merlin.pid
# same as the "module" section above
log_file = /opt/nagios/merlin/current/logs/daemon.log;
ipc_debug_read = /opt/nagios/var/log/daemon.ipc.read.bin;
ipc_debug_write = /opt/nagios/var/log/daemon.ipc.write.bin;
# The import_program is responsible for priming the merlin database
# with configuration information and an initial import of status data.
# It's invoked with the following arguments:
# --cache=/path/to/objects.cache
# --status-log=/path/to/status.log
# --db-name=database_name
# --db-user=database_user_name
# --db-pass=database_password
# --db-host=database_host
# The database parameters are taken from "database" section if such
# a section exists.
import_program = php /opt/nagios/merlin/current/import.php
# port to listen to. 15551 is default. This is a daemon
# specific config setting, as the module never listens to
# the network
port = 15551;
database {
name = merlin
user = merlin
pass = merlin
host = localhost
type = mysql
}
}
More information about the op5-users
mailing list