| Solaris10下Nagios安裝<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> ?1.??? 前言 Nagios是一個系統和網絡監控軟件,它可以監測所指定的主機和服務,并在出現問題以及問題恢復后發出報警。Nagios最早是被設計運行于Linux環境下的,但在大多數Unix操作系統下也可以使用。同時它是一個開源軟件,我們可以免費獲得它的源碼,和使用它。Nagios是個不錯的系統監控軟件,應用的范圍也比較廣泛。 本文將介紹Nagios在Solaris10操做系統下的安裝過程,其中包括,源碼的編譯、安裝;Apache的安裝、Nagios CGI的配置;Nagios監控配置等。 本文參考了Nagios的官方文檔和Nagios社區的文章,以及互聯網上其他人的相關帖子。 ? ?2.??? 環境、資源準備 要安裝Nagios首先你得有一個能運行Nagios軟件的操作系統,我選用的操作系統是Solaris10(x86),當然還要有Nagios源代碼。nagios-plugins也是必不可少的,沒有它Nagios就不能獲得你要監控資源的任何信息。 在solaris10下安裝Nagios,還需要C編譯環境,一般選擇gcc和make。還有一些必須的軟件包。 需要的軟件包如下: | gcc-3.4.6-sol10-x86-local.gz | | libiconv-1.11-sol10-x86-local.gz | | libintl-3.4.0-sol10-x86-local.gz | | make-3.81-sol10-x86-local.gz | | openssl-0.9.8h-sol10-x86-local.gz | | gd-2.0.35-sol10-x86-local.gz | | httpd-2.2.4.tar.gz | Nagios和nagios-plugins的源碼包如下: | nagios-3.0.3.tar.gz | | nagios-plugins-1.4.11.tar.gz | | nrpe-2.12.tar.gz | Nagios的版本是3.0.3,plugins為1.4.11。 ? ?2.1. 安裝gcc、make 配置C編譯環境 ?2.1.1.? 安裝gcc 使用gcc需要安裝libiconv和libintl。 | # gunzip ./libiconv-1.11-sol10-x86-local.gz # pkgadd -d ./libiconv-1.11-sol10-x86-local ? # gunzip ./libintl-3.4.0-sol10-x86-local.gz # pkgadd -d ./ libintl-3.4.0-sol10-x86-local ? # gunzip ./gcc-3.4.6-sol10-x86-local.gz # pkgadd -d ./gcc-3.4.6-sol10-x86-local | 將/usr/local/bin 和 /usr/ccs/bin 添加到PATH中 | # PATH=/usr/local/bin:/usr/ccs/bin:$PATH | 設置LD_LIBRARY_PATH,加入/usr/local/lib | # LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH |
? ?2.1.2.? 安裝make和openssl 安裝make | # gunzip ./make-3.81-sol10-x86-local.gz # pkgadd -d ./make-3.81-sol10-x86-local | 安裝openssl | # gunzip ./openssl-0.9.8h-sol10-x86-local.gz # pkgadd -d ./openssl-0.9.8h-sol10-x86-local # LD_LIBRARY_PATH=/usr/local/ssl/lib:$LD_LIBRARY_PATH |
? ?3.??? 安裝Nagios 當c編譯環境準備完畢后,下一步就可以安裝Nagios了。 一般來講安裝Nagios,需要安裝如下幾個軟件包,一個是Nagios軟件包;一個是Nagios Plugins,這是個插件,監控腳本和程序都在這個包里;還有,如果需要監控遠程主機,那么NRPE也是必不可少的(對于Unix主機,Windows用的是NSCA)。 此外,Nagios還提供一個用cgi編寫的web應用,可以將其部署到apache服務器下,提供一個可視化的監控情況的瀏覽界面。 下面逐步介紹一下安裝和配置過程。 ? ?3.1. 安裝Nagios 安裝Nagios之前需要創建用戶、用戶組(默認nagios、nagios)。 確認/usr/ccs/bin存在于PATH中。 創建Nagios的安裝目錄 /usr/local/nagios | # groupadd nagios # useradd -g nagios -d /usr/local/nagios nagios | 安裝Nagios | # gunzip ./nagios-3.0.3.tar.gz # tar xvf ./nagios-3.0.3.tar # cd ./nagios-3.0.3 # ./configure --prefix=/usr/local/nagios? --with-nagios-user=nagios \ ? --with-nagios-group=nagios --with-gd-lib=/usr/sfw/lib? \ ? --with-gd-inc=/usr/sfw/include # make all # make fullinstall # make install-config | 安裝Nagios Plugins | # gunzip ./nagios-plugins-1.4.11.tar.gz # tar xvf ./nagios-plugins-1.4.11.tar # cd nagios-plugins-1.4.11 # ./configure --prefix=/usr/local/nagios --with-openssl=/usr/local/ssl # make # make install ? # chown -R nagios:nagios /usr/local/nagios/libexec |
? ?3.2. 安裝、配置Apache 安裝Apache | # ./configure --prefix=/usr/local/apache2 --enable-mods-shared=all \ ??? --enable-ssl=shared \ ??? --enable-ssl --with-ssl=/usr/local/ssl # make # make install | 配置/usr/local/apache2/conf/httpd.conf 文件。 修改apahce的執行用戶、用戶組為nagios、nagios。 配置Nagios的web應用。 | <IfModule !mpm_netware_module> # # If you wish httpd to run as a different user or group, you must run # httpd as root initially and it will switch. # # User/Group: The name (or #number) of the user/group to run httpd as. # It is usually good practice to create a dedicated user and group for # running httpd, as with most system services. # User nagios Group nagios </IfModule> | 在/usr/local/apache2/conf/httpd.conf文件追加如下內容。 | #setting for nagios ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin #Cgi文件所在目錄 <Directory "/usr/local/nagios/sbin">???? ??? AuthType Basic ??? Options ExecCGI ??? AllowOverride None ??? Order allow,deny ??? Allow from all ??? AuthName "Nagios Access" ??? #驗證文件路徑 ??? AuthUserFile /usr/local/etc/htpasswd ? ??? Require valid-user </Directory> ? Alias /nagios /usr/local/nagios/share #nagios頁面文件目錄 <Directory "/usr/local/nagios/share">?? ??? AuthType Basic ??? Options None ??? AllowOverride None ??? Order allow,deny ??? Allow from all ??? AuthName "nagios Access" ??? #驗證文件路徑 ??? AuthUserFile /usr/local/etc/htpasswd? ??? Require valid-user </Directory> | 生成登錄用戶和驗證口令。 | # /usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd <user_name> | 此處的user_name為登錄Nagios Web應用需要輸入的用戶名。我所使用的是sky。 按照提示輸入要設置的口令即可。 配置/usr/local/nagios/etc/cgi.cfg,添加用戶sky。 | ................. ................. # SYSTEM/PROCESS INFORMATION ACCESS # This option is a comma-delimited list of all usernames that # have access to viewing the Nagios process information as # provided by the Extended Information CGI (extinfo.cgi).? By # default, *no one* has access to this unless you choose to # not use authorization.? You may use an asterisk (*) to # authorize any user who has authenticated to the web server. ? authorized_for_system_information=nagiosadmin,sky ? ? ? # CONFIGURATION INFORMATION ACCESS # This option is a comma-delimited list of all usernames that # can view ALL configuration information (hosts, commands, etc). # By default, users can only view configuration information # for the hosts and services they are contacts for. You may use # an asterisk (*) to authorize any user who has authenticated # to the web server. ? authorized_for_configuration_information=nagiosadmin,sky ? ? ? # SYSTEM/PROCESS COMMAND ACCESS # This option is a comma-delimited list of all usernames that # can issue shutdown and restart commands to Nagios via the # command CGI (cmd.cgi).? Users in this list can also change # the program mode to active or standby. By default, *no one* # has access to this unless you choose to not use authorization. # You may use an asterisk (*) to authorize any user who has # authenticated to the web server. ? authorized_for_system_commands=nagiosadmin,sky ? ? ? # GLOBAL HOST/SERVICE VIEW ACCESS # These two options are comma-delimited lists of all usernames that # can view information for all hosts and services that are being # monitored.? By default, users can only view information # for hosts or services that they are contacts for (unless you # you choose to not use authorization). You may use an asterisk (*) # to authorize any user who has authenticated to the web server. ? ? authorized_for_all_services=nagiosadmin,sky authorized_for_all_hosts=nagiosadmin,sky ? ? ? # GLOBAL HOST/SERVICE COMMAND ACCESS # These two options are comma-delimited lists of all usernames that # can issue host or service related commands via the command # CGI (cmd.cgi) for all hosts and services that are being monitored. # By default, users can only issue commands for hosts or services # that they are contacts for (unless you you choose to not use # authorization).? You may use an asterisk (*) to authorize any # user who has authenticated to the web server. ? authorized_for_all_service_commands=nagiosadmin,sky authorized_for_all_host_commands=nagiosadmin,sky ................. ................. | 啟動Apache登錄[url]http://<IP>/nagios[/url],IP是主機ip地址,檢查配置是否正確。 在IE地址欄輸入[url]http://<IP>/nagios[/url] ? 如果可以看到如上界面,那么你的配置就成功了。 ? ?3.3. 配置、啟動Nagios 在nagios的etc目錄下存放的是配置文件,Nagios從nagios.cfg文件中讀取配置信息,從而確定監控的內容。nagios.cfg文件僅僅是配置信息的入口,該文件中有很多指向(cfg_file=...),指定其余配置文件的路徑,包括模板配置文件(templates.cfg)、命令配置文件(commands.cfg)、時間周期文件(timeperiods.cfg)等等。 ?3.3.1.? 配置監控內容 編輯/usr/local/nagios/etc/objects/localhost.cfg文件,監控本機運行狀況。 | #定義一個模板 define host{ ??????? name????????????????? linux-box?????????????? ; Name of this template ??????? use?????????????????? generic-host??????????? ; Inherit default values ??????? check_period????????? 24x7 ??????? check_interval??????? 5 ??????? retry_interval??????? 1 ??????? max_check_attempts??? 10 ??????? check_command???????? check-host-alive ????? ??notification_period?? 24x7 ??????? notification_interval 30 ??????? notification_options? d,r ??????? contact_groups??????? admins ??????? register????????????? 0?????????????????????? ; DONT REGISTER THIS - ITS A TEMPLATE ??????? } #定義主機信息 define host{ ??????? use???????????????????? linux-server??????????? ; Name of host template to use ????????????? ; This host definition will inherit all variables that are defined ????????????? ; in (or inherited by) the linux-server host template definition. ??????? host_name??????????? localhost ??????? alias?????????????????????? localhost ??????? address???????????????? 127.0.0.1 ??????? } #定義主機組,將 localhost添加到該組中 define hostgroup{ ??????? hostgroup_name? linux-servers ; The name of the hostgroup ??????? alias???? ??????Linux Servers ; Long name of the group ??????? members???????? localhost???? ; Comma separated list of hosts that belong to this group ??????? } #定義監控的服務 # “ping” define service{ ??????? use???????????????????????????? local-service???????? ; Name of service template to use ??????? host_name?????????????????????? localhost ??????? service_description???????????? PING ? check_command???? check_ping!100.0,20%!500.0,60% ??????? } # / 空間使用情況 define service{ ??????? use???????????????????????????? local-service???????? ; Name of service template to use ??????? host_name?????????????????????? localhost ??????? service_description???????????? Root Partition ? check_command???? check_local_disk!20%!10%!/ ??????? } #當前登錄的用戶數 define service{ ??????? use???????? ????????????????????local-service???????? ; Name of service template to use ??????? host_name?????????????????????? localhost ??????? service_description???????????? Current Users ? check_command???? check_local_users!20!50 ??????? } #進程數 define service{ ??????? use???????????????????????????? local-service???????? ; Name of service template to use ??????? host_name?????????????????????? localhost ??????? service_description???????????? Total Processes ? check_command???? check_local_procs!250!400!RSZDT ?? ?????} #CPU負載 define service{ ??????? use???????????????????????????? local-service???????? ; Name of service template to use ??????? host_name?????????????????????? localhost ??????? service_description???????????? Current Load ? check_command???? check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 ??????? } #交換分區 define service{ ??????? use???????????????????????????? local-service???????? ; Name of service template to use ??????? host_name?????????????????????? localhost ??????? service_description???????????? Swap Usage ? check_command???? check_local_swap!20!10 ??????? } #SSH define service{ ??????? use???????????????????????????? local-service???????? ; Name of service template to use ??????? host_name?????????????????????? localhost ??????? service_description ????????????SSH ? check_command???? check_ssh ? notifications_enabled?? 0 ??????? }???? | 修改/usr/local/nagios/etc/nagios.cfg如下 | ............... # Definitions for monitoring the local (Linux) host cfg_file=/usr/local/nagios/etc/objects/localhost.cfg ............... |
? ?3.3.2.? 啟動Nagios Nagios的啟動程序是/usr/local/nagios/bin/nagios | # ./nagios --help ? Nagios 3.0.3 Copyright (c) 1999-2008 Ethan Galstad ([url]http://www.nagios.org[/url]) Last Modified: 06-25-2008 License: GPL ? Usage: ./nagios [options] <main_config_file> ? Options: ? ? -v, --verify-config????????? Verify all configuration data ? -s, --test-scheduling??????? Shows projected/recommended check scheduling and other ?????????????????????????????? diagnostic info based on the current configuration files. ? -x, --dont-verify-paths????? Don't check for circular object paths - USE WITH CAUTION! ? -p, --precache-objects?????? Precache object configuration - use with -v or -s options ? -u, --use-precached-objects? Use precached object config file ? -d, --daemon???????????????? Starts Nagios in daemon mode, instead of as a foreground process ? Visit the Nagios website at [url]http://www.nagios.org/[/url] for bug fixes, new releases, online documentation, FAQs, information on subscribing to the mailing lists, and commercial support options for Nagios. | 首先通過-v選項驗證配置文件是否正確。 | # cd /usr/lcoal/nagios/bin # ./nagios -v ../etc/nagios.cfg Nagios 3.0.3 Copyright (c) 1999-2008 Ethan Galstad ([url]http://www.nagios.org[/url]) Last Modified: 06-25-2008 License: GPL ? Reading configuration data... ? Running pre-flight check on configuration data... ? Checking services... ......................................................... ? ............................................. Total Warnings: 0 Total Errors:?? 0 ? Things look okay - No serious problems were detected during the pre-flight check | 如果沒有錯誤,就可以啟動它了。 | # cd /usr/local/nagios/bin # ./nagios -d /usr/local/nagios/etc/nagios.cfg | 查看/usr/local/nagios/var/nagios.log日志文件,觀察啟動是否正常。 在IE瀏覽器中查看監控情況。 點擊左邊導航欄的 Host Detail 點擊 localhost 查看詳細情況。 ? ?3.4. 安裝NRPE 通過Nagios的安裝和配置,我們看到了本機(localhost)系統的運行狀況。我們需要監控的不僅僅是我們本機,還應該可以監控網絡中其它服務器的運行狀況,NRPE就是用來解決這個問題的。簡單的說NRPE就是運行在遠程主機(被監控主機)上的進程,它負責與Nagios主監控進程進行通信,將監控的結果傳給主監控機器(Nagios所在主機)。 Nagios和NRPE之間的關系如下圖所示 圖中藍色部分就是NRPE,它主要包括兩個部分一個是NRPE程序(圖中Remote Linux/Unix Host所包含的藍色部分NRPE),另一個是NRPE插件(也就是check_nrpe程序)。Nagios通過調用check_nrpe和運行在遠程主機上的NRPE程序來通訊,NRPE程序通過要調用Nagios的插件(Nagios Plugins)獲得監控結果、信息,將其傳回給監控主機(Monitoring Host)。 ?3.4.1.? NRPE的安裝 從NRPE原理圖我們不難看出,安裝NRPE軟件需要安裝如下幾個部分,首先,在監控主機(Monitoring Host)也就是Nagios所在的主機上要安裝NRPE插件(check_nrpe);其次,在遠程主機(Remote Linux/Unix Host),即被監控主機上,安裝NRPE程序(nrpe);最后,如果遠程主機(Remote Linux/Unix Host)只有NRPE程序(nrpe)是不能監控本機的任何信息的,當然還要在遠程主機上安裝Nagios插件(Nagios Plugins)。 以下講述一下NRPE和NRPE插件的安裝過程,Nagios插件的安裝參見之前的Nagios安裝部分,這里就不重復了。 首先在遠程主機上準備好C編譯環境,參考之前的章節。創建nagios用戶和nagios用戶組,以及軟件的安裝目錄/usr/local/nagios。 ?3.4.1.1.?? Configuration 解壓軟件包 | # gunzip ./nrpe-2.12.tar.gz # tar xvf ./nrpe-2.12.tar # cd ./nrpe-2.12 # ./configure --prefix=/usr/local/nagios/ --enable-ssl --with-ssl=/usr/local/ssl \ ???? --with-ssl-lib=/usr/local/ssl/lib | 當看到沒有錯誤后就可以Make了。 ?3.4.1.2.?? Make 在make之前,需要對./src/nrpe.c進行必要的修改,否則編譯會報錯。 | # vi ./src/nrpe.c ? ?????? /* 將這些代碼注釋掉,因為solaris不支持如下功能。 ?????? else if(!strcmp(varvalue,”authpriv”)) ?????? log_facility=LOG_AUTHPRIV; ?????? else if(!strcmp(varvalue,”ftp”)) ?????? log_facility=LOG_FTP; ?????? */ | 編譯 如果沒有錯誤,則表明編譯通過了,下一步就是安裝了。在監控主機(Monitoring Host)和遠程主機(Remote Host)上安裝方法是不一樣的,下面將逐一說明。 ?3.4.1.3.?? 在監控主機(Monitoring Host)安裝NRPE插件 在監控主機上安裝NRPE插件 這個過程實際上就是將編譯好的check_nrpe拷貝到/usr/local/nagios/libexec下。 ?3.4.1.4.?? 在遠程主機(Remote Host)安裝NRPE程序和配置文件模板 在遠程主機上安裝NRPE和配置模板文件 | # make install-daemon # make install-daemon-config | nrpe程序被拷貝到了/usr/local/nagios/bin下。 配置文件nrpe.cfg位于/usr/local/nagios/etc下。 ?3.4.2.? NRPE的配置和啟動(遠程主機) 修改遠程主機上的/usr/local/nagios/etc/nrpe.cfg文件。 | # vi /usr/local/nagios/etc/nrpe.cfg ? ... ... ... ... ... ... ... ... allowed_hosts=<Monitoring Host IP>?????????????? #這里的<IP>是監控主機的IP地址 ... ... ... ... ... ... ... ... # The following examples use hardcoded command arguments... ? #以下定義命令 command[check_users]=/usr/local/nagios//libexec/check_users -w 5 -c 10 command[check_load]=/usr/local/nagios//libexec/check_load -w 15,10,5 -c 30,25,20 command[check_hda1]=/usr/local/nagios//libexec/check_disk -w 20% -c 5% -p /dev/dsk/c0d0s0 #將-p 后面的分區參數更換成你本機真是環境的設備路徑名。 command[check_zombie_procs]=/usr/local/nagios//libexec/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/local/nagios//libexec/check_procs -w 150 -c 200 ... ... ... ... ... ... ... ... | 需修改allowed_hosts,設置為監控主機的IP。 啟動NRPE(遠程主機) | # export LD_LIBRARY_PATH=/usr/local/ssl/lib:$LD_LIBRARY_PATH # cd /usr/local/nagios/bin # ./nrpe -d -c /usr/local/nagios/nrpe.cfg # ps -ef | grep nrpe | 查看daemon的后臺日志,檢查是否啟動正常。 通過在監控主機(Monitoring Host)運行check_nrpe命令檢查訪問是否正常。 | # /usr/local/nagios/libexec/check_nrpe -H <Remote Host IP> NRPE v2.12 |
? ?3.4.3.? 配置監控主機(Monitoring Host),使其能監控遠程主機(Remote Host) 首先修改/usr/local/nagios/etc/objects/commands.cfg,增加check_nrpe命令定義。 | # vi /usr/local/nagios/etc/objects/commands.cfg ... ... ... ... ... ... ... ... # 添加 # 'check_nrpe' command definition define command{ ??????? command_name??? check_nrpe ??????? command_line??? /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ ??????? } ... ... ... ... ... ... ... ... | 創建一個新的主機監控配置文件,/usr/local/nagios/etc/objects/unixhost_<Remote Host IP>.cfg。 ? | # vi /usr/local/nagios/etc/objects/unixhost_172.17.101.150.cfg ? ################################################################# # 172.17.101.150 # HOST DEFINITION # ################################################################# ? # Define a host for the local machine ? define host{ ??????? use???????????????????? linux-box??????????? ; Name of host template to use ????????????? ; This host definition will inherit all variables that are defined ????????????? ; in (or inherited by) the linux-server host template definition. ??????? host_name?????????????? solaris10_150 ??????? alias?????????????????? solaris10_150 ??????? address???????????????? 172.17.101.150 ??????? } ? ? ? ################################################################# # 172.17.101.150 # SERVICE DEFINITIONS # ################################################################# ? #CPU load define service{ ??????? use???????????????????? generic-service ??????? host_name?????????????? solaris10_150 ??????? service_description???? CPU Load ??????? check_command?????????? check_nrpe!check_load ??????? } #the number of currently logged ??????? define service{ ??????? use???????????????????? generic-service ??????? host_name?????????????? solaris10_150 ??????? service_description???? Current Users ??????? check_command?????????? check_nrpe!check_users ??????? } ??????? #the free drive space on /dev/hda1 on the remote host define service{ ??????? use???????????????????? generic-service ??????? host_name?????????????? solaris10_150 ??????? service_description???? / Free Space ??????? check_command?????????? check_nrpe!check_hda1 ??????? } ? #the total number of processes on the remote host. define service{ ??????? use???????????????????? generic-service ??????? host_name?????????????? solaris10_150 ??????? service_description???? Total Processes ??????? check_command?????????? check_nrpe!check_total_procs ??????? } ? #the number of zombie processes on the remote host. define service{ ??????? use???????????????????? generic-service ??????? host_name?????????????? solaris10_150 ??????? service_description???? Zombie Processes ??????? check_command?????????? check_nrpe!check_zombie_procs ??????? } | 將unixhost_172.17.101.150.cfg添加到nagios.cfg中。 | # vi /usr/local/nagios/etc/nagios.cfg ? ... ... ... ... ... ... ... ... cfg_file=/usr/local/nagios/etc/objects/localhost.cfg cfg_file=/usr/local/nagios/etc/objects/unixhost_172.17.101.150.cfg ... ... ... ... ... ... ... ... | 驗證配置信息是否正確。 | # cd /usr/local/nagios/bin # ./nagios -v /usr/local/nagios/etc/nagios.cfg | 重新啟動Nagios,查看遠程主機是否已被添加進來。 主機列表 服務信息情況 ? ? ?4.??? 結語 以上內容僅是簡單的介紹了一下Nagios在Solaris10上的安裝、配置過程主要是Nagios、Nagios Plugins和NRPE的安裝,以及Nagios和NRPE的配置過程。Nagios是一功能較強大的開源軟件,其擴展性很好,通過Nagios Plugins新版本的方法其監控將更強大,當然你也可以根據API規則編寫能夠滿足自己需要的監控方法。 本文出自 “sky” 博客,請務必保留此出處[url]http://skymax.blog.51cto.com/365901/98351[/url]本文出自 51CTO.COM技術博客 |