PDF

Fidelia Logo

        Manage NetVigil


The Manage menu in the NetVigil user interface allows creating and configuring of containers, devices, tests, actions, etc. using the web interface.

16.1 Adding Devices For Monitoring

The Manage Devices page displays all the department's devices and links to perform various administrative functions on the devices. Each row contains the device name and address, type of device, whether monitoring is currently active or suspended, a link for suspending or resuming monitoring, and the physical device location. Additionally, there are links for updating or deleting the device, and for managing the tests for the device.

Figure 16.1 Manage Devices Page

user : authentication_password : encryption_password

  1. Click the Create Device button to begin the test discovery process.
note Test discovery may take up to 1 minute, depending on the number of test types you chose. Please follow the on-screen instructions as the device is queried.

Figure 16.2 Create Device Page

The suspend/resume feature allows you to temporarily turn off all the tests for a device and turn them on again. This feature is useful if you are performing maintenance task on a device and do not want to receive alerts while the device is offline. Once a device is suspended, the polling and data collection for all the tests on the device is suspended and thus any associated actions to the tests will not generate notifications. The suspend/resume feature is available at both the device and the individual test level. Furthermore, when a device is suspended (e.g. for maintenance), this time is not included in the total downtime reports since it is considered a planned outage.

WARNING: Deleting a device will remove all information about that device from the database, including all historical records. Deletions are not reversible. Suspending a device may be preferable because there is no loss of data.

  1. Click MANAGE | devices.
  2. On the Manage Devices page, find the device that you want to delete, and then click Update.
  3. On the Update Device page, select Delete This Device (and associated tests).
  4. Click Submit.
  5. If you are sure that you want to delete the device, click Delete on the Delete Device confirmation page.

Auto-Update for Device Capacity Change

NetVigil provides a mechanism for refreshing maximum values or SNMP object identifiers (SNMP OID) when an SNMP test has changed. For example, when memory or disk capacity has changed, tests that return percentage-based values would be incorrect unless the max value (for determining 100%) is refreshed. Additionally, in some cases even replacing a device with similar hardware can cause the SNMP OIDs to change, thus creating a mismatch between the current SNMP OIDs and the ones which NetVigil discovered during initial provisioning.

If one of the previous situations occurs, the user need only repeat the test provisioning process in the web application for a changed device. NetVigil will discover whether any material changes on the device have occurred and highlight those changes on the Configure Tests page, giving the user the option to also change thresholds and/or actions that apply to the test.

If you see a non-OK test, you can click on the non-OK icon itself (at the test level, not device level) to see the returned error message.However, if the OID is marked as "invalid" and the tests do not exist (e.g. a port module or disk partition no longer exists), then these tests should be deleted manually since NetVigil will not automatically delete these tests.

16.2 Managing Standard Tests

16.2.1 Before You Provision Tests

Your User Group privileges determine whether or not you can create your own actions. Assigning actions to tests can be done in several ways, but all require that an action has already been created either by you or by your User Group administrator. Options include:

16.2.2 Test Autodiscovery

For some monitors (test types), NetVigil can automatically discover which tests are supported by a given device. For example, if you add a new router to your network, NetVigil can discover which SNMP tests the router supports. You can then select which of the supported tests you want to run. Alternately, if you know exactly which tests you want, you can skip the auto-discovery process and provision those tests manually.

16.2.3 Grouping Tests by Subtype

One test configuration option, Group all SNMP tests with same type and sub-type together, only appears when you choose to auto-discover SNMP tests. The option gives the following advantages:

If the grouping option is not selected, every discovered SNMP test is listed individually, as shown in Figure 16.3. You can set a separate test interval, warning threshold, critical threshold, and action profile for each test.

Figure 16.3 Discovered SNMP tests, listed individually

If the grouping option is selected, discovered tests with the same subtype are grouped together. Figure 16.4 shows the results of auto-discovery of SNMP tests for the same device as the one shown in Figure 16.3. However, in this image, discovered tests are grouped by subtype. For example, the eth0 Util In and eth0 Util Out tests are grouped under the snmp/bandwidth (Interface Utilization) test subtype.

Figure 16.4 Discovered SNMP tests, grouped by subtype

You can select/clear the checkbox near a subtype name (item A in Figure 16.4) to provision/not provision all tests within that subtype. To provision some, but not all, tests within a subtype, make sure that the subtype checkbox is selected. Then, from the list of tests within the subtype (item B in Figure 16.4), select only those that you want to provision.

The configuration parameters that you set (Interval, Thresholds, Action Profile) for a subtype are applied to all selected tests within the subtype. You can change the configuration for an individual test after it is provisioned.

This grouping feature is useful when you have many tests of the same subtype for a single device. For example, assume that you have a large switch with 100 ports, each of which supports Util In and Util Out interface utilization tests. If the grouping option is not selected, the list of discovered tests has 200 entries for these tests. If the grouping option is selected, the list of discovered tests is more compact, and instead of configuring and provisioning 200 tests, you can configure and provision a single subtype, snmp/bandwidth (Interface Utilization). The Interval, Thresholds, and Action Profile selected for the subtype are applied to all tests in the group. (You can change the configuration for individual tests after the tests are provisioned.)

NOTE: Internal settings in the TestType.xml file may sometimes override the Group all SNMP tests... option because of which some test subtypes may always be grouped, even if you do not select the grouping option.

If this option is cleared and NetVigil discovers a provisioned test of this subtype for this device (e.g., a Packet Loss test is already configured for this device), the test subtype does not appear in the list of tests that you can choose to provision.

If this option is selected and NetVigil discovers a provisioned test of this subtype for this device, the test subtype is listed and you can provision another test of the same subtype for the device.

If this option is not selected, but some of the configured parameters for the test do not match the re-discovered parameters (such as max, oid, etc.), then the test is displayed so that you have the option to update the values.

This is only available if you have chosen to autodiscover SNMP tests. See "Grouping Tests by Subtype" on page 205 for a detailed explanation of this option.

  1. Click Continue.
  2. If you have chosen to autodiscover tests, please wait for the discovery process to complete. This may take a short time.
  3. In the Create New Tests: Step 3 window, select those tests that you want to provision. (If you've chosen to group SNMP tests by subtype, select subtypes and, optionally, individual tests within subtypes.) For each test, enter the following:
    Field
    Used For
    Test Name
    A unique identifier for this test.
    Interval
    The frequency, in minutes, at which the test will run.
    Thresholds/Units
    If the test result passes the number of units specified by the Warning or Critical threshold, the test goes into Warning or Critical state, respectively.
    Action Profile
    The action (or series of actions) to be taken when the test enters specific states.

If you've chosen to group SNMP tests by subtype, these parameters are applied at the subtype level -- that is, to all selected tests within the subtype.

  1. Click Provision Tests. The newly provisioned test(s) appear in the Manage Tests window.
  2. To update an existing test:
  3. Go to the Manage Tests page for the device being tested (see Figure 16.5).
  4. Click on the Update link for the test you want to modify and you will be taken to the Update Test page.
  5. Make the desired changes.
  6. Click on the Update button to complete the changes.
Figure 16.5 Manage Tests Page

Note: When you resume a suspended test, the test is rescheduled to run on the monitor. If you visit the Test Summary page for the device that the test is on, you may see an unknown (question mark) icon in the status column. This indicates that the test has been rescheduled, but that its status is not yet known because it hasn't yet run. After the test runs, the unknown icon is replaced with the appropriate status icon.

  1. Click MANAGE | devices.
  2. On the Manage Devices page, find the device whose test(s) you want to suspend or resume, and then click Tests.
  3. On the Manage Tests page, select the test(s) you want to suspend or resume in the Select column.
  4. In the Apply the following updates to the tests selected above: area, select Suspend or Resume, as appropriate, from the Modify Test: list.
  5. Click Submit to suspend or resume the test(s).
  6. To delete a test:
  7. Click MANAGE | devices.
  8. On the Manage Devices page, find the device whose test(s) you want to delete, and then click Tests.
  9. On the Manage Tests page, select the test(s) you want to delete in the Select column.
  10. In the Apply the following updates to the tests selected above: area, select Delete from the Modify Test: list.
  11. Click Submit to delete the test(s).

16.3 Managing Advanced Tests

16.3.1 Monitoring Databases Using SQL Query

You can issue a SQL query against supported databases in Netvigil.On the test provisioning page, select the driver from the drop down list and a properly formatted SQL query in the text box.

As an example:

	Port : 7663
	Driver Class : MySQL
	Database : aggregateddatadb
	Query : select id from Validation Table
	Username : emerald
	Password : xxxxx 
	test name = MySQL : Database: Status 
	driver class = MySQL 
	query = show tables; 
	database = mysql 
	username = user allowed to log into database (e.g. "root") 
	password/again = password for above user 
	port = tcp port for MySQL (e.g. 3306) 
  1. make sure the checkbox next to test name is selected, submit form

If the database is not running, the test will return FAIL status.Otherwise, the test will show time taken to perform the "show table;" query. As a caution, the username that is used for this test must be allowed to access the database specified remotely.See following documents regarding access control requirements:

http://dev.mysql.com/doc/mysql/en/Remote_connection.html
http://dev.mysql.com/doc/mysql/en/Connection_access.html
http://dev.mysql.com/doc/mysql/en/Access_denied.html

16.3.2 URL Transaction Tests

You can create a URL transaction test in NetVigil which can connect to a web site, fill in a form, click on various hyperlinks, etc. so as to simulate a real user. This is a very powerful feature in NetVigil which allows testing the response time and errors in most web enabled applications.

The system is fairly intuitive with context sensitive help and has a mini-browser that displays the various stages of the URL transaction. You can then save and even export/import this transaction for other sites.

The steps to create a transaction script are:

  1. Click on the modify icon for any device, and click on "create new custom tests"
  2. Scroll down to "web transaction test" and click on "manage web transaction test scripts"
  3. Click on "create web transaction script"
  4. Select "no" if you are not behind a proxy (typically the case)
  5. Enter the URL you wish to monitor.This would be the same URL you would use when accessing the site in question using a browser.For tomcat monitoring, this would be http://your_web_app_host/logon.jsp.If you wish to use the same script for multiple web servers, select the "replace this url hostname..." option. Click "next"
  6. The URL you have entered will be loaded and presented on a small window. This window is meant to show your progress on the web transaction...do not click on any links on this window
  7. Various elements found on the page will be displayed to you on subsequent pages. You would select the element (e.g. form, link) and an item from the selected element. e.g. for NetVigil webapp, if you wanted to login you would select the "form" element "logonForm" and click "next"
  8. Depending on what element/item you choose, you will be presented with corresponding options and as you progress through the transaction, the small web window would show which page you are in.You can always consult this small window to determine which element/item you would want to pick from the transaction monitor
  9. When you have completed the session, it is time to close out the transaction script, so click on "finished". The small window will be closed automatically
  10. Provide a unique name for the script and if you wanted to search for a specific text message during the session, you can enter it also
  11. Go back to device summary and click on modify icon for a device which has a web server running and is serving the content for which the script was created
  12. Click on "create new custom tests" and scroll down to "web transaction test"
  13. Check the "provision?" box, provide a test name (e.g. NetVigil WebApp) and select the newly created script from drop-down list of "test script"
  14. Click "provision tests"

16.3.3 Advanced SNMP Tests

NetVigil automatically detects standard MIBs and their tests. To run a test that is part of a vendor-specific MIB, you can create an Advanced SNMP Test containing the OID of the vendor-specific test.

16.3.4 Advanced Port Tests

Advanced Port Tests allow you to send a text string to a TCP port, then check the response against an expected string (the return string does not have to be a perfect match, only a substring match).

The DGE connects to the target port specified, transmits the "send" string if one is specified and then performs a case-insensitive sub-string match for the "expect" string if one is specified. As an example, to monitor if the sshd TCP port is alive and responding:

test Name: sshd service

send string: (blank)

expect string: SSH

port: 22

If you just want to test connectivity to a TCP port, leave the "expect" string blank.

To note that it is also possible to send a multi-line string when setting up the above test by separating each line with \r\n (carriage return + line feed).

This can be accomplished by creating an advanced port test and not specifying any send/expect strings.For example, if you wish to monitor port 7000 on device "my_device", click on manage -> devices -> tests (next to my_device) -> create new advanced tests) and provide the following parameters:

test name: (as you see fit)

send string: (blank)

expect string: (blank)

port: 7000

Now the DGE will test to make sure that my_device is accepting incoming connections on port 7000 at the specified interval.

16.3.5 External Tests

An External Test is one that is run outside of NetVigil (by a standalone script, for example). The test result is inserted into NetVigil via the External Data Feed (EDF) and aggregated as though NetVigil had collected it. Although the test itself is not run by NetVigil, by creating an External Test, you determine how test results will be processed after they are received via EDF.

16.4 Suppressing Tests

When you suppress a test, it continues to run at the specified interval and trigger events, notifications, etc., but its status does not affect the overall status of any associated device, Service Container, or Department. When the status of the test changes (e.g., from WARNING to CRITICAL or from CRITICAL to OK), the test is automatically unsuppressed and NetVigil again takes the test's status into account for determining device, Service Container, and Department status.

In the default sort order of the Device Details and All Tests Summary pages, suppressed tests appear at the bottom of the list with a single arrow to the left of the status icon. (For additional information, see "To see which tests are suppressed:" on page 224.)

IMPORTANT: A suppressed test continues to run, but its status does not affect the overall status of related objects. A suspended test stops running and does not trigger events, notifications, etc. until it is resumed.

For example, assume that a device has two network tests configured. When both tests have status OK, the overall status of the device in the Network column of the Device Summary Page is OK. If one of these tests goes into WARNING state, the overall status of the device in the Network column of the Device Summary Page changes to WARNING. However, if you suppress the test that is in WARNING state, the status of the remaining tests determines device status. In this case, there is only one other test, with status OK, so the overall device Network status is OK.

If the suppressed test returns to status OK, it is no longer suppressed. The next time its status becomes WARNING, overall device status will also become WARNING, unless you suppress the test once again.

The suppressed tests at the bottom of the list on the All Tests Summary and Device Test Summary pages, and are marked by a single arrow to the left of the Status icon.

16.5 Smart Thresholds Using Baselines

Baselining is a process by which NetVigil can automatically set the Warning and Critical thresholds for each test based on the test's historical data. This allows one to set customized thresholds automatically based on each tests's individual behavior.

As an example, the response time for a local device is normally much smaller than the response time for a device in a remote datacenter because of network latency. Rather than setting the response time Warning threshold for all devices to be the same, you can use the baseline feature to calculate the 95th percentile of the response time reported for each device over a three-month period, and then set the Warning threshold to be 10% higher than this 95th percentile value.

The Baseline Data Set

The baseline value is calculated for each test based on its own historical data. You select the devices and tests for which you want to run baselining by specifying a combination of device name, test name and test type.

Each time NetVigil aggregates a test result, it stores three values: The minimum, maximum, and mean values of the tested variable over the course of the aggregation period. For example, if NetVigil is configured to store data for 1 day at 10 minute samples, and a test is set up to run every 10 minutes, in the course of a day it generates 144 test results. Each test result includes the maximum, minimum, and mean values of the tested quantity for the 10 minute period. You can generate a baseline from the maximum, minimum, or mean samples within the specified date range.

Managing Baselines

The table that follows explains the items on the Baseline Management page:

Table 16.4 Baseline Management fields
Field
Purpose
Device Name/RegExp
The name of a device whose tests are to be baselined, or a regular expression containing `*' wildcards to match multiple device names.
TestName/RegExp
The name of an individual test to be baselined, or a regular expression containing the `*' wildcards to match multiple test names.
Test Type/Subtype
The Monitor and Subtype of the test(s) to be baselined. e.g. port/http, snmp/chassis_temp
Start Date, End Date
The start and end date of the test results to be used in calculating the baseline. Note: Each selected test must have test results available for the full date range.
Taking values of
The value from each test result (maximum, minimium, or mean) that is used to calculate the baseline. See "The Baseline Data Set" on page 224 for more information.
And using the
The method (average or 95th percentile) used to calculate the baseline from the maximum, minimum, or mean test results. average is the mean of the test results (sum of test results / number of test results).
Warning Threshold
A percentage above or below the calculated baseline. Select above if the test result gets worse as it gets higher. Select below if the test result gets worse as it gets lower. When the test result crosses this threshold, test status is set to Warning.
Critical Threshold
A percentage above or below the calculated baseline. Select above if the test result gets worse as it gets higher. Select below if the test result gets worse as it gets lower. When the test result crosses this threshold, test status is set to Critical.

Note If you access the Test Baseline Management page from either the Manage Tests page or the Update Test page, some of the Baseline Management information is filled in.

16.6 Configuring Test Schedules

You can configure a time schedule (hour and day of week) for running a test, and assign this schedule to a test. By default, the test schedule is 24x7 (all the time). These schedules are stored in your local timezone specified in Manage->Prefs.

16.7 Device Dependency

In network environments, switches, routers, etc. are often the physical gateways that provide access to other network devices. If critical "parent devices" are unavailable, monitoring may be impeded for devices that are accessed via the parents. To distinguish between devices that are genuinely in a CRITICAL state and those that are UNREACHABLE because of a problem with one or more parent devices, you can create device dependencies.

A device dependency is a parent-child relationship between monitored devices. A single parent can have multiple children, and a single child can have multiple parents. Device dependencies are cascading. If A is a child of B, and B is a child of C, it is only necessary to configure A as a child of B and B as a child of C. NetVigil automatically recognizes the dependency between A and C.

If a device is tested and the result is CRITICAL (for all thresholds), UNKNOWN, or FAILED, some additional processing is used to determine if the device is reachable.

  1. A current packet loss test is examined for the device. If such a test exists and packet loss is not 100%, the device is considered reachable.
  2. If no packet loss test exists, all immediate parent devices are examined. If the device has no parents, it is considered reachable and the result of the test is the measured value. If all parents have a current packet loss test which was measured at 100%, the device is considered unreachable.
  3. If no packet loss test exists for the parent, or no recent test result is found for an existing packet loss test, the child device is considered reachable and the result of the test is the measured value.

Dependency Restrictions

Device dependencies must conform to these rules:

  1. Circular dependency is not allowed. For example, if you set up the following dependencies:

Device A depends on Device B depends on Device C

You cannot configure Device C to depend on Device A.

  1. Parent and child devices must belong to the same DGE Location.
  2. To configure device dependency:
  3. Create the parent device
  4. Create the child device
  5. Click MANAGE | devices.
  6. On the Manage Devices page, find the device that will be the child device in the dependency and click Update.
  7. On the Update Device page, click Update Device Dependency.
  8. On the Update Device Dependency page, select the device or devices on which this child depends from the Does Not Depend On list, and then click Done. (If you return to the Device Dependency page you will see that the parent device(s) appear in the Depends On list).

Note: Device dependencies are cascading. If A is a child of B, and B is a child of C, it is only necessary to configure A as a child of B and B as a child of C. NetVigil automatically recognizes the dependency between A and C.

The next time the parent device has a CRITICAL ping/pl test result, the child device will have UNREACHABLE status.

16.8 Managing Account Preferences

These changes will become part of your user profile and will serve as defaults each time you log in to NetVigil.


Fidelia Technology, Inc.
NetVigil v4.0
www.fidelia.com