Monitoring
==========

SANET tackles the challenge of providing the best possible support for
networks management and troubleshooting activities and the most
flexible and customizable configuration mechanisms.
Pursuing this goals SANET shows advanced features that makes it a valuable tool in the
world of network management:

+ **Maximum flexibility in the checks management**. Every check
  is customizable according to many parameteres. In particular:

    + **Frequency**: usally the most important checks are executed at 
      intervals of tens of seconds, while more deep (and invasive)
      checks can be executed less and less frequently.
    + **Tolerance**: that is, the possibility to define if a given check 
      has to immediately raise an alarm, or if it just has to be logged and raise an alarm
      only if it persists for more than a given time interval.
    + **Notifications**: each check's failure can be notified via email and/or SMS.
      The subject and the body of the emails, as long as the SMS body, can be customized separately for each given check.

+ **IPv4 and IPv6 reachability checks.** Sanet allows to check
  a given host reachability both in IPv4 and IPv6, with customizable packet size
  (usually set in order to produce a 1500 byte large IP datagram).
  It is also possible to measure statistics concerning RTT (minimum, maximum, average)
  and packet loss, and to represent these data through specific charts.
  If many IP addresses are bound to the same name, SANET checks all theirs
  reachability by automatically reshuffling the address set.

+ **Dependency among checks.** The performing a given checks
  might be bound to the successful state of other checks it depends on.
  In this cases the check of interest will not be performed if the checks it depends on
  are not successful.
  As an example: if the router of a given site is unreachable it makes no sense
  checking the switches and other site's appliances since the will clearly be unreachable.
  Mechanisms like this allow to improve the performances, reducing
  the number of redundant notifications (very useful especialli for SMS alerts)
  and to show immediately where the problem is originated.
+ **Interfaces detection flexibility.** People dealing with network management
  know how many problems may rise when trying to detect a node's interfaces through
  the so called `ifIndex` (that is the instance in the 1.3.6.1.2.1.2.2.1 MIB2 of the interfaces table),
  since such number is not strictly bound to physical interfaces but might
  change when rebooting, when some hardware is changed
  (insertion or removal of modules), or according to the firmware version.

  Similar problems arise with other MIB branches: in exmaple with the interfaces
  in the bridge MIB (1.3.6.1.2.1.17.2.15.1), with servers' filesystems (`hrStorage`,
  1.3.6.1.2.1.25.2.3.1), with the RAM on Cisco IOS appliances (1.3.6.1.4.1.9.9.48.1.1.1),
  with the running processes on a server (1.3.6.1.2.1.25.4.2.1), and in many other cases.
  
  SANET defines a felxible mechanism to detect instances in generic tables
  according to many possible different criteria, allowing to use such instance numbers
  in checks and in quantitative measuring,
  hence it is possible to monitor an interface according to its name, its
  IP or MAC address, a substring of the IOS description, etc.

  The poller process performs the walk automatically to determine the
  correct instance and saves the walk results in a cache, this means that
  in stationary conditions it is possible to obtain an automatic and
  immediate update without any strong increase in the SNMP traffic.

+ **Ping flap dampening.** Some checks might happen to continuously
  oscillate between different states, such phenomenon might be caused by "almost working"
  links, partial hardware damages, etc.

  Traditional monitoring systems in such cases produce an annoying (especially
  in the case of SMS notifications) long sequence of alerts
  and notifications.

  SANET gives the possibility to turn off such notifications, using an
  algorithm inspired from `BGP route flap dampening`, giving each check
  a score that increases at any state change and decreases exponentially
  with time (with customizable halflife) when no state change happens.
  It is then possible to define two penaltiy tresholds: one (higher) to turn
  off notifications and one (lower) to turn them back on.
  When a check's notifications are turned off, the check continues to be
  performed and logged periodically, only email or SMS notifications are suspended.

+ **Check functions' utilities.** through the Poller process, SANET provides
  many general purpose utilites that might be combined to preexistent checks and might
  use present or past SNMP variables.
  In example there exists a function to check that an ethernet interface is in full duplex
  moode, by trying the various standard and propetary MIBs where such information might
  be included. Another function checks if NTP servers are effectively synchronized.

  Other functions migth perform operations and evaluate aggregates on SNMP tables
  (i.e. to check the average CPU occupation without specifying how many CPUs there are),
  check for a given TCP port to be open, a given URL to be available and matching (or not)
  a given pattern, in order to build logical conditions according to checks results, etc.

+ **Timetable management for checks and notifications.** It is possible
  to define timeslices for the checks, according to the local time, day, and weekday.
  This allows to run checks that needs to be performed only at specific times,
  in example to monitor offices that switch off power supply when closing, etc.
  It is also possible to customize the notifications recipients according to
  timetables, in example to notify a [presidio] when present, or a [reperibile]
  otherwise.