The Internal NICBypass helps prevent network outage due to IDP entering a hang state or experiencing high CPU utilization when the IDP device is in inline mode (transparent mode only). The internal NICBypass employs WatchDog Timers to achieve the functionality.
The nicBypass script which is located in /usr/idp/device/bin/ prevents the interfaces from going into bypass mode when the IDP is working normally by resetting the watchdog timer. It also restores the interfaces to normal mode if they had gone into Bypass mode. The nicBypass script performs this check and takes the action (if required) every "loopInterval" seconds (the script 'sleeps' for "loopInterval" seconds).
These two parameters can be configured via the idp.cfg file on the IDP Sensor as shown below:
nicBypass.watchdogInterval 10 (secs)
nicBypass.loopInterval 3 (secs)
The duration specified against watchdogInterval controls how long the IDP will stay in the hang state before the activation of Bypass on the interface pairs or before the interfaces are forced into bypass mode. Practically, watchdogInterval specifies the value from which the timer will count down to zero.
The command "bypassStatus" lets us know the status of the interface pairs as shown below:
BYPASS STATUS: Tue Aug 28 11:16:34 PST 2012
Status for nicBypass daemon : on
Watchdog timer setting(sec) : 10
Watchdog loop reset interval(sec): 3
NIC Setting Current State WD Time Left(ms)
---------------------------------------------------------------
eth2,eth3 ENABLED Normal 9800
eth4,eth5 disabled Normal (wdt inactive)
eth6,eth7 disabled Normal (wdt inactive)
eth8,eth9 disabled Normal (wdt inactive)
The watchdog time is decremented every few ms, as could be seen under the column "WD Time Left(ms)". The moment the "watchdog time left" becomes 0, the interface pair is forced to go into Bypass mode. In normal working conditions the nicBypass script would reset the value of watchdog to the configured value after every "loopInterval" seconds. This is how it prevents the interface pairs from going into bypass mode. However, when the IDP is hung which happens due to high CPU utilization, the nicBypass script will not be able to run and the watchdog time keeps decrementing without being reset and eventually becomes 0 causing the Bypass to be activated on interface pairs configured to go to bypass mode.
Below is an example to clarify the above explanation:
We start at t=0 with values of loopInterval and watchdoginterval as specified above.
Time nicBypass script watchdog Time Left
t=0 sleeping 10
t=1 sleeping 9
t=2 sleeping 8
t=3 wakeup, check interfaces and reset
watchdogInteval's value to 10 & sleep 7 --->10
t=4 sleeping 9
t=5 sleeping 8
t=6 wakeup, check interfaces and reset
watchdogInteval's value to 10 & sleep 7--->10
t=7 sleeping 9
t=8 sleeping 8
t=9 wakeup, check interfaces and reset
watchdogInteval's value to 10 & sleep 7--->10
Below, a scenario in which Bypass gets activated is explained, for which the watchdogInterval and loopInterval values have been changed:
nicBypass.watchdogInterval 6 (secs)
nicBypass.loopInterval 2 (secs)
Explained below is the scenario in which IDP's CPU gets hogged by some processes due to which nicBypass doesn't get CPU time to run for a duration of 6 seconds or more from t=4. At t=11 the CPU is available again for nicbypass script daemon to run.
Time nicBypass script watchdog Time Left
t=0 sleeping 6
t=1 sleeping 5
t=2 wakeup, check interfaces and reset
watchdogInteval's value to 6 4--->6
t=3 sleeping 5
t=4 cannot wakeup due to unavailability 4 ----->IDP Hangs/ CPU utilization shoots to
of CPU time above 90%
t=5 cannot wakeup due to unavailability 3
of CPU time
t=6 cannot wakeup due to unavailability 2
of CPU time
t=7 cannot wakeup due to unavailability 1
of CPU time
t=8 cannot wakeup due to unavailability 0 ------>Bypass activated on interfaces on
of CPU time which bypass mode was enabled
t=9 cannot wakeup due to unavailability 0 interfaces in bypass state
of CPU time
t=10 cannot wakeup due to unavailability 0 interfaces in bypass state
of CPU time
t=11 wakeup, check interfaces and bring them to 0---->6 ------> interfaces back in normal state
normal state and reset watchdogInteval's
value to 6
t=12 sleeping 5
t=13 wakeup, check interfaces and reset
watchdogInteval's value to 6 4--->6
Increasing the watchdog Interval will delay the time taken for Bypass to be activated on the interface pairs. Decreasing the loopInterval will increase the frequency of watchdog time getting reset to the configured value. The advantage of this is that, short spikes in CPU utilization will not result in Bypass being triggered unnecessarily.
Comments
Post a comment