[IDP Series] NIC Bypass


The Internal NICBypass helps prevent network outage due to IDP entering a hang state or experiencing high CPU utilization when the IDP device is in inline mode (transparent mode only). The internal NICBypass employs WatchDog Timers to achieve the functionality.

The nicBypass script which is located in /usr/idp/device/bin/ prevents the interfaces from going into bypass mode when the IDP is working normally by resetting the watchdog timer. It also restores the interfaces to normal mode if they had gone into Bypass mode. The nicBypass script performs this check and takes the action (if required) every "loopInterval" seconds (the script 'sleeps' for "loopInterval" seconds). 

These two parameters can be configured via the idp.cfg file on the IDP Sensor as shown below:                                                                                                               

nicBypass.watchdogInterval             10 (secs) 
nicBypass.loopInterval                       3 (secs)

The duration specified against watchdogInterval controls how long the IDP will stay in the hang state before the activation of Bypass on the interface pairs or before the interfaces are forced into bypass mode. Practically, watchdogInterval specifies the value from which the timer will count down to zero. 
The command "bypassStatus" lets us know the status of the interface pairs as shown below:
       BYPASS STATUS: Tue Aug  28 11:16:34 PST 2012
      Status for nicBypass daemon      : on
      Watchdog timer setting(sec)      : 10
      Watchdog loop reset interval(sec): 3

      NIC             Setting         Current State   WD Time Left(ms)
      ---------------------------------------------------------------
      eth2,eth3       ENABLED         Normal          9800
      eth4,eth5       disabled        Normal          (wdt inactive)
      eth6,eth7       disabled        Normal          (wdt inactive)
      eth8,eth9       disabled        Normal          (wdt inactive)

The watchdog time is decremented every few ms, as could be seen under the column "WD Time Left(ms)". The moment the "watchdog time left" becomes 0, the interface pair is forced to go into Bypass mode. In normal working conditions the nicBypass script would reset the value of watchdog to the configured value after every "loopInterval" seconds. This is how it prevents the interface pairs from going into bypass mode. However, when the IDP is hung which happens due to high CPU utilization, the nicBypass script will not be able to run and the watchdog time keeps decrementing without being reset and eventually becomes 0 causing the Bypass to be activated on interface pairs configured to go to bypass mode. 




Below is an example to clarify the above explanation:

We start at t=0 with values of loopInterval and watchdoginterval as specified above.


      Time                     nicBypass script                                    watchdog Time Left

      t=0                        sleeping                                         10

      t=1                        sleeping                                          9

      t=2                        sleeping                                          8

      t=3           wakeup, check interfaces and reset       
                      watchdogInteval's value to 10 & sleep          7 --->10

      t=4                        sleeping                                          9

      t=5                        sleeping                                          8

      t=6           wakeup, check interfaces and reset        
                      watchdogInteval's value to 10 & sleep          7--->10

      t=7                        sleeping                                          9

      t=8                        sleeping                                          8

      t=9           wakeup, check interfaces and reset        
                      watchdogInteval's value to 10 & sleep          7--->10




Below, a scenario in which Bypass gets activated is explained, for which the watchdogInterval and loopInterval values have been changed:

      nicBypass.watchdogInterval               6 (secs)
      nicBypass.loopInterval                        2 (secs)

Explained below is the scenario in which IDP's CPU gets hogged by some processes due to which nicBypass doesn't get CPU time to run for a duration of 6 seconds or more from t=4. At t=11 the CPU is available again for nicbypass script daemon to run.

    Time                        nicBypass script                                  watchdog Time Left 

      t=0                        sleeping                                                6

      t=1                        sleeping                                                5

      t=2          wakeup, check interfaces and reset          
                      watchdogInteval's value to 6                              4--->6


      t=3                        sleeping                                                5           

     t=4          cannot wakeup due to unavailability                  4     ----->IDP Hangs/ CPU utilization shoots  to 
                      of CPU time above 90%

      t=5          cannot wakeup due to unavailability                  3
                      of CPU time

      t=6          cannot wakeup due to unavailability                  2         
                      of CPU time                                            

      t=7          cannot wakeup due to unavailability                  1         
                      of CPU time                                                  

      t=8          cannot wakeup due to unavailability                  0    ------>Bypass activated on interfaces on 
                      of CPU time                                                                    which bypass mode was enabled     
                                                                                                                                                                                                                                                                                                                   
      t=9          cannot wakeup due to unavailability                  0          interfaces in bypass state
                      of CPU time                                             
      t=10        cannot wakeup due to unavailability                  0          interfaces in bypass state
                      of CPU time                                         



     t=11       wakeup, check interfaces and bring them to        0---->6      ------>  interfaces back in normal state
                    normal state and reset watchdogInteval's     
                    value to 6                                              
     t=12                      sleeping                                                 5

     t=13      wakeup, check interfaces and reset        
                    watchdogInteval's value to 6                                4--->6



Increasing the watchdog Interval will delay the time taken for Bypass to be activated on the interface pairs. Decreasing the loopInterval will increase the frequency of watchdog time getting reset to the configured value. The advantage of this is that, short spikes in CPU utilization will not result in Bypass being triggered unnecessarily. 

[IDP Series] NIC Bypass

The Internal NICBypass helps prevent network outage due to IDP entering a hang state or experiencing high CPU utilization when the IDP...