Skip to main content

[IDP Series] NIC Bypass


The Internal NICBypass helps prevent network outage due to IDP entering a hang state or experiencing high CPU utilization when the IDP device is in inline mode (transparent mode only). The internal NICBypass employs WatchDog Timers to achieve the functionality.

The nicBypass script which is located in /usr/idp/device/bin/ prevents the interfaces from going into bypass mode when the IDP is working normally by resetting the watchdog timer. It also restores the interfaces to normal mode if they had gone into Bypass mode. The nicBypass script performs this check and takes the action (if required) every "loopInterval" seconds (the script 'sleeps' for "loopInterval" seconds). 

These two parameters can be configured via the idp.cfg file on the IDP Sensor as shown below:                                                                                                               

nicBypass.watchdogInterval             10 (secs) 
nicBypass.loopInterval                       3 (secs)

The duration specified against watchdogInterval controls how long the IDP will stay in the hang state before the activation of Bypass on the interface pairs or before the interfaces are forced into bypass mode. Practically, watchdogInterval specifies the value from which the timer will count down to zero. 
The command "bypassStatus" lets us know the status of the interface pairs as shown below:
       BYPASS STATUS: Tue Aug  28 11:16:34 PST 2012
      Status for nicBypass daemon      : on
      Watchdog timer setting(sec)      : 10
      Watchdog loop reset interval(sec): 3

      NIC             Setting         Current State   WD Time Left(ms)
      ---------------------------------------------------------------
      eth2,eth3       ENABLED         Normal          9800
      eth4,eth5       disabled        Normal          (wdt inactive)
      eth6,eth7       disabled        Normal          (wdt inactive)
      eth8,eth9       disabled        Normal          (wdt inactive)

The watchdog time is decremented every few ms, as could be seen under the column "WD Time Left(ms)". The moment the "watchdog time left" becomes 0, the interface pair is forced to go into Bypass mode. In normal working conditions the nicBypass script would reset the value of watchdog to the configured value after every "loopInterval" seconds. This is how it prevents the interface pairs from going into bypass mode. However, when the IDP is hung which happens due to high CPU utilization, the nicBypass script will not be able to run and the watchdog time keeps decrementing without being reset and eventually becomes 0 causing the Bypass to be activated on interface pairs configured to go to bypass mode. 




Below is an example to clarify the above explanation:

We start at t=0 with values of loopInterval and watchdoginterval as specified above.


      Time                     nicBypass script                                    watchdog Time Left

      t=0                        sleeping                                         10

      t=1                        sleeping                                          9

      t=2                        sleeping                                          8

      t=3           wakeup, check interfaces and reset       
                      watchdogInteval's value to 10 & sleep          7 --->10

      t=4                        sleeping                                          9

      t=5                        sleeping                                          8

      t=6           wakeup, check interfaces and reset        
                      watchdogInteval's value to 10 & sleep          7--->10

      t=7                        sleeping                                          9

      t=8                        sleeping                                          8

      t=9           wakeup, check interfaces and reset        
                      watchdogInteval's value to 10 & sleep          7--->10




Below, a scenario in which Bypass gets activated is explained, for which the watchdogInterval and loopInterval values have been changed:

      nicBypass.watchdogInterval               6 (secs)
      nicBypass.loopInterval                        2 (secs)

Explained below is the scenario in which IDP's CPU gets hogged by some processes due to which nicBypass doesn't get CPU time to run for a duration of 6 seconds or more from t=4. At t=11 the CPU is available again for nicbypass script daemon to run.

    Time                        nicBypass script                                  watchdog Time Left 

      t=0                        sleeping                                                6

      t=1                        sleeping                                                5

      t=2          wakeup, check interfaces and reset          
                      watchdogInteval's value to 6                              4--->6


      t=3                        sleeping                                                5           

     t=4          cannot wakeup due to unavailability                  4     ----->IDP Hangs/ CPU utilization shoots  to 
                      of CPU time above 90%

      t=5          cannot wakeup due to unavailability                  3
                      of CPU time

      t=6          cannot wakeup due to unavailability                  2         
                      of CPU time                                            

      t=7          cannot wakeup due to unavailability                  1         
                      of CPU time                                                  

      t=8          cannot wakeup due to unavailability                  0    ------>Bypass activated on interfaces on 
                      of CPU time                                                                    which bypass mode was enabled     
                                                                                                                                                                                                                                                                                                                   
      t=9          cannot wakeup due to unavailability                  0          interfaces in bypass state
                      of CPU time                                             
      t=10        cannot wakeup due to unavailability                  0          interfaces in bypass state
                      of CPU time                                         



     t=11       wakeup, check interfaces and bring them to        0---->6      ------>  interfaces back in normal state
                    normal state and reset watchdogInteval's     
                    value to 6                                              
     t=12                      sleeping                                                 5

     t=13      wakeup, check interfaces and reset        
                    watchdogInteval's value to 6                                4--->6



Increasing the watchdog Interval will delay the time taken for Bypass to be activated on the interface pairs. Decreasing the loopInterval will increase the frequency of watchdog time getting reset to the configured value. The advantage of this is that, short spikes in CPU utilization will not result in Bypass being triggered unnecessarily. 

Comments

Popular posts from this blog

[vSRX] Installing on KVM

Two ways VirtManager(GUI) virt install (cli) Other ways (Qemu) On Server: uname -a lscpu (architecture, support virtualization (VT-X), NUMA) lspci / lspci -vvv |grep Ether dmidecode lsmod | grep kvm virsh - qemu:///system list virsh dumpxml <instance ID> (will show configuration file for the VM, similar to .vmx file in vmware) virsh net-list --all virsh domiflist <vm-name> brctl show

[ScreenOS] Firmware upgrade

screenos upgrade:  Please find the upgrade process below to upgrade remaining firewalls (if boot loader and image key are proper there is no need to update them)  1.       Upgrade the image key >> GUI access or Console + TFTP access is required.  2.       Upgrade the OS >> CLI+TFTP access is required.  3.       Upgrade the Boot loader >> Console + TFTP access required.  The firewall is in cluster, to upgrade the backup unit first you will need manage-ip configured on it.  Points to check before upgrading firewall:  ++ Please check nsrp status ‘get nsrp’. There should be a master and a primary backup (PB) available.  ++ Check the sessions on master and backup ‘get session info’ à this will ensure that session synchronization is happening properly.  ++ Check the routes on both the firewalls and they should be identical.  ++ Check whether both firewalls are in sync ‘exec nsrp sync globacl checksum’  ++ All above checks are done whether backup firewall is