Skip to main content

[IDP Series] NIC Bypass

The Internal NICBypass helps prevent network outage due to IDP entering a hang state or experiencing high CPU utilization when the IDP device is in inline mode (transparent mode only). The internal NICBypass employs WatchDog Timers to achieve the functionality.

The nicBypass script which is located in /usr/idp/device/bin/ prevents the interfaces from going into bypass mode when the IDP is working normally by resetting the watchdog timer. It also restores the interfaces to normal mode if they had gone into Bypass mode. The nicBypass script performs this check and takes the action (if required) every "loopInterval" seconds (the script 'sleeps' for "loopInterval" seconds). 

These two parameters can be configured via the idp.cfg file on the IDP Sensor as shown below:                                                                                                               

nicBypass.watchdogInterval             10 (secs) 
nicBypass.loopInterval                       3 (secs)

The duration specified against watchdogInterval controls how long the IDP will stay in the hang state before the activation of Bypass on the interface pairs or before the interfaces are forced into bypass mode. Practically, watchdogInterval specifies the value from which the timer will count down to zero. 
The command "bypassStatus" lets us know the status of the interface pairs as shown below:
       BYPASS STATUS: Tue Aug  28 11:16:34 PST 2012
      Status for nicBypass daemon      : on
      Watchdog timer setting(sec)      : 10
      Watchdog loop reset interval(sec): 3

      NIC             Setting         Current State   WD Time Left(ms)
      eth2,eth3       ENABLED         Normal          9800
      eth4,eth5       disabled        Normal          (wdt inactive)
      eth6,eth7       disabled        Normal          (wdt inactive)
      eth8,eth9       disabled        Normal          (wdt inactive)

The watchdog time is decremented every few ms, as could be seen under the column "WD Time Left(ms)". The moment the "watchdog time left" becomes 0, the interface pair is forced to go into Bypass mode. In normal working conditions the nicBypass script would reset the value of watchdog to the configured value after every "loopInterval" seconds. This is how it prevents the interface pairs from going into bypass mode. However, when the IDP is hung which happens due to high CPU utilization, the nicBypass script will not be able to run and the watchdog time keeps decrementing without being reset and eventually becomes 0 causing the Bypass to be activated on interface pairs configured to go to bypass mode. 

Below is an example to clarify the above explanation:

We start at t=0 with values of loopInterval and watchdoginterval as specified above.

      Time                     nicBypass script                                    watchdog Time Left

      t=0                        sleeping                                         10

      t=1                        sleeping                                          9

      t=2                        sleeping                                          8

      t=3           wakeup, check interfaces and reset       
                      watchdogInteval's value to 10 & sleep          7 --->10

      t=4                        sleeping                                          9

      t=5                        sleeping                                          8

      t=6           wakeup, check interfaces and reset        
                      watchdogInteval's value to 10 & sleep          7--->10

      t=7                        sleeping                                          9

      t=8                        sleeping                                          8

      t=9           wakeup, check interfaces and reset        
                      watchdogInteval's value to 10 & sleep          7--->10

Below, a scenario in which Bypass gets activated is explained, for which the watchdogInterval and loopInterval values have been changed:

      nicBypass.watchdogInterval               6 (secs)
      nicBypass.loopInterval                        2 (secs)

Explained below is the scenario in which IDP's CPU gets hogged by some processes due to which nicBypass doesn't get CPU time to run for a duration of 6 seconds or more from t=4. At t=11 the CPU is available again for nicbypass script daemon to run.

    Time                        nicBypass script                                  watchdog Time Left 

      t=0                        sleeping                                                6

      t=1                        sleeping                                                5

      t=2          wakeup, check interfaces and reset          
                      watchdogInteval's value to 6                              4--->6

      t=3                        sleeping                                                5           

     t=4          cannot wakeup due to unavailability                  4     ----->IDP Hangs/ CPU utilization shoots  to 
                      of CPU time above 90%

      t=5          cannot wakeup due to unavailability                  3
                      of CPU time

      t=6          cannot wakeup due to unavailability                  2         
                      of CPU time                                            

      t=7          cannot wakeup due to unavailability                  1         
                      of CPU time                                                  

      t=8          cannot wakeup due to unavailability                  0    ------>Bypass activated on interfaces on 
                      of CPU time                                                                    which bypass mode was enabled     
      t=9          cannot wakeup due to unavailability                  0          interfaces in bypass state
                      of CPU time                                             
      t=10        cannot wakeup due to unavailability                  0          interfaces in bypass state
                      of CPU time                                         

     t=11       wakeup, check interfaces and bring them to        0---->6      ------>  interfaces back in normal state
                    normal state and reset watchdogInteval's     
                    value to 6                                              
     t=12                      sleeping                                                 5

     t=13      wakeup, check interfaces and reset        
                    watchdogInteval's value to 6                                4--->6

Increasing the watchdog Interval will delay the time taken for Bypass to be activated on the interface pairs. Decreasing the loopInterval will increase the frequency of watchdog time getting reset to the configured value. The advantage of this is that, short spikes in CPU utilization will not result in Bypass being triggered unnecessarily. 


Popular posts from this blog

What is vSRX?

vSRX is a virtual firewall appliance for cloud environments. It runs as a VM on Hypervisors hosted on x86 hardware. It is built on Juniper's OS (JunOS), however unlike FreeBSD linux used by JunOS earlier, vSRX is developed on top of Yocto Linux (WindRiver). This allows the appliance to perform better than any other Virtual Firewall in the market. vSRX includes below features: 1. Full Stateful Firewall 2. Routing 3. VPN 4. IPS (Intrusion Detection and Prevention) 5. Application Firewall 6. Antivirus 7. AntiSpam 8. Webfiltering 9. Content-filtering 10. High Availability

[ScreenOS] Snoop and debug flow

Debug flow basic: Understanding debug flow filters: Running "debug flow basic": How do I capture debugging (debug flow) information?: When to use 'snoop' and 'debug flow': Snoop: How do you use Snoop for troubleshooting?: What options are available when configuring snoop?: How to apply the logical 'AND' or 'OR' snoop filters: How do I interpret the snoop out