[IDP Series] NIC Bypass


The Internal NICBypass helps prevent network outage due to IDP entering a hang state or experiencing high CPU utilization when the IDP device is in inline mode (transparent mode only). The internal NICBypass employs WatchDog Timers to achieve the functionality.

The nicBypass script which is located in /usr/idp/device/bin/ prevents the interfaces from going into bypass mode when the IDP is working normally by resetting the watchdog timer. It also restores the interfaces to normal mode if they had gone into Bypass mode. The nicBypass script performs this check and takes the action (if required) every "loopInterval" seconds (the script 'sleeps' for "loopInterval" seconds). 

These two parameters can be configured via the idp.cfg file on the IDP Sensor as shown below:                                                                                                               

nicBypass.watchdogInterval             10 (secs) 
nicBypass.loopInterval                       3 (secs)

The duration specified against watchdogInterval controls how long the IDP will stay in the hang state before the activation of Bypass on the interface pairs or before the interfaces are forced into bypass mode. Practically, watchdogInterval specifies the value from which the timer will count down to zero. 
The command "bypassStatus" lets us know the status of the interface pairs as shown below:
       BYPASS STATUS: Tue Aug  28 11:16:34 PST 2012
      Status for nicBypass daemon      : on
      Watchdog timer setting(sec)      : 10
      Watchdog loop reset interval(sec): 3

      NIC             Setting         Current State   WD Time Left(ms)
      ---------------------------------------------------------------
      eth2,eth3       ENABLED         Normal          9800
      eth4,eth5       disabled        Normal          (wdt inactive)
      eth6,eth7       disabled        Normal          (wdt inactive)
      eth8,eth9       disabled        Normal          (wdt inactive)

The watchdog time is decremented every few ms, as could be seen under the column "WD Time Left(ms)". The moment the "watchdog time left" becomes 0, the interface pair is forced to go into Bypass mode. In normal working conditions the nicBypass script would reset the value of watchdog to the configured value after every "loopInterval" seconds. This is how it prevents the interface pairs from going into bypass mode. However, when the IDP is hung which happens due to high CPU utilization, the nicBypass script will not be able to run and the watchdog time keeps decrementing without being reset and eventually becomes 0 causing the Bypass to be activated on interface pairs configured to go to bypass mode. 




Below is an example to clarify the above explanation:

We start at t=0 with values of loopInterval and watchdoginterval as specified above.


      Time                     nicBypass script                                    watchdog Time Left

      t=0                        sleeping                                         10

      t=1                        sleeping                                          9

      t=2                        sleeping                                          8

      t=3           wakeup, check interfaces and reset       
                      watchdogInteval's value to 10 & sleep          7 --->10

      t=4                        sleeping                                          9

      t=5                        sleeping                                          8

      t=6           wakeup, check interfaces and reset        
                      watchdogInteval's value to 10 & sleep          7--->10

      t=7                        sleeping                                          9

      t=8                        sleeping                                          8

      t=9           wakeup, check interfaces and reset        
                      watchdogInteval's value to 10 & sleep          7--->10




Below, a scenario in which Bypass gets activated is explained, for which the watchdogInterval and loopInterval values have been changed:

      nicBypass.watchdogInterval               6 (secs)
      nicBypass.loopInterval                        2 (secs)

Explained below is the scenario in which IDP's CPU gets hogged by some processes due to which nicBypass doesn't get CPU time to run for a duration of 6 seconds or more from t=4. At t=11 the CPU is available again for nicbypass script daemon to run.

    Time                        nicBypass script                                  watchdog Time Left 

      t=0                        sleeping                                                6

      t=1                        sleeping                                                5

      t=2          wakeup, check interfaces and reset          
                      watchdogInteval's value to 6                              4--->6


      t=3                        sleeping                                                5           

     t=4          cannot wakeup due to unavailability                  4     ----->IDP Hangs/ CPU utilization shoots  to 
                      of CPU time above 90%

      t=5          cannot wakeup due to unavailability                  3
                      of CPU time

      t=6          cannot wakeup due to unavailability                  2         
                      of CPU time                                            

      t=7          cannot wakeup due to unavailability                  1         
                      of CPU time                                                  

      t=8          cannot wakeup due to unavailability                  0    ------>Bypass activated on interfaces on 
                      of CPU time                                                                    which bypass mode was enabled     
                                                                                                                                                                                                                                                                                                                   
      t=9          cannot wakeup due to unavailability                  0          interfaces in bypass state
                      of CPU time                                             
      t=10        cannot wakeup due to unavailability                  0          interfaces in bypass state
                      of CPU time                                         



     t=11       wakeup, check interfaces and bring them to        0---->6      ------>  interfaces back in normal state
                    normal state and reset watchdogInteval's     
                    value to 6                                              
     t=12                      sleeping                                                 5

     t=13      wakeup, check interfaces and reset        
                    watchdogInteval's value to 6                                4--->6



Increasing the watchdog Interval will delay the time taken for Bypass to be activated on the interface pairs. Decreasing the loopInterval will increase the frequency of watchdog time getting reset to the configured value. The advantage of this is that, short spikes in CPU utilization will not result in Bypass being triggered unnecessarily. 

[ScreenOS] Snoop and debug flow


Debug flow basic:
Understanding debug flow filters: https://kb.juniper.net/InfoCenter/index?page=content&id=KB6709&actp=METADATA
Running "debug flow basic": https://kb.juniper.net/InfoCenter/index?page=content&id=KB12208
How do I capture debugging (debug flow) information?: https://kb.juniper.net/InfoCenter/index?page=content&id=KB5536&actp=METADATA
When to use 'snoop' and 'debug flow': https://kb.juniper.net/InfoCenter/index?page=content&id=KB5967&actp=METADATA

Snoop:
How do you use Snoop for troubleshooting?:https://kb.juniper.net/InfoCenter/index?page=content&id=KB5411&actp=METADATA
What options are available when configuring snoop?: https://kb.juniper.net/InfoCenter/index?page=content&id=KB6586&actp=METADATA
How to apply the logical 'AND' or 'OR' snoop filters: https://kb.juniper.net/InfoCenter/index?page=content&id=KB6707&actp=METADATA
How do I interpret the snoop output?https://kb.juniper.net/InfoCenter/index?page=content&id=KB6708&actp=METADATA
How to follow a packet by using Snoop: https://kb.juniper.net/InfoCenter/index?page=content&id=KB5413&actp=METADATA
How do I view snoop output in Wireshark?: https://kb.juniper.net/InfoCenter/index?page=content&id=KB20562&actp=METADATA&act=loginhttps://kb.juniper.net/InfoCenter/index?page=content&id=KB20562&actp=METADATA&act=login

[vSRX] Installing on KVM

Two ways
VirtManager(GUI)
virt install (cli)
Other ways (Qemu)

On Server:
uname -a
lscpu (architecture, support virtualization (VT-X), NUMA)
lspci / lspci -vvv |grep Ether
dmidecode
lsmod | grep kvm
virsh - qemu:///system list
virsh dumpxml <instance ID> (will show configuration file for the VM, similar to .vmx file in vmware)
virsh net-list --all
virsh domiflist <vm-name>
brctl show

[ScreenOS] Firmware upgrade

screenos upgrade:


 Please find the upgrade process below to upgrade remaining firewalls (if boot loader and image key are proper there is no need to update them)

 1.       Upgrade the image key >> GUI access or Console + TFTP access is required.
 2.       Upgrade the OS >> CLI+TFTP access is required.
 3.       Upgrade the Boot loader >> Console + TFTP access required.
 The firewall is in cluster, to upgrade the backup unit first you will need manage-ip configured on it.

 Points to check before upgrading firewall:
 ++ Please check nsrp status ‘get nsrp’. There should be a master and a primary backup (PB) available.
 ++ Check the sessions on master and backup ‘get session info’ à this will ensure that session synchronization is happening properly.
 ++ Check the routes on both the firewalls and they should be identical.
 ++ Check whether both firewalls are in sync ‘exec nsrp sync globacl checksum’
 ++ All above checks are done whether backup firewall is in perfect sync with master and there is no issue. Check all these things again when you upgrade backup firewall and you are  ready to upgrade master.
 ++ After the backup firewall is upgraded and checked, use command ‘exec nsrp vsd-group id 0 mode backup’. This command will forcefully failover the master to backup.


 Please find below the Juniper KB with complete upgrade procedure:
 http://kb.juniper.net/InfoCenter/index?page=content&id=TSB16495 &actp=search

 Please refer to below mentioned Upgrade guide (page 24) which mentions the process of upgrading firewall in clusters.
 http://www.juniper.net/techpubs/software/screenos/screenos6.3.0/630_upg rade.pdf

[IDP Series] NIC Bypass

The Internal NICBypass helps prevent network outage due to IDP entering a hang state or experiencing high CPU utilization when the IDP...