				       README Notes
				Broadcom bnxtnet VMWare Driver

	                             Broadcom Inc.
				15101 Alton Parkway,
				   Irvine, CA 92618

			  Copyright (c) 2024 Broadcom Limited
				   All rights reserved


Table of Contents
=================
  Introduction
  Driver Dependencies
  Driver Installation
  Unloading and Loading Driver
  Driver Settings
  Multi RSS
  Port Speeds
  Interface Private Statistics
  Enabling SR-IOV
  NPAR feature
  FEC Feature
  EEE feature
  Interrupt Coalescing Configuration
  How to change Rx/Tx Ring Size
  DCB support
  VF rate limit
  Changing Bridge mode of NIC
  MAC Anti-Spoof Feature
  Firmware Dump Feature
  Key-Value commands feature
  Setting Auto Speed mask value
  Enabling RoCE support
  Error Recovery support
  200G PAM4 support
  Module Parameters/Defaults
  Known issues
  Extra Notes
  ENS LRO/RSS feature

Introduction
============
This file describes the bnxtnet VMWare driver for the Broadcom NetXtreme-C
and NetXtreme-E BCM573xx/BCM574xx/BCM575xx 10/20/25/40/50/100/200 Gbps
Ethernet Network Controllers.


Driver Dependencies
===================
The driver has no dependencies on user-space packages and necessary
firmware must be programmed in NVRAM.


Driver Installation
===================
1. Copy the <bnxtnet>-<driver version>.vib file in /var/log/vmware
2. $ cd /var/log/vmware
3. $ esxcli software vib install --no-sig-check -v  /var/log/vmware/<bnxtnet>-<driver version>.vib
4. Reboot the machine
5. Verify that whether drivers are correctly installed:
$ esxcli software vib list | grep bnxtnet

Unloading and Loading Driver
=============================
1. To unload:

	$ vmkload_mod -u bnxtnet

2. To Load:

	$ vmkload_mod bnxtnet

		OR

	$ vmkload_mod bnxtnet <parameter=value>

	Example: To disable the tpa

	$ vmkload_mod bnxtnet disable_tpa=1,1,1,1

	$ kill -SIGHUP $(pidof vmkdevmgr) (Must required after "vmkload_mod bnxtnet")


Driver Settings
===============
bnxtnet drivers can be set and modify using two utilities:

Get the interface name e.g. vmnicX from "$ esxcfg-nics -l"

1. Changes using VMWare's "esxcli network" utility

	i. Show current speed, duplex, Driver version, Firmware version and link status

		$ esxcli network nic get -n <iface>

	ii. Set speed

	Example: Set speed to 10Gbps

		$ esxcli network nic set -S 10000 -D full -n <iface>

	Note: For 10Base-T cards, speed setting using this command will set the vmnic
	      into autoneg mode with single autoneg speed mask of the speed given with
	      the command. For e.g.,

	      "$ esxcli network nic set -S 1000 -D full -n vmnic4" will set autoneg mask of
	      1G speed in 10BaseT cards.

	      This is due to the limitation of ESXi OS which doesn't provide explicit
	      commands to set autoneg mask.

	Limitations:
	1. For 10G baseT, only link speed auto negotiation is supported.
           Using force speed selection results in esxcli command failure.

	iii. Show ring size

		$ esxcli network nic ring current get -n <iface>

	iv. Set ring size

	Example: Set Rx ring size = 256 and Tx ring size = 256

		$ esxcli network nic ring current set -r 256 -t 256 -n <iface>

	v. Check the status of TSO

		$ esxcli network nic tso get

	vi. Set the TSO on/off

		$ esxcli network nic tso set --enable=1 -n <iface> /*TSO=on*/
		$ esxcli network nic tso set --enable=0 -n <iface> /*TSO=off*/

	vii. Up/down the iface

		$ esxcli network nic up -n <iface>
		$ esxcli network nic down -n <iface>

	viii. Check offloads

		$ esxcli network nic software list

	ix. Set offloads

		$ esxcli network nic software set --<option> on -n <iface>

	x. Set and Get Pause Parameters

		1. To get Pause Parameters
		$ esxcli network nic pauseParams list

		2. To set Pause Parameters
		$ esxcli network nic pauseParams set --auto <1/0> --rx <1/0> --tx <1/0> -n <iface>

	   Note: Auto flow control/pause autonegotiation can be set only when the interface is configured in
	   autonegotiate speed mode.

2. Broadcom provides BNXTNETCLI (esxcli bnxtnet) utility to set the driver
to set/view miscellaneous parameters

	i. Install this utility

	-> Copy BCM-ESX-bnxtnetcli-<version>.vib in /var/log/vmware.
	-> $ cd /var/log/vmware
	-> $ esxcli software vib install --no-sig-check -v /BCM-ESX-bnxtnetcli-<version>.vib
	-> Reboot the system.
	-> Verify whether vib is installed correctly:

		$ esxcli software vib list | grep bcm-esx-bnxtnetcli

	ii. Set speed 10/20/25/40/50G

		$ esxcli bnxtnet link set -S <speed> -D <full> -n <iface>

	This will return "OK" message if the speed is correctly set.

	Example:

		$ esxcli bnxtnet link set -S 25000 -D full -n vmnic5

	iii. Show the link stats

		$ esxcli bnxtnet link get -n vmnic6

	iv. Show the driver/firmware/chip information

		$ esxcli bnxtnet drvinfo get -n vmnic4

	v. Show the NIC information (e.g. BDF; NPAR, SRIOV configuration)

		$ esxcli bnxtnet nic get -n vmnic4


Multi RSS
=========
Multi RSS in ESXi driver is controlled by enable_multi_rss module parameter.
By default, Multi-RSS is enabled. When enabled, Multi RSS is tries to enable
4 RSS engines.
In ENS mode, possible RSS engine will be created with 8 RSS queues by default.
If enough resources are not available, Driver enables the RSS with less engines
and/or less number of RSS queues in both ENS and Non ENS mode.


Limitations:
1) ESXi driver may disable Multi-RSS capability if enough resources are not
   available. The possible scenarios are NPAR, NPAR + SRIOV and/or servers
   will less system resources(Ex: less CPU cores). Driver logs the appropriate
   message in vmkernel log.
2) ESXi driver enables Multi-RSS capability with less number of RSS engines with
   less number of RSS queues on Wh+/Thor NPAR, NPAR + SRIOV as the hardware
   resources are limited in this configuration. Driver logs the appropriate
   message in vmkernel log.

Port Speeds
===========
On dual-port devices, the port speed of each port must be compatible with
the port speed of the other port.  10Gbps, and 40Gbps are one set of
compatible speeds.  25Gbps and 50Gbps are the other set of compatible
speeds.  The 2 sets are not compatible with each other and will result
in link loss if one port uses a speed that is incompatible with that of
other port.


Interface Private Statistics
============================
$ localcli --plugin-dir /usr/lib/vmware/esxcli/int networkinternal nic privstats get -n <iface>


Enabling SR-IOV
===============
The Broadcom NetXtreme-C and NetXtreme-E devices support Single Root I/O
Virtualization (SR-IOV) with Physical Functions (PFs) and Virtual Functions
(VFs) sharing the Ethernet port.

Only the PFs are automatically enabled. If a PF supports SR-IOV, the
PF(vmknicX) will be part of below command output.

    esxcli network sriovnic list

To enable one or more VFs, the driver uses the module parameter "max_vfs" to
enable the desired number of VFs for PFs.
For example, to enable 4 VFs on PF:

    esxcfg-module -s 'max_vfs=4' bnxtnet (reboot required)

To enable VFs on a set of PFs, use the below command format.
For example, to enable 4 VFs on PF 0 and 2 VFs on PF 2:

    esxcfg-module -s 'max_vfs=4,2' bnxtnet (reboot required)

The required VFs of each supported PF will be enabled in order during the PF
bring up.

Refer to other vmware documentation on how to map a VF to a VM.


NPAR Feature
============
NPAR 1.0/1.1 is for NIC Partitioning and it is Switch independent. NPAR can be configured
using Ctrl-S in Bios of the Broadcom NIC via CCM menu.

In bnxtnet L2 driver NPAR ports (MF's) mostly behaves same as regular Single Functions (SFs) excepts the following:-

1) Operation state of Link is always UP (or Down) for all the NPAR ports together on the same physical port.
2) NIC Teaming is supported for NPAR but ONLY between partitions on separate physical ports.
3) For two ports NIC cards BDF 'Even' values are for port 0 and BDF 'Odd' values are for port 1.


FEC Feature
===========
Please use latest version of bnxtnetcli utility to enable and disbale FEC
feature and display it.

1. To display current FEC setting:-

	$ esxcli bnxtnet link get -n vmnic5
	OR
	$ /opt/broadcom/bin/bnxtnetcli -g -n vmnic5 -L

2. To Enable/Disable FEC:-

	$ esxcli bnxtnet fec set -n vmnic5 -E <0/1> -M <mask>
	OR
	$ /opt/broadcom/bin/bnxtnetcli -s -n vmnic5 -B 1/0 -F <mask>

	mask:
	-----
	0x1 = For FEC AN
	0x2 = For FEC CLAUSE74
	0x4 = For FEC CLAUSE91

	For e.g.
		To set FEC AN:
				$ esxcli bnxtnet fec set -n vmnic5 -E 1 -M 1
		To set FEC AN and FEC CL74
				$ esxcli bnxtnet fec set -n vmnic5 -E 1 -M 3
		To disable FEC AN, CL74 and CL91
				$ esxcli bnxtnet fec set -n vmnic5 -E 0 -M 7


EEE feature
===========
Please use latest version of bnxtnetcli utility to enable and disbale EEE
feature in Cumulus NIC1 device.

1. To display current setting:-
Cmd options:
  -n|--nic-name=<str=vmnic5>   vmnic name

e.g.
$ /opt/broadcom/bin/bnxtnetcli -g -n vmnic5 -E
OR
$ esxcli bnxtnet eee  get -n vmnic5

2. To Enable EEE feature:-
Cmd options:
  -e|--enable=<str=1>     eee ON setting
  -n|--nic-name=<str=vmnic5>   vmnic name

e.g.
$  /opt/broadcom/bin/bnxtnetcli -s -n vmnic5 -e 1
OR
$ esxcli bnxtnet eee  set -n vmnic5 -e 1

3. To disable EEE feature:-
Cmd options:
  -e|--enable=<str=0>     eee OFF setting
  -n|--nic-name=<str=vmnic5>   vmnic name

e.g.
$  /opt/broadcom/bin/bnxtnetcli -s -n vmnic5 -e 0
OR
$ esxcli bnxtnet eee  set -n vmnic5 -e 0

4. To enable/disable TX LPI Timer:-
Cmd options:
  -e|--enable=<str>     eee on/off setting (required)
  -n|--nic-name=<str>   vmnic name (required)
  -t|--tx-lpi-enable=<str>
                        tx lpi timer on/off setting

value of --tx-lpi-enable (or -t): 0 is for disable and 1 is for enable.

e.g.
$ /opt/broadcom/bin/bnxtnetcli -s -n vmnic5 -e 1 -t <0/1>
OR
$ esxcli bnxtnet eee set -n vmnic5 -e 1 -t <0/1>

5. To set TX LPI Timer value:
Cmd options:
  -e|--enable=<str>     eee on/off setting (required)
  -n|--nic-name=<str>   vmnic name (required)
  -T|--tx-lpi-timer=<str>
                        tx lpi timer value in uS (int)

e.g. to set TX LPI Timer = 10 uS
$/opt/broadcom/bin/bnxtnetcli -s -n vmnic5 -e 1 -t 1 -T 10
OR
$ esxcli bnxtnet eee set -n vmnic5 -e 1 -t 1 -T 10


Interrupt Coalescing Configuration
==================================
Interrupt coalescing is a feature implemented in hardware under driver control
on high-performance Cumulus NICs, allowing the reception of a group of network
frames to be notified to the operating system kernel via a single hardware
interrupt.

1. To display current coalescing setting:-
$  esxcli network nic coalesce get

2.To set maximum number of (Rx/Tx) frames to wait for, before interrupting
$ esxcli network nic coalesce set -T=<rx_val> -R=<rx_val> -n vmnic4

3.To set number of microseconds to wait for completed (Rx/Tx), before
  interrupting.
$ esxcli network nic coalesce set -t=<val1> -r=<val2> -n vmnic4

Note:-
====
Currently our implementation uses one completion ring for each pair of  Tx and
Rx ring. Therefore, driver won't be able to set TX/RX  coalescing values
separately. Because of VMware NetIOC requires accurate TX coalescing setting,
following policies are currently enforced in the driver:

  1. If both TX and RX coalescing params are changing, out of both TX and RX,
     TX setting takes precedence.
  2. If only RX is changing, both TX and RX takes RX setting.
  3. If only TX is changing, both TX and RX takes TX setting.

A warning message also gets printed to vmkernel.log to show that RX coalescing
parameters may have changed due to the TX coalescing parameter change.


How to change Rx/Tx Ring Size
=============================
1. To display current ring size setting on a given vmnicX:-
$ esxcli network nic ring current get -n vmnic4

2. To set Rx/TX ring size:-
$ esxcli network nic ring current set -n vmnic4 -r=512 -t=512

Note:-
====
1. For Rx ring max ring size is 4095.
2. For Tx ring max ring size is 4095 and min ring size is 25.


DCB support
===========
By default, dcb is enabled. Current ESXi driver supports ROCE traffic as lossless and
all other traffic as lossy. DCB can be enabled with firmware dcb-x agent or host dcb-x agent.
Current driver supports firmware DCB support. Please proceed to respective sections to know more
on DCB support.

Supported DCB features:
1) Negotiation with switch and converging on IEEE DCB parameters.
2) Notifying the RoCE priority and CoS queue to RoCE driver.
3) Priority tagging for RoCE traffic.
4) Configuring ETS and PFC on device.

Switch configuration requirements:
1) Configure the switch DCB-X mode in IEEE or CEE mode (not in Auto mode)
2) Configure the switch in Non-willing mode
3) Make sure that switch multicasts the ETS recommendation, PFC and APP TLVs
4) Configure switch to support RoCE v2 App TLV
Note: Configuring the switch for LLDP DCB-X support is outside the scope this Readme

Firmware DCB support:
--------------------
By default this mode is enabled in driver.
Please enable firmware dcb-x agent through below nvm options by using lcdiag or lfc
tools and configuration.

Option - Description
155    - DCBX Mode
269    - LLDP Nearest Bridge
270    - LLDP non-TPMR

Please refer firmware dcb-x design document for more details.
https://docs.google.com/document/d/1vmLEn7lk9dKmqEDY3Rw5hQD2_xn1xmaIxuG0Y0iGocs/edit#

Current firmware based dcb support limitations:
1) Firware dcb-x agent always works in willing mode. Hence dcb negotiation in
   back-to-back mode is not supported.

Native DCB support(With Host dcb-x agent):
-----------------------------------------
By default this mode is disabled. please set enable_host_dcbd module parameter to enable this and
disable firmware dcb-x agent through HII or bnxtnvm tool. The current driver is not supporting
this mode and this will be enabled in subsequent releases. This Section will be updated
accordingly. It is recommended not to enable Native DCB support in driver for now.

Host configuration requirements:
1) Create a vSwitch in the host and assign bnxtnet adapter as an uplink:
        $ esxcfg-vswitch -a vswitch_DCB (This will create vSwitch named as "vswitch_DCB")
        $ esxcfg-vswitch -L vmnic4 vswitch_DCB (This will assign vmnic4 to vswitch_DCB)

VMware dcbd:
"dcbd" is active once ESXi driver registers DCB capabilities with VMkernel.
No need to enable it separately.

Some useful commands on dcbd:
1) disable already running dcbd daemon:
        $ /etc/init.d/dcbd stop

2) Start the dcbd daemon service
        $ /etc/init.d/dcbd start

3) Check the /var/log/syslog to confirm the dcbd is running properly
        $ cat /var/log/syslog | grep dcbd

Native based dcb support limitations:
1) ESXi dcbd always works in willing mode. Hence dcb negotiation in
   back-to-back mode is not supported.
2) Native DCB agent(dcbd) doesn't provide any interface to configure it.

3) Current DCB/QoS Vmware implementation assumes that services of FW DCBx
   agent and any VM's DCBx agent is stopped. Only ESXi DCBx agent service
   is running per port.

Overall Current Limitations in Vmware DCB implementation:
1) Current ESXi driver supports ROCE traffic as lossless and all other
   traffic as lossy due to ESXi L2 stack limitation.

Get DCBx ETS, PFC, APP and queue information:
============================================

  Please use bnxtnetcli version 1.13 onwards to get DCBx information.

  Latest bnxtnetcli utility supports to display the following DCB related
  information to the user:

       1) Maximum configurable total, lossless and lossy queues.

       2) ETS information (Number of ETS enabled TCs, Priority to Traffic Class(TC)
       map, Per TC Bandwidth, Per TC scheduling alogithm)

       3) PFC information (Maximum PFC enabled TCs, PFC priority bitmap,
       MACSec bypass capability)

       4) Application information (protocol_id, protocol selector, priority)

  Command to use:
  --------------
       $ esxcli bnxtnet dcb get -n <vmnic-name>

                       OR

       $ /opt/broadcom/bin/bnxtnetcli -g -n <vmnic-name> -Q

       For e.g.

       $ esxcli bnxtnet dcb get -n vmnic4

                       OR

       $ /opt/broadcom/bin/bnxtnetcli -g -n vmnic4 -Q

  Sample output:
  -------------

	Queue Information:
	-----------------
		MAX Configurable Queues: 3
		MAX Configurable Loseless Queues: 2
		MAX Configurable Lossy Queues: 1
	ETS Information:
	---------------
		Willing: Yes    Max no. of TCs: 3

		Traffic Class to BW and Scheduling Algo map
		-------------------------------------------
		TCs     BW      Sched.Algo
		---     --      ----------
		0       100     2 (ETS Priority)

		Priorities to TC map
		--------------------
		Pri. 0 is mapped to TC0
		Pri. 1 is mapped to TC1

	PFC Information:
	---------------
		Max PFC TCs: 0
		MAC security Bypass Capicity: 0

		Priority bitmap: ( 0 0 0 0 0 0 1 0 )
		----------------
		PFC is Enabled on: Priority 1

	APP Information:
	---------------
		Protocol ID: 0x8915, Protocol Selector: 0, Priority: 5

ESXi 7.0 onwards, VMware provided standard cli command to display DCB information.
Below is the sample command to display DCB information. When CEE enabled, current
standard cli command not displaying PG Bandwidth and PFC information. So, it is
recommended to use bnxtnetcli to verify CEE data.
Example: esxcli network nic dcb status  get -n vmnic5
   Nic Name: vmnic5
   Mode: 3 - IEEE Mode
   Enabled: true
   Capabilities:
         Priority Group: true
         Priority Flow Control: true
         PG Traffic Classes: 3
         PFC Traffic Classes: 1
   PFC Enabled: true
   PFC Configuration: 0 0 0 0 0 1 0 0
   IEEE ETS Configuration:
         Willing Bit In ETS Config TLV: 1
         Supported Capacity: 3
         Credit Based Shaper ETS Algorithm Supported: 0x0
         TX Bandwidth Per TC: 50 50 0 0 0 0 0 0
         RX Bandwidth Per TC: 0 0 0 0 0 0 0 0
         TSA Assignment Table Per TC: 2 2 0 0 0 0 0 0
         Priority Assignment Per TC: 0 0 0 0 0 1 0 0
         Recommended TC Bandwidth Per TC: 0 0 0 0 0 0 0 0
         Recommended TSA Assignment Per TC: 0 0 0 0 0 0 0 0
         Recommended Priority Assignment Per TC: 0 0 0 0 0 0 0 0
   IEEE PFC Configuration:
         Number Of Traffic Classes: 1
         PFC Configuration: 0 0 0 0 0 1 0 0
         Macsec Bypass Capability Is Enabled: 0
         Round Trip Propagation Delay Of Link: 0
         Sent PFC Frames: 0 0 0 0 0 0 0 0
         Received PFC Frames: 0 0 0 0 0 0 0 0
   DCB Apps:
         App Type: UDP port number
         Protocol ID: 0x12b7
         User Priority: 0x5


VF rate limit
=============
VF rate limiting is one of the QoS feature used by OEMs to restrict VF's
bandwidth usage. Administrator can set the rate limit on a VF through
command line interface. User is free to allocate available bandwidth.
Driver just adopts the FCFS (first come first serve) policy during
allocation.

Vsish command line interface to set and verify VF rate limit
------------------------------------------------------------
1) Go to particular VF directory.
   This step is mandatory to execute following commands.
   vsish /> cd /net/sriov/vmnic4/vfs/<vf num>
2) View the current rate limit setting with below command
   "get /net/sriov/vmnic4/vfs/<vf num>/egressRateLimit
    Sample Output:
    RateLimit {
        Rate limit enabled: 0
        Bandwidth limit in Mbps:0
    }
3) Set the rate limit on a particular VF with below command
   "set /net/sriov/vmnic4/vfs/<vf num>/egressRateLimit enable <BW Mbps>"
4) To disable rate limit use below command
   "set /net/sriov/vmnic4/vfs/<vf num>/egressRateLimit disable"

Current Limitations of VF rate limit feature
--------------------------------------------
1) VMware does not have public esxcli (native ESXi command line interface)
   command to set rate limit. Hence need to go with vsish commands to set and
   test VF rate limit.
2) VMware doesn't provide to set minimum rate on VF. Current vsish interface has
   only one parameter which is maximum rate limit.
3) VMware provides the rate limit in multiples of Mbps.
4) The VF rate limit setting is not persistent across reboots.


Changing Bridge mode of NIC
===========================
Latest bnxtnetcli 1.17 and bnxtnet-20.6.300.0 onwards supports changing of
EVB mode of the NIC to VEB or VEPA.

Command to change the EVB mode:
------------------------------

	$ esxcli bnxtnet evb set -B <veb/vepa> -n vmnicX

				or

	$ /opt/vmware/broadcom/bin/bnxtnetcli -s -n vmnicX -G veb/vepa

For e.g. $ esxcli bnxtnet evb set -B vepa -n vmnic4

To get the current EVB mode in the NIC
--------------------------------------

	$ esxcli bnxtnet nic get -n vmnicX


MAC Anti-Spoof Feature
======================
MAC anti-spoofing filter is placed on the Tx side of a L2 NIC interface of a user
subnet and only allows packets through that are within the mac address range owned
by the given VF.  The intent is to exclude packets that have invalid source mac
addresses.

$ esxcli network vswitch standard portgroup policy security get -p <port-group name>
$ esxcli network vswitch standard portgroup policy security set -f false -p <port-group name>

You can notice "Allow Forged Transmits: false" after above command i.e. L2
Anti-Spoof check is now in place for all the VM's sharing this port group.

MAC Anti-Spoof configuration setting is persistent across reboots and get
applied every time VM reboots.


Firmware Dump Feature
=====================
ESXi vmkernel provides support to dump the important information of firmware and/or driver
into a file when the system crashes(PSOD) or live dump.

bnxtnet driver provides this feature to take firmware information such as firmware trace buffer,
register dumps, queue information etc.

Once the host give PSODs, it will take the coredump. As a part of coredump host will dump these
firmware information as well.

After cold boot/reboot from PSOD extract "bnxtnet_fwdmp" file from vmkernel-zdump using below command:

 $ vmkdump_extract -F "bnxtnet_fwdmp" <zdump file>

For e.g.

 $ vmkdump_extract -F "bnxtnet_fwdmp" vmkernel-zdump.1

To take a live dump of the system use following commands:

 1. $ localcli --plugin-dir /usr/lib/vmware/esxcli/int/ debug livedump perform
 2. Find currently active partition using $ esxcli system coredump partition list
 3. Convert into zdump using:
    $ esxcfg-dumppart -C -D "<activated dump partition path>" --zdumpname <filename.zdump>
 4. $ vmkdump_extrace -F bnxtnet_fwdmp vmkernel-zdump


Key-Value commands feature
==========================
bnxtnet driver provides miscellaneous key-value command feature which can be used to debug
and get important information from the driver such as currently configured module parameters,
firmware traces, rings details etc.

"vmkmgmt_keyval" is the userspace utility to execute commands and get the driver information in
vmkernel.log.

Currently, bnxtnet provides three keys.
 1. "sw_hw_ring_info_dump"
 2. "driver_static_dbg_Info_dump"
 3. "firmware_debug_trace_dump"

Usage:
------
 $ /usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -i <bnxtnet-specifice-instance> -k <key> -g

For e.g.
 $ /usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -i "key_val_vmnic4/broadcom" -k "driver_static_dbg_Info_dump" -g

Note:- After issuing vm-support or 'firmware_debug_trace_dump' key-val  command using
cli pls ignore bnxtnet warning messages "Dumping FW Trace:", "== START OF TRACE ==" and
== END OF TRACE == in dmesg or vmkernel.log.


Setting Auto Speed mask value
=============================
User can set the Autoneg Speed mask value in Vmware using bnxtnetcli utility.

Command Usage: esxcli bnxtnet link advertise [cmd options]
Description:
  advertise             Change the advertising link of the NIC via management API.
Cmd options:
  -n|--nic-name=<str>   vmnic name (required)
  -A|--speed-mask=<str> link speed mask %x 1G=0x8, 10G=0x40, 25G=0x100, 40G=0x200
			and 50G=0x10000 (required)

Command Examples:
$ esxcli bnxtnet link advertise -A 0x8 -n vmnic7  (To set 1G only autoneg speed mask)
$ esxcli bnxtnet link advertise -A 0x40 -n vmnic7  (To set 10G only mask)
$ esxcli bnxtnet link advertise -A 0x48 -n vmnic7  (To set 1G and 10G  mask)


Enabling RoCE support:
=====================
To enable RoCE support in L2 driver, bnxtnet driver needs to be loaded with disable_roce=0 module parameter.
ESXi 7.0 onwards, disable_roce=0 is set by default and no need to provide this paramter explicitly.
If Customer wants to use RoCE support, It is recommended to upgrade compatible L2 and RoCE drivers. Otherwise,
RoCE functionality may not be working as expected.


Error Recovery support:
======================
This feature helps in auto recovery of Broadcom NICs because of hardware fatal/non fatal
errors without "User Intervention"/"Server Reboot".
Current bnxtnet driver supports "Firmware initiated error recovery"
and "Driver initiated errory recovery". By default, error recovery is disabled in Firmware.

What is "Firmware initiated error recovery"?
Firmware detects the non-recoverable hardware errors and trigger recovery by doing chip reset.
During this process, it notifies driver with Async event. Driver receives this event and re-init
itself.

What is "Driver initiated recovery"?
In some extreme scenarios, firmware might also crash. Driver checks firmware's health periodically
and auto recover the Broadcom NICs.

Error Recovery feature is not supported in the RoCE driver, NIC driver can't recover as the dependent RoCE
driver is not supporting the recovery.

200G PAM4 support:
=================
ESXi driver supports both autoneg and force speed of PAM4 200G support.
However, user needs to aware of below notes on PAM4 implementation and VMkernel's behavior.
- VMkernel remembers previous nic link settings. If earlier setting was auto,
  VMkernel tries to set autoneg. So, please check dmesg to check requested mode.
  Example: Uplink: 17299: Setting speed/duplex to (0 AUTO) on vmnic10
- For Autoneg to work both NIC and Switch should be in Autoneg
- If NIC is in Autoneg + Auto detect, Link can come up even with peer is in 200G PAM4
  force speed
- For 200G autodetect to work, user needs to hide 2nd port in UEFI device settings if dual
  ports enabled by default
- If dual ports enabled, autodect logic will not try for 200G autoneg or force speeds

Module Parameters
=================
This driver supports so many parameters to perform various actions like RSS, SR-IOV and etc.

1. See the available parameters and their description

$ esxcfg-module -i bnxtnet

2. See the current parameters of the driver

$ esxcfg-module -g bnxtnet

3. Set the parameter

$ esxcfg-module -s 'parameter=value' bnxtnet

Example: To disable TPA

$ esxcfg-module -s "disable_tpa=1,1,1,1" bnxtnet


Driver Defaults
===============
Flow control :			None

MTU :				1500

Tx_Push:			Disabled

Debug Level:			Disabled

VFS:				0 (Disabled)

RSS:				-1 (Max. 16)

DRSS:				4 (Max. 16)

RSS pools:			4

multi_rx_filters:		-1(use default number of Rx filters)

int_mode:			0 (MSIX enabled)

disable_tpa			0 (Disabled)

force_hwq			-1 (auto-configured)

enable_multi_rss		1

enable_default_queue_filters	-1 (Enabled for NPAR mode and/or when VFs are enabled, Disabled otherwise)

enable_vxlan_ofld		1

disable_shared_rings		0

disable_dcb			0

enable_host_dcbd		0

roce_pri			0x3

enable_geneve_ofld		1

disable_roce			0(ESXi 7.0 onwards), 1(other OS versions)

rsvd_netq			-1 (auto-configured)

Known issues:
============
        1. (CTRL-21078) RSS feature is not working when Vxlan filter is enabled.
           This is due to OS not giving RSS attributes along with vxlan filter.
           We have already got a waiver from Vmware for this issue.
        2. (CTRL-21967) SRIOV Windows VF Guest OS disallow MTU change option is not
           working. This is due to windows VF driver reloads the driver with new MTU
           value. If VF driver loads the driver again with new MTU, PF driver
           can't track the actual MTU change and forward the request to hypervisor
           for validation.
	3. SR-IOV:"VF rate limit" (PF driver controlled bandwidth limit settings for VFs)
	   is not supported in Vmware L2 driver.
	4. (CTRL-24519) ESXi L2 driver: Observing stale VF rate limit values on driver
	   Load->Unload->Load Sequence. Raised below DCPN with VMware to track this.
	   https://dcpn.force.com/apex/TechnicalRequestCaseRedesignPartner?Id=500i000000sW4rQAAS
	5. All loadable kernel modules like bnxtnet driver has one limitation, if driver code changes
	   user supplied module parameters values during validation, it will not be reflected in 'esxcfg-module -g'
	   o/p. There is no back channel provided by OS to give what exact values are used inside driver.
	   however revised module parameter value will be reflected in driver log messages in 'dmesg'.
	6. Geneve OAM traffic is routing to default queue instead of separate queue. Hence, disabling OAM queue
	    support by default in driver. CTRL-26041 is for tracking firmware fix.
	7. By default, VMkernel supports only 1024 interrupt cookies. If the system is having multiple NICs with
	    NPAR, NPAR+SRIOV, the requirement for the maxIntrCookies will be more. So, we need to use below
	    workaround to increase this system resource. Please use below configuration when max supported VFs are in use.
	    Increase the intr cookies to the desired vlaue (4096 is the max) to support more Broadcom NICs PFs/Partitions.
	    Command: esxcli system settings kernel set -s=maxIntrCookies -v=4096
	    In case of insufficient cookies, driver/Vmkernel fails to claim PFs/Paritions with below
	    warning message in logs.
	    Sample warning messages:
	    -----------------------
	    WARNING: IntrCookie: 1424: Unable to find a free interrupt number; see https://kb.vmware.com/s/article/78182
	    WARNING: VMK_PCI: 597: device 0000:af:01.1 failed to allocate 15 MSIX interrupts
	    WARNING: bnxtnet: bnxtnet_alloc_intr_resources:1176: [0000:af:01.1 : 0x45211c68c000] Failed to allocate
		     interrupt cookies (Out of resources)msix:15
	    WARNING: bnxtnet: bnxtnet_attach_device:337: [0000:af:01.1 : 0x45211c68c000] failed to find cumulus
		     device phase 2(status: Failure)
	    -----------------------
	8. (CTRL-47027, CTRL-49725) alloc_rx_buffers failures on a server with more NICs or NPARs
	    These failures because of vmk_PktAllocPage failure. This API can fail if default page pool memory
	    is not sufficient. So, we need to increase this pool with below commands.
	    a. esxcli system settings kernel set -s netPagePoolLimitCap -v <new_value>
	    b. esxcli system settings kernel set -s netPagePoolLimitPerGB -v <new_value>
	    User can use below comamnd to view current value.
	    c. esxcli system settings kernel list | grep netPagePoolLimitCap
	    d. esxcli system settings kernel list | grep netPagePoolLimitPerGB
	9. (CTRL-48410) Driver may disable the DRSS in some resource constraint environments,
	    i.e. when requested rings > available rings.
	10. PSOD or QUERY_FUNC HWRM command timeout observed with "AMD Rome" platforms with IOMMU IR enabled.
	    This issue can be observed on AMD platforms with IOMMU IR enabled and the platform requires >=512 interrupts.
	    Symptoms:
		1. bnxtnet driver: No warning messages seen during driver load. But, few PFs may fail to
		   run I/O due to interrupts not getting routed to driver. As a side effect of this,
		   RING_FREE hwrm timeouts can be seen during bnxtnet driver unload leading to PSOD.
		2. bnxtroce driver: QUERY_FUNC HWRM timeout messages can be seen in vmkernel logs during
		   the driver load for few PFs.
	    Workaround suggested by VMware:
		Disable IR (interrupt remapper) using boot option ‘iovDisableIR=TRUE' or using esxcli command.
		## esxcli system settings kernel set -s iovDisableIR -v true
		Please contact VMware if there are any other side effects observed because of this workaround.
	11. (CTRL-50544) "esxcli network nic tso get" command showing TSO as "N/A" for ENS interfaces
	    As per VMware inbox team, "esxcli network nic" commands will throw an expection when
	    interface is in ENS mode. User need to access vsish entries to verify enabled HW capabilities.
	12. (CTRL-51706) PSOD can be observed with continuous error recovery on AMD platforms with ESXi 7.0 U1
	    and setting iovDisableIR=true. Customer need to upgrade to ESXi 7.0 U2 to avoid PSOD. As per VMware,
	    ESXi 7.0 U2 has all the applicable AMD specific fixes.
	13. (CTRL-52044) Issues like interrupt miss (and/or) vf creation failures can be avoided by reducing
	    'Number of MSI-X Vectors per VF' from default to a lower value on 'preboot config menu'.
            For example:
		A.Reduce it to 3 to support 128 VFs on Thor in NPAR mode.
		B.Reduce it to 6 or 7 to support 32 VFs per PF on 4-port Thor.
	14. (CTRL-53640) The driver can support upto num_of_vfs_per_pf VF's configured on 'preboot config menu'
	    theoretically, but, driver may reduce VFs (or) may disable SR-IOV if firmware can not accommodate
	    required resources (like num_filters, num_txq, num_rxq, etc.) for those many requested VFs
	    and the workaround is to try with lesser number of VFs.

		-----------------------------------------------------------------------------------------
		|          NIC		| With default settings		| With "msix_vectors_per_vf=3"	|
		|			|				| to achieve max possible VFs 	|
		-----------------------------------------------------------------------------------------
		| 2-port-Thor (SF)	| 64 VFs/pf (128 VFs total)	| 64 VFs/pf (128 VFs total)	|
		-----------------------------------------------------------------------------------------
		| 4-port-Thor (SF)	| 24 VFs/pf (96 VFs total) 	| 32 VFs/pf (128 VFs total)	|
		-----------------------------------------------------------------------------------------
		| 2-port-Thor (NPAR1.0)	| 	0 VFs/pf  		| 8 VFs/pf (128 VFs total)	|
		-----------------------------------------------------------------------------------------
		| 4-port-Thor (NPAR1.0)	| 	0 VFs/pf 		| 8 VFs/pf (128 VFs total)	|
		-----------------------------------------------------------------------------------------
	   Above limits are applicable for all the Thor variants until B1. The limit of default settings will
	   be changed from Thor B2 and same will be modified in later releases.
	15. (DCSG01358204) ENS capability fails to enable on NPAR with bnxtroce driver loaded. This is
	mainly because of VMkernel supports maximum of 256 helper queues. When bnxtroce driver loaded,
	bnxtroce driver consumes 6 helper queues per partition. The side effect of this is some of the
	ENS modules are unable to load and couldn't enable ENS capability on NPARs. The workaround is
	to load bnxtnet driver with disable_roce=1 module parameter.
	16. (DCSG01288744/DCSG01375514) The performance drop seen on BCM5741x NICs with geneve traffic in
	NSX-T environmnets. The BCM5741x adapters do not support LRO functionality for Geneve traffic.
	This leads to overhead in the software stack that degrades the performance.
	Workaround: Enable software LRO by disabling hardware LRO, using disable_tpa driver parameter.
	Below is the example to disable hardware LRO with 4 nics
	esxcfg-module -s 'disable_tpa=1,1,1,1' bnxtnet
	17: (DCSG01531965) System crash when NPAR enabled on more Thor NIC ports with RoCE enabled
	VMkernel supports maximum of 256 helper queues. When bnxtroce driver loaded, bnxtroce driver
	consumes 6 helper queues per partition. The side effect of this is some of the VMkernel
	components won't get the helper queues causing system crash. The workaround is to disable
	the RoCE on few paritions so that VMkernel components gets the required helper queues.

Extra Notes:
===========
These are the steps which make kernel command line parameters persistent
over every reboot:-
1) Go to the browser, enter the host IP address and go to the vSphere Web UI.
2) On left tab click on Manage, go to system and select Advanced settings.
3) Search for keyword 'pcibaralloc' it will give VMkernel.Boot.pciBarAllocPolicy
   right click it and edit the value to 0x1(0x1 is smallest fit algorithm).
4) Reboot the host.

ENS LRO feature:
===============
User need to set below advanced settings for LRO to function in ENS mode.
Reboot not required for the below command to take effective

set comand - esxcli system settings advanced set -o /Net/EnsHwLROSupport -i 1

ENS RSS feature:
===============
ESXi 8.0 has support for this feature. Current ENS stack is not yet populating
the information about RSS queues through vsish nodes.
So, driver logs the enabled number of RSS engines in vmkernel log.
To know more about each RSS engine, User needs to use below command.

Get the RSS engines configured on ENS uplink:
nsxdp-cli ens uplink rss list -n <vmnic>

Example output:
nsxdp-cli ens uplink rss list -n vmnic7
ID   Type     PriQ NumQ h-Func h-Type keySz  IndSz  SecQs
----------------------------------------------------------
0    DFTRSS   0    4    2      63     40     128    1 2 3
1    SHARSS   5    4    2      63     40     128    6 7 8
ID: RSS engine Index
Type: DFTRSS = DefaultQ RSS,
      SHARSS = Shared RSS engines(shared by multiple clients),
      OFLRSS = Offloaded RSS/Dedicated RSS(dedicated to single VNIC)-Not supported
PriQ: Primary queue of RSS engine
NumQ: Number of queues in RSS engine
h-Func: Hash function
h-Type: Hash Type
KeySz: Key Size
IndSz: Indirection Table Size
SecQs: The secondary queues of the RSS engine

User needs to set some of the kernel parameters to enable ENS RSS and/or LRO.
As per discussion with VMware, below are the options needs to be set for enabling
certain features and use Latest ESXi 8.0 GA builds.

For vmknic RSS+LRO enable:
--------------------------
esxcli system settings advanced set -o /Net/TcpipEnsMultipleRxContexts -i 1
esxcli system settings advanced set -o /Net/EnsHwLROSupport -i 1
esxcli system settings advanced set -o /Net/TcpipEnsDedicatedNetQRSS -i 1

For VM RSS+LRO enable:
----------------------
esxcli system settings advanced set -o /Net/EnsHwLROSupport -i 1
Modify ethernet%d.rssOffload to 1(Dedicated RSS) and ethernet%d.pnicFeatures=5 in
VM's .vmx file

For vmknic RSS only:
--------------------
No extra specific parameters needs to be set. bydefault, it is enabled.

For VM RSS only:
----------------
No extra specific parameters needed as by default ethernet%d.pnicFeatures is 4(RSS only) in
.vmx VM's file

For vmknic LRO only:
--------------------
esxcli system settings advanced set -o /Net/TcpipEnsNetQRSS -i 0
esxcli system settings advanced set -o /Net/EnsHwLROSupport -i 1

For VM LRO only:
----------------
esxcli system settings advanced set -o /Net/EnsHwLROSupport -i 1
Set ethernet%d.pnicFeatures=1 in VM's .vmx file.

-Make sure above options set before creating dvSwitch.
-Note that anytime a vmknic or vnic port in a DVS requested shared RSS, the whole DVS is
considered as using shared RSS, so LRO would not take effect. So, if multiple vmknic and/or
vnic ports in the DVS and wants to enable LRO (either by itself or LRO+RSS), make sure all
ports have settings for either LRO only or LRO+(dedicated)RSS.
