				       README Notes
			      Broadcom bnxt_re FreeBSD Driver

	                             Broadcom Inc.
				15101 Alton Parkway,
				   Irvine, CA 92618

			  Copyright (c) 2024 Broadcom Limited
				   All rights reserved


Table of Contents
=================
  Introduction
  Driver Dependencies
  Driver Download and Compilation
  Unloading and Loading Driver
  L2 Settings
  Commands to query RoCE
  RoCE driver stats
  Query GID
  krping
  Known Issues

Introduction
============
This file describes the bnxt_re FreeBSD driver for the Broadcom NetXtreme-C
and NetXtreme-E BCM573xx/BCM574xx/BCM575xx 10/20/25/40/50/100/200 Gbps
Ethernet Network Controllers.

Driver Dependencies
===================
   1. L2 driver (if_bnxt)
   2. kldload linuxkpi
   3. kldload  ibcore
   4. Necessary firmware must be programmed in NVRAM.

Driver Download and Compilation
===============================
1. Download and unpack the driver source files.
2. Navigate to the bnxt_re directory.
   ## cd freebsd-bnxt-227.0.24.X/bnxt_re
3. Compile the driver.
   ## make clean
   ## make
      Note: The Makefile's .PATH variable points to /usr/src/sys/dev/bnxt/bnxt_re/
            for the source files, but if there are any source files in the current
	    working directory (pwd), they will take precedence over the ones
	    specified in .PATH variable.
4. The driver binary will be generated in the current directory.
   ## ls ./bnxt_re.ko

=============================
Unloading and Loading Driver
=============================
1. To unload:
   ## kldunload bnxt_re

2. To load:
   A. Navigate to the directory containing the driver binary
      ## cd /usr/obj/usr/src/amd64.amd64/sys/modules/bnxt_re/
      (or)
      ## cd freebsd-bnxt-227.0.24.X/bnxt_re
   B. Load the driver
      ## kldload ./bnxt_re.ko

L2 Settings
============
Commands to assign IP, MTU, down/up, speed change, etc..
   ## ifconfig
   ## ifconfig -a
   ## ifconfig bnxt0 12.0.0.10/24
   ## ifconfig bnxt0 mtu 9000
   ## ifconfig -m
   ## ifconfig -m bnxt0
   ## ifconfig bnxt0 media 25GBase-CR

Commands to query RoCE
========================
   ## ibstat
   ## sysctl sys.class.infiniband.bnxt_re0
   ## sysctl sys.class.infiniband_cm.bnxt_re0
   ## sysctl sys.class.infiniband_verbs.uverbs0.ibdev: bnxt_re0
   ## sysctl sys.class.infiniband_mad.issm0.ibdev: bnxt_re0
   ## sysctl sys.class.infiniband_mad.umad0.ibdev: bnxt_re0
   ## sysctl sys.class.infiniband_cm.ucm0.ibdev: bnxt_re0

RoCE driver stats
==================
   ## sysctl sys.class.infiniband.bnxt_re0.ports.1.hw_counters

Query GID
===========
   GID values:
   -----------
   ## sysctl sys.class.infiniband.bnxt_re0.ports.1 | grep gids
   sys.class.infiniband.bnxt_re0.ports.1.gids.3: 0000:0000:0000:0000:0000:ffff:0c00:0014
   sys.class.infiniband.bnxt_re0.ports.1.gids.2: 0000:0000:0000:0000:0000:ffff:0c00:0014
   sys.class.infiniband.bnxt_re0.ports.1.gids.1: fe80:0000:0000:0000:b226:28ff:fed0:2c51
   sys.class.infiniband.bnxt_re0.ports.1.gids.0: fe80:0000:0000:0000:b226:28ff:fed0:2c51

   GID types:
   -----------
   ## sysctl sys.class.infiniband.bnxt_re0.ports.1 | grep types
   sys.class.infiniband.bnxt_re0.ports.1.gid_attrs.types.3: RoCE v2
   sys.class.infiniband.bnxt_re0.ports.1.gid_attrs.types.2: IB/RoCE v1
   sys.class.infiniband.bnxt_re0.ports.1.gid_attrs.types.1: RoCE v2
   sys.class.infiniband.bnxt_re0.ports.1.gid_attrs.types.0: IB/RoCE v1

   GID net_device
   ---------------
   root@FB12:~ # sysctl sys.class.infiniband.bnxt_re0.ports.1 | grep ndev
   sys.class.infiniband.bnxt_re0.ports.1.gid_attrs.ndevs.3: bnxt3
   sys.class.infiniband.bnxt_re0.ports.1.gid_attrs.ndevs.2: bnxt3
   sys.class.infiniband.bnxt_re0.ports.1.gid_attrs.ndevs.1: bnxt3
   sys.class.infiniband.bnxt_re0.ports.1.gid_attrs.ndevs.0: bnxt3

   Default RoCE mode:
   ------------------
   ## sysctl -a | grep default_roce_mode_port
   sys.class.infiniband.bnxt_re0.default_roce_mode_port1: RoCE v2

krping
=======
   Krping is a tool that can be used to test the connectivity and performance
   of RoCE-enabled network interfaces on FreeBSD and other operating systems.
   It is a user-space program that utilizes the kernel's RDMA (Remote Direct
   Memory Access) subsystem to perform the tests.

   Krping works by exchanging ping-pong messages between two nodes over RoCE,
   where each node sends a message to the other node and waits for a response.

   Load krping module:
   -------------------
   ## kldload krping

   Start krping server:
   ---------------------
   ## echo "server,addr=12.0.0.20,port=9991,verbose" > /dev/krping
      krping: write string = |server,addr=12.0.0.20,port=9991,verbose|


   Start krping Client:
   ---------------------
   ## echo "client,addr=12.0.0.20,port=9991,verbose" > /dev/krping
      krping: write string = |client,addr=12.0.0.20,port=9991,verbose|

   krping stats (appears only when I/O in progress)
   -------------------------------------------------
   ## cat /dev/krping
   krping: num   device  snd bytes snd msgs  rcv bytes rcv msgs wr bytes wr msgs rd bytes rd msgs
   krping:   1 bnxt_re0      13872      867      13856      866        0       0        0       0
   krping:   1 bnxt_re0      19552     1222      19552     1222        0       0        0       0
   krping:   1 bnxt_re0      25392     1587      25376     1586        0       0        0       0
   krping:   1 bnxt_re0      31072     1942      31072     1942        0       0        0       0
   krping:   1 bnxt_re0      38352     2397      38336     2396        0       0        0       0


   root@FB12:/mnt/CIN2/bnxt_re # dmesg | grep "ping data:"
   krping: ping data: rdma-ping-30423: `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS
   krping: ping data: rdma-ping-30424: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRST
   krping: ping data: rdma-ping-30425: bcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTU
   krping: ping data: rdma-ping-30426: cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV
   krping: ping data: rdma-ping-30427: defghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVW
   krping: ping data: rdma-ping-30428: efghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWX
   krping: ping data: rdma-ping-30429: fghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXY
   krping: ping data: rdma-ping-30430: ghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
   kirping: ping data: rdma-ping-30431: hijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ[

   Krping options
   ---------------
           {"count", OPT_INT, 'C'},
           {"size", OPT_INT, 'S'},
           {"addr", OPT_STRING, 'a'},
           {"addr6", OPT_STRING, 'A'},
           {"port", OPT_INT, 'p'},
           {"verbose", OPT_NOPARAM, 'v'},
           {"validate", OPT_NOPARAM, 'V'},
           {"server", OPT_NOPARAM, 's'},
           {"client", OPT_NOPARAM, 'c'},
           {"server_inv", OPT_NOPARAM, 'I'},
           {"duplex", OPT_NOPARAM, 'd'},
           {"tos", OPT_INT, 't'},
           {"txdepth", OPT_INT, 'T'},
           {"poll", OPT_NOPARAM, 'P'},
           {"local_dma_lkey", OPT_NOPARAM, 'Z'},
           {"read_inv", OPT_NOPARAM, 'R'},
           {"wlat", OPT_NOPARAM, 'l'}, 		[Currently unsupported]
           {"rlat", OPT_NOPARAM, 'L'},		[Currently unsupported]
           {"bw", OPT_NOPARAM, 'B'},		[Currently unsupported]
           {"fr", OPT_NOPARAM, 'f'},		[Currently unsupported]

Configuration Tips
==================
- For heavy RDMA-READ workloads with large number of active QPs,
  a higher ack-timeout value is recommended.
  For example:
   ib_read_bw with -q 4096 would require ack timeout 18. Ack timeout is
   controlled by option "-u".
   ib_read_bw --report_gbits -F -m 4096 -q 4096 -d bnxt_re2 -x 3 -u 18 -D 60 -s 65536

- For use cases where the adapter QP limit is exercised or active qps are
  close to adapter limits, ack timeout needs to be increased to 24 to avoid
  retransmissions and loss of performance.
  For example:
   For multiple instances of ib_send_bw/ib_read_bw/ib_write_bw, which creates
   total of 64K QPs, specify higher ack timeout in each application instance
   using -u 24.

Limitations
===========
1. Doorbell drops (DB Drops):
   Doorbell (DB) is a mechanism used by host software to notify NIC about the following:
     A. Producer index SQ, RQ & SRQ
     B. Consumer index of CQ & NQ
     C. Arming CQ, NQ & SRQ

   NIC has a limited-size DB FIFO for storing Host's writes. When more and more
   CPU cores ring the DBs simultaneously, DB FIFO can be full due to the delay
   in HW processing the DBs and incoming doorbells may get dropped.
   Such dropped doorbells can potentially cause unpredictable behavior(crashes, infinite hang,
   I/O failures on both L2 and RoCE, interrupts/events miss).

   When doorbell drops occur, the L2 driver logs the following messages in the kernel logs (dmesg):
     > kernel: bnxt2: One or more MMIO doorbells dropped by the device! epoch: 0x1.
     > kernel: bnxt2: One or more MMIO doorbells dropped by the device! epoch: 0x2.

Known Issues:
=============
1. There may be few compilation warnings (but no errors).
2. "Ifconfig down" can be done only after unloading RoCE driver.
