-
Notifications
You must be signed in to change notification settings - Fork 0
Switch Port Configuration
- Port Identification
- Physical Port Identification
- Port Administrative State
- Port MTU
- Port Speed
- Port Statistics
- Port Splitting
- Port Module Information
- Further Resources
Management ports do not use the same driver as front panel ports and can
therefore be distinguished using the Linux ethtool
utility.
The following is an output example of a management port on a Mellanox SN2700 switch:
$ ethtool -i eth1
driver: e1000e
version: 3.2.6-k
firmware-version: 1.10-0
bus-info: 0000:06:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
The following is an output example of a front panel port on a Mellanox SN2700 switch:
$ ethtool -i sw1p1
driver: mlxsw_spectrum
version: 1.0
firmware-version: 13.400.116
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
The output example above shows that the management port uses Intel's
e1000e
driver whereas the front panel port uses Mellanox's mlxsw_spectrum
driver.
As of Linux 4.7 it has become possible to create udev
rules which rename the
software interfaces (port netdevs) corresponding to the front panel ports
according to the front panel numbering. To do so, create the following rule in
/etc/udev/rules.d/10-local.rules
:
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="mlxsw_spectrum*", \
NAME="sw$attr{phys_port_name}"
It is possible to make the port LED blink using ethtool
and thereby
identify the corresponding physical interface:
$ ethtool -p sw1p1
This command turns on the LED next to the port until it is explicitly turned off by
killing ethtool
. It is possible to turn the LED on for a specific number of
seconds by running:
$ ethtool -p sw1p1 5
systemd
234 can automatically rename the ports according to their
front panel numbering without user intervention. This results in
names such as enp3s0np5
, which represents front panel port 5.
Note: This functionality was backported to systemd
231 in Fedora
and thus available in Fedora 25 and onwards.
After booting the switch or loading the driver, all the ports go down. The following command changes the administrative state of the port to up:
$ ip link set dev sw1p5 up
However, the operational state of the port only changes to up if the port is able to negotiate the link with its partner. In which case, the output appears as follows:
$ ip link show dev sw1p5
31: sw1p5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast switchid e41d2d45a9c0 state UP mode DEFAULT group default qlen 1000
link/ether e4:1d:2d:45:a9:f5 brd ff:ff:ff:ff:ff:ff
To set the port to down, run:
$ ip link set dev sw1p5 down
To set the port MTU, run:
$ ip link set dev sw1p1 mtu 1400
The switch supports jumbo frames, so values higher than 1500 may be used.
Port speed settings are performed with the ethtool
utility. Assuming the
port's operational status is up, the user may query its current speed:
$ ethtool sw1p5 | grep Speed
Speed: 40000Mb/s
In this case the port's speed is 40Gb/s. To set a different speed, run:
$ ethtool -s sw1p5 speed 10000 autoneg off
This sets the port's speed to 10Gb/s. Assuming the administrative state of the port is up, this command makes the port go through link negotiation again by toggling its administrative state to down and then up. However, the port only goes up if its partner also supports the configured speed.
The command also disables speed auto-negotiation by setting only one desired speed. To allow the switch to auto-negotiate and choose the highest advertised speed, the user may enable auto-negotiation by running:
$ ethtool -s sw1p5 autoneg on
To query the port speed after speed negotiation, run:
$ ethtool sw1p5 | grep Speed
Speed: 40000Mb/s
Two types of statistics exist for each port:
- Software
- Hardware
Software statistics account for packets trapped to the CPU or packets sent from the CPU. Hardware statistics account for all packets going through the port, including those not trapped to or originating from the CPU.
The ifstat
utility is used to query the port's software statistics:
$ ifstat -x cpu sw1p5
#kernel
Interface RX Pkts/Rate TX Pkts/Rate RX Data/Rate TX Data/Rate
RX Errs/Drop TX Errs/Drop RX Over/Rate TX Coll/Rate
sw1p5 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Two utilities can be used to query the port's hardware statistics:
-
ip
utility -
ethtool
utility
Using ip
:
$ ip -s link show sw1p5
31: sw1p5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast switchid e41d2d45a9c0 state UP mode DEFAULT group default qlen 1000
link/ether e4:1d:2d:45:a9:f5 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
136360 1868 0 1864 0 0
TX: bytes packets errors dropped carrier collsns
776 8 0 0 0 0
Using ethtool
:
$ ethtool -S sw1p5
NIC statistics:
a_frames_transmitted_ok: 8500
a_frames_received_ok: 772
a_frame_check_sequence_errors: 0
a_alignment_errors: 0
a_octets_transmitted_ok: 874212
a_octets_received_ok: 67968
a_multicast_frames_xmitted_ok: 308
a_broadcast_frames_xmitted_ok: 0
a_multicast_frames_received_ok: 290
a_broadcast_frames_received_ok: 0
a_in_range_length_errors: 0
a_out_of_range_length_field: 0
a_frame_too_long_errors: 0
a_symbol_error_during_carrier: 0
a_mac_control_frames_transmitted: 0
a_mac_control_frames_received: 0
a_unsupported_opcodes_received: 0
a_pause_mac_ctrl_frames_received: 0
a_pause_mac_ctrl_frames_xmitted: 0
if_in_discards: 0
if_out_discards: 0
if_out_errors: 0
ether_stats_undersize_pkts: 0
ether_stats_oversize_pkts: 0
ether_stats_fragments: 0
ether_pkts64octets: 0
...
ether_pkts65to127octets: 0
...
dot3stats_fcs_errors: 0
dot3stats_symbol_errors: 0
dot3control_in_unknown_opcodes: 0
dot3in_pause_frames: 0
discard_ingress_general: 0
discard_ingress_policy_engine: 0
discard_ingress_vlan_membership: 0
discard_ingress_tag_frame_type: 0
discard_egress_vlan_membership: 0
discard_loopback_filter: 0
discard_egress_general: 0
discard_egress_hoq: 0
discard_egress_policy_engine: 0
discard_ingress_tx_link_down: 0
discard_egress_stp_filter: 0
discard_egress_sll: 0
rx_octets_prio_0: 67968
rx_frames_prio_0: 772
tx_octets_prio_0: 874212
tx_frames_prio_0: 8500
rx_pause_prio_0: 0
rx_pause_duration_prio_0: 0
tx_pause_prio_0: 0
tx_pause_duration_prio_0: 0
...
tc_transmit_queue_tc_0: 0
tc_no_buffer_discard_uc_tc_0: 0
...
-
a_frames_transmitted_ok
: Includes PAUSE frames transmitted by the port. This applies fora_octets_transmitted_ok
as well. -
a_frames_received_ok
: Includes packets later discarded due to insufficient space in the port's headroom or not admitted to the switch's shared buffer. This applies fora_octets_received_ok
as well. -
a_pause_mac_ctrl_frames_received
: Includes both PAUSE and PFC frames. This applies fora_pause_mac_ctrl_frames_xmitted
as well. -
As part of RFC 2863:
if_in_discards
- The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent them from being deliverable to a higher-layer protocol.if_out_discards
- The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent them from being transmitted.if_out_errors
- The number of outbound packets that could not be transmitted because of errors. -
As part of RFC 2819:
ether_stats_undersize_pkts
- The total number of packets received that were less than 64 octets long (excluding framing bits, but including FCS octets) and were otherwise well formed.ether_stats_oversize_pkts
- The total number of packets received that were longer than MTU octets (excluding framing bits, but including FCS octets) but were otherwise well formed.ether_stats_fragments
- The total number of packets received that were less than 64 octets in length (excluding framing bits but including FCS octets) and had either a bad FCS with an integral number of octets (FCS error) or a bad FCS with a non-integral number of octets (alignment error).ether_pkts64octets
- The total number of packets (including bad packets) received that were 64 octets in length (excluding framing bits but including FCS octets).ether_pkts<X>to<Y>octets
- The total number of packets (including bad packets) received that were betweenX
andY
octets in length (excluding framing bits but including FCS octets). -
As part of RFC 3635:
dot3stats_fcs_errors
- A count of frames received that are an integral number of octets in length but do not pass the FCS check. This count does not include frames received with frame-too-long or frame-too-short errors.dot3stats_symbol_errors
- The number of times the receiving media is non-idle (a carrier event) for a period of time equal to or greater than minFrameSize, and during which there was at least one occurrence of an event that causes the PHY to indicate 'Receive Error'.dot3control_in_unknown_opcodes
- A count of MAC Control frames received that contain an opcode that is not supported.dot3in_pause_frames
- count of MAC Control frames received with an opcode indicating the PAUSE operation. -
Hardware specific discard counters:
discard_egress_general
- In Spectrum, counts only MTU discards.discard_egress_hoq
- Head-of-Queue time-out discards.discard_egress_sll
- Number of packets dropped, because the Switch Lifetime Limit was exceeded. -
rx_pause_prio_X
: Number of PFC frames received from the far-end port with priorityX
. PAUSE frames increment the counters of all priorities. -
rx_pause_duration_prio_X
: The total time in microseconds in which transmission of packets with priorityX
to the far-end port has been paused. PAUSE frames increment the counters of all priorities. -
tx_pause_prio_X
: Number of PFC frames sent to the far-end port with priorityX
. PAUSE frames increment the counters of all priorities. -
tx_pause_duration_prio_X
: The total time in microseconds that transmission of packets with priorityX
from the far-end port has been requested to pause. -
tc_transmit_queue_tc_X
: The transmit queue depth in bytes of traffic classX
. -
tc_no_buffer_discard_uc_tc_X
: The number of unicast packets with traffic classX
dropped due to lack of shared buffer resources.
The port's statistics are never reset while the driver is loaded. They can only be reset by removing and inserting the driver.
However, it is possible to see the difference in the hardware statistics using
iproute2
's ifstat
utility. When executed, it shows the difference between the
last and the current call:
$ ifstat sw1p5
#kernel
Interface RX Pkts/Rate TX Pkts/Rate RX Data/Rate TX Data/Rate
RX Errs/Drop TX Errs/Drop RX Over/Rate TX Coll/Rate
sw1p5 1 0 1 0 98 0 114 0
0 0 0 0 0 0 0 0
(... after some time passes ...)
$ ifstat sw1p5
#kernel
Interface RX Pkts/Rate TX Pkts/Rate RX Data/Rate TX Data/Rate
RX Errs/Drop TX Errs/Drop RX Over/Rate TX Coll/Rate
sw1p5 9 0 9 0 882 0 1026 0
0 0 0 0 0 0 0 0
As of Linux 4.6 it has become possible to split and unsplit the front panel
ports using the devlink
utility, which is part of the iproute2
package.
Note that devlink
is available in iproute2
starting with version 4.6.0.
The following command splits the first front panel port into 4 ports:
$ devlink port split pci/0000:03:00.0/61 count 4
Where pci/0000:03:00.0/61
is the DEV/PORT_INDEX
handle used by devlink
and
can be retrieved using the command devlink port show
:
$ devlink port show
...
pci/0000:03:00.0/61: type eth netdev sw1p1
...
Assuming the previously described udev
rule is used, sw1p1
disappears
and the following net devices are created:
$ devlink port show
...
pci/0000:03:00.0/61: type eth netdev sw1p1s0 split_group 0
pci/0000:03:00.0/62: type eth netdev sw1p1s1 split_group 0
pci/0000:03:00.0/63: type eth netdev sw1p1s2 split_group 0
pci/0000:03:00.0/64: type eth netdev sw1p1s3 split_group 0
...
Note: In SN2700 and SN2410, splitting a port by four disables the adjacent
port in the front panel column. So in the case above, both sw1p1
and sw1p2
disappear.
The following command unsplits the previously split sw1p1
port:
$ devlink port unsplit pci/0000:03:00.0/62
The handle DEV/PORT_INDEX
of any of the split ports can be used when
unsplitting. The unsplit
command re-spawns the previously present front
panel ports: sw1p1
and sw1p2
.
In order to access the SFP+/QSFP internal EEPROM info, use the ethtool -m
command. For example:
$ ethtool -m sw1p7
Identifier : 0x0d (QSFP+)
Extended identifier : 0x00
Extended identifier description : 1.5W max. Power consumption
Extended identifier description : No CDR in TX, No CDR in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x23 (No separable connector)
Transceiver codes : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 40G Ethernet: 40G Base-CR4
Transceiver type : 100G Ethernet: 100G Base-CR4 or 25G Base-CR CA-L
Encoding : 0x00 (unspecified)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 0km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 1m
Transmitter technology : 0xa0 (Copper cable unequalized)
Attenuation at 2.5GHz : 2db
Attenuation at 5.0GHz : 3db
Attenuation at 7.0GHz : 4db
Attenuation at 12.9GHz : 7db
Vendor name : Mellanox
Vendor OUI : 00:02:c9
Vendor PN : MCP1600-E00A
Vendor rev : A2
Vendor SN : MT1526VS05742
Revision Compliance : SFF-8636 Rev 1.5
Module temperature : 0.00 degrees C / 32.00 degrees F
Module voltage : 0.0000 V
- SN2700/SN2400 Hardware User Manual (PDF)
- man ethtool
- Writing
udev
rules - man ip
- man devlink
- man devlink-dev
- man devlink-port
- man ifstat
Installation
System Maintenance
Network Interface Configuration
- Switch Port Configuration
- Persistent Configuration
- Quality of Service
- Queues Management
- Port Mirroring
- ACLs
- OVS
- Resource Management
Layer 2
Network Virtualization
Layer 3
- Static Routing
- Virtual Routing and Forwarding (VRF)
- Tunneling
- Multicast Routing
- Virtual Router Redundancy Protocol (VRRP)
Debugging