Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202405][port] | Link is not up within 40 seconds after port flap #20381

Open
nazariig opened this issue Sep 30, 2024 · 1 comment
Open

[202405][port] | Link is not up within 40 seconds after port flap #20381

nazariig opened this issue Sep 30, 2024 · 1 comment
Assignees
Labels
Bug 🐛 MSFT Triaged this issue has been triaged

Comments

@nazariig
Copy link
Collaborator

Description

The bug is caused by SWSS due to busy state while processing large number of NextHopGroup/Route objects.

Due to the fact that swss is a single threaded application, the link notifications by syncd might not be processed on time by swss, which eventually caused a link delay in 30 sec. The issue can be observed during CMIS interfaces bulk toggle stress test.

This is CMIS timing issue caused by existing SONiC architecture.

Port startup flow:

system log:

2024 Sep  6 13:42:14.975387 sonic NOTICE PORT-ACTION: => Ethernet144 START
2024 Sep  6 13:42:18.062154 sonic NOTICE swss#orchagent: :- doPortTask: Set port Ethernet144 admin status to up
2024 Sep  6 13:42:18.116118 sonic NOTICE swss#orchagent: :- setHostTxReady: Setting host_tx_ready status = false, alias = Ethernet144, port_id = 0x100000000001f
2024 Sep  6 13:42:18.117390 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'up', 'mtu': '9100', 'host_tx_ready': 'false', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}
2024 Sep  6 13:42:18.218562 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144 Forcing Tx laser OFF

sairedis log:

2024-09-06.10:42:18.050615|s|SAI_OBJECT_TYPE_PORT:oid:0x100000000001f|SAI_PORT_ATTR_ADMIN_STATE=true
2024-09-06.10:42:18.117167|n|port_host_tx_ready|[{"host_tx_ready_status":"SAI_PORT_HOST_TX_READY_STATUS_READY","port_id":"oid:0x100000000001f","switch_id":"oid:0x21000000000000"}]|

system log:

2024 Sep  6 13:42:50.093799 sonic NOTICE swss#orchagent: :- setHostTxReady: Setting host_tx_ready status = true, alias = Ethernet144, port_id = 0x100000000001f
2024 Sep  6 13:42:50.094425 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'up', 'mtu': '9100', 'host_tx_ready': 'true', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}
2024 Sep  6 13:43:02.678329 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: READY
2024 Sep  6 13:43:20.050878 sonic NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet144 oper state set from down to up

sairedis log:

2024-09-06.10:43:20.050269|n|port_state_change|[{"port_id":"oid:0x100000000001f","port_state":"SAI_PORT_OPER_STATUS_UP"}]|

Time diff: 2024 Sep 6 13:43:20.050878 - 2024 Sep 6 13:42:18.062154 ~ 62 sec

Next Hop Group & Route configuration:

system log:

2024 Sep  6 13:42:20.146180 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP_GROUP.NOTICE] ./src/mlnx_sai_nexthopgroup.c[1346]- mlnx_create_next_hop_group: Create NEXT_HOP_GROUP: #0 TYPE=DYNAMIC_ORDERED_ECMP
2024 Sep  6 13:42:20.146180 sonic NOTICE syncd#SDK: [SAI_NEXT_HOP_GROUP.NOTICE] ./src/mlnx_sai_nexthopgroup.c[1381]- mlnx_create_next_hop_group: Created NEXT_HOP_GROUP [OID:0x4800000005] [ID:1]
2024 Sep  6 13:42:20.155723 sonic NOTICE swss#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.1@PortChannel102,10.0.0.5@PortChannel105,10.0.0.13@PortChannel1011

...

2024 Sep  6 13:42:50.063670 sonic NOTICE swss#orchagent: :- addRoutePost: Update Nexthop Group 10.0.0.1@PortChannel102,10.0.0.5@PortChannel105
2024 Sep  6 13:42:50.064166 sonic NOTICE swss#orchagent: :- removeNextHopGroup: Delete next hop group 10.0.0.1@PortChannel102,10.0.0.5@PortChannel105

sairedis log:

2024-09-06.10:42:20.140774|c|SAI_OBJECT_TYPE_NEXT_HOP_GROUP
2024-09-06.10:42:20.164004|C|SAI_OBJECT_TYPE_NEXT_HOP_GROUP_MEMBER
2024-09-06.10:42:20.239426|c|SAI_OBJECT_TYPE_NEXT_HOP_GROUP
2024-09-06.10:42:20.312432|C|SAI_OBJECT_TYPE_NEXT_HOP_GROUP_MEMBER
2024-09-06.10:42:20.462982|c|SAI_OBJECT_TYPE_NEXT_HOP_GROUP
2024-09-06.10:42:20.464480|C|SAI_OBJECT_TYPE_NEXT_HOP_GROUP_MEMBER
2024-09-06.10:42:20.653645|R|SAI_OBJECT_TYPE_ROUTE_ENTRY
2024-09-06.10:42:20.681447|C|SAI_OBJECT_TYPE_ROUTE_ENTRY

...

2024-09-06.10:42:50.064071|R|SAI_OBJECT_TYPE_NEXT_HOP_GROUP_MEMBER
2024-09-06.10:42:50.065928|r|SAI_OBJECT_TYPE_NEXT_HOP_GROUP

Time diff: 2024-09-06.10:42:50.065928 - 2024-09-06.10:42:20.140774 ~ 30 sec

Port standalone flows:

Shutdown:

2024 Sep  6 12:52:00.664847 sonic WARNING pmon#xcvrd: message repeated 4 times: [ $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'host_tx_ready': 'true', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}]
2024 Sep  6 12:52:00.664847 sonic NOTICE swss#buffermgrd: :- doSpeedUpdateTask: Removing PG Ethernet144|3-4 from port Ethernet144 which is administrative down
2024 Sep  6 12:52:00.664847 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'admin_status': 'down', 'alias': 'etp19', 'autoneg': 'off', 'description': 'ARISTA05T2:Ethernet2', 'index': '19', 'lanes': '144,145,146,147,148,149,150,151', 'mtu': '9100', 'pfc_asym': 'off', 'speed': '400000', 'subport': '0', 'tpid': '0x8100'}
2024 Sep  6 12:52:00.664847 sonic WARNING pmon#xcvrd: *** ('Ethernet144', 'CONFIG_DB', 'PORT') handle_port_update_event() fvp {'admin_status': 'down', 'alias': 'etp19', 'autoneg': 'off', 'description': 'ARISTA05T2:Ethernet2', 'index': '19', 'lanes': '144,145,146,147,148,149,150,151', 'mtu': '9100', 'pfc_asym': 'off', 'speed': '400000', 'subport': '0', 'tpid': '0x8100', 'port_name': 'Ethernet144', 'asic_id': 0, 'op': 'SET'}
2024 Sep  6 12:52:00.666954 sonic NOTICE swss#orchagent: :- processPriorityGroup: Remove buffer PG Ethernet144:3-4
2024 Sep  6 12:52:00.668670 sonic NOTICE swss#portmgrd: :- doTask: Configure Ethernet144 MTU to 9100
2024 Sep  6 12:52:00.674532 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet144 admin:0 oper:0 addr:1c:34:da:bf:b7:00 ifindex:110 master:86 type:sx_netdev
2024 Sep  6 12:52:00.676439 sonic NOTICE swss#portsyncd: :- onMsg: Publish Ethernet144(ok:down) to state db
2024 Sep  6 12:52:00.677792 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet144 admin:0 oper:0 addr:1c:34:da:bf:b7:00 ifindex:110 master:86 type:sx_netdev
2024 Sep  6 12:52:00.679322 sonic NOTICE swss#portsyncd: :- onMsg: Publish Ethernet144(ok:down) to state db
2024 Sep  6 12:52:00.682672 sonic NOTICE swss#portmgrd: :- doTask: Configure Ethernet144 admin status to down
2024 Sep  6 12:52:00.686374 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=INSERTED, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:52:00.687453 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet144 admin:0 oper:0 addr:1c:34:da:bf:b7:00 ifindex:110 master:86 type:sx_netdev
2024 Sep  6 12:52:00.688537 sonic NOTICE swss#portsyncd: :- onMsg: Publish Ethernet144(ok:down) to state db
2024 Sep  6 12:52:00.703081 sonic NOTICE swss#orchagent: :- setDistributionOnLagMember: Disable distribution on LAG member Ethernet144
2024 Sep  6 12:52:00.704441 sonic NOTICE swss#orchagent: :- setCollectionOnLagMember: Disable collection on LAG member Ethernet144
2024 Sep  6 12:52:00.704946 sonic NOTICE swss#orchagent: :- updatePortOperStatus: Port PortChannel108 oper state set from up to down
2024 Sep  6 12:52:00.705784 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting appl=1
2024 Sep  6 12:52:00.724955 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting host_lanemask=0xff
2024 Sep  6 12:52:00.785819 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting media_lanemask=0xf
2024 Sep  6 12:52:00.785819 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144 Forcing Tx laser OFF
2024 Sep  6 12:52:00.790379 sonic NOTICE swss#orchagent: :- doPortTask: Set port Ethernet144 admin status to down
2024 Sep  6 12:52:00.844970 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'down', 'mtu': '9100', 'host_tx_ready': 'true', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}
2024 Sep  6 12:52:01.117556 sonic NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet144 oper state set from up to down
2024 Sep  6 12:52:01.124283 sonic NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status DOWN to host interface Ethernet144
2024 Sep  6 12:52:01.124283 sonic NOTICE swss#orchagent: :- publish: EVENT_PUBLISHED: {"sonic-events-swss:if-state":{"ifname":"Ethernet144","status":"down","timestamp":"2024-09-06T09:52:01.124194Z"}}
2024 Sep  6 12:52:03.138877 sonic NOTICE swss#orchagent: :- setHostTxReady: Setting host_tx_ready status = false, alias = Ethernet144, port_id = 0x100000000001f
2024 Sep  6 12:52:03.139522 sonic WARNING pmon#xcvrd: message repeated 2 times: [ $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'down', 'mtu': '9100', 'host_tx_ready': 'true', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}]
2024 Sep  6 12:52:03.139522 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'down', 'mtu': '9100', 'host_tx_ready': 'false', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}
2024 Sep  6 12:52:03.139593 sonic WARNING pmon#xcvrd: *** ('Ethernet144', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet144', 'asic_id': 0, 'op': 'SET'}
2024 Sep  6 12:52:03.266412 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=INSERTED, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:52:03.286542 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting appl=1
2024 Sep  6 12:52:03.305046 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting host_lanemask=0xff
2024 Sep  6 12:52:03.345213 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting media_lanemask=0xf
2024 Sep  6 12:52:03.345213 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144 Forcing Tx laser OFF

Startup:

2024 Sep  6 12:53:11.852801 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'admin_status': 'up', 'alias': 'etp19', 'autoneg': 'off', 'description': 'ARISTA05T2:Ethernet2', 'index': '19', 'lanes': '144,145,146,147,148,149,150,151', 'mtu': '9100', 'pfc_asym': 'off', 'speed': '400000', 'subport': '0', 'tpid': '0x8100'}
2024 Sep  6 12:53:11.852801 sonic WARNING pmon#xcvrd: *** ('Ethernet144', 'CONFIG_DB', 'PORT') handle_port_update_event() fvp {'admin_status': 'up', 'alias': 'etp19', 'autoneg': 'off', 'description': 'ARISTA05T2:Ethernet2', 'index': '19', 'lanes': '144,145,146,147,148,149,150,151', 'mtu': '9100', 'pfc_asym': 'off', 'speed': '400000', 'subport': '0', 'tpid': '0x8100', 'port_name': 'Ethernet144', 'asic_id': 0, 'op': 'SET'}
2024 Sep  6 12:53:11.856779 sonic NOTICE swss#orchagent: :- processPriorityGroup: Set buffer PG Ethernet144:3-4 to BUFFER_PROFILE_TABLE:pg_lossless_400000_300m_profile
2024 Sep  6 12:53:11.857398 sonic NOTICE swss#portmgrd: :- doTask: Configure Ethernet144 MTU to 9100
2024 Sep  6 12:53:11.861449 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet144 admin:1 oper:0 addr:1c:34:da:bf:b7:00 ifindex:110 master:86 type:sx_netdev
2024 Sep  6 12:53:11.862933 sonic NOTICE swss#portmgrd: :- doTask: Configure Ethernet144 admin status to up
2024 Sep  6 12:53:11.862933 sonic NOTICE swss#portsyncd: :- onMsg: Publish Ethernet144(ok:down) to state db
2024 Sep  6 12:53:11.864060 sonic INFO kernel: [ 2612.936248] 8021q: adding VLAN 0 to HW filter on device Ethernet144
2024 Sep  6 12:53:11.874085 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=INSERTED, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:53:11.892492 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting appl=1
2024 Sep  6 12:53:11.910885 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting host_lanemask=0xff
2024 Sep  6 12:53:11.950593 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting media_lanemask=0xf
2024 Sep  6 12:53:11.950593 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144 Forcing Tx laser OFF
2024 Sep  6 12:53:11.957779 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'up', 'mtu': '9100', 'host_tx_ready': 'false', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}
2024 Sep  6 12:53:11.998239 sonic NOTICE swss#orchagent: :- doPortTask: Set port Ethernet144 admin status to up
2024 Sep  6 12:53:12.061394 sonic NOTICE swss#orchagent: :- setHostTxReady: Setting host_tx_ready status = true, alias = Ethernet144, port_id = 0x100000000001f
2024 Sep  6 12:53:12.062106 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'up', 'mtu': '9100', 'host_tx_ready': 'true', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}
2024 Sep  6 12:53:12.062255 sonic WARNING pmon#xcvrd: *** ('Ethernet144', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'true', 'index': '-1', 'port_name': 'Ethernet144', 'asic_id': 0, 'op': 'SET'}
2024 Sep  6 12:53:12.078053 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=INSERTED, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:53:12.097342 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting appl=1
2024 Sep  6 12:53:12.116415 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting host_lanemask=0xff
2024 Sep  6 12:53:12.157667 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Setting media_lanemask=0xf
2024 Sep  6 12:53:12.222790 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: force Datapath reinit
2024 Sep  6 12:53:13.240006 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=DP_DEINIT, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:53:14.269950 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: DpDeinit duration 0.05 secs, modulePwrUp duration 5.0 secs
2024 Sep  6 12:53:15.286650 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=AP_CONFIGURED, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:53:15.308016 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Apply Optics SI found for Vendor: Mellanox          PN: MMS1V00-WM       lane speed: 50G
2024 Sep  6 12:53:16.433306 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=DP_INIT, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:53:16.448088 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: DpInit duration 10.0 secs
2024 Sep  6 12:53:17.468250 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=DP_TXON, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:53:22.993946 sonic NOTICE pmon#xcvrd: message repeated 5 times: [ CMIS: Ethernet144: 400G, lanemask=0xff, state=DP_TXON, appl 1 host_lane_count 8 retries=0]
2024 Sep  6 12:53:23.009388 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=DP_TXON, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:53:23.019059 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: Turning ON tx power
2024 Sep  6 12:53:24.036077 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: 400G, lanemask=0xff, state=DP_ACTIVATION, appl 1 host_lane_count 8 retries=0
2024 Sep  6 12:53:24.038387 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: READY
2024 Sep  6 12:53:24.063382 sonic NOTICE pmon#xcvrd: CMIS: Ethernet144: updated TRANSCEIVER_INFO_TABLE [('active_apsel_hostlane1', '1'), ('active_apsel_hostlane2', '1'), ('active_apsel_hostlane3', '1'), ('active_apsel_hostlane4', '1'), ('active_apsel_hostlane5', '1'), ('active_apsel_hostlane6', '1'), ('active_apsel_hostlane7', '1'), ('active_apsel_hostlane8', '1'), ('host_lane_count', '8'), ('media_lane_count', '4')]
2024 Sep  6 12:53:24.064336 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:TRANSCEIVER_INFO fvp {'is_replaceable': 'True', 'ext_rateselect_compliance': 'N/A', 'supported_min_tx_power': 'N/A', 'active_apsel_hostlane3': '1', 'host_electrical_interface': '400GAUI-8 C2M (Annex 120E)', 'active_apsel_hostlane1': '1', 'serial': 'MT2235JT00160   ', 'vendor_oui': '00-02-c9', 'active_apsel_hostlane7': '1', 'connector': 'MPO 1x12', 'active_apsel_hostlane6': '1', 'media_lane_count': '4', 'hardware_rev': '1.0', 'active_apsel_hostlane4': '1', 'supported_max_laser_freq': 'N/A', 'ext_identifier': 'Power Class 6 (12.0W Max)', 'dom_capability': 'N/A', 'cable_type': 'Length Cable Assembly(m)', 'media_interface_code': '400GBASE-DR4 (Cl 124)', 'nominal_bit_rate': '0', 'supported_min_laser_freq': 'N/A', 'supported_max_tx_power': 'N/A', 'encoding': 'N/A', 'active_apsel_hostlane2': '1', 'active_apsel_hostlane8': '1', 'media_lane_assignment_option': '1', 'media_interface_technology': '1310 nm EML', 'application_advertisement': "{1: {'host_electrical_interface_id': '400GAUI-8 C2M (Annex 120E)', 'module_media_interface_id': '400GBASE-DR4 (Cl 124)', 'media_lane_count': 4, 'host_lane_count': 8, 'host_lane_assignment_options': 1, 'media_lane_assignment_options': 1}, 2: {'host_electrical_interface_id': '100GAUI-2 C2M (Annex 135G)', 'module_media_interface_id': '100GBASE-DR (Cl 140)', 'media_lane_count': 1, 'host_lane_count': 2, 'host_lane_assignment_options': 85, 'media_lane_assignment_options': 15}}", 'cable_length': '0.0', 'host_lane_assignment_option': '1', 'model': 'MMS1V00-WM      ', 'vendor_rev': 'A3', 'type': 'QSFP-DD Double Density 8X Pluggable Transceiver', 'manufacturer': 'Mellanox        ', 'cmis_rev': '4.0', 'active_apsel_hostlane5': '1', 'host_lane_count': '8', 'vendor_date': '2022-08-26   ', 'specification_compliance': 'sm_media_interface'}
2024 Sep  6 12:53:42.404133 sonic INFO kernel: [ 2643.478597] IPv6: ADDRCONF(NETDEV_CHANGE): Ethernet144: link becomes ready
2024 Sep  6 12:53:42.404368 sonic NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet144 oper state set from down to up
2024 Sep  6 12:53:42.408352 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet144 admin:1 oper:1 addr:1c:34:da:bf:b7:00 ifindex:110 master:86 type:sx_netdev
2024 Sep  6 12:53:42.408352 sonic NOTICE swss#portsyncd: :- onMsg: Publish Ethernet144(ok:up) to state db
2024 Sep  6 12:53:42.408352 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet144 admin:1 oper:1 addr:1c:34:da:bf:b7:00 ifindex:110 master:86 type:sx_netdev
2024 Sep  6 12:53:42.409300 sonic NOTICE swss#portsyncd: :- onMsg: Publish Ethernet144(ok:up) to state db
2024 Sep  6 12:53:42.410335 sonic NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status UP to host interface Ethernet144
2024 Sep  6 12:53:42.410432 sonic NOTICE swss#orchagent: :- publish: EVENT_PUBLISHED: {"sonic-events-swss:if-state":{"ifname":"Ethernet144","status":"up","timestamp":"2024-09-06T09:53:42.410249Z"}}
2024 Sep  6 12:53:42.411624 sonic WARNING pmon#xcvrd: $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'host_tx_ready': 'true', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}
2024 Sep  6 12:53:42.414023 sonic NOTICE swss#orchagent: :- doTask: Ethernet144 oper speed is 400000
2024 Sep  6 12:53:44.431706 sonic NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet144 admin:1 oper:1 addr:1c:34:da:bf:b7:00 ifindex:110 master:86 type:sx_netdev
2024 Sep  6 12:53:44.432243 sonic NOTICE swss#portsyncd: :- onMsg: Publish Ethernet144(ok:up) to state db
2024 Sep  6 12:53:44.437548 sonic NOTICE swss#orchagent: :- setCollectionOnLagMember: Enable collection on LAG member Ethernet144
2024 Sep  6 12:53:44.440077 sonic NOTICE swss#orchagent: :- setDistributionOnLagMember: Enable distribution on LAG member Ethernet144
2024 Sep  6 12:53:44.441751 sonic WARNING pmon#xcvrd: message repeated 4 times: [ $$$ Ethernet144 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'host_tx_ready': 'true', 'supported_speeds': '400000,200000,100000,50000,40000,25000,10000,1000', 'supported_fecs': 'none,rs,fc,auto', 'speed': '400000', 'fec': 'rs'}]
2024 Sep  6 12:53:44.450544 sonic NOTICE swss#orchagent: :- updatePortOperStatus: Port PortChannel108 oper state set from down to up

Time diff: 2024 Sep 6 12:53:42.404368 - 2024 Sep 6 12:53:11.862933 ~ 31 sec

Steps to reproduce the issue:

Automation:

  1. Run test
sonic-mgmt/tests/hash/test_generic_hash.py
  1. Observe failure between
2024 Aug 31 03:04:24.282117 sonic INFO start-LogAnalyzer-test_lag_member_flap[CRC-SRC_IP-ipv6-None-None].2024-08-31-00:04:23
2024 Aug 31 03:09:32.207163 sonic INFO end-LogAnalyzer-test_lag_member_flap[CRC-SRC_IP-ipv6-None-None].2024-08-31-00:04:23

Manual:

  1. Deploy t1-32-lag topo
  2. Run script
root@sonic:/home/admin# cat test.sh
#!/bin/bash

sudo config interface shutdown Ethernet112
sudo config interface startup Ethernet112
sudo config interface shutdown Ethernet120
sudo config interface startup Ethernet120
sudo config interface shutdown Ethernet144
sudo config interface startup Ethernet144
sudo config interface shutdown Ethernet160
sudo config interface startup Ethernet160
sudo config interface shutdown Ethernet112
sudo config interface startup Ethernet112
sudo config interface shutdown Ethernet120
sudo config interface startup Ethernet120
sudo config interface shutdown Ethernet144
sudo config interface startup Ethernet144
sudo config interface shutdown Ethernet160
sudo config interface startup Ethernet160
sudo config interface shutdown Ethernet112
sudo config interface startup Ethernet112
sudo config interface shutdown Ethernet120
sudo config interface startup Ethernet120
logger -t PORT-ACTION -p NOTICE "=> Ethernet144 STOP"
sudo config interface shutdown Ethernet144
logger -t PORT-ACTION -p NOTICE "=> Ethernet144 START"
sudo config interface startup Ethernet144
sudo config interface shutdown Ethernet160
sudo config interface startup Ethernet160

Describe the results you received:

hash/test_generic_hash.py:493: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

duthost = <MultiAsicSonicHost sonic>
interfaces = ['Ethernet112', 'Ethernet120', 'Ethernet144', 'Ethernet160']
portchannels = dict_keys(['PortChannel102', 'PortChannel105', 'PortChannel108', 'PortChannel1011'])
times = 3

    def flap_interfaces(duthost, interfaces, portchannels=[], times=3):
        """
        Flap the specified interfaces. Assert when any of the interfaces is not up after the flapping.
        Args:
            duthost (AnsibleHost): Device Under Test (DUT)
            interfaces: a list of interfaces to be flapped
            portchannels: a list of portchannels which need to check the status after the flapping
            times: flap times, every interface will be shutdown/startup for the value number times
        """
        logger.info(f"Flap the interfaces {interfaces} for {times} times.")
        # Flap the interface
        for _ in range(times):
            for interface in interfaces:
                shutdown_interface(duthost, interface)
                startup_interface(duthost, interface)
        # Check the interfaces status are up
        for interface in interfaces:
>           pytest_assert(wait_until(30, 2, 0, duthost.is_interface_status_up, interface),
                          f"The interface {interface} is not up after the flapping.")
E           Failed: The interface Ethernet144 is not up after the flapping.

_          = 2
duthost    = <MultiAsicSonicHost sonic>
interface  = 'Ethernet144'
interfaces = ['Ethernet112', 'Ethernet120', 'Ethernet144', 'Ethernet160']
portchannels = dict_keys(['PortChannel102', 'PortChannel105', 'PortChannel108', 'PortChannel1011'])
times      = 3

hash/generic_hash_helper.py:334: Failed

Describe the results you expected:

No errors are expected

Output of show version:

  • N/A

Output of show techsupport:

  • N/A

Additional information you deem important (e.g. issue happens only occasionally):

Platform:

Platform: x86_64-mlnx_msn4700-r0
HwSKU: Mellanox-SN4700-O32
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2119X03331
Model Number: MSN4700-WS2F_QP1
Hardware Revision: A2
@prsunny
Copy link
Contributor

prsunny commented Oct 9, 2024

Please check with the fix PR - sonic-net/sonic-swss#3269. Please note that the fix is in master and not 202405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 MSFT Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

3 participants