NSX-T v3.0.1 and v3.1.3 were used to try the stuff described below
As always with network engineers, even when working with SDN/SSDC solutions, sooner or later you will be asked to troubleshoot connectivity across your hops. And if working with VMware NSX-T platform, your next-hop for the North-South Datacenter traffic will almost always be NSX-T EDGE Transport Node VM. It will be really useful then to be able to get some packet traces out of that box in order to troubleshoot the traffic issues in detail.
One of the examples would be simple routing or some sort of Loadbalancing traffic that seems not to reach the backend hosts behind NSX-T edge.
On the NSX-T EDGE VM it’s fairly simple to capture traffic directly. It’s possible to get the output out on the console or to save it to the file on the EDGE and then pull it out with SCP.
If you have an EDGE Cluster, normally build out of 2 VMs, first, you need to see on which node the T0 or T1 router you want the traffic to be captured is active.
Let’s say we want to capture traffic on “T0-router” shown in the image below. You can go to that T0 router from the UI and check the High Availability Mode output:
We usually use Policy mode for NSX-T configuration as it’s the prefered and suggested way. Unfortunately, I still didn’t find a way to check which EDGE VM is the T0 instance active on, so I switched to Manager mode and got the status there, as shown above. For T1 is easy from Policy view. As shown below:
For the T1 instance, open the status details and select the Auto Allocated EDGE option to get the info about the EDGE VM cluster member on which that T1 is active:
SSH to EDGE VM with active T0 in question and list the VRFs that EDGE uses internally for T0 and T1 routers:
It’s done with get logical-router command and you need to do it so you can find the VRF that your T0-router uses for forwarding traffic.
edge2> get logical-router Thu Sep 30 2021 UTC 10:06:25.453 Logical Router UUID VRF LR-ID Name Type Ports Neighbors 736a80e3-23f6-5a2d-81d6-bbef33386666 0 0 TUNNEL 4 6/5000 43bfd5fc-4428-40e1-a3b5-fb6368453330 1 2067 DR-pks-1da5c83b-3309-46da-bae2- DISTRIBUTED_ROUTER_TIER1 11 15/50000 eeba015c-03a9-45ab-a056-3313332ad0c2 2 1032 DR-pks-10ab3fff-3386-4935-add7- DISTRIBUTED_ROUTER_TIER1 12 28/50000 1a70b114-4d36-4395-8ed1-675ef333916c 3 1 T0-router DISTRIBUTED_ROUTER_TIER0 8 9/50000 9ea09b8a-5f5c-402f-8d4a-53cc33396832 4 1029 SR-pks-39429a63-3319-4e5f-bd9b- SERVICE_ROUTER_TIER1 5 2/50000 b67af562-af30-4d03-9fe9-5bff3336eff9 5 1027 SR-T0-router SERVICE_ROUTER_TIER0 5 1/50000 b90b4e6e-55a9-41da-b34f-70014333e892 7 5 DR-pks-39423333-7519-4e5f-bd9b- DISTRIBUTED_ROUTER_TIER1 10 4/50000 56cb0949-339f-4074-8de6-c9333ab8c6a1 8 2068 SR-pks-1da3333b-9b09-46da-bae2- SERVICE_ROUTER_TIER1 5 2/50000 96e0d296-4ecd-40e2-ac21-9582333f008e 9 3 SR-T1-MGMT SERVICE_ROUTER_TIER1 5 2/50000 c723835b-2747-497d-ba0c-b287ac333663 10 1033 SR-pks-10333fff-0386-4935-add7- SERVICE_ROUTER_TIER1 5 2/50000 1cd5df1b-2b5d-4b43-8780-c73338fce5d8 11 2 DR-T1-MGMT DISTRIBUTED_ROUTER_TIER1 6 8/50000
You can then see that your T0-router is in the fourth line and that it uses VRF3. Enter VRF3 to get access to T0-router interfaces:
edge2> vrf 3 edge2(vrf)>
Inside T0/T1 router that you and interested in list the interfaces so you can see the ID of all of them
edge2(vrf)> get interfaces
All interfaces of that T0-router will be listed with all the details for each. Interface ID is the thing you need to find for a specific interface you want to capture traffic on. It looks like this:
Interface : 054b184b-c30a-4aba-b4f4-ec3c8999ad81
Start the capture carefully by using expressions to filter traffic (expression) you are interested in or limit the output to a specific number of packets to catch (count). Be careful, as always, you could crash the VM or make it unresponsive if it forwards huge amounts of traffic and you try to capture all of that and show it on the terminal. Or maybe do the same thing by mistake by using wrongly written filter expression. Use the count to be sure at first to capture only a few thousand packets and start from there.
Copy the interface ID and exit the T0-router VRF submenu:
See if some capture sessions are already defined. You will want to use a new one or clear existing ones to start a new capture.
edge2> get capture sessions Thu Sep 30 2021 UTC 10:25:24.052 Packet Capture Session ID : 0 PORTS :  Packet Capture Session ID : 1 PORTS :  Packet Capture Session ID : 2 PORTS :  Packet Capture Session ID : 3 PORTS :  edge2>
I don’t have any, but if you do you could delete it, or just start and use an empty one.
edge2>del capture session 1
To define a new capture session and interface and direction of traffic that you want to trace (dual for both directions):
edge2>set capture session 1 interface 054b184b-c30a-4aba-b4f4-ec3c8999ad81 direction dual
Start the capture with expression defined to filter interesting source/desc host/port etc and count the number of packets you want to capture.
You can also define to save the capture to a .pcap file with a subcommand file but then you will not see the capture on the console:
edge2>set capture session 1 count 100 expression host 10.10.10.10
Of course, if your T0 router is configured as an active-active routing engine, then for the traffic capture of everything on the specific interface you should do the same capture on both EDGE VMs.