Category: Troubleshooting

Cisco DNA Upgrade Issues – Application Update Stuck

After initiating Cisco DNA Appliance version 2.1.2.4 and starting an upgrade towards 2.2.2.8 in order to get to 2.2.3.5 I got a strange issue where the appliance system update went fine but the switch to 2.2.2.8 was disabled until Application Updates did not finish.

The real issue here was that Application Updates of Cloud Connectivity – Data Hub got stuck on 12% for 4 days without timing out or finishing. Tried several appliance reboots from CIMC which didn’t help.

Below are the steps that helped sort out Application Updates issues with container pods being stuck at the point of pooling images from the cloud and creation on the Cisco DNA appliance. Of course, this was a long troubleshoot and here I listed only maglev CLI commands that were useful.

I SSH into the appliance and started digging the maglev CLI tool.

NSX-T Edge Transport Node Packet Capture

NSX-T v3.0.1 and v3.1.3 were used to try the stuff described below

As always with network engineers, even when working with SDN/SSDC solutions, sooner or later you will be asked to troubleshoot connectivity across your hops. And if working with VMware NSX-T platform, your next-hop for the North-South Datacenter traffic will almost always be NSX-T EDGE Transport Node VM. It will be really useful then to be able to get some packet traces out of that box in order to troubleshoot the traffic issues in detail.

One of the examples would be simple routing or some sort of Loadbalancing traffic that seems not to reach the backend hosts behind NSX-T edge.

On the NSX-T EDGE VM it’s fairly simple to capture traffic directly. It’s possible to get the output out on the console or to save it to the file on the EDGE and then pull it out with SCP.

If you have an EDGE Cluster, normally build out of 2 VMs, first, you need to see on which node the T0 or T1 router you want the traffic to be captured is active.

Let’s say we want to capture traffic on “T0-router” shown in the image below. You can go to that T0 router from the UI and check the High Availability Mode output:

NSX-T EDGE VM Active T0

Troubleshooting EIGRP Neighbor Relationships

EIGRP internals and getting hands dirty in debugging routing adjacency and solving EIGRP neighboring issues.

What is sequence TLV and Conditional Receive CR-mode and CR flag

Couple of days ago I got a strange network behavior in my CCIE lab. Something was wrong between a router and L3 switch connection and there was EIGRP neighbor relationship reset every few minutes. It was happening all the time so I decided to debug a little.

It was looking something like this and I was very confused about it:

*Mar 1 01:00:32.135: EIGRP: Received Sequence TLV from 155.1.1.2
*Mar 1 01:00:32.135: 155.1.1.1
*Mar 1 01:00:32.139: address matched
*Mar 1 01:00:32.139: clearing CR-mode
*Mar 1 01:00:32.139: EIGRP: Received CR sequence TLV from 155.1.1.2, sequence 5
*Mar 1 01:00:32.139: EIGRP: Received UPDATE on FastEthernet0/0 nbr 155.1.1.2
*Mar 1 01:00:32.139: AS 1, Flags 0xA, Seq 5/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/1, not in CR-mode, packet discarded
………………………….
*Mar 1 01:10:00.123: EIGRP: FastEthernet0/0 multicast flow blocking cleared

The thing that was happening here is basically just some EIGRP internals doing their job. To be precise this was reliable routing information delivery over both multicast and unicast doing his reliable update delivery.

Normal EIGRP synchronization is done using multicast.

If we have two EIGRP routers that are trying to make initial sync and become neighbors they will intentionally see each other as “laggard” in the beginning. The term laggard is important here as it symbolize a neighbor to witch we are sending EIGRP updates in separate unicast communication.

Why?

If one EIGRP router is sending updates to, let’s say, 5 other neighbors. It will send that update to address 224.0.0.10 and it will include the update sequence number inside. Let’s say Seq=25.

When it sends the update Seq=25 to the network segment it will get ready to send next update with Seq=26 and wait for the acknowledgement of sequence 25 update. The problem is that the router will put newly prepared update Seq=26 onto the transmission lists for all 5 neighbors and it will not send it out to anybody until he acknowledges sequence 25 update from all 5 neighbors. That means that if one of 5 routers does not send back the acknowledgement for Seq=25 update our router will not continue sending multicast update Seq=26 to anybody and he will lose the neighbors after hold timer expires.

Packet capture in Cisco IOS

Packet capture on IOSThis will be a brief article but a good one. It will save you some walking time to server room. I have the need to capture traffic on the switch or on the router several times every week. That action needed from me to be physically near the switch and to configure SPAN port so that I can connect to the switch with my machine and capture some packets with wireshark. Okay, I could use RSPAN to get captured packets to the closest switch but this altogether is not good enough. It’s too time consuming for short packets captures in troubleshooting sessions.

Recently in my CCIE study I came across the info that Cisco IOS is able to capture packets on the device itself and on more interfaces in once. You can later export that capture to your PC and analyze it with wireshark.

You can do it like this

When I connect to device with telnet or SSH I don’t see logging?

Cisco device will not send log messages to your terminal session by default. That means that you will see log messages in putty only if you are connected to console port but if you connect with telnet or SSH you will not get log messages. Here’s how to resolve this:

There are three types of logging: Console, Monitor and Buffer.
Console logging is seen if you connect to console port directly. Monitor logging is what you will se on your screen when you are connected to CLi in some other way. And the last is Buffer logging that means all the logs that are going into buffer memory to be viewed later in time. Every of the logging type has its own setting on which loggs will be registered.
There are this types of looging levels with their severity numbers where greather number means smaller severity case, from severity 1 to 7: