In the 1st Part of this series, I’ve described the most common steps that you should follow to troubleshoot a total lack of communication between a Layer 2 device (Cisco switch) and an end user connected device. As I promised here is the second part, in which I’ll try to show you what you can check when you have no problem with connection, but still you encounter a degradation in service. By this degraded service, I understand a scenario when you have packet loss for example, or intermitent connection which will affect communication and more than sure will make user users not very happy.

We will stick with the same scenario when a end user device is connected to a Cisco switch. Remember that until now, we just troubleshoot at the Layer 1 and Layer 2. Today we will stick in the same area, so nothing directly related to IP, routing protocols or complex networking environment.

Scenario 2: You have an end device connected to a switch and you have degraded communication

a) Check for errors on the interface:


In this example there is no errors, but if you find something there, you may want to keep an eye on this port. Try to issue the above command couple of times to see if the errors are increasing in real time, as this is the worst case possible and you should take action immediately. Error on the interface can be caused by faulty interface on the switch or on the other end, ethernet cable issue or wrong configuration

b) Check the interface queue and drop packets


Interface queue is very important and you should check it during your troubleshooting process. With the above command you can see how many packets are in the input / output queue, which is the transmit and receive rate and very important if you have packets dropped from input and output queue. Usually this happens when there is a lot of load on the switch and it cannot process as quick as it’s needed all the packets. This lead us to the next step.

c) Check the CPU load on the switch


The command output is longer but most interesting for this example are the first 2 rows which show load in 60 seconds and in 60 minutes. If you have there peaks up to 100, then it’s bad and the device is having some issues that need to be fixed.

d) Identify what process is keeping the CPU busy


Most of the time, this is easy to read and to see what process is taking all your CPU power. When you see there Fifo Error Detection with 100% than you have to think that maybe there is something wrong with the queue on one of the interfaces and try to find which one is having problem. This is not straighforward and you have to check a lot of things, but can be helpful. To be honest, I see a lot of engineers just reloading the device and then problem is solved (if it was due to a hardware issue and not a configuration mistake).

e) Check for memory issue on the switch


Again, if you run out of memory, bad things can happend to your device and as well to the communication with device connected to the switch. Reloading of the device solved about 90% of this kind of problems. I don’t recommend just unplug the power cable as soon as you see a memory problem. First have a look, maybe there is something you can fix without reloading the device.

f) Check for problems with storm-control implementation


In one of previous posts I have explained how you can use storm-control to limit the available bandwidth on a Cisco switch interface. In the example above I set this bandwidth to 1 % from the available one gigabit (I know is stupid, but imagine a typo mistake). Imagine what effect will this have on the traffic. Everything above 1 % is keeped in the queue until this is full and then silent discarded.

e) As a general rule, have a look into the logs (maybe this should be first step!)

If there are a lot of Spanning-tree reconfiguration, interface flapping or anything else that looks suspicious, be sure to check on this as you can find there the root cause for your problems.

Do you have any other tips in regard to this topic? Anything else you check and can be added here? Be sure to comment below and your suggestion will be taken into consideration.

Cisco tips: Track down communication issues – Part 2
Tagged on:                                                     

3 thoughts on “Cisco tips: Track down communication issues – Part 2

  • October 9, 2011 at 18:08
    Permalink

    Hi… I would add to issue the command “show mac-address-table dynamic interface mod/port”

    Some times the interface shows status of connected but with the command above you realize there is no MAC address associated to the switch port at issue.

    Then the possible cause for that has been already discussed in your article: Speed/Duplex mismatch or a defective network adapter on the attached device.

    I am new to your website, I think it is great. Thanks for sharing your knowledge.

    Keep up the great work!

    Have a good day,

    David M.

    Reply
    • October 13, 2011 at 10:29
      Permalink

      Hello David and thanks for your comment. 

      Thanks for your observation, but I don’t believe it’s an actual issue, but rather a logical behavior. If you connect a device in a cisco switch port, the interface will come up showing connected. This is Layer 1 in OSI stack, the physical connection. If there is zero communication from device, then the switch doesn’t have a trigger for an ARP request and it will not learn any mac-address of that port.As soon as there is any active communication (like DHCP request) then the mac will appear on that particular port.Of course there are some strange situation when there is something misconfigured and even with active communication there is no mac on Cisco switch port, but this fall into another category.

    • January 25, 2012 at 08:47
      Permalink

      Hi

      Also remember that if you use port-security with the maximum option on your interfaces the mac-address-table entries learned will be static and consequently not listed with the dynamic option!

      Ex:
       switchport port-security maximum 3

Leave a Reply

%d bloggers like this: