Tuesday, December 14, 2010

Switch Flooding 101 - Troubleshooting Case Study

Remember the first time you learned the basics of bridging? Dig deep in your memory and think back to the basics. With helpful verification from my co-workers and Aaron Conaway (on Twitter as @aconaway), I verified that some "crazy" behavior I saw today on our network was, in fact, "normal," albeit undesired.

I've been troubleshooting some very strange behaviors on our network lately. I suspect some (all?) of them have to do with our fairly old Cisco Catalyst 6500s with Sup2's and Sup1a's in our data center, as well as the dinosaur Catalyst 2948 access switches in our closets. There are times when our monitoring system throws alerts saying it can't ping certain devices. But minutes later, things return to normal. (Don't you just love intermittent problems?)  One tool that any good network engineer will consider when dealing with such a problem is a packet capture product such as the ever-popular Wireshark.

When I fired up Wireshark on my desktop computer, I had to filter through the muck to see what was going on. By "muck" I'm referring to the traffic I don't care about, such as the traffic my box is generating, as well as broadcast and multicast. I slowly added more and more exceptions to my capture filter (see below) to narrow the scope of my capture.

My Wireshark Capture Filter: not host [my IP address] and not host [directed broadcast for my subnet] and not broadcast and not host and not host and not host and not host and not host and not ether proto 0x0806 [for CDP] and not ether host 01:00:0c:cc:cc:cc [for HSRP] and not host and not host and not host and not host and not host and not stp and not host and not host

Once I filtered out enough to see more clearly, I noticed a TON of syslog (UDP 514) traffic destined for another host on my subnet. After scratching my head and consulting with co-workers, I started looking at the mac-address tables (or CAM tables). My upstream switch didn't have a CAM table entry for the mac address of the syslog server. Neither did it's upstream switch. In fact, the Cat 6500 directly connect to the syslog server didn't have a CAM table entry for it.

Checking the timeouts for the CAM table on one of the CatOS switches gave us this:
CatOS-Switch> (enable) sh cam agingtime

VLAN    1 aging time = 300 sec
VLAN    2 aging time = 300 sec
VLAN    9 aging time = 300 sec
VLAN   17 aging time = 300 sec
VLAN   18 aging time = 300 sec
VLAN   20 aging time = 300 sec
VLAN   21 aging time = 300 sec
VLAN   25 aging time = 300 sec

Similarly, the Cat6500 running Native IOS showed this:
NativeIOS-Switch#sh mac-address-table aging-time 
Vlan    Aging Time
----    ----------
Global  300
no vlan age other than global age configured

Apparently, this syslog server is so quiet, so stealthy, that it doesn't transmit ANY traffic for more than 5 minutes (300 sec) at a time. After 5 minutes, the CAM table entries timeout, and all traffic destined for that server gets flooded to every port in the VLAN throughout our trunked network.

One way to prevent the flooding would be to put static CAM table entries in all the affected switches. Perhaps an easier solution is to configure the syslog server to generate some traffic at least every 5 minutes or less.

I'm not sure if the flooding is causing the other strange behaviors we're seeing on our network, but this has been a good learning experience and reminder for me about the basics of Layer-2 networking.

Any other troubleshooting ideas you would use for a situation like this? Comment here and/or hit me up on Twitter (@swackhap).

Friday, December 3, 2010

Splunk "host" Field Enhancement For Syslog-ng

We are very fortunate where I work to have Splunk. It's an incredibly powerful indexing tool that can "eat all your IT data" and report on it in many different ways. We mostly use it to do simple searches for troubleshooting, but we're always building more expertise as time permits.

Splunk is set up to index syslog messages very nicely by default. It takes each syslog message and intelligently recognizes the date/time stamp, then "extracts" all the fields and names them things like "host", "eventtype", "event_desc", "error_code", "log_level", and so on.  This post focuses on the "host" field, which is the IP address of the end device (router, switch, firewall, etc).

In our environment, we send all our syslogs to a Linux server running a free open-source tool called syslog-ng. With it, we do two things: (1) save a copy of each syslog message on the local server in a flat text file named for the source IP address where it came from, and (2) forward a copy to our Splunk indexing server using TCP port 9998.

For a while I’ve noticed that our Splunk lists all syslog messages with a “host” field that is the IP of the syslog-ng server. I was able to do some research this morning and “fixed” this so now all the syslog-ng forwarded messages have their host field set to the source IP address of their original sending device (router/switch/firewall).

Here’s how I did it:
1. Created props.conf file in /san/splunk/etc/system/local with the following contents
TRANSFORMS = syslog-header-stripper-ts-host syslog-host

2. Then restarted splunk with this command:
service splunk restart

Information sources I used:

Happy Splunking!

Thursday, November 11, 2010

Solarwinds Orion Network Performance Monitor Bug

I am *scary* good at finding bugs in software. Just ask the Cisco TAC. Or in today's case, ask Solarwinds support. This is a duplicate posting that I've also added to Solarwinds' Thwack.com user community site. If you use Orion NPM and send SNMP traps to another network management tool, READ AND HEED.

Thwack Post Title: NPM 10.0.0 SP1 Bug: Alert Action To Send SNMP Traps Actually BROADCASTS On Local Network

Many thanks to Mariusz from the Support team for helping me pin this down. I wanted to share with all since this might be happening under your nose!

We have Orion NPM 10.0.0 SP1 and have the "Alert me when a node goes down" alert configured with two trigger actions:

  1. Log Alert to NetPerfMon Event Log
  2. Send SNMP Trap to two hosts (Microsoft Operations Manager and Orion NCM).
A DBA told me earlier today that he noticed a server was receiving traps from our Orion poller. He noticed this in that server's Event Viewer Application Log.

With help from Mariusz and Wireshark, we found that the Orion NPM poller was actually broadcasting SNMP traps to! It seems that the workaround is to create a different trigger action for each SNMP Trap destination.  In other words, we changed our trigger actions to this:
  1. Log Alert to NetPerfMon Event Log
  2. Send SNMP Trap to Microsoft Operations Manage
  3. Send SNMP Trap to Orion NCM
As a matter of fact, for each additional valid IP destination we added to the trigger action, it appears that the Orion poller actually generated duplicate broadcasts for each SNMP trap.

If you use this feature of Orion, I recommend you check your settings and maybe run Wireshark on your poller to be sure you're not spewing broadcasts out to your entire server subnet.

Mariusz is filing this as a bug, and I'm not sure what all versions of Orion are impacted. Feel free to add your comments to this thread.


Friday, October 15, 2010

The Case of the Mysterious Disappearing VPN

Many of us in the networking world use IPSEC VPNs over the Internet. The ISP connection is, or at least can be, cheaper than alternatives like MPLS, and of course we all need to connect our networks to the Internet (unless you're the DoD, CIA, or some other secretive organization with a classified network). This mystery begins with a VPN outage.

Refer to the reference network shown below.  For these two sites, the primary connectivity is the IPSEC VPN over the Internet. The MPLS VPN is a secondary connection.

Problem: IPSEC VPN Down
At 2:44am CT the primary 10Mbps IPSEC VPN went down, but the 3Mbps MPLS worked flawlessly after route reconvergence.  As the day progressed, the level of traffic between the two sites increased and began causing performance problems for users at Site B.

As we continued to troubleshoot what had happened, we found this syslog entry in Splunk that came from FW A:

Oct 14 02:44:33 fw.fw.fw.21 Oct 14 2010 02:44:33: %ASA-4-106023: Deny protocol 47 src inside:a.a.a.1 dst outside:b.b.b.254 by access-group "inside_access_in"
(Note: IP addresses have been changed here for security reasons.)

Nobody had made any changes at 2:44am. So what changed? After digging some more into our change management system, we found this change to FW A that was made back on 9/23:

Last Month - 9/23/2010 12:00:18 AM
access-list inside_access_in extended permit gre host a.a.a.1 host b.b.b.254
access-list inside_access_in extended permit gre host a.a.a.254 host b.b.b.254

This change was logged during a nightly config backup/compare, thus the Midnight time listing. It turns out that day we added another VPN that connects from another site (we'll call it Site C) back to Site A.  For that VPN, we chose to use a.a.a.254 as the GRE endpoint on RTR A. We prefer to use .1 addresses to manage routers, and with .1 as a GRE endpoint we can't ping it.  Unfortunately, we didn't realize the other VPN to Site B was active.  Apparently, the IPSEC security association (SA) remained active, as did the stateful firewall connection in FW A, until 2:44am CT.  So we ask ourselves again: What changed at that time?

Splunk to the Rescue
Diving more into the logs that we index with Splunk, we found visually when the problem started--it's where the histogram suddenly goes from 17 events per hour to over 1500.
Clicking on the 2AM timeframe brings up many iterations of the "Deny protocol 47" message that was shown above. Immediately prior to that stream of messages we see these three events:
  • Oct 14 02:44:26 fw.fw.fw.21 Oct 14 2010 02:44:26: %ASA-3-713123: Group = [FW B InternetIP], IP = [FW B InternetIP], IKE lost contact with remote peer, deleting connection (keepalive type: DPD)
  • Oct 14 02:44:26 fw.fw.fw.21 Oct 14 2010 02:44:26: %ASA-5-713259: Group = [FW B InternetIP], IP = [FW B InternetIP], Session is being torn down. Reason: Lost Service
  • Oct 14 02:44:26 fw.fw.fw.21 Oct 14 2010 02:44:26: %ASA-4-113019: Group = [FW B InternetIP], Username = [FW B InternetIP], IP = [FW B InternetIP], Session disconnected. Session Type: IPsec, Duration: 21d 15h:00m:15s, Bytes xmt: 181785169, Bytes rcv: 3049561298, Reason: Lost Service
Correct me if I'm wrong, but it appears there may have been some connectivity problem on the Internet that happened just long enough for dead-peer-detection (DPD) to take effect and tear down the existing session. When that happened, a new IPSEC SA was created, still using the GRE endpoint of a.a.a.1. Since the firewall was previously changed to allow a.a.a.254 instead of a.a.a.1, this traffic got denied on the inside interface of FW A and prevented the GRE tunnel from coming up.

To fix, we added a rule to FW A allowing GRE from a.a.a.1 to b.b.b.254.

Mystery solved!

Thursday, October 14, 2010

Contacts Consolidation

I don't know about you, but I have contacts everywhere. I've got Exchange with Outlook at work, Google Contacts (to go along with Gmail and Google Voice), Facebook, Twitter, and Linked In.  There may be others but I spent about 30 minutes and pulled together all my current contacts from all these sources last night. Here's how I did it:

  1. Outlook: Exported all contacts as a CSV file. Cleaned it up and imported into Google Contacts.
  2. Facebook: I found a post that explained how to use a Yahoo account to import Facebook contacts. I then exported as a CSV and, again, imported into Google Contacts.
  3. Linked In: Under the Contacts listing, there's an easy-to-use "Export Connections" link. Exported to CSV and, you guessed it, imported into Google Contacts.
  4. Twitter: Found a nice service called MyTweeple.com that has a handy tool to export all contacts to a CSV file. Imported into Google Contacts.
By now you see a pattern developing.  Since I use Gmail and Google Voice so heavily, Google Contacts is a natural repository for all my contacts.  It also allowed me to import custom column fields, like "TwitterName", so I have all my tweeps listed in my Google Contacts with their "twittername" as a Note attached to their details. 

Another great thing about Google Contacts is that it is great at finding and merging duplicate contacts. As you might guess, there are many people that I follow on multiple social networks, so merging duplicates is a must for me.

How do you keep your contacts organized?

Find me on Twitter at @swackhap.

Tuesday, October 12, 2010

Who Said Catholics Don't Have A Sense Of Humor?


Catholic or not you have to laugh at this one.

A Catholic priest and a nun were taking a

rare afternoon off and enjoying a round

of golf.

The priest stepped up to the first tee and

took a mighty swing. He missed the ball

entirely and said "Shit, I missed."

The good Sister told him to watch his


On his next swing, he missed again.

"Shit, I missed."

"Father, I'm not going to play with you

if you keep swearing," the nun said tartly..

The priest promised to do better and

the round continued.

On the 4th tee, he misses again. The

usual comment followed.

Sister is really mad now and says, "Father

John, God is going to strike you dead if you

keep swearing like that."

On the next tee, Father John swings and

misses again. "Shit, I missed."

A terrible rumble is heard and a gigantic

bolt of lightning comes out of the sky and

strikes Sister Marie dead in her tracks..

read on

And from the sky comes a booming voice ......

"Shit, I missed."

Monday, September 27, 2010

Google Is Great For More Than Just Searching

I've recently been discovering (or in some cases re-discovering) some of the awesome free stuff that Google has to offer. My Google Dashboard lights up like a Christmas tree now that I'm using so many of their tools. Here are a few that I've started (re)using lately.

Gmail - After looking at the web-based interface on and off for a while, I decided to take the leap. My primary e-mail address, which uses my own domain (swackhammer.net) automatically forwards all e-mail to my Gmail account. Advantages I love include speed, ability to quickly search all e-mails for what I need, and integration with all my contacts.

Google Voice - I give out one number to everyone, then can customize what phone will ring and when based on who is calling me. Annoying call from recruiter or telemarketer? Just tell Google Voice to send them to voicemail. Or better yet, play a message that indicates your number is no longer in service. :-) And when you do get a voicemail, you can read a transcript of it via SMS or in your e-mail so you don't even have to listen to it. (Although some people's accents make for some very interesting transcripts.)

Google Contacts - Integration with Gmail and Google Voice--all your important contacts in one place, all easily reachable from any web browser.

Google Reader - RSS (Really Simple Syndication) feed-reader allows me to sign up for all the news and blogs I care about and read them at my leisure. I also use the NewsRack app on my iPhone which syncs with Google Reader. Any article I read on my iPhone gets marked as "read" so I won't waste time reading it a second time if I'm using Google Reader in a web browser.

Blogger - I've heard many people say they like WordPress better, but until I need features that WordPress offers, this works great for me.

Best of all, these services are FREE. I know, I know--you may be one of those people that hate Google and don't want them tracking your every move. I'm aware of my online footprint, and as a techie I fully understand that if someone really wants to find out more about me, they will anyway.

How do you use Google? What non-Google services do you love in place of these and why?

Friday, September 24, 2010

Don't Drink and Drive; DO Geekout and Drive

I've been listening to Pandora on my iPhone while driving to and from work for weeks now, and I love it. I am very musically oriented. But I've saturated myself with awesome music for now. I wanted something different to occupy my time in the car. So I started searching for some interesting technical podcasts to listen to. Here's some great ones that I found:

Packet Pushers Podcast (http://packetpushers.net/) - Roundtable of network engineers talking about the week's happenings in the networking industry

Tech News Today (http://twit.tv/tnt) - Amusing daily look at technology news from different sources, quite professionally done

vChat (http://www.vmwarevideos.com/vchat) - Fantastic discussions about VMWare

Wireless LAN Professionals (http://wirelesslanprofessionals.com/category/podcasts/wlw/) - Helps me keep up with wireless technology in the enterprise

What other podcasts do you recommend? Tell me on Twitter @swackhap!

Tuesday, August 17, 2010

A rancher hired an architect, an engineer, and a mathematician to design the largest animal pen possible using only a limited number of fence segments.

The architect arranged all the fence pieces in a perfect square. "Making all sides equal in length maximizes the space," he explained to the farmer, who looked on with interest.
Next, the engineer took the fence pieces and arranged them in a large circle. "Eliminating sides and making the pen round produces a shape with even greater area than a square," he told the farmer, who was even more impressed.
Finally, the mathematician took only three fence pieces and arranged them in a triangle with himself in the middle. "I am outside the pen," he declared.

Tuesday, April 6, 2010

Scam Against Older Men

Gentlemen, beware! Ladies, warn your men! Here is a scam that has recently come to my attention. Women often receive warnings about protecting themselves at the mall, in dark parking lots, etc. This is the first warning I have seen for men. It’s a “heads up” for those men who may be regular Lowe's, Home Depot, Costco, or even Wal-Mart customers. This one caught me totally by surprise. I wanted to pass it on in case you haven't heard about it. Below is one man’s account of his terrifying experience.

Over the last month I became a victim of a clever scam while out shopping. Simply going out to get supplies has turned out to be quite traumatic. Don't be naive enough to think it couldn't happen to you or your friends.

Here's how the scam works: Two nice-looking, college-aged girls will come over to your car or truck as you are packing your shopping into your vehicle. They both start wiping your windshield with a rag and Windex, with their breasts almost falling out of their skimpy T-shirts. (It's impossible not to look.) When you thank them and offer them a tip, they say no but instead ask for a ride to McDonalds. You agree and they climb into the vehicle. On the way, they start undressing. Then one of them starts crawling all over you, while the other one steals your wallet.

I had my wallet stolen Mar. 4th, 9th, 10th, twice on the 15th, 17th, 20th, 24th, & 25th. Also Feb. 1st & 4th, twice on the 8th, 16th, 23rd, 26th & 28th, three times last Monday and very likely again this upcoming weekend.

Warn your friends to be vigilant. What a horrible way to take advantage of us older men!

Please send this on to all the older men that you know and warn them to be on the lookout for this scam.

P.S. Wal-Mart has wallets on sale for $2.99 each. I found even cheaper ones for $.99 at the Dollar Store and bought out their stock in three of their stores. Also, you never will get to eat at McDonalds. I've already lost 11 pounds just running back and forth from Lowe's, to Home Depot, to Costco, Etc.

P.P.S. The best times are just before lunch and around 4:30 in the afternoon.

Lenten Prayer

Dear Lord,
In the past year you have taken away my favorite actor (Patrick Swayze)
my favorite actress (Farah Fawcett)
my favorite musician (Michael Jackson) and
my favorite salesperson (Billy Mays).
I just wanted to let you know that my favorite legislator is Nancy Pelosi.

Thursday, March 4, 2010

The Way Of The Dinosaur

It's been a
long time. I can't remember how long, and I'm too lazy/busy to look it up. But somewhere around two (yep, count 'em, TWO!) years ago we had a major problem at work. One of our Cisco Catalyst 6509 core Ethernet switch had major problems. Turns out we had some bent pins on the backplane in slot 2. In laymen's terms, the place where you plug the brains into the switch was broke. We still had one "brain" (a.k.a. supervisor module) but the redundant one couldn't be used. The only solution to get our redundancy back? Replace the whole chassis.

Replacing an entire switch chassis is NOT a small job. There were literally hundreds of servers connected to this switch in the data center. So we set out on a very. long. journey. We got a replacement chassis from Cisco and sloooooooowly began moving one server network connection at a time from the old switch to the new switch.

Fast forward to today. Thanks to a big push in the last few days by some coworkers and me, we currently have only 7 more connections on this switch. And if things go according to plan, they'll all be changed to the new switch by Saturday afternoon. (Yeah, I have to go to work on Saturday. And it's supposed to be nice weather, too! Bummer...)

Some might not see the significance of this accomplishment, but those of us that have worked on it over these many months are psyched! We've scheduled a ceremonial power-off ceremony for Monday afternoon. Two of us will switch off the dual redundant power supplies, and everyone present will have the opportunity to disconnect one of the many ancient RJ-21 Ethernet cable connections. It will be stupendous when this switch makes itself extinct, and we can go on with our other more exciting, less mundane, projects.

Thursday, February 25, 2010

Does anyone know who I am?

It happened at the Denver Airport. This is hilarious. I wish I had the guts of this girl. An award should go to the United Airlines gate agent in Denver for being smart and funny, while making her point, when confronted with a passenger who probably deserved to fly as cargo.

A crowded United Airlines flight was canceled. A single agent was re-booking a long line of inconvenienced travelers. Suddenly, an angry passenger pushed his way to the desk. He slapped his ticket on the counter and said, "I HAVE to be on this flight and it has to be FIRST CLASS."

The agent replied, "I'm sorry, sir. I'll be happy to try to help you, but I've got to help these folks first; and then I'm sure we'll be able to work something out."

The passenger was unimpressed. He asked loudly, so that the passengers behind him could hear, "DO YOU HAVE ANY IDEA WHO I AM?"

Without hesitating, the agent smiled and grabbed her public address microphone. "May I have your attention, please?", she began, her voice heard clearly throughout the terminal: "We have a passenger here at Gate 14 WHO DOES NOT KNOW WHO HE IS. If anyone can help him find his Identity, please come to Gate 14".

With the folks behind him in line laughing hysterically, the man glared at the United agent, gritted his teeth, and said, "F*** You!" Without flinching, she smiled and said, "I'm sorry sir, you'll have to get in line for that, too."

Life isn't about how to survive the storm, but how to dance in the rain.

Friday, February 12, 2010

Friday Joke

Father O'Malley rose from his bed.. It was a fine spring day in his new Washington DC parish. He walked to the window of his bedroom to get a deep breath of air and to see the beautiful day outside. He then noticed there was a jackass lying dead in the middle of his front lawn.

He promptly called the US House of Representatives for assistance.

The conversation went like this: "Good morning. This is speaker Pelosi. How might I help you?"

"And the best of the day te yerself. This is Father O'Malley at St.Brigid's.. There's a jackass lying dead in me front lawn. Would ye be so kind as to send a couple o' yer lads to take care of the matter?"

Speaker Pelosi, considering herself to be quite a wit, replied with a smirk, "Well now father, it was always my impression that you people took care of last rites!" There was dead silence on the line for a long moment.

Father O'Malley then replied: "Aye, that's certainly true, but we are also obliged to first notify the next of kin."

Thursday, January 28, 2010

Putting Sacrifice In Perspective

An interesting letter in the Australian Shooter Magazine this week:

"If you consider that there has been an average of 160,000 troops in the Iraq theater of operations during the past 22 months, and a total of 2112 deaths, that gives a firearm death rate of 60 per 100,000 soldiers.

"The firearm death rate in Washington , DC is 80.6 per 100,000 for the same period.

That means you are about 25 percent more likely to be shot and killed in the U.S. capital, which has some of the strictest gun control laws in the U.S., than you are in Iraq ."

Conclusion: The U.S. should pull out of Washington, DC..

To Blog Or Not To Blog...

That is the question! I've been toying with the idea for a while, because I feel like there are some things that are worth saying that take more than 140 characters (Twitter) and don't fit neatly into a Facebook or LinkedIn status update. I won't promise to write often, but at least now I have a place to express myself. I humbly present my blog to you. Now, go SWACK YOURSELF!