Tuesday, June 26, 2012

Network Disruption Causes vCenter DB Corruption


First off, I am NOT a VMware expert by any stretch of the imagination.  I AM however learning a lot working with some smart folks in virtualized servers and desktops.  

A network engineer (who shall remain nameless) was making some changes to the network infrastructure last night and unfortunately experienced an outage. Due to an ongoing network migration from Cat6500 to Nexus 7k/5k/2k, all ESX hosts are now connected to Nexus FEX but iSCSI storage is still on old Cat6500. Outage basically cut connectivity between Nexus-connected hosts and iSCSI storage. 

As users started trying to login to their desktops in the morning, we started getting reports of problems. Our VDI vCenter showed 4 of our 20+ hosts disconnected or not responding. We ended up power-cycling those, one at a time, and once they came up we were able to re-connect them back into vCenter.  

The next big problem was that the profile server, which runs as a VM in the VDI infrastructure, was hung while attempting to migrate. We rebooted vCenter which orphaned the profile server, but we found we were unable to browse the particular LUN where that VM's datastore existed to add it back into vCenter. At that point, we engaged VMware support and spent several hours on WebEx troubleshooting storage connectivity problems (tail -f /var/log/vmkernel and some other stuff). By the time I left in the early afternoon we had identified half a dozen hosts that seemed to be having iSCSI problems based on what VMware Support was seeing in the logs, and we rebooted those hosts one at a time to minimize end-user impact.

I had to leave before the fun was all over, but found out afterwards that apparently a couple of the hosts got duplicates of the datastore IDs on them when they recovered from the outage overnight. Once that happened, the database was somehow corrupted with the wrong datastore information. It was apparently cleared by removing the two particular hosts from vCenter and adding them back in, thus giving them new datastore information.

Like I said, I'm not a VMware expert but I'm learning more each day. You ever experience something like this? Who else is doing VDI? Leave your comments below or find me on Twitter (@swackhap).

Saturday, June 23, 2012

Cisco Live Tips and Tricks

Hard to believe it's been over a year since my last post here.  As I've learned in life though, sometimes you have to forgive yourself for your failings (in this case, not blogging for a while) and then you can continue to improve on yourself.

I recently attended Cisco Live 2012 in San Diego. After attending 9 times (thereabouts), I figured I'd share some ideas/thoughts/tips.

First off, have a 10-foot extension cord when traveling and when attending sessions.  Many breakout sessions and labs are in rooms that have power strips available, but some do not. If your extension cord has a 3-prong plug, have a 3-prong to 2-prong adapter with you just in case you need to plug into an old outlet.

The World of Solutions (WoS) is the area where Cisco and their partners set up booths with all sorts of goodies.  The first night it may be okay to wander a bit, but at some point you need to HAVE A PLAN. Look over the list of exhibitors. Think about your goals for the conference. Are there particular problems at work that you're trying to solve?  The WoS is THE PLACE to find the solution.  Print a map of the booths and circle the ones you want to visit. Then cross them off after you've been there.  Stay focused!

Some of my favorite places in the World of Solutions:

  • Walk-In Hands-On Labs - Great place to spend a few minutes learning new skills and practicing configurations on a plethora of systems.
  • Cisco Booth - Incredible opportunity to learn about almost every product/system/solution that they sell.
  • Social Media Hub - For the first time this year, the folks behind all the social networking for the event, such as the @CiscoLive Twitter account, set up shop to show off the top Tweeters and give people a place to lounge for a bit.
  • Technical Solutions Clinic - Basically an engineer's Heaven-on-Earth, there are several dozen whiteboards surrounded by some of Cisco's smartest Technical Marketing Engineers and TAC folks. What problem did you have at work you've been trying to fix? They'll solve it for you.
The Cisco Live mobile app makes navigating the conference a snap. View your schedule of sessions, browse WoS exhibitor listings and conference maps, and complete evaluations of sessions you've attended, right on your phone or tablet.  The evaluations are incredibly important and Cisco takes them very seriously.

I'm very excited to have attended Cisco Live once again, and hope to continue doing so.  I consider a week at Cisco Live equivalent to about 3 weeks worth of training.

If you have any questions, comment below or hit me up on Twitter (@swackhap). Cheers!