Thursday, June 26, 2014

Cisco Nexus 7000 - Basic Design Case Study and Lessons Learned

As a senior-level engineer with my company, I have the opportunity to do some basic system design. It’s not the kind of experience I would get with a VAR or a larger enterprise, but I count my blessings every chance I get to install and play with new gear.

We deployed a Nexus 7000 in our main datacenter three years ago for 10Gbps connectivity, and we’re now getting around to doing the same thing in our collocated DR site.  Due to tech advancements, though, it doesn’t make sense for us to use identical hardware for the DR location.  Here I’ll compare the old to the new and some of the lessons learned while getting the new one set up.

Our older Nexus 7010 uses Sup1 supervisor engines and M1-series line cards.  We started with a single VDC (virtual device context) model, then later added an L2-only VDC to introduce mass in-line firewall functionality. Having all M1 line cards made this really easy. We’re still running NX-OS v5.1.5 because we’ve had no particular reason to upgrade. Installation was made easier with help from our Cisco partner.

Now there are M1, M2, F1, F2, F2e, and F3 line card models that use different architectures.  I’ve been reading entire slide decks from Cisco Live that talk about how certain features can be implemented with particular combinations of models of line cards. Combining that plethora of information, along with our requirements, presents a formidable challenge. Add on the fact that we MAY WANT to do certain things in the future (like OTV for instance) and it’s even more interesting.

Our new N7K, which is also a 7010, has dual Sup2 supervisors along with M1 and F2e line cards. The M1 cards (model M148GT-11L) provide 48-port copper 1Gbps RJ45 connections, and the F2e cards (model F248XP-25E) are for 1/10Gbps connections using either fiber optics transceivers or twinax cables. One key thing I’ve learned in my cram course on N7K modules is that we will need NX-OS v6.2 in order to support the same VDC model we already use in production. When running in this “proxy routing” mode, the F2e ports defer the L3 decisions to the M1 cards in the same VDC. In my case there’s also a key takeaway: we cannot connect other routers to F2e ports when using M1 for proxy routing.

Screenshot 2014 06 26 08 24 49

All our existing routers in the same location are 1Gbps only so can be connected to the M1 cards, but we’ll have to keep this in mind for future connections. We may need to create an F2e-only VDC in the future if we want to terminate 10Gbps routers. I welcome your comments if you have experience with this.

The resources I’ve been using include some very smart folks on Twitter such as Ron Fuller (@ccie5851) and David Jansen (@ccie5952). Ron and David, as well as countless others, referred me to the F2e and M Series Design Guide for NX-OS 6.2. Honestly, I might not have known about this doc had it not been for Ron’s apparent omnipresence on Twitter.  Many made references to http://ciscolive.com/online and the great presentations there.  Also, here’s a relevant discussion on Cisco’s Support Forums site: https://supportforums.cisco.com/discussion/11673636/nexus-f2e-series-modules

 As always, hit me up on Twitter @swackhap if you have questions or comments. Or leave them below this post.

Tuesday, June 3, 2014

Using Aruba ClearPass for iPod Mobile Point-Of-Sale (POS) with EAP TLS and Aruba Instant (IAP)

I'm happy to report that, with a lot of help, I was able to get a basic framework in place and working yesterday for our new Mobile POS effort to connect to a store's IAP. We'll be onboarding these iPod units with ClearPass OnBoard, downloading unique cert per device as well as network settings to enforce the use of EAP TLS. Then with the same SSID the device will auto-connect with a different role on the IAP.
 
Couple things I still need to work on:
1. Why isn't forced redirect working for the onboarding role specified on the IAP (ClearPass is handing it back to IAP correctly)?
2. Need to set up API account on AirWatch MDM and configure CPPM to point to it, then lock down the authentication to require the device to be enrolled in the MDM.
3. Lock down firewall rules on the IAP for the onboarding and mobile-pos roles. If you have a captive portal enforcement redirecting to an external site, do you have to allow traffic to that site? Or is it inferred automatically that traffic is allowed? 
 
What am I forgetting? Any hints/tips/tricks? Thanks to @sethfiermonti and others for the help!
 
Swack
Twitter: @swackhap

Tuesday, May 27, 2014

A10 Load Balancer Default Health Checks

If you work with load balancers, you know that one of the keys to setting up a virtual server (VIP) is the health check that is used to monitor the health of the servers being balanced.  My original experience with load balancers was with F5 LTMs, but in the last few years I’ve added A10 AX to my vocabulary.  

For a long time I assumed that the health check assigned to the server pool (F5 lingo), or service group (A10 parlance), was THE health check that determined the status of the VIP.  However, it turns out that there are two default health checks that A10 uses that I wasn’t aware of (or perhaps I knew at one point and just forgot).

Each server (not virtual server, but actual server) on an A10 AX has a default L3 health check (ICMP), and each port that is defined for the server has a default L4 health check (TCP 3-way handshake).  The overall up/down status of the pool/service group is the logical AND of the L3, L4, and, if defined, L7 health check for each server. If there is one web server in a pool, and the AX cannot ping it, even if it can do an HTTP GET and sees “200 OK”, the pool status will be DOWN and thus the VIP will be DOWN.

To get around this, you can easily disable the default health checks with an example. Consider the following two real web servers. 

slb server WebServerA 192.168.1.10
  port 80 tcp

slb server WebServerB 192.168.1.11
   no health-check
   port 80 tcp
      no health-check

In the case of WebServerA, there is a default L3 health check which will periodically ping the server at 192.168.1.10 as well establish and tear-down a TCP connection at 192.168.1.10:80. If either of these checks fails, then the service group (pool) that this server belongs to will flag the server as down.

For WebServerB, the first “no health-check” command disables the default L3 check and the second iteration of the command disables the L4 test. In this case, the only health check that matters will be the L7 health check assigned to the service group.

I hope this information can prove to be useful to someone else before they pull their hair out as I did before learning about it.

Got questions? Hit me up on Twitter (http://twitter.com/swackhap) or comment below.

Friday, August 30, 2013

VMworld Wednesday Lessons Learned

One of the strengths of a conference such as VMworld is being able to direct questions to strangers across the table at meals and often get a useful answer.  At lunch Wednesday I struck up a conversation with the folks at the table about PowerCLI to see if I could accomplish this task:
 
3. Learn some basic functions of PowerCLI
 
It turns out they were easily able to get me pointed in the right direction.  PowerCLI is an application available for download from VMware that an administrator can run on their workstation to help with mundane and repetitive tasks related to vSphere management.  PowerCLI is a VMware tool that is based on Microsoft's PowerShell which is available on most (or all?) modern Windows OS versions.  PowerGUI, as the name suggests, is a free graphical front-end for PowerShell that can incorporate components to managed vSphere.  One of the top 10 VMworld sessions this year was "VSVC4944: PowerCLI Best Practices: A Deep Dive" (available on YouTube here)
 
I attended "Key Lessons Learned from Deploying a Private Cloud Service Catalog" (OPT5051), presented by two consultants from Greenpages Technology Solutions that implemented such a system for one of their customers. In their case study, five people spent 6-8 months working with their corporate customer building consensus between different groups within the company for what should be in the service catalog, what could be automated, and what things were deemed too complicated and would take too much effort to implement in the initial engagement.
 
They initially started the project by gathering all requirements up front and attempted to implement, but because there was so much "mission creep" after they completed some initial integrations they modified their approach to use individual "Sprints" of 2-3 weeks to build functionality incrementally.
 
The idea of having a service catalog implies the use of on-demand procurement by end-users. Setting up such a system inevitably leads to higher demand, so the system should have usage monitoring in place. When the available pools drops below a certain threshold, it should be agreed in advance that IT will procure new resources either for the internally based "private cloud" or to be able to take advantage of "hybrid cloud" technology such as VMware's recently announced vCloud Hybrid Service (vCHS).
Service catalog offerings are meant to provide on-demand service, but it's important to include financial management tools that will track costs and either "show-back" or "bill-back" the costs to the lines of business using the service.
 
Finally, I was able to complete the NSX hands-on lab. Not surprisingly, this particular lab was the most taken lab of the week with about 6500 sittings.  Of course, the NSX lab was so long it required 2 sittings, but it's still impressive that over 3000 people presumably took that lab.
NSX Lab Stats

Wednesday, August 28, 2013

VMworld Tuesday Lessons Learned

Today's accomplishments are focused around these particular goals I mentioned in my "Swack's VMworld To-Do List" post:
 
1. Gain better understanding of NSX (came from vCNS/vShield and Nicira) and dive more into details of VMware networking

4. What is DevOps all about?

An Introduction to Network Virtualization" (NET5516)
For NSX, I attended an excellent session titled "An Introduction to Network Virtualization" (NET5516) with Eric Lopez and Thomas Kraus (@tkrausjr) from VMware, both formerly of Nicira.  Following are some notes I took down from their slides.

Cloud Consumers want the following, and these are driving network virtualization:

  • Ability to deploy apps at scale and with little preplanning (provisioning speed and efficiency)
  • Mobility to move workloads between different geographies and providers (investment protection and choice)
  • Flexibility to create more diverse architectures in a self service manner (rich L3-L7 network services)
NSX System Architecture consists of 3 planes familiar to most network engineers: Management, Control, and Data Planes
  • Management Plane = NSX Manager - programmatic web services api to define logical networks
  • Control Plane = Control Cluster
  • Clustered App runs on x86 servers, controls and manages 1000s of edge switching devices, does NOT sit in data plane
  • Data Plane = OVS/NVS
    • Open vSwitch (OVS) vmWare-led open source project
    • NSX vSwitch (NVS) is a software vSwitch in ESXi kernel
  • Switch software designed for remote control and tunneling installed in hypervisors, NSX gateways or hardware VTEP devices
  • Can work with vSphere, KVM, XenServer
  • vSwitch in each hypervisor controlled through API by Controller Cluster
  • NSX manager uses this API, so does cloudstack, openstack, CMS/CMP, VMware 
  • To get between physical and virtual networks, Open vSwitch NSX Gateway or HW Partner VTEP Device is used
  • NSX Controller Cluster establishes an overlay network
  • Multiple tunneling protocols including STT, GRE, VXLAN
  • Packets encapsulate with Logical Switch info
  • The tunneling protocol is NOT network virtualization, rather, it is a component of it 
NSX use cases include:
  1. Automated network provisioning
  2. Inter rack or inter DC connectivity
  3. P2V and V2V migration
  4. Burst or migrate enterprise to cloud 

NSX Whiteboard Sketch

The Whiteboard snapshot above was drawn to demonstrate the basic components of NSX and how VMs communicate using the virtual overlay netowrk

The example uses ESXi on left and KVM hypervisor on right (HV1 and HV2)

  • Each connected to IP fabric
  • 3 controllers drawn in the middle
  • Intelligent Edge NVS installed on ESXi and OVS installed on KVM
  • Controllers talk with ESXi on vmkernel management interface, something similar with KVM
  • Addresses assigned that used for encapsulation and direct communication between hypervisors: 172.16.20.11/24 on left, 172.16.30.11/24 on right
  • Customer A is green, they have a VM on each hypervisor (192.168.1.11 on left, 192.168.1.12 on right)
  • Customer B is red, they have VM on each hypervisor with SAME IP ADDRESSES - logically separated similar to VRFs (I didn't get a picture of this--sorry) 
  • Controller cluster controls virtual ports, so they can programmatically control QoS, Security, Distributed Routing
NSX Hands-On-Lab HOL-SDC-1303, continued
I was able to continue, but not yet finish, the NSX lab I started yesterday in the VMworld Hands-on-Labs (HOL-SDC-1303). This portion of the lab went into more technical detail surrounding the following diagram:

Screen Shot 2013 08 27 at 4 02 07 PM

The network drawing depicts a 3-tier web application which includes web, application, and database servers. Each server tier is on a different subnet, and thus connected to a different port group. The NSX Edge shown acts as the external layer 3 (L3) gateway for each subnet shown in blue, green, and orange.  At the beginning of this lab section we verify the web app is working properly by connecting to the website and verifying data is served from the back 2 tiers (application and database servers).  Then we disconnect the NSX Edge from the App and DB subnets/port groups and validate that the website is broken (can get to web servers but get an HTTP error saying service not working).  Next, we connect to the vCenter web client and verify that each cluster is configured and loaded with the virtual router and virtual firewall components of the NSX suite, and we configure the router and firewall to connect to the App and DB tiers and allow the appropriate traffic. Finally we verify that service is restored on the website. Part of the configuration includes OSPF connectivity between the virtual distributed router on the ESXi hosts and OSFP running in the NSX Edge routing engine. Looking at the snapshot below of the NSX Edge you can see the similarities with Cisco IOS. For instance, "show ip ospf neighbor" and "show ip route" commands are identical.
Screen Shot 2013 08 27 at 3 51 27 PM
 
I hope to complete this lab tomorrow.
 
What is DevOps?
While spending some time in the Solutions Exchange I discussed what DevOps means with someone involved in that space at the Cisco booth.  As I understand it, companies usually first get virtualized, then they implement a service catalog, then they implement a "cloud" such that it's self-service enabled. DevOps refers to IT working closely with developers such that they create the development environment as well as production environment that the developers will deploy to. If you know more about DevOps and I've misunderstood, please keep me honest.
 
VMware IT Business Management Suite 
Finally, in the VMware booth I learned about the VMware IT Business Management Suite. It enables companies to understand costs and, as I understand it, implement chargeback to IT's internal customers. The demo looked pretty impressive, and I think there is a lot of value in such a tool. It can pull General Ledger data directly from standard systems such as Oracle and SAP and presents data in a well-thought-out manner. It's something to share with the CIO and/or accounting folks back home.

Tuesday, August 27, 2013

VMworld Monday Lessons Learned

Started out a productive day with my first-ever Fritatta and some delicious croissants at breakfast in Moscone South.  Having seen the debacle of "breakfast" at last year's VMworld, the seating this year was at least an improvement with areas available in both Moscone South and West.

I went to the General Session at 9am, but as I was seated towards the back I couldn't see the bottom of the screens. There were no screens overhead, only 3 or 4 large screens up front. In addition, the vmworld2013 wireless SSID was nowhere to be seen. The Press SSID (vmwaremedia) was available but locked down. Attempts to use my AT&T MyFi were stifled due to the overwhelming RF interference in the area. And I had AT&T cell coverage but no throughput.  Having seen how well wireless CAN be delivered at Cisco Live, even in this kind of space for 20,000+ people, I was very disappointed.  I decided to go watch the Keynote from the Hang Space, but that was full to capacity with a line waiting to get in. I finally gave up and walked over to Moscone West, 3rd floor, and sat at a charging station watching the live stream while waiting for my first breakout session. (Kudos at least for the stream working.)

My first session was "Moving Enterprise Application Dev/Test to VMware’s internal Private Cloud -- Operations Transformation (OPT5194)." This was a great story of how leadership from the top pushed VMware to implement Infrastructure as a Service (IaaS). Kurt Milne (@kurtmilne) (VMware Director of CloudOps) and Venkat Gopalakrishnan (VMware Director of IT) shared lessons learned during VMware's internal implementation of a service catalog and the automation of processes which used to require manual intervention by cross-functional teams over the course of weeks.  The process of standing up a new Software Development Life Cycle (SDLC) series of dev/test/uat/stage/prod environments has been greatly automated and provisioning time reduced from 4 weeks to 36 hours and they plan to reduce it to 24 hours in the near future.  If you're going through a similar journey in your organization, this session is a must see when recordings and slides are released after the conference. I believe the session was also live-tweeted by @vmwarecloudops.

The other session I attended today was the very popular "What's New in VMware vSphere" presented by Mike Adams (http://blogs.vmware.com/vsphere/author/madams). We reviewed some of the new features released in vSphere 5.1 last year as well as some of the changes made for vSphere 5.5 this year.  Some key takeaways for me (your mileage may vary):

  1. vSphere is now wrapped up with Operations Management, i.e., vCenter Operations Manager (vCOPS). Referred to as "vSphere with Operations Management" it's now available in the Standard, Enterprise, and Enterprise+ flavors, each of which includes vCOPS Standard. See snapshot of feature breakout and license cost.
    VSphere with Ops Mgmt Cost Features Chart
  2. vCloud Suite variations all include vSphere Enterprise+, vCloud Director (vCD), and vCloud Networking and Security (vCNS). The individual flavors depend on the version of vCOPS and vCloud Automation Center (vCAC) which are Standard, Advanced, and Enterprise. In addition, the Enterprise SKU also includes vCenter Site Recovery Manager (vC SRM).
  3. vSphere Web Client is replacing vSphere Windows Client, so we "better get comfortable with it." If I understand correctly, vSphere 5.5 includes support for all functionality in the Web Client now but not the Windows Client.
  4. New features in vSphere 5.5 include: VMDK file support up to 62TB, 4TB memory per host, 4096 vCPUs per host.
  5. vSphere Replication allows full copying of workloads, including the VMFS files, without shared storage. This perhaps saves the cost of more expensive synchronous or asynchronous storage replication, but has a somewhat limited Recovery Point Objective (RPO) of about 15 minutes.  Still, this may be a good fit for some organizations for DR (including mine).

In addition to the sessions I was able to complete three labs (between yesterday and today) all related to VMware's recently announced vCloud Hybrid Service (vCHS). HOL-HBD-1301, HOL-HBD-1302, and HOL-HBD-1303 give a good introduction to the components and steps necessary to migrate workloads from a vSphere or vCloud Director environment in your own datacenter to the vCHS environment, as well as networking & security components and managing the service. 

One big announcement during the morning General Session/Keynote was the release of VMware's network virtualization product called NSX.  This is the marriage of Nicira (an earlier VMware acquisition) and vCNS/vShield in a new product.  As a network engineer by background and training, this is particularly interesting to me. I was able to start the NSX lab (HOL-SDC-1303) but couldn't yet finish as I ran out of time. I plan to finish tomorrow. More to come on that.

I have to give a big thumbs-down to VMworld's requirement that we all get our badges scanned as we enter lunch.  I don't remember this last year, nor have I ever seen this at any other conference I've attended.  What gives?  It's hard to hold a herd of hungry humans back from the food!

Finally, I visited with some fine folks at the Rackspace booth in the Solutions Exchange, including Waqas Makhdum (@waqasmakhdum). I now understand that Rackspace's Openstack platform uses a different hypervisor solution than VMware or Amazon EC2, but they offer guaranteed uptime with a phone number to call for support and apparently pretty reasonable costs for running a VM you control or even hosting the VM and just having you run your application on it. Also, I learned they offer VMware-based Managed Virtualization to allow you to "Set up a single-tenant VMware environment at our data center, rapidly provision VMs, and retain full control using the orchestration tools you’re familiar with." (Ref: http://www.rackspace.com/managed-virtualization/)

I'm failing to mention all the great people I met and conversations but one would expect nothing less from a great conference!

Sunday, August 25, 2013

Swack's VMworld To-Do List

Vmw2013 banner hero sf key preReg

It's time for VMware's 10th Annual VMworld conference in beautiful San Francisco!  This is my second trip to VMworld and I'm looking forward to making it my best one yet. As such, I'd like to share some of my goals for this week. I feel that publishing my objects tend to keep me motivated.

1. Gain better understanding of NSX (came from vCNS/vShield and Nicira) and dive more into details of VMware networking

2. Better understand OpenStack and maybe take a test drive

3. Learn some basic functions of PowerCLI

4. What is DevOps all about?

5. Set up vCloud Director and/or vCenter Orchestrator and try it out

6. Learn about VMware's Internal Private Cloud for dev/test workloads

7. What is Cloud Foundry and how does it relate to my company?

If you have insights or can point me in the right direction please do! Comment below or find me on Twitter (@swackhap).

-Swack