Thursday, June 26, 2014

Cisco Nexus 7000 - Basic Design Case Study and Lessons Learned

As a senior-level engineer with my company, I have the opportunity to do some basic system design. It’s not the kind of experience I would get with a VAR or a larger enterprise, but I count my blessings every chance I get to install and play with new gear.

We deployed a Nexus 7000 in our main datacenter three years ago for 10Gbps connectivity, and we’re now getting around to doing the same thing in our collocated DR site.  Due to tech advancements, though, it doesn’t make sense for us to use identical hardware for the DR location.  Here I’ll compare the old to the new and some of the lessons learned while getting the new one set up.

Our older Nexus 7010 uses Sup1 supervisor engines and M1-series line cards.  We started with a single VDC (virtual device context) model, then later added an L2-only VDC to introduce mass in-line firewall functionality. Having all M1 line cards made this really easy. We’re still running NX-OS v5.1.5 because we’ve had no particular reason to upgrade. Installation was made easier with help from our Cisco partner.

Now there are M1, M2, F1, F2, F2e, and F3 line card models that use different architectures.  I’ve been reading entire slide decks from Cisco Live that talk about how certain features can be implemented with particular combinations of models of line cards. Combining that plethora of information, along with our requirements, presents a formidable challenge. Add on the fact that we MAY WANT to do certain things in the future (like OTV for instance) and it’s even more interesting.

Our new N7K, which is also a 7010, has dual Sup2 supervisors along with M1 and F2e line cards. The M1 cards (model M148GT-11L) provide 48-port copper 1Gbps RJ45 connections, and the F2e cards (model F248XP-25E) are for 1/10Gbps connections using either fiber optics transceivers or twinax cables. One key thing I’ve learned in my cram course on N7K modules is that we will need NX-OS v6.2 in order to support the same VDC model we already use in production. When running in this “proxy routing” mode, the F2e ports defer the L3 decisions to the M1 cards in the same VDC. In my case there’s also a key takeaway: we cannot connect other routers to F2e ports when using M1 for proxy routing.

Screenshot 2014 06 26 08 24 49

All our existing routers in the same location are 1Gbps only so can be connected to the M1 cards, but we’ll have to keep this in mind for future connections. We may need to create an F2e-only VDC in the future if we want to terminate 10Gbps routers. I welcome your comments if you have experience with this.

The resources I’ve been using include some very smart folks on Twitter such as Ron Fuller (@ccie5851) and David Jansen (@ccie5952). Ron and David, as well as countless others, referred me to the F2e and M Series Design Guide for NX-OS 6.2. Honestly, I might not have known about this doc had it not been for Ron’s apparent omnipresence on Twitter.  Many made references to http://ciscolive.com/online and the great presentations there.  Also, here’s a relevant discussion on Cisco’s Support Forums site: https://supportforums.cisco.com/discussion/11673636/nexus-f2e-series-modules

 As always, hit me up on Twitter @swackhap if you have questions or comments. Or leave them below this post.

Tuesday, June 3, 2014

Using Aruba ClearPass for iPod Mobile Point-Of-Sale (POS) with EAP TLS and Aruba Instant (IAP)

I'm happy to report that, with a lot of help, I was able to get a basic framework in place and working yesterday for our new Mobile POS effort to connect to a store's IAP. We'll be onboarding these iPod units with ClearPass OnBoard, downloading unique cert per device as well as network settings to enforce the use of EAP TLS. Then with the same SSID the device will auto-connect with a different role on the IAP.
 
Couple things I still need to work on:
1. Why isn't forced redirect working for the onboarding role specified on the IAP (ClearPass is handing it back to IAP correctly)?
2. Need to set up API account on AirWatch MDM and configure CPPM to point to it, then lock down the authentication to require the device to be enrolled in the MDM.
3. Lock down firewall rules on the IAP for the onboarding and mobile-pos roles. If you have a captive portal enforcement redirecting to an external site, do you have to allow traffic to that site? Or is it inferred automatically that traffic is allowed? 
 
What am I forgetting? Any hints/tips/tricks? Thanks to @sethfiermonti and others for the help!
 
Swack
Twitter: @swackhap

Tuesday, May 27, 2014

A10 Load Balancer Default Health Checks

If you work with load balancers, you know that one of the keys to setting up a virtual server (VIP) is the health check that is used to monitor the health of the servers being balanced.  My original experience with load balancers was with F5 LTMs, but in the last few years I’ve added A10 AX to my vocabulary.  

For a long time I assumed that the health check assigned to the server pool (F5 lingo), or service group (A10 parlance), was THE health check that determined the status of the VIP.  However, it turns out that there are two default health checks that A10 uses that I wasn’t aware of (or perhaps I knew at one point and just forgot).

Each server (not virtual server, but actual server) on an A10 AX has a default L3 health check (ICMP), and each port that is defined for the server has a default L4 health check (TCP 3-way handshake).  The overall up/down status of the pool/service group is the logical AND of the L3, L4, and, if defined, L7 health check for each server. If there is one web server in a pool, and the AX cannot ping it, even if it can do an HTTP GET and sees “200 OK”, the pool status will be DOWN and thus the VIP will be DOWN.

To get around this, you can easily disable the default health checks with an example. Consider the following two real web servers. 

slb server WebServerA 192.168.1.10
  port 80 tcp

slb server WebServerB 192.168.1.11
   no health-check
   port 80 tcp
      no health-check

In the case of WebServerA, there is a default L3 health check which will periodically ping the server at 192.168.1.10 as well establish and tear-down a TCP connection at 192.168.1.10:80. If either of these checks fails, then the service group (pool) that this server belongs to will flag the server as down.

For WebServerB, the first “no health-check” command disables the default L3 check and the second iteration of the command disables the L4 test. In this case, the only health check that matters will be the L7 health check assigned to the service group.

I hope this information can prove to be useful to someone else before they pull their hair out as I did before learning about it.

Got questions? Hit me up on Twitter (http://twitter.com/swackhap) or comment below.