SwackNet: 2011

Friday, April 8, 2011

Aruba Mobility Bootcamp Experience and Random Cisco Wireless Comparisons

This post is about wireless technology. However, I'm not a wireless expert. I've worked with Cisco Wireless LAN Controllers (WLCs) for a few years and have been quite happy with them. That said, I've seen Aruba's prices and they're very competitive. I have the opportunity to work with both Aruba and Cisco now in my current position.

I attended the Aruba Mobility Bootcamp (MBC), a week-long class including Powerpoint instruction as well as hands-on labs with Aruba model 3200 controllers, AP125s, and RAP2s. The class was very well taught by an experienced Aruba instructor (Ken Elwell). The material was well designed and Ken did a great job boiling down some of the more complicated slides saying things like "This is an overly complex slide that really is just trying to tell you X."

Topics covered included the following, and there was a hands-on lab for each one. I've included my own interpretation for most of them.

Architecture
Initial Controller Setup
AP Provisioning - AP's come online using Aruba Discovery Protocol, which uses things like DHCP option 43 and/or looking for DNS "aruba-master" record; AP's come up in default AP group, then are provisioned to desired group, assigned a useful name, and rebooted for changes to take effect
Authentication - MAC-based, Captive Portal, 802.1x with different EAP types
Firewall Policies - Aruba controller can be licensed with additional stateful firewall with policies that can be applied to individual devices and users (They also mention it's ICSA certified)
Roles - Every device and user has a role associated with it; there are different methods how these roles can be derived, such as through MAC address, 802.1x authentication, Captive Portal login credentials, as well as the actual SSID the user is associated with
RF Plan - Decent application available on the controller as well as standalone for Windows that allows import of floor plans and automatic placement of APs on map; can then print a bill of materials for order placement (I'm sure that's a favorite feature of Aruba SEs :-) )
Adaptive Radio Management (ARM) - automatic detection of channel-based WiFi interference and automatic channel and power-level changes to maximize coverage
Captive Portal Operations - web-based authentication for guest networks
Remote Access Points (RAPs) - useful for SOHO, can tunnel all traffic and/or do split-tunnel for employee SSID; can also provide additional SSID for non-employee Internet access for personal/family use
Remote AP Installation with ZeroTouch Deployment - administrator adds a RAP's MAC address to a "white list", then user takes RAP home, plugs it in, enters basic info allowing it to "phone home" to the controller and get it's config policies
Virtual Intranet Access (VIA) - remote-access client for PCs running Windows 32-bit; future support for Win 64-bit and Mac
Wired Access Control - apply security policies used for wireless users to wired ports on APs; particular useful for SOHO running a RAP with additional Ethernet ports
Site-To-Site VPN - Compatible with other Aruba controllers as well as Netscreen, Sonicwall, Microsoft, and Cisco
Master Redundancy - VRRP active/standby redundancy
Master and Local Operation - AP's can be associated to a controller on-prem ("Local") and failover to Master (back at datacenter) in case of Local controller failure
Local Redundancy - VRRP active/standby, N+1 failover where one controller backs up multiple conrollers as VRRP standby for those other controllers, active/active redundancy where each controller in a pair is active VRRP for different VRRP groups
Mobility - Keep same IP even while roaming between different controllers, useful for dense deployments on large campuses
Mesh - Outdoor or indoor
Wireless Intrusion Protection (WIP)

One of the most critical things I learned this week is the level of abstraction involved with configuring Aruba Mobility Controllers. In order to configure something as simple as a set of access points with multiple SSIDs (e.g., employee and guest), you actually create two different "Virtual APs" or VAPs. Then you associate those two VAPs with an "AP Group". Then you provision particular APs to that group. It's a little challenging to get used to after working with Cisco for so long, but it's a very powerful way of configuring the controller. The concept of object-oriented programming comes to mind.

Keeping in mind that I am NOT A WIRELESS EXPERT, here are some of my thoughts on Aruba vs Cisco:

Random Comparisons between Aruba and Cisco (Swack's $0.02):

Aruba ARM vs Cisco CleanAir - Aruba's current ARM technology appears to be limited to seeing channel-based interference, whereas Cisco CleanAir incorporates a special chip designed to see the entire RF environment including interference not caused by 802.11 sources (think microwave ovens, analog jammers, radar, etc.). CleanAir is more expensive, but is much more advanced. Depends how critical your wireless environment is and how much you're willing to pay for the added functionality.

Aruba RAP vs Cisco OfficeExtend - Aruba's RAP2 provides 802.11b/g and retails for $99. Cisco OfficeExtend uses 1140 or 1130AG APs which I think are more than $99 (correct me if I'm wrong). These costs don't take into effect the licensing you'll need on the controllers.

Aruba Policy Enforcement Firewall (PEF) vs Cisco SSID-based ACLs - Stateful firewall policies based on user and/or device vs. non-stateful ACLs.

Aruba RF Plan software vs Cisco WCS Planning Tool - Aruba's RF Plan software is available on their controllers as well as through a Windows-based executable. We got it for free from our Aruba SE. Cisco WCS is not cheap, and I'm not aware of another source for the planning software.

Swack's Take:

I learned a ton this week that I can apply at my current job. Also, thanks to some folks I interact with on Twitter, I was able to learn more about Cisco's wireless solutions. In the end, it's up to the individual engineer at a particular company to decide what is best for their environment.

Please comment below or hit me up on Twitter (@swackhap) with your comments/questions/snarky remarks about the competition.

Monday, February 28, 2011

Scripting On-Demand Network Changes with Solarwinds Orion NCM

Getting called at 2am is never fun, even if you are the Network On-Call person. Any chance I can prevent a call like that, I'll take it! In this case, there's a "failover pair" of servers, one in each data center (DC). Each server has a locally unique admin/replication IP addresses on one interface that is always active and a second interface that shares the same IP address as the server in the other DC. Whichever server is active enables the highly-available (HA) interface while the other server's HA interface is disabled. We can then make network changes to routers and switches to "switch" the server from one DC to the other. And instead of my having to manually make those changes at 2am, we can script the changes with a configuration management tool. Our tool of choice is Solarwinds Orion Network Configuration Manager (NCM).

In this particular use of NCM, there are 5 individual NCM jobs, one for each device that must be touched. The changes include enabling/disabling switch ports and adding/removing route advertisements in EIGRP and BGP. Assume the names of the 5 jobs are AutoJob1a, AutoJob2a, ..., AutoJob5a. In addition, there are 5 jobs for the reverse direction named AutoJob1b, AutoJob2b, ..., AutoJob5b. Each of these jobs has an NCM Job ID associated with it seen under the "Job ID" column when viewing Scheduled Jobs from the NCM GUI.

At this point, we've saved ourselves from having to individually login to each of the devices to make the required changes. But we can take it a step further by combining all the jobs and launching them from a Windows Batch (.bat) file. On the NCM server we created the file d:\RemoteJobs\AutoJob-A.bat which contains these 5 lines, one per NCM job:

"D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe" "D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-318696.ConfigMgmtJob"
"D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe" "D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-631858.ConfigMgmtJob"
"D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe" "D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-713828.ConfigMgmtJob"
"D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe" "D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-272305.ConfigMgmtJob"
"D:\Program Files\SolarWinds\Configuration Management\configmgmtjob.exe" "D:\Program Files\SolarWinds\Configuration Management\Jobs\Job-777458.ConfigMgmtJob"

Note that the Job ID for each job shows up in the name of the .ConfigMgmtJob file that is called in each line of the .bat file.

At this point, any monkey with a login to the NCM server could just double-click on the .bat file to kick off those five NCM jobs. But there's a better way, at least in our environment: Tidal Scheduler. With a Tidal agent on the NCM server, Tidal can be configured to launch d:\RemoteJobs\AutoJob-A.bat or the reverse d:\RemoteJobs\AutoJob-B.bat on-demand by the Operator-On-Duty. This allows the event to be properly audited and standardizes the action required by the Operator.

In addition, we can configure each NCM job such that it generates an e-mail notification when it completes, so when all 5 have completed we get 5 e-mails that show exactly what commands were entered and the corresponding output from the router/switch that was modified. The e-mail can be sent to the Network team as well as the Operations team so they have a better understanding of success than simply a "completed job" message from Tidal.

In the end, instead of getting a wake-up call at 2am, the server admin team can now simply call the Operator-On-Duty and ask them to run NCM Job "AutoJob-A" or "AutoJob-B". They then use a simple traceroute to determine if the network "thinks" the server is in DC A or DC B.

Ahh, now I can go back to sleep.

Zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.

Wednesday, February 23, 2011

Juniper Launches QFabric To Compete In The Data Center

In case you missed it, Juniper held an official launch event at 12N Central time (US) today for their new QFabric platform. What intrigued me the most was the maximum end-to-end delay of 5 microseconds. Here are a couple links that show the marketing pump-up-the-adrenaline type of advertisement they led the event with, as well as a 2:24 video with a brief explanation of the QFabric architecture. Also, here's an Infographic: The 7 Defining Characteristics of QFabric.

Based on what I know of the Brocade VCS/VCX platform, this sounds similar in many ways. But I'm certainly not a Brocade expert!

Did you watch the launch event? What was your take on it?

Here are some screenshots I took of the more technical slides (which I think most engineers agree are more interesting than the marketing hype).

Thursday, February 17, 2011

Life Without Caffeine

Title catch your attention? I thought so. Try to imagine it for a minute. I've been living it. Well, not quite, but I've been living on a limited caffeine intake since January 1 2011 as part of my 100-day challenge. Before Jan 1, I would drink 6-8 Diet Cokes per day. Since then, just 1. What's worth giving up so much caffeine for, you ask? This:

The Apple iPad. Better yet, by the time my 100-day challenge is done on April 10 (but who's counting), there should be an iPad 2 available.

In this world of self-indulgence it's very rewarding, albeit quite challenging, to replace instant self-gratification with self-denial. It makes the prize that much sweeter in the end.

What will you give up for 100 days, and what will your reward be? Join me in the challenge!

Friday, February 4, 2011

Splunk Field Extraction and Report for Cisco AnyConnect VPN Failures

At the peak of Snowmageddon and Icemageddon this week our remote-access VPN resources were getting some major exercise. Our office was even closed for a day, something that doesn't happen often. Our 100 simultaneous AnyConnect SSL VPN licenses on our Cisco ASA were being used up by 9am 3 days in a row, preventing many people from getting connected. I've mentioned in a previous post about our secondary process, where we have users download and install the IPSEC VPN client. But for those that know the products, that's not as convenient as AnyConnect.

After the fact I was discussing options for increasing our remote access VPN capacity, all of which require money. To justify the cost to the money holders, it's always useful to have data to back you up. So we started asking questions:

How many people had problems connecting to the VPN?
How many times were individual users failing to connect due to our license limit?

After some digging I was able to find the perfect ASA log entry:

%ASA-4-716023: Group name User user Session could not be established: session limit of maximum_sessions reached.

In our case it looks more like this:

%ASA-4-716023: Group <SSLVPNUsers> User <swackhap> IP <24.107.10.23> Session could not be established: session limit of 100 reached.

With our Splunk log analysis tool we were able to dig even deeper to analyze the data and get some good statistics to justify our request for added VPN capacity. Within Splunk, I first ran a search for the above log entry:

So in this case you can see we had 1071 occurrences of that log entry. But how many people were affected? Splunk normally does a great job extracting fields of data it considers to be useful. But in our case we want to extract the actual userIDs, such as ea900503 and nbf shown above, and Splunk hasn't done it for us.

To extract a new field in Splunk, simply click on the small gray box with the downward facing triangle to the left of the event, then select "Extract Fields" as shown below.

In the "Example values" box I typed the two sample userIDs and clicked Generate, but in this particular case Splunk failed to generate a regex. So, I was forced to come up with one on my own.

After messing around with a free tool called RegExr, and after much wailing and gnashing of teeth, I was able to come up with a regular expression to extract the proper field:

(?:Group <SSLVPNUsers> User <)(?P<AnyConnectUser>[^>]*)

In Splunk, I clicked the gray Edit button and entered my own regex, then saved the new field extraction. Now we're able to see "AnyConnectUser" as an interesting field on the left side of the search screen. (You may have noticed it in earlier screenshots, since I had already created the field extraction before writing this blog post.)

Clicking on the "AnyConnectUser" field shows a list of the top 10 hits, including the number of occurrences for each. (Note that I've obfuscated many of the usernames for security). But at this point we still don't know how many users had problems connecting (we just know it's more than 100). So we use some more Splunk magic--generate a report based on the search.

Clicking on "top values overall" brings up the report generation wizard.

After creating and saving the report, we can now get to it anytime from the main Search screen under the "Searches & Reports" drop-down menu:

Here's the finished product:

After scrolling down we can see a table of the raw data:

We can then go to the last page of the table, scroll to the bottom, and see the total number of users that had at least one failure connecting to the VPN:

We had 194 users experience VPN connection problems due to our existing license limit.

Hit me up on Twitter (@swackhap) if you have questions or ideas on how to do this better. Or leave a comment below.

Thursday, January 20, 2011

Snowmageddon vs. The Corporate Network

A major winter storm can make for some very interesting statistics. Let's look at the primary firewall for Company XYZ, also used for remote access VPN. We've got a failover pair of Cisco ASA5510s licensed for 100 simultaneous AnyConnect WebVPN connections as well as 750 IPSEC VPN connections. Our "road warriors" are set up with the IPSEC VPN on their laptops, but folks who work from home using their own personal computers usually come in using the AnyConnect WebVPN (SSL-based).

You can see from the IPSEC VPN Connections chart below that we apparently have about 80-100 "road warriors" that just keep their home computers connected all the time (based on the lowest number of connections each day). Over the last week we've peaked around 160-180 except for today, which has taken us up close to 200. One of the reasons for this is because of the next chart.

The WebVPN Connections chart below shows on most days we have up to 30 connections at our peak times. Since the sky opened up and dumped snow on us overnight, you can see that we've more than maxed out our connection limit for WebVPN. For days like this, our WebVPN page has a message that says something like "If there is inclement weather today and you are having problems connecting, there may be too many other people trying to connect at the same time. You may connect using a different method, by downloading an alternate VPN client using the appropriate link below." Then there are links for 3 .zip files: Windows XP/2000, Windows Vista/Win7, and Macintosh. Each zip file contains the Cisco IPSEC VPN client EXE as well as two PCF files that provide limited-access profiles for the IPSEC VPN.

Unfortunately, there doesn't seem to be any nice error message that says "no more connections available" to indicate a user is running into a connection limit. Is there some way to do that I don't know about?

The chart that got all this analysis started this morning also generated an e-mail telling my team the ASA VPN appliance was running high on CPU. (Well, the chart didn't generate the e-mail--the network monitoring system did.) Take a look at the following Average CPU Load and you'll see we're running about 80% today vs. a typical day at or below 60%.

The next chart shows the bandwidth impact all this VPN traffic has on our DS3 circuit. The green line shows uplink to the Internet and is peaking close to the 45Mbps mark today. I wonder how many of those users are RDP'd to their desktops and the screensaver has kicked in, causing high bandwidth utilization. *sigh*

In case you're wondering, all these graphs were pulled from Solarwinds Orion Network Performance Monitor (NPM). In particular, the first two charts showing connection numbers utilize Orion's Universal Device Poller (UnDP) funtionality. There wasn't any built-in way I could find to measure what I wanted, so I found ideas on Thwack.com (Solarwinds' user community site) to use SNMP polling via UnDP to get those numbers.

So who's winning the battle...Snowmaggedon or The Corporate Network? You decide! Let me know on Twitter (@swackhap) or in the comments below.

Tuesday, January 18, 2011

RSA SecurID Soft Token for iPhone - A Better Deployment Method

Working in a retail environment makes you think really hard about security, especially in light of what happened with TJ Maxx a few years ago. Using credit cards in retail is a privilege that we only get to keep if we follow the Payment Card Industry Data Security Standard (PCI DSS). One of the requirements of PCI is related to two-factor authentication for remote-access to your corporate network, and one solution for this is RSA's SecurID authentication product.

RSA SecurID supports many form factors, both hardware fobs/cards and software-based on PCs and mobile devices. This post focuses on mobile device soft tokens, particularly iPhones.

For quite some time, the process to get a soft token on an iPhone looked something like this:

User downloads RSA app from App Store
Administrator log in to RSA SecurID appliance and assign soft token to user
Generate CT-KIP credentials for web download, e-mail special link to user
Connect user's iPhone to internal corporate network
Have user open e-mail on the native iPhone app and tap the link
iPhone communicates directly with RSA appliance
Token is now present on iPhone

Step 4 is required because of the way RSA has locked down its current appliance. The only way for an iPhone to connect to the RSA appliance from outside the corporate firewall would be to somehow expose the appliance itself to the Internet, either directly or through a Microsoft ISA proxy server. This is one of my big gripes about the appliance, but it's a great solution for the most part.

The most recent update to RSA's iPhone app has greatly improved the token deployment process. Now the process looks like this:

User downloads RSA app from App Store (no change)
Administrator log in to RSA SecurID appliance and assign soft token to user (no change)
Issue token file (.sdtid) and save to desktop
Use RSA-provided TokenConverter.exe on command line to convert .sdtid file to a long string of characters, then e-mail that to user
Have user open e-mail on the native iPhone app and tap the link (no change)
Token is now present on iPhone

The new method precludes the requirement for the iPhone to communicate directly with the appliance, which is a huge improvement. The TokenConverter.exe is available for download from RSA's website for both Windows and Linux, and also works with Android and Windows Mobile, though I'm not sure if it works yet for Windows Phone 7. Of course, the same token deployment process I've described above works for any iOS device (iPod Touch, iPad).

Kudos to RSA for improving the token deployment process! Comment below or look for me on Twitter (@swackhap).