Friday, February 4, 2011

Splunk Field Extraction and Report for Cisco AnyConnect VPN Failures

At the peak of Snowmageddon and Icemageddon this week our remote-access VPN resources were getting some major exercise.  Our office was even closed for a day, something that doesn't happen often.  Our 100 simultaneous AnyConnect SSL VPN licenses on our Cisco ASA were being used up by 9am 3 days in a row, preventing many people from getting connected.  I've mentioned in a previous post about our secondary process, where we have users download and install the IPSEC VPN client. But for those that know the products, that's not as convenient as AnyConnect.

After the fact I was discussing options for increasing our remote access VPN capacity, all of which require money.  To justify the cost to the money holders, it's always useful to have data to back you up.  So we started asking questions:
  • How many people had problems connecting to the VPN?
  • How many times were individual users failing to connect due to our license limit?
After some digging I was able to find the perfect ASA log entry:

     %ASA-4-716023: Group name User user Session could not be established: session limit of maximum_sessions reached.

In our case it looks more like this:

     %ASA-4-716023: Group <SSLVPNUsers> User <swackhap>  IP <24.107.10.23> Session could not be established: session limit of 100 reached.

With our Splunk log analysis tool we were able to dig even deeper to analyze the data and get some good statistics to justify our request for added VPN capacity. Within Splunk, I first ran a search for the above log entry:
So in this case you can see we had 1071 occurrences of that log entry.  But how many people were affected? Splunk normally does a great job extracting fields of data it considers to be useful. But in our case we want to extract the actual userIDs, such as ea900503 and nbf shown above, and Splunk hasn't done it for us.

To extract a new field in Splunk, simply click on the small gray box with the downward facing triangle to the left of the event, then select "Extract Fields" as shown below.
In the "Example values" box I typed the two sample userIDs and clicked Generate, but in this particular case Splunk failed to generate a regex. So, I was forced to come up with one on my own.  
After messing around with a free tool called RegExr, and after much wailing and gnashing of teeth, I was able to come up with a regular expression to extract the proper field:

     (?:Group <SSLVPNUsers> User <)(?P<AnyConnectUser>[^>]*)

In Splunk, I clicked the gray Edit button and entered my own regex, then saved the new field extraction.  Now we're able to see "AnyConnectUser" as an interesting field on the left side of the search screen. (You may have noticed it in earlier screenshots, since I had already created the field extraction before writing this blog post.)
Clicking on the "AnyConnectUser" field shows a list of the top 10 hits, including the number of occurrences for each.  (Note that I've obfuscated many of the usernames for security). But at this point we still don't know how many users had problems connecting (we just know it's more than 100).  So we use some more Splunk magic--generate a report based on the search.
Clicking on "top values overall" brings up the report generation wizard.
After creating and saving the report, we can now get to it anytime from the main Search screen under the "Searches & Reports" drop-down menu:
Here's the finished product:
After scrolling down we can see a table of the raw data:
We can then go to the last page of the table, scroll to the bottom, and see the total number of users that had at least one failure connecting to the VPN:
We had 194 users experience VPN connection problems due to our existing license limit.

Hit me up on Twitter (@swackhap) if you have questions or ideas on how to do this better.  Or leave a comment below.  

No comments:

Post a Comment