I’m the first to admit—I love regular expressions.  It’s kind of a hammer and nail situation.  I see text, I immediately think:

using System.Text.RegularExpressions;

They’re just so useful.  How can you not like them?  Okay, they’re a bit obfuscated and horrid to debug (and that’s even true for the person who writes them).  I’m always thinking of my coworkers, so here’s a few regurgitated thoughts about how to improve readability and maintainability of Regular Expressions.

Since Regular Expressions can be so difficult to understand, it helps to properly document the individual tokens in the pattern.  There are a few reasons behind this:

  1. Check your work up front
  2. Clearly state what the expression intends to match
  3. Clearly state how the expression intends to match

I have this new friend—uh, buddy… his name is RegexBuddy.  Damn, I love this tool.  You can type in an expression, some input text, and see real time results.  But, there’s something invaluable about its presentation.  As you write an expression, it generates this great explanation.  The best part is that it is in plain English.  Looking at this got me thinking a little.

image
RegexBuddy even allows you to export the explanation to various places.  Seems like the clipboard could come in handy.  Wait, what if we could get this kind of information into comments?  Maybe it could be pasted and massaged into comments to look something like this:

//---------------------------------------------------------------------
/// 
///     This Regular Expression can be used to extract
///     a customer name from the salutation of a form letter.
/// 
#region Regex Explanation
// Expression:
// Dear (?<name>[A-Za-z ]*),
//
// Explanation:
// 	Match the characters “Dear ” literally «Dear »
// 	Match the regular expression below and capture its match into backreference with name “name” «(?<name>[A-Za-z ]*)»
//   	    Match a single character present in the list below «[A-Za-z ]*»
//      	Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//      	A character in the range between “A” and “Z” «A-Z»
//      	A character in the range between “a” and “z” «a-z»
//      	The character “ ” « »
// 	Match the character “,” literally «,»
//
// Sample Input:
// "Dear Loyal Reader,
//
// Thanks for reading John Coder!  :-)"
//
// Matches:
// "Loyal Reader"
//
// Created with RegexBuddy
#endregion
public const string GetCustomer = @"Dear (?<name>[A-Za-z ]*),";

Is it overkill?  Maybe.  Although, there are a couple of things to note.  Jamming all of this “stuff” into the

Xml comment tag will inevitably break it.  Plus, you probably don’t want that much information to pop up in a tooltip anyway.  So, I wrapped it in a region to make it collapsible.  The summary is really a summary, and the drawn out explanation is deferred to a less-obtrusive location.

Is this easy enough to understand?  Leave a comment and let me know what you think.

Friday, July 17, 2009 12:34:24 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] -
Coding Horror | Commenting | Jeff Atwood | Maintainability | RegexBuddy | Regular Expressions
Comments are closed.

John Nelson

mugshot I am a passionate C# Developer working in ASP.NET on an e-commerce solution for ticketing software. I work across all of the application layers, including server side functionality, and client side programming with jQuery and MS Ajax. Although my full time job is in WebForms, I spend many of my off hours working with MVC. I am especially interested in productivity and good programming practices.

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010
johncoder.com
Statistics
Total Posts: 41
This Year: 17
This Month: 0
This Week: 0
Comments: 4