Review of a Great Regular Expression Development Tool, Regex Buddy

geoffc

By: geoffc

March 19, 2008 11:15 am

Reads: 337

Comments:2

Rating:0

One of the many things you will often have to do in an Novell Identity Manager driver is a compare or replace that is based on a regular expression, also known as regex.

Regular expressions have been around for a long time, and for some people, they are so obvious and intuitive that they look at you funny when you have no idea what “s/\AMrs|\AMiss|\AMr|\AMs//” does. I mean, come on – it’s so obvious that it’s a substitute function (a Perl-ism, that in Identity Manager would be a Replace All token) finding all instances of either Mrs, Miss, Mr, or Ms, that occur at the beginning of a word (“\A” means beginning of the string).

In Novell Identity Manager, regular expressions can be used in any compare condition, and they can be used in Replace tokens. This means that any time you have a condition in DirXML Script that you are defining via Policy builder and you choose “equals” (or “not equals”) there are several comparison options. Case-insensitive is the usual default, but regex is one of the choices. (Numeric, binary, case- sensitive, case-insensitive, Source DN and Destination DN are the other compare modes). When you use a Replace All or Replace First token in Argument Builder, your test is implicitly regular- expression-based.

The example above is something I had to use the other day, and it can be quite powerful when your data feed with Full Name includes an honorific such as Mr, or Mrs, etc. (Oh, I suppose I should add Dr. to that, shouldn’t I? It is a silly approach, but I can’t think of a much better one … Did I miss any other honorifics? I will have to get a report run to see what the other possibilities currently are in use.)

There are lots of different products that use Regular expressions; Perl is one I happen to be familiar with. In fact, when I need to do a regular expression I usually pull out my O’Reilly Perl “camel” book to figure out what I need to do.

When I run on Linux, I use Regex Coach (http://weitz.de/regex-coach/) as a tool to help me figure out if my regular expression will correctly match my search string. Regex Coach is pretty good, and very helpful. But there is an even better tool.

For the Windows platform (and actually having nothing really to do with Identity Manager explicitly) there is an amazingly powerful product called Regex Buddy (http://www.regexbuddy.com). It is commercial but pretty cheap for a one-user license. You will find that the time it saves you on a single moderately complex regular expression design will be worth the $50 or so that it costs. I highly recommend the product.

In this screen shot:

you can see that I am testing Replace mode for my example above. The format of the “s” (substitution operator) in Perl that I used above, is “s///” – where the value between the forward slashes is the search then the replace strings. So a simple example like “s/Mr/mR/” would find any “Mr” and replace it with “mR”. I like the “s///” format since the extra slashes make it harder to read, and thus a good example of how complex this can get.

In the top pane I have my search string that I build. In the big middle pane, I have my search string, and you can see in the yellow highlighting what is selected. This is important, since had I simply used “Mr” as the search string in my silly contrived example, it would have matched on the “Mr” in the middle of “AlexandMrer” and the beginning of “Mroskvin”. What I needed was to insert a token or something that says, “start at the beginning of the string and only look at the beginning.” It turns out that is “\A” followed by the string.

Thus my search for “\AMr ” (note the space at the end) will find only the “Mr” followed by a space, only at the beginning of the string – which makes this actually work for my example.

Now we come back to a vocabulary issue. I can never remember all the silly switches, values, and tokens in Regular expressions like some of the pros can. The beauty of Regex Buddy is that you do not have to!

I right-clicked and got an Insert Token menu.

Now it helps if you have read a little bit about regular expressions, so that you know what a character class or lookaround means. But for 99% of the things you will need to do, you should be able to fake it pretty well. The good news is that context-sensitive help for Regex Buddy includes a couple of Regular Expression reference guides, so you can try and read about the meaning of the terms used in the online help. This, too, is a valuable part of the product.

I selected from the Anchors menu:

and selected the Beginning of string anchor. (Note the cute little boat anchor imagery. Someone has WAY too much spare time on his hands!) This pushed in a “\A” – and that part was done.

Now what you can do, once you have your sample regular expression built, is to dump some samples in the big middle pane and confirm for yourself that it really does what you want. This is one of the most powerful things. It is nice to tell you how to build a search string, but it is just as important to let you test it against silly examples. You could, of course, run it through the Simulator in Designer for Identity Manager, but for timeliness, it is hard to beat Regex Buddy!

One interesting issue I need to get resolved is which exact flavor of Regular Expressions is used by the Identity Manager engine. Regex Buddy recognizes that like any standard there are some variations in implementations, and so it can handle the flavor that you need, and the wierd specific quirks that come with it, such as JGsoft, Tkl, Perl, POSIX BRE, and even XPATH. I will have to play around with the XPATH one, since that could be even more powerful in an Identity Manager world. Of course, you need to be careful – Identity Manager only supports XPATH 1.0, not XPATH 2.0, since they are apparently quite different and not really compatible. The concrete issue that this would bring up is that XPATH only supports two anchors, neither of which are my “\A” that I am using. So while this test will work in a condition test compare, it probably would not work in an XPATH expression test. I wish standards would be used in a standardized way!

The next incredibly powerful feature is shown in this screen shot. I left the language at XPATH, (instead of the default of JGsoft from the past examples) and continued with my example.

Now, instead of Test mode, where I can test my search string against a sample, I am in Create mode where I try to figure out what my regular expression should be.

This is where the power of Regex Buddy shines. Say for example you post a question on the forums. (You are using the Novell Support Forums, http://forums.novell.com or nntp://forums.novell.com for a newsreader, aren’t you? If not, why aren’t you? There are a number of Novell engineers, support critters (like “ab”) and just plain busy bodies like me, posting questions and answers to all sorts of weird, real- world questions). Suppose you ask in a Forum how to do what is needed in my example, and “ab” or Father Ramon replies, “that’s as easy as pi! Just use a Regex of “\AMrs |\AMiss |\AMs |\AMr ” … Thats nice that they helped, (Very nice by the way, and we thank them for continuing to do so!) but they sort of handed you a fish, instead of teaching you to fish. In other words, what if you need to add “Dr” to that string? Can you figure it out? Well, this is an easy example, and you should be able to figure it out.

But with Regex Buddy, you can look smarter than you are by pasting the string that they suggested into the Create mode screen. Regex Buddy will then break it down, token by token, to do its best to explain what is going on.

In our example in the screenshot, we see that Xpath does not support the “\A” anchor. Oops – well, that is great to know. So, Regex Buddy tells it regardless that it should try to match the string “Mrs “. Then it tells us that the pipe sysmbol (|) is an OR operator and tries to match the next string in the list, and so on.

As you can imagine, this can be immensely helpful. You can dissect a suggestion someone has made and try to understand what each token is doing. Then in the worst case, by trial and error, you can try different tokens and combinations in Test mode to see if it works to match what you need on your test data.

What I need to check into is whether I can run it in a batch mode. That is, via an LDAP tool say, I could extract all the values of Full Name from my data source, then run Regex Buddy with my current matching string on it, and generate a list of what matches, just to be sure that I did not miss a test case. That would just be icing on the cake for a great product.

I highly recomend this to anyone who is NOT a regular expression wizard. Even to those who are, I am sure there are always things you need to try to do and figure out, and this tool can be a great helping hand!

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

Tags:
Categories: Uncategorized

Disclaimer: As with everything else at NetIQ Cool Solutions, this content is definitely not supported by NetIQ, so Customer Support will not be able to help you if it has any adverse effect on your environment.  It just worked for at least one person, and perhaps it will be useful for you too.  Be sure to test in a non-production environment.

2 Comments

  1. By:alekz

    There is a tool called Regex Designer that is free and is very nice.

    http://www.radsoftware.com.au/regexdesigner/

  2. By:geoffc

    Thanks for the link! I tried it. What I like better about Regex Buddy is subtle. I dislike having to keep clicking on Match Expression in Regex Designer.

    Also, probably the biggest thing Regex Buddy has going for it, is the example where someone provides you a regex that does exactly what you need, but it looks like gibberish, there is a mode that explains what each token means to try and help you understand what is going on.

    The Linux tool, Regex Coach that I mentioned in the original article is less useful, as it does not even have an easy way to insert a token by english name, instead of knowing it in advance.

Comment