Monday, June 13, 2011

Regexp::Grammars for more DWIM regexes

Perl is famous for its emphasis on DWIM (do what I mean, otherwise known as the principal of least surprise). Unfortunately one of Perl's other great features is its Regex powers, which are dramatically "Do exactly what I say". Now its not a silver bullet and it still only does what you say, but Regexp::Grammars goes a long way to making regexes look like what you mean.

For example, on SO this post was looking for a way to implement a rather pathological parse on each line of text. The OP didn't seem to understand that the specification was rather odd (though several other posters didn't hesitate to inform him). It appeared to me that the OP trying to recover code that had been serialized to single lines. This is complicated by the fact that it appeared to contain line comments started with a `//` and ended by multiple spaces. To me this seemed to be an excellent time to pull out regexes that are more DWIM.

By defining the types of structure I hoped to match and then having them automatically parsed into a hash structure (%/) the job becomes far less painful and far more readable. Certainly this would have to be expanded and debugged to fully to what the OP probably needs, but it is a nice example of a more modern Perl for more modern Perl programmers.

No comments: