Perl FAQ 4.25: Can I use Perl regular expressions to match balanced text?

Perl FAQ 4.25

Can I use Perl regular expressions to match balanced text?

No, or at least, not by themselves.

Regexps just aren't powerful enough. Although Perl's patterns aren't strictly regular because they do backreferencing (the \1 notation), you still can't do it. You need to employ auxiliary logic. A simple approach would involve keeping a bit of state around, something vaguely like this (although we don't handle patterns on the same line):

while(<>) {
    if (/pat1/) {
        if ($inpat++ > 0) { warn "already saw pat1" } 
        redo;
    } 
    if (/pat2/) {
        if (--$inpat < 0) { warn "never saw pat1" } 
        redo;
    } 
}

A rather more elaborate subroutine to pull out balanced and possibly nested single chars, like ` and ', { and }, or ( and ) can be found on convex.com in /pub/perl/scripts/pull_quotes.


Other resources at this site: