Sunday, April 3, 2011

Lazy, Greedy, or What? Looking for a definitive Regex reference

Recently, somewhere on the web*, I found a reference for regular expressions which described a "third way" of greediness, different both from greedy (.*) and lazy (.*?) matching.

I've now tried searching SO, Googling, and even searching my browser history, but to no avail.

Can anyone make a good guess at what it was I saw?


Clarification: it referred to what was for me a new construct (something like .*+), and I believe it even had a name for it (something like, but probably not, "passively greedy").


* I appreciate that "somewhere on the web" is about as helpful as "in the Library of Babel" or "in the Mandelbrot set", but please try to help

From stackoverflow
  • Well, not exactly a reference, but good still. Mastering Regular Expressions

    There is also a "reference" book from O'Reilly, but I can't testify on it. Just saw it for the first time.

  • This maybe? http://www.regular-expressions.info/repeat.html

    An Alternative to Laziness

    In this case, there is a better option than making the plus lazy. We can use a greedy plus and a negated character class: <[^>]+>.

    Brent.Longborough : That's a good reference, but doesn't appear to have what I thought I saw... Thanks
    Brent.Longborough : It had, in fact, what I was looking for, a bit further down the tutorial. Sorry I can't upvote a second time...
    Alan Moore : I only see one plus sign; there should be two: <[^>]++>
  • I think you are referring to "posessive" matching. Java describes it on this page: http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

    Possessive quantifiers, which greedily match as much as they can and do not back off, even when doing so would allow the overall match to succeed.

    The syntax is the same as what you described (.*+) .

    Brent.Longborough : Sorry to have un-accepted your answer, but I believe Bennett McElwee has the better reference.
  • I always keep a copy of this regular expressions cheat sheet handy in my cube.

  • There are various different regex packages. PCRE (Perl-compatible regular expressions) are used (more or less) in Perl, Java, PHP and probably other languages. The PCRE man page might be regarded as the definitive reference. It describes possessive quantifiers (e.g. *+ and ++), which are a shorthand for atomic groups.

  • Thank you all. The key to getting my memory back was "possessive", not "passive".

    Here are a couple of useful references:

0 comments:

Post a Comment