May 9, 2012

Capturing in Regular Expressions

Parenthesis inside regex will be grouped as well as captured.
Let me explain the grouping concept with an example where in you want to validate a decimal part

Eg: Validation for decimal part as given in the below mentioned example

        $input = +345.34f
        $input = ~/([-+]?[0-9]+(\.[0-9]*)?)([cf])$/
        where
                  $1 = /([-+]?[0-9]+(\.[0-9]*)?)    # 345.34
                  $2 = (\.[0-9]*)                            # .34
                  $3 = ([cf])                                  #  f

Notations used in the above example :
1) $1, $2, $3 are grouping data, one can capture the specific data and use it later in the program with $1, $2, $3 etc.,
2) *, + are greedy operators, they match more than what they required, that's why they are called as Greedy Operators
3) ? is a limiting operator which limits the mapping to what ever required
4) [] is a character class
5) $2 lies within $1 even then they both are distinct
6) $ represents the end of the string

Advantages of capturing:
 a) You can capture the data and use the data later in the program
 b) Very useful when we want to grep or capture tricky regex

Disadvantages of unnecessary Capturing:
 a) Waste of memory.
 b) Performance problem.
 c) Regex will be costlier


I hope you have really liked today's topic.

Please share your views with me in case of any any doubts/suggestions. Feel free, don't hesitate :-)

I will be explaining about the Non-Capturing Parenthesis in my next blog. Keep watching my blog.

Thanks for your valuable time. Have a good day :-)

No comments:

Post a Comment