Top Banner
30

An Introduction to Regular expressions

May 27, 2015

Download

Technology

Yamagata Europe

An Introduction to Regular expressions
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Introduction to Regular expressions
Page 2: An Introduction to Regular expressions

And are they contagious?

Page 3: An Introduction to Regular expressions

There is no official standard for

regular expressions, so no real

definition.

Simply put, you can call it a

text pattern to search and/or

replace text.

Easy peasy!

Page 4: An Introduction to Regular expressions

Perl programming language

Perl-compatible

.NET

Java

JavaScript

… What, no cherry flavour?

Page 5: An Introduction to Regular expressions

Back to grammar school!

Page 6: An Introduction to Regular expressions

a matches any occurrence of that character Jack is a boy. cat matches About cats and dogs.

Page 7: An Introduction to Regular expressions

square bracket [ backslash \ caret ^ dollar sign $ period or dot . vertical bar or pipe symbol | question mark ? asterisk or star * plus sign + opening round bracket ( closing round bracket ) opening curley bracket {

Page 8: An Introduction to Regular expressions

Special characters are reserved for special use. They need to be preceded by a backslash if you want to match them as literal characters. This is called escaping. If you want to match 1+1=2 the correct regex is 1\+1=2

Page 9: An Introduction to Regular expressions

tab \t carriage return \r line feed \n beginning of line ^ end of line $ word boundary \b

Page 10: An Introduction to Regular expressions

If regular expressions are Unicode enabled you can search any character using the Unicode value. Depending on syntax: \u0000 or \x{0000} Hard space \u00A0 or \x{00A0} ® sign \u00AE or \x{00AE} ...

Page 11: An Introduction to Regular expressions

Quantifiers allow you to specify the number of occurrences to match against X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times

Page 12: An Introduction to Regular expressions

The regex colou?r matches both colour and color. You can also group items together by using brackets: Nov(ember)? will match Nov and November The regex a+ is the same as a{1,} and matches a or aaaaa The regex w{3} matches www.qa-distiller.com

Page 13: An Introduction to Regular expressions

Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey. A character class matches only a single character, the order is not important You can also use ranges. [0-9] matches a single digit between 0 and 9

Page 14: An Introduction to Regular expressions

Typing a caret after the opening square bracket will negate the character class. q[^u] means: "a q followed by a character that is not a u". It will match the q and the space after the q in Iraq is a political quagmire. but not the q of quagmire because it is followed by the letter u

Page 15: An Introduction to Regular expressions

\d digit [0-9] \w word character [A-Za-z0-9_ ] \s whitespace [ \t\r\n] Negated versions \D not a digit [^\d] \W not a word character [^\w] \S not a whitespace [^\s]

Page 16: An Introduction to Regular expressions

The dot matches a single character, without caring what that character is. The regex e. matches Houston, we have a problem

Page 17: An Introduction to Regular expressions

If you want to search for cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog matches Are you sure you want a cat? You can add more options like this: green|black|yellow|white

Page 18: An Introduction to Regular expressions

Which of the following completely matches regex a(ab)*a 1) abababa 2) aaba 3) aabbaa 4) aba 5) aabababa

Page 19: An Introduction to Regular expressions

Which of the following completely matches regex ab+c? 1) abc 2) ac 3) abbb 4) bbc 5) abbcc

Page 20: An Introduction to Regular expressions

Which of the following completely matches regex a.[bc]+ 1) abc 2) abbbbbbbb 3) azc 4) abcbcbcbc 5) ac 6) asccbbbbcbcccc

Page 21: An Introduction to Regular expressions

Which of the following completely matches regex (very )+(fat )?(tall|ugly) man 1) very fat man 2) fat tall man 3) very very fat ugly man 4) very very very tall man

Page 22: An Introduction to Regular expressions

Still awake?

Page 23: An Introduction to Regular expressions

Positive lookahead: X(?=X) Match something that is followed by something Yamagata(?= Europe) matches Yamagata Europe, Yamagata Intech Solutions Negative lookahead: X(?!X) Match something that is not followed by something Yamagata(?! Europe) matches Yamagata Europe, Yamagata Intech Solutions

Page 24: An Introduction to Regular expressions

Positive lookbehind: (?<=X)X Match something following something (?<=a)b matches thingamabob Negative lookbehind: (?<!X)X Match something not following something (?<!a)b matches thingamabob

Page 25: An Introduction to Regular expressions

Round brackets create a backreference. You can use the backreference with a backslash + the number of the backreference. The regex Java(script) is a \1ing language matches Javascript is a scripting language The regex (Java)(script) is a \2ing language that is not the same as \1 matches Javascript is a scripting language that is not the same as Java

Page 26: An Introduction to Regular expressions

Use the regex \b(\w+) \1\b to find doubled words. Ze streelde haar haar in in de auto. With exceptions: \b(?!haar\b)(\w+) \1\b Ze streelde haar haar in in de auto.

Page 27: An Introduction to Regular expressions

You want to add brackets around step numbers: This is step 5 from chapter 1. Continue with step 45 from page 15. Use the regex ([sS]tep) (\d+) to find all instances. Replace it by \1 (\2) Or alternatively (?<=[sS]tep )\d+ by (\0)

Page 28: An Introduction to Regular expressions

Powerful, for individual text-based files

More powerful, batch operations, command line

No back references

RegEx Text File Filter

RegEx search

Very limited

Powerful, called GREP

Page 29: An Introduction to Regular expressions

Some people, when confronted with a problem, think "I know, I'll use regular expressions.“ Now they have two problems. -> Do not try to do everything in one uber-regex -> Regular expressions are not parsers