Top Banner
Regular Expressions MacSysAdmin Göteborg, September 2012
101

Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Jul 09, 2018

Download

Documents

dokhue
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Regular Expressions

MacSysAdminGöteborg, September 2012

Page 2: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

“It’s called grepbecause it greps for things.”

-- Rob Pike, Bell Labs

Page 3: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st
Page 4: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

/usr/share/dict/wordsIntroducing My Favourite Unix File

235,886 English words in alphabetical order.

Page 5: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

A Quick DemoGrepping for Things

Page 6: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

A pattern.

Page 7: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

A pattern.Matched against a line.

Page 8: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

A pattern.Matched against a line.

Did it match at all?

Page 9: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

A pattern.Matched against a line.

Did it match at all?

Where, specifically,did it match?

Page 10: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

A pattern.Matched against a line.

Did it match at all?

Where, specifically,did it match?

Can we change the part that matched?

Page 11: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

A little history.

Page 12: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Theoretical Computer ScienceFormal Language and Automata Theory

1950s• Stephen Cole Kleene – Foundations of Recursion Theory

– Mathematical notation of “regular sets”

– Kleene star - unary operation on a set of strings, known

in mathematics as the “free monoid construction.”

– Application of the Kleene star to a set V is written as V*

Page 13: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

QED

Ordinary character Matches itself

^ Beginning of line

$ End of line

. Any character

[string] Any character in the string

[^string] Any character not in the string

* Zero or more occurrences

a|b Either of the expressions

(expr) Grouping

Page 14: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

UNIX and the “ed” editor

• Hey, I’ll show you.

• You should know this editor. Some day you’ll need it.

Page 15: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

March 3, 1973

grep

$ grep pattern file1 file2 file3....

And soon, “egrep” and “fgrep”.

Page 16: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Kernighan and Pike, 1984Early UNIX Regular Expressions

c Any non-special character matches itself\c Turn off any special meaning of c^ Beginning of line$ End of line. Any single character

[...] Any one of the characters in the range[^...] Any one character not in the range\(r\) Tagged regular expression (grep only)\x What the x’th \(expression\) matched (grep only)r* Zero or more occurrences of rr+ One or more occurrences of r (egrep only)r? Zero or one occurrences of r

r1r2 r1 followed by r2

r1|r2 r1 or r2 (egrep only)

Page 17: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Extended Regular Expressions - egrep

Implemented first by “egrep”

?

+

|

Page 18: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Early Unix - THREE grep programs

grep• Early. Based on “ed”

egrep• Extended Grep. Fancier regular expressions. Runs faster, but starts slower.

fgrep• “Fixed Grep”. No patterns, static strings only. Very efficient.

• fgrep -f fileOfWordsICannotSpell myDocument

“The distinction is hard to justify”.

Page 19: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Surely we don’t have three grep programs any more. It’s 2012.

Today’s UNIX. Much better.

$ ls -l /usr/bin/*grep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/bzegrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/bzfgrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/bzgrep-rwxr-xr-x 3 root wheel 29664 23 Jul 20:57 /usr/bin/egrep-rwxr-xr-x 3 root wheel 29664 23 Jul 20:57 /usr/bin/fgrep-rwxr-xr-x 3 root wheel 29664 23 Jul 20:57 /usr/bin/grep-rwxr-xr-x 2 root wheel 25632 23 Jul 20:58 /usr/bin/pgrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/zegrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/zfgrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/zgrep-rwxr-xr-x 1 root wheel 1188 23 Jul 20:58 /usr/bin/zipgrep

Page 20: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Surely we don’t have three grep programs any more. It’s 2012.

Today’s UNIX. Much better.

$ ls -l /usr/bin/*grep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/bzegrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/bzfgrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/bzgrep-rwxr-xr-x 3 root wheel 29664 23 Jul 20:57 /usr/bin/egrep-rwxr-xr-x 3 root wheel 29664 23 Jul 20:57 /usr/bin/fgrep-rwxr-xr-x 3 root wheel 29664 23 Jul 20:57 /usr/bin/grep-rwxr-xr-x 2 root wheel 25632 23 Jul 20:58 /usr/bin/pgrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/zegrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/zfgrep-rwxr-xr-x 6 root wheel 29664 23 Jul 20:58 /usr/bin/zgrep-rwxr-xr-x 1 root wheel 1188 23 Jul 20:58 /usr/bin/zipgrep

Page 21: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Perl-Compatible Regular Expressions

Inspired by a billion things Perl 5 added to regular expressions• lookahead, lookbehind, lookaround

Used by• Apache

• PHP

• Postfix

• Safari

• ack

A C library used by many other tools

Page 22: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

2003The Real Triumph of Grep

Page 23: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

2003The Triumph of Grep

Page 24: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

It can be simple. /* match: search for regexp anywhere in text */ int match(char *regexp, char *text) { if (regexp[0] == '^') return matchhere(regexp+1, text); do { /* must look even if string is empty */ if (matchhere(regexp, text)) return 1; } while (*text++ != '\0'); return 0; }

/* matchhere: search for regexp at beginning of text */ int matchhere(char *regexp, char *text) { if (regexp[0] == '\0') return 1; if (regexp[1] == '*') return matchstar(regexp[0], regexp+2, text); if (regexp[0] == '$' && regexp[1] == '\0') return *text == '\0'; if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text)) return matchhere(regexp+1, text+1); return 0; }

/* matchstar: search for c*regexp at beginning of text */ int matchstar(int c, char *regexp, char *text) { do { /* a * matches zero or more instances */ if (matchhere(regexp, text)) return 1; } while (*text != '\0' && (*text++ == c || c == '.')); return 0; }

c matches any literal character c . matches any single character ^ matches the beginning of the input string $ matches the end of the input string * matches zero or more occurrences of the previous character

Page 25: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Using Today’s “grep”

Page 26: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

$ grep pattern file1 file2 file3 ....About “grep”

Read each file in turn, one line at a time

If the pattern matches anywhere in the line, print the line.

$ grep steve /usr/share/dict/words

Page 27: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

$ grep pattern file1 file2 file3 ....About “grep”

Read each file in turn• One line at a time

If the pattern matches anywhere in the line,• print the line.

$ grep steve /usr/share/dict/words

Page 28: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

$ grep pattern file1 file2 file3 ....About “grep”

Read each file in turn• One line at a time

If the pattern matches anywhere in the line,• print the line.

$ grep steve /usr/share/dict/words

Page 29: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

$ grep pattern file1 file2 file3 ....About “grep”

Read each file in turn• One line at a time

If the pattern matches anywhere in the line,• print the line.

$ grep steve /usr/share/dict/wordsstevedoragestevedorestevedoringstevelstevenTransteverineTrastevereTrasteverine

Page 30: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

grep -v: “Print the lines that DON’T match”

Getting fancy with grep

$ grep -v steve /usr/share/dict/wordsAaaaaalaaliiaamAaniaardvarkaardwolfAaron

Page 31: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

grep -c: “Just count the matches”

Getting fancy with grep

$ grep -c steve /usr/share/dict/words8

Tip:Remember

this one!

Page 32: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

grep -o: “Just print the matching part”

Getting fancy with grep

$ grep -o steve /usr/share/dict/wordsstevestevestevestevestevestevestevesteve

Page 33: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

grep -q: “Be quiet.”

Getting fancy with grep

$ grep -q steve /usr/share/dict/words$$ if grep -q steve /usr/share/dict/words; then echo Not a good password; fi Not a good password

Page 34: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

grep -C n: “Print some context.”

Getting fancy with grep

$ grep -C 2 steve /usr/share/dict/words

--transsubjectivetranstemporalTransteverinetransthalamictransthoracic----trashytrassTrastevereTrasteverinetrasytraulism

2 lines before, 2 lines after each match.

Page 35: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

grep --color: “Colour the matches.”

Getting fancy with grep

$ grep --color steve /usr/share/dict/wordsstevedoragestevedorestevedoringstevelstevenTransteverineTrastevereTrasteverine

Page 36: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

grep -i: “Case-insensitive matching.”

Getting fancy with grep

$ grep -i steve /usr/share/dict/wordsStevestevedoragestevedorestevedoringstevelStevenstevenStevensonianStevensonianaTransteverineTrastevereTrasteverine

Page 37: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

grep -r “Recursively go through subdirectories”grep -l “Just list the matching file names”

Getting fancy with grep

$ grep -r -l -i Tennis ~/Data/Users/me/Data/Medallists.csv/Users/me/Data/Shakespeare.txt

Page 38: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

That you match against text.(Usually against a single line.)

A pattern.

Pattern Matches this Doesn’t match this

stevesteve

stevenEmelio Estevez

stephenSteve

[0123456789] Can$200 Two hundred dollars

. Steve

Page 39: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Beginning or end of the line

Anchors

Pattern Matches this Doesn’t match

^SteSteveStephenStegosaurus

Hello Steve

hen$

Stephenlichenstrengthen

Hello SteveChicken

Page 40: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Repeat the previous pattern

Repetition

Pattern Matches this Meaning

S*Steve“HISSSS” said the snake

|TychoZero or more

S? |Steve|Tycho

Zero or one(Optional)

S+ SteveSSSSnake

One or more

S{3} SSSnakeExactly three

S{3,4} SSSnakeSSSSnake

Three or four

Page 41: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Matches as much as it canNote: * is “greedy”

Suppose you want to match HTML tags.

You have a string

<HTML> <HEAD> <TITLE>My Page</TITLE> </HEAD> </HTML>

You have a pattern <.*>

You can say <.*?> to make it “lazy” instead of “greedy”

Page 42: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Combinations

Pattern Matches lines that....* Anything at all.

^x Start with x

y$ End with y

^x... Start with x;at least 3 more characters

^x.*y$Start with x,anything in the middle,end with y

.......... Have at least 10 characters

^..........$ Have exactly 10 characters

^.{10}$ Have exactly 10 characters

Tip:Remember

this one!

Page 43: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Advanced Patterns

Page 44: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

(egrep)Backreferences

References to a (expression) you already matched.

\1 is the first one, \2 is the 2nd, etc.

Example: Find words with double letters

$ egrep ‘(.)\1’ wordsaaaalaaliiaamaardvarkaardwolfabactinallyabaffabaissed

abandonee

Page 45: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

sed, vi, perl, XcodeSubstitution

You match something with (grouping) and (more grouping)

You replace it with something else, using \1, \2 to refer to the matched groups

s/pattern/replacement/

perl -p -e 's/(.*):(.*)/\2,\1/;' -- Flip the first two colon-separated fields around

foo:bar => bar,foo

Page 46: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Page 47: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Page 48: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Page 49: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Page 50: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Page 51: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Page 52: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Page 53: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Page 54: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Page 55: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Tycho:SjogrenArek:Dreyer

Page 56: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -p -e 's/(.*):(.*)/\2,\1/;'

Sjogren,TychoDreyer,Arek

Page 57: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl -i.bak -p -e 's/(.*):(.*)/\2,\1/;' names.txt

Tycho:SjogrenArek:Dreyer

names.txt

Tycho:SjogrenArek:Dreyer

names.txt.bak

Sjogren,TychoDreyer,Arek

names.txt

Page 58: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Try this!

$ cd /usr/share/dict$ egrep '(.)\1' words

Page 59: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Challenge: Find a word witha repeating five letter pattern (like “abcdeabcde”)

$ cd /usr/share/dict$ egrep '(.)\1' words

Page 60: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st
Page 61: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Prize Challenge:

How many 14 letter wordsare in /usr/share/dict/words ?

Page 62: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Really Advancedand Ugly Looking Patterns

Page 63: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

“Lookaround”Lookahead and Lookbehind

Pattern Means ...

(?=pattern) Zero-width positive lookahead.

(?!pattern) Zero-width negative lookahead.

(?<=pattern) Zero-width positive lookbehind.

(?<!pattern) Zero-width negative lookbehind.

Page 64: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

“U” always follows “Q”, right?A Negative Lookahead Example

Negative Lookahead: A “q” not followed by a “u”

q(?!u)

Why not just use q[^u] ?

Didn’t we use that already?

A q followed by a non-u character?

Page 65: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Normal REWords with “q” not followed by “u”

Negative Lookahead

Page 66: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Lookahead and Lookbehind don’t work everywhere.A Problem

Perl 5

grep -P• 10.7 and earlier only

• 10.8 uses a different grep. You can use “perl -e”

ack

Nice tutorial here:• http://www.regular-expressions.info/lookaround.html

Page 67: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Certain tools only (perl)Comments and Free-Spacing

Match a valid date:

^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$

Or, with comments and free-spacing

# Match a 20th or 21st century date in yyyy-mm-dd format(19|20)\d\d # year (group 1)[- /.] # separator(0[1-9]|1[012]) # month (group 2)[- /.] # separator(0[1-9]|[12][0-9]|3[01]) # day (group 3)

www.regular-expressions.info

Page 68: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Matching an Email Address

How hard can it be?

Page 69: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Matching an Email Address

It’s easy, right?

Everybody is [email protected], right?

\s+@\s+\.\s+

Page 70: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Can you match this?

[email protected]

steve@[192.0.43.10]

“Steve” <[email protected]>

[email protected] (Steve (Musical Dictator))

[email protected]

Steve\ [email protected]

“steve”@example.com

steve\@[email protected]

Page 71: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Doing it Right Is Hard

Recommended on stackoverflow.com ...

^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$

Page 72: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl - Mail::RFC822::ADDRESS(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*))*)?;\s*)

Page 73: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

perl - Mail::RFC822::ADDRESS

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*))*)?;\s*)

Page 74: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

RFC3696 - Checking Names

http://tools.ietf.org/html/rfc3696

Page 75: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

So how do you really do it?

Ask what their address is.

Maybe look for an “@”

Send a test message.

If it gets through, it’s valid.

Page 76: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Tips

Or, “Leftover Slides”

Page 77: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Doing anything fancy at all?

Use egrep not grep.Or, “grep -E” - same thing.

Page 78: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Use egrep not grep.

This is an “egret”

Page 79: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

grep handles only Basic Regular Expressions.egrep handles a lot more.

Page 80: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Quote the pattern!

Page 81: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

See exactly what’s matching.Use “grep --color”

Page 82: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

betterthangrep.com - based on PRCEConsider installing “ack”

Use --passthru to print all lines, even those not matching

Page 83: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Surprise!grep changed in Mountain Lion

GNU Grep in 10.7

BSD Grep in 10.8– No more “--use-perl” for PCRE

Use perl itself - or go get “ack”.• www.betterthangrep.com

Page 84: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Things I’m Not Proud Of

Page 85: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st
Page 86: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st
Page 87: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

BonusLinguistic Tagging

Page 88: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Beyond Regular ExpressionsFor Objective-C Programmers

NSRegularExpression

NSDataDetector (iOS 4, OS X 10.7)

• Dates

• Addresses

• Links

• Phone Numbers

• Transit Information

NSLinguisticTagger

Page 89: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

WWDC Session 215Text and Linguistic Analysis

Page 90: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

NSLinguisticTagger

Find ...• Word and sentence boundaries

• Lexical classes– noun, verb, adjective

• Lemmas– Root forms of words

• Named Entities– Personal names, place names, organization names

Page 91: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

In ActionNSLinguisticTagger

NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes: [NSArray arrayWithObjects: NSLinguisticTagSchemeTokenType, NSLinguisticTagSchemeLexicalClass, NSLinguisticTagSchemeNameType, NSLinguisticTagSchemeNameTypeOrLexicalClass, NSLinguisticTagSchemeLemma, nil] options:0];[tagger setString:string];

Page 92: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

In ActionNSLinguisticTagger

[tagger enumerateTagsInRange:range scheme:NSLinguisticTagSchemeLexicalClass options:NSLinguisticTaggerOmitWhitespace usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop){

if (tag == NSLinguisticTagNoun) // we have found a noun...;

Page 93: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Safe Tweets In SchoolAn Example

Prompt the user for a line of text

Use NSLinguisticTagger to identify the nouns

Remove everything else.

Page 94: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Resources

Page 95: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

regular-expressions.info

Page 96: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

www.regexpal.com

Page 97: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st
Page 98: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st
Page 99: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st
Page 100: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st
Page 101: Regex - Clean - MacSysAdmindocs.macsysadmin.se/2013/pdf/regex.pdf · 2013-09-21 · Aani aardvark aardwolf ... grep -l “Just list the matching file names ... # Match a 20th or 21st

Prize Challenge:

How many 14 letter wordsare in /usr/share/dict/words ?