Top Banner
grep (Global REgular expresion Print) Operation Search a group of files Find all lines that contain a particular regular expression pattern Write the result to an output file grep returns to the prompt with no extra output when it is done Syntax: grep [-cilLnrsvwx] pattern [list of files] Examples find information about the user, harley >grep harley /etc/passwd Find all lines in the files containing the string xxx . >grep xxx .
24

grep (Global REgular expresion Print)

Feb 02, 2016

Download

Documents

Bank Keroro

grep (Global REgular expresion Print). Operation Search a group of files Find all lines that contain a particular regular expression pattern Write the result to an output file grep returns to the prompt with no extra output when it is done Syntax: grep [-cilLnrsvwx] pattern [list of files] - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: grep (Global REgular expresion Print)

grep (Global REgular expresion Print)

• Operation– Search a group of files– Find all lines that contain a particular regular expression

pattern– Write the result to an output file– grep returns to the prompt with no extra output when it is

done• Syntax: grep [-cilLnrsvwx] pattern [list of files]• Examples

– find information about the user, harley>grep harley /etc/passwd

– Find all lines in the files containing the string xxx .>grep xxx .

Page 2: grep (Global REgular expresion Print)

grep Flags

1. -c count the number of matches2. -i Ignore case when searching for matches3. -l List the file names containing matches4. -L list files that do not have a match5. -n Write the line number in front of each line6. -r perform a recursive directory search7. -s suppress warning and error messages8. -v search for lines without the matching pattern9. -w search only for complete words10. -x lines that exactly match the pattern

Page 3: grep (Global REgular expresion Print)

Regular Expressions

• Industry standard way to specify patterns– In Java: string.match("pattern");– In Java: string.replaceAll("pattern", string)

• Meta-characters/operators (some need to be escaped)^ beginning of line, $ end of a line* match 0 or more of the previous group+ match 1 or more of the previous group? match 0 or one of the previous group{n} match n of the previous group{m,n} match m to n of the previous group{n,} match n or more of the previous group| match either the group before or the groups after. match any character except for new line\ literally interpret the following meta-character or operator

Note: Many UNIX programs use these (vi, sed, more, grep, awk)

Page 4: grep (Global REgular expresion Print)

Regular Expression ExamplesRegular Expression String Match

[a-z](12){3}[c-e]{3} a121212cde Yes

a.*e+ abc12cde Yes

a.*f abc12cde No

^a.*e$ abc12cde Yes

^b*e$ abc12cde No

^a*e$ abc12cde No

\^.*\$ ^ab12cd$ Yes

^.*$ ^ab12cd$ Yes

^*$ ^ab12cd$ No

Note: To use ( ) { } or + grep use the –E (extended) switch or precede with \

Page 5: grep (Global REgular expresion Print)

More grep ExamplesContents of a file called homeworkMath: problems 12-10 to 12-33, due MondayBasketWeaving: make a 6-inch basket, DONEPsychology: essay on Animal Existentialism, due end of termSurfing:catch at least 10

grep commands >grep –v DONE homework displays all but line 2>grep –c DONE homework displays 1>grep –wi ".*a.*" on homework displays all lines>grep –w "m.*e" homework displays line 2>grep –i "d.*e" homework displays lines 1, 2 and 3>grep '\(Ma\|DO\).*' homework displays lines 1 and 2

Note: the last example escapes the parentheses and the vertical bar

Page 6: grep (Global REgular expresion Print)

Sorting Data• Background

– Each line in a file is a record– Each line is a series of fields separated by spaces and/or tabs

• Commands>sort fileName sorts fileName on the 1st field of each line>sort -k 6 fileName sorts on the 6th field of each line>sort –n –k 5 fileName sort on the 5th field numerically>sort –t sort –k4r –k3 abc fileName sort descending on the 4th field,

and then ascending on the 3rd with ':' as a delimeter>sort –t ':' fileName sort using ':' as a separator character>sort –u –k2r fileName sort reverse on the 2nd field and remove

duplicates (output must be unique)>sort –k 3,4 in a pipe sorts by the key, from field 3 through field 4>sort –k5n –k8 sorts numeric by the 5th field and alphabetic by the 8th

Page 7: grep (Global REgular expresion Print)

SED (Stream Editor

• SED is a filter– Input from stdin or a file– Output to stdout or a file– Modifies the input to produce the output– Non-interactive

• Processing– Read from an input stream– Perform line oriented commands– Write to an output stream

• Syntax: >sed [-i] command | [-e command] … [file]

Page 8: grep (Global REgular expresion Print)

Search and Replace

• Search, change and redirect to newFile>sed ‘s/cat/dog/g' file > newFile

• Search, change, and edit file>sed –i ‘s/cat/dog/g' file

• Specific range of lines: >sed '5,10s/cat/dog/g' file

• Lines apply search to lines containing OK: >sed '/OK/s/cat/dog/g' names

• Lines apply to lines having 2 numeric characters>sed '/[0-9]\{2\}/s/cat/dog/g' names

• Delete range of lines: >sed '5,10d' file

Note: single quotes suppress the shell's interpretation of special characters

Note: This syntax works in vi, more, awk

Note: You must escape the characters: +, { and } for it to work

Page 9: grep (Global REgular expresion Print)

Complex Commands sed –i \

-e 's/mon/Monday/g' \

-e 's/tue/Tuesday/g' \

-e 's/wed/Wednesday/g' \

-e 's/thu/Thursday/g' \

-e 's/fri/Friday/g' \

-e 's/sat/Saturday/g' \

-e 's/sun/Sunday/g' \

calendar

• The backslash is a continuation character

• The –e specifies another command (extension)

• The '/g/ means change every occurrence on each line, not just the first

Page 10: grep (Global REgular expresion Print)

AWK

• AWK (Aho, Weinberger, Kernigham)• Special purpose programming language

– Interpretive– Useful for UNIX Scripts

• Purposes– Filter text files based on supplied patterns– Produce reports– Callable from "vi"– Create simple databases– Simple mathematical operations– Creating scripts

• Not good for large complicated tasks• Other interpretive languages: perl, php

Page 11: grep (Global REgular expresion Print)

General Syntax

• The single quote causes the shell to ignore special characters

• The various clauses are optional

• Much of the syntax for <action> clauses is c and Java compatible

• The patterns utilize regular expressions

BEGIN {<initialization>}

<pattern> {<action>}

<pattern> {<action>}

<pattern> {<action>}

END {<final actions>}

>awk '<awk program>'

Page 12: grep (Global REgular expresion Print)

AWK General Operation

• Each file consists of a series of records• Each record is a series of fields• Defaults

– Record separator: new line character– Field separator: white space characters

• Flow of Operation– Read the input file line by line– If it matches the line, then process– Otherwise skip

Page 13: grep (Global REgular expresion Print)

Some AWK Simple Examples1. Print fields of records in a file

>awk ' {print $5, $6, $7, $8} ' fileName2. Print lines with a search string

>awk '/gold/ {print}' fileName3. Print the number of records

>awk 'END {print NR, "records"}' fileName4. Print records using a condition

>awk '{if ($3 < 1980) print $3}' fileNameor >awk ‘$2 > max {println $2}’ fileName

5. Comparing field to regular expression>awk ‘$2 ~ /[0-9]+/ {print $2}’ fileName

6. Using variables>awk '/gold/{sum += $2} END {print "value = " sum}‘ \

fileName

Page 14: grep (Global REgular expresion Print)

A Longer AWK command

awk –F ';' \'BEGIN \{num_gold=0; wt_gold=0; } \\ /[Gg]old/ { num_gold++; wt_gold += $2; } \\END \{ printf("\n Gold Pieces: %2d %5.2f\n“, \ num_gold, wt_gold); \}' \goldFile

Gold 3.5

Silver 2.25

Bronze 5.31

Gold 23.22

gold 0.22

goldFile

OutputGold Pieces: 3 26.94

Note: The backslashes are continuation lines

Semi colons delimit the fields in the file

Page 15: grep (Global REgular expresion Print)

Execute Program in a file

# awk program summarizing a coin collectionBEGIN {num_gold=0; wt_gold=0; } /gold/ {num_gold++; wt_gold += $2}; END { val_gold = 485 * wt_gold;printf("\n Gold Pieces: %2d", num_gold);

printf("\n Gold Weight: %5.2f", wt_gold); printf("\n Gold Value: %7.2f\n", val_gold);}

awk –F ';' –f <program> <fileName>

Output Gold Pieces: 3 Gold Weight: 26.94 Gold Value: 13065.90

Page 16: grep (Global REgular expresion Print)

Invoking AWK>awk [-F<ch>] [<program>] [-f <programFile>]

[<vars>] [- | <datafile>]

• <ch> is a field separator (default: space, tab)• <program> an AWK program• <programFile> a file containing an AWK

program• <vars> a series of variables to initialize

>awk –f program f1=file2 f2=file1 > output• - means accept AWK input from STDIN• <dataFile> a file containing data to process

Note: AWK is often invoked repeatedly in shell scripts

Page 17: grep (Global REgular expresion Print)

Search Patterns

• An exact string: /The/• A string starting a line: /^The/• A string ending a line: /The$/• A String ignoring case of first letter: /[Tt]he• Decimal: /[0-9]*.[0-9]*/• Alphanumeric: /[a-zA-Z0-9]*/• Choice between two strings: /(da|De).*/• Numeric: /[+-]?[0-9]+/• Any Boolean expression: $4>90 or $4>$5

Note: Some utilities require \(, \) and \| if you use ()| regular expression characters

Page 18: grep (Global REgular expresion Print)

Built in Variables

• NR: Total number of records• NF: Total number of fields• FILENAME: The current input file• FS: Field separator character• RS: Record separator character• OFS: Output field separator character• ORS: Output record separator character• OFMT: The default printf output format

Page 19: grep (Global REgular expresion Print)

Arrays and control structures

• Indexed and associative arrays– By index: months[3] = "March";– Associative: debts["Kim"] = 1000;– Note: arrays index from one, not zero

• Counter Controled: for (i=1, i<100; i++) data[i] = i;• Iterator: for (i in myArray) print i, names[i];• Pre test: i=0; while (i<20) data[i] = i++;• Condition:

if (i==1) print debts["Kim"] else print debts["Joe"]; print (i==1)? debts["Kim"] : debts["Joe"];

• Unconditional control statements– break: jump out of a loop– continue: next iteration– next: get next line of input– exit: exit the AWK program

Page 20: grep (Global REgular expresion Print)

Built-in functions

• Square root: print sqrt(3.6)

• Integer portion: print int(3.2)

• Substring: print substr("abcde", 3,2);

• Split: letters = split("a;b;c;d;e", ";");• Position: print index("gorbachev", "bach");

Note: if a substring doesn't exist, 0 returnedNote: Strings index from one, not zero

Page 21: grep (Global REgular expresion Print)

printf• printf(<template>, <arguments>);

– printf applies the template to the arguments– Formats are specified in the templates

%d for integer output%o for octal%x for hexadecimal%s for string%e for exponential format%f for floating point format

– Greater control%5.2f means 5 spaces wide, print two digits%-8.4s means left justify, 8 wide, print 4 characters%08s means output leading zeroes, print 8 characters

Page 22: grep (Global REgular expresion Print)

Escape Characters

• New line: \n

• Carriage return: \r

• Backspace: \b

• Horizontal tab: \t

• Form feed: \f

• A quote: \"

• A backslash: \\

Page 23: grep (Global REgular expresion Print)

AWK redirection and pipes

• Create a file with the first field>awk '{print $1 >> "file" }’

• Pipe output to another utility>ls –l | awk '{print $8}' | tr '[a-z]' '[ A-Z]'

Pipe to a utility to translate from lower to upper case

• Sort the grades file and print the first field>sort +4n grades | awk '{print $1}'

• list .txt files < 2000 bytes, print sorted descending>ls –l | grep '\.txt$' | awk '$5 < 2000 {print $9, $5}' | sort –nr +1

Page 24: grep (Global REgular expresion Print)

More Examples

• Print Bush's grades>awk '/Bush/{print $3, $4}' grades

• Print first name, last name, and quiz 3 grade for everyone who got more than a 90 on quiz 1 and 2>awk '{if ($4>90 && $5>90) print $3, $2, $6}' grades>awk '$4>90 && $5>90 {print $3, $2, $6}'

• Print username for user with userid 502>awk –F: '{if ($3==502) print $1}'>awk –F: '$3==502 {print $1}'