Top Banner
Chapter 12: gawk Yes it sounds funny
24

Chapter 12: gawk

Feb 25, 2016

Download

Documents

kesler

Chapter 12: gawk. Yes it sounds funny. In this chapter …. Intro Patterns Actions Control Structures Putting it all together. gawk?. GNU awk awk == Aho, Weinberger and Kernighan Pattern processing language Filters data and generates reports. gawk con’t. Syntax: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 12: gawk

Chapter 12:gawk

Yes it sounds funny

Page 2: Chapter 12: gawk

In this chapter …• Intro• Patterns• Actions• Control Structures• Putting it all together

Page 3: Chapter 12: gawk

gawk?• GNU awk• awk == Aho, Weinberger and Kernighan• Pattern processing language• Filters data and generates reports

Page 4: Chapter 12: gawk

gawk con’t• Syntax:

gawk [options] [program] [file-list]gawk [options] –f program-file [file-list]

• Essentially, program is a list of things to pattern match, and then a list of actions to perform

• Can either be on the command line or in a file

Page 5: Chapter 12: gawk

gawk program• A gawk program contains one or more lines

in the format pattern { action }• Pattern is used to determine which lines of

data to select• Action determines what to do with those lines• Default pattern is all lines• Default action is to print the line• Use single quotes around program on CL

Page 6: Chapter 12: gawk

Patterns• Simple numeric or string comparisons

< <= == != >= >• Regular expressions (see Appendix A)

– The ~ operator matches pattern– The !~ operator does not match pattern

• Combinations using || (OR) and && (AND)

Page 7: Chapter 12: gawk

Patterns, con’t• BEGIN – before any lines are processed• END – after all lines are processed• pattern1,pattern2 – a range, that starts with

pattern 1, and ends with pattern2. After matching pattern2, gawk attempts to match pattern1 again

Page 8: Chapter 12: gawk

Variables• $0 – the current record (line)• $1-$n – fields in current record• FS – input field separator (default: SPACE / TAB)• NF – number of fields in record• NR – current record number• RS – input record separator (default: NEWLINE)• OFS – output field separator• ORS – output record separator

Page 9: Chapter 12: gawk

Associative Arrays• A variable type similar to an array, but with

strings as indexes (instead of integers)• Ex

– myAssocArray[name] = “Bob”– myAssocArray[hometown] = “Austin”

• Ex– studentGrades[123-45-6789] = 75– studentGrades[987-65-4321] = 100

Page 10: Chapter 12: gawk

Pattern examples• $1 ~ /^[A-Z]/

– Matches records where first field starts with a capital letter

• $3 <= $5– Matches records where the third field is less than

or equal to the fifth field• $2 > 5000 && $1 !~ /exempt/

– Matches records where second field is greater than 5000 and first field is not exempt

Page 11: Chapter 12: gawk

Functions• length(str) – returns length of str

– Returns length of line if str omitted• int(num) – returns integer portion of num• tolower(str) – coverts chars to lower case• toupper(str) – converts chars to upper case• substr(str,pos,len) – returns substring of str

starting at pos with length len

Page 12: Chapter 12: gawk

Actions• Default action is print entire record• Using print, can print out particular parts (i.e.,

fields)– Ex. { print $1 }

• Put literal strings in single quotes• By default multiple parameters catenated

– Use comma to use OFS• Ex. { print $1, $5 }

Page 13: Chapter 12: gawk

Actions, con’t• Separate multiple actions by semicolons• Other actions usually involve variables (i.e.,

incrementors, accumulators)• Variables need not be formally initialized• By default set to zero or null• Standard operators function normally

* / % + - = ++ -- += -= *= /= %=

Page 14: Chapter 12: gawk

Actions, con’t• Instead of print you can use printf (c-style)• Syntax:

– printf “control-string”, arg1, arg2 … argn– control-string contains one or more conversion– %[-][[x].[y]]conv

• - – left justify x – min field width y – decimal places•conv: d – decimal f – floating point s – string• Ex: %.2f – floating point with two decimal places

Page 15: Chapter 12: gawk

Control Structures• gawk programs can utilize several control

structures• Can use if-else, while, for, break and

continue• All are C-style in syntax (what did the K in

gawk stand for?)

Page 16: Chapter 12: gawk

if … else• Syntax:if (condition)

{commands

}else

{commands

}

Page 17: Chapter 12: gawk

while• Syntax:while (condition)

{commands

}

Page 18: Chapter 12: gawk

for• Syntax:

for (init; condition; increment){

commands}

• You can use break and continue for both for and while loops

Page 19: Chapter 12: gawk

Examples• gawk ‘{print}’ cars• gawk ‘/chevy/’ cars• gawk ‘{print $3, $1}’ cars• gawk ‘/chevy/ {print $3, $1} cars• gawk ‘$1 ~ /^h/’ cars• gawk ‘2000 <= $5 && $5 < 9000’ cars• gawk ‘/volvo/ , /bmw/’ cars• gawk ‘{print $3, $1, “$” $5}’ cars• gawk ‘BEGIN {print “Car Info”}’ cars

Page 20: Chapter 12: gawk

Putting it all togetherBEGIN{

print " Miles"print "Make Model Year (000) Price"print \"--------------------------------------------------"}{if ($1 ~ /ply/) $1 = "plymouth"if ($1 ~ /chev/) $1 = "chevrolet"printf "%-10s %-8s %2d %5d $ %8.2f\n",\

$1, $2, $3, $4, $5}

Page 21: Chapter 12: gawk

Resultsgawk -f printf_demo cars MilesMake Model Year (000) Price--------------------------------------------------plymouth fury 1970 73 $ 2500.00chevrolet malibu 1999 60 $ 3000.00ford mustang 1965 45 $ 10000.00volvo s80 1998 102 $ 9850.00ford thundbd 2003 15 $ 10500.00chevrolet malibu 2000 50 $ 3500.00bmw 325i 1985 115 $ 450.00honda accord 2001 30 $ 6000.00ford taurus 2004 10 $ 17000.00toyota rav4 2002 180 $ 750.00chevrolet impala 1985 85 $ 1550.00ford explor 2003 25 $ 9500.00

Page 22: Chapter 12: gawk

Associative Arrays• gawk ‘ {manuf[$1]++}END {for(name in manuf) print name,\ manuf[name]}’ cars | sort

• bmw 1chevy 3ford 4honda 1plym 1toyota 1volvo 1

Page 23: Chapter 12: gawk

Standalone Scripts• Alternative to issuing gawk –f at command

line• Just like making a shell script – first line

defines what runs script• #!/bin/gawk –f• Then begin your patterns/actions

Page 24: Chapter 12: gawk

Advanced gawk• getline - allows you to manually pull lines

from input– Useful if you need to loop through data

• Coprocess – direct input or output through a second process, using |& operator

• Coprocess can be network based using /inet/tcp/0/URL