Top Banner
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen
25

Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Dec 27, 2015

Download

Documents

Stewart Hudson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Programming Perl in UNIX

Course Number : CIT 370

Week 4

Prof. Daniel Chen

Page 2: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Introduction

Review and Overviews Chapters 7 and 8 Summary Lab Mid-term Exam Next Week (Week 5)

Page 3: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Topics of Discussion What Is a Regular Expression? Expression Modifiers and Simple

Statements Regular Expression Operators Regular Expression

Metacharacters Unicode

Page 4: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Chapter 7: Regular Expressions – Pattern Matching

7.1 What Is a Regular Expression? 7.2 Expression Modifiers and Simple

Statements 7.3 Regular Expression Operators

Page 5: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

7.1 What Is a Regular Expression/

A regular expression is really just a sequence or pattern of characters that is matched against a string of text when performing searches and replacements.

Example: 7.1

/abc/

?abc?

Page 6: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

7.2 Expression Modifiers and Simple Statements

Conditional Modifiers The if Modifier

Format: Expression2 if Expression1;

Examples: 7.2, 7.3, 7.4

The DATA Filehandle Format: __DATA__

The actual data is stored here

Examples: 7.5, 7.6

Page 7: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

7.2 Expression Modifiers and Simple Statements The unless Modifier

Format: Expression2 unless Expression1;

Examples: 7.7, 7.8

Looping Modifiers

The while Modifier

Format: Expression2 while Expression1;

Examples: 7.9

The Until Modifier

Example: 7.10

The foreach Modifier

Example: 7.11

Page 8: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

7.3 Regular Expression Operators The m Operator and Matching

Format: /Regular Expression/

m#Regular Expression#

m(regular expression)

Table 7.1

Examples: 7.12, 7.13, 7.14, 7.15, 7.16

Page 9: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

7.3 Regular Expression Operators The g Modifier-Global Match

Format: m/search pattern/g

Example: 7.17

The i Modifier-case Insensitivity Format: m/search pattern/i

Example: 7.18

Special Scalars for Saving patterns

Example: 7.19

The x Modifier-Global Match Example: 7.20

Page 10: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

7.3 Regular Expression Operators The s Operator and Substitution

Format: s/old/new/;

s/old/new/I;

s/old/new/g;

Table 7-2

Examples: 7.21, 7.22, 7.23

Changing the Substitution Delimiters

Example: 7.24, 7.25

The g Modifier-Global Substitution Examples: 7.26, 7.27

Page 11: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

7.3 Regular Expression Operators The I Modifier-Case Insensitivity

Format: s/search pattern/replacement string/i;

Examples: 7.28, 7.29

The e Modifier-Evaluating An Expression

Format: s/search pattern/replacement string/e;

Examples: 7.30, 7.31, 7.32, 7.33

Pattern Binding Operators

Format: variable = ~ /Expression/

variable !~ /Expression/

Variable =~ s/old/new

Table 7.3

Examples: 7.34, 7.35, 7.36, 7.37, 7.38, 7.39

Page 12: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Chapter 8: Getting Control – Regular Expression Metacharacters

8.1 Regular Expression Metacharacters

8.2 Unicode

Page 13: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

8.1 Regular Expression Metacharacters

Regular expression metacharacters are characters that do not represent themselves. They are endowed with special powers to allow you to control the search pattern in some way.

Metacharacters lose their special meaning if proceeded with a backslash(\).

Metasymbols – [0-9] = \d

Example: 8.1

/^a…c/

Table 8.1

Page 14: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

8.1 Regular Expression Metacharacters

Metacharacters for Single Characters Table 8.2

Example: 8.2

The s Modifier-The Dot metacharacter and the newline Example: 8.3

The Character Class A character class represents one character from a

set of characters.

Examples: 8.4, 8.5, 8.6, 8.7, 8.8, 8.9

Page 15: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

8.1 Regular Expression Metacharacters

The POSIX Character Class

POSIT (the Portable Operating System Interface) is an industry standard used to ensure that programs are portable across operating system.

Table 8.3

Example 8.11

Whitespace Metacharacters

Table 8.4

Examples: 8.12, 8.13, 8.14

Page 16: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

8.1 Regular Expression Metacharacters Metacharacters to Repeat Pattern matches

Quantifier – One or more characters

The Greed Factor – the asterisk (*) It matches for zero or more of the preceding character.

$-=“ab123456783445554437AB” s/ab[0-9]*/X/; XAB

Table 8.5

Example 8.15, 8.16, 8.17, 8.18, 8.19, 8.20, 8.21, 8.22

Metacharacters That Turn Off Greediness

By pacing a question mark after a greedy quantifier, the greed is turned off and the search ends after the first match, rather the last one.

Table 8.6

Examples: 8.24

Page 17: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

8.1 Regular Expression Metacharacters Anchoring Metacharacters

Zero-width assertions – Anchors correspond to positions, not actual characters.

Table 8.7

Example 8.25, 8.26, 8.27, 8.28

The m Modifier

The m modifier is used to control the behavior of the $ and ^ anchor metacharacters.

Examples: 8.29

Alternation

Alternation allows the regular expression to contain alternative pattern to be matched,

Example 8.30

Page 18: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

8.1 Regular Expression Metacharacters

Grouping or Clustering The process of grouping characters together is

called clustering.

Example 8.31, 8.32, 8.33, 8.34

Remembering or Capturing Subpattern – If the regular expression pattern is

enclosed in parentheses. The subpattern is saved in special numbered scalar variables, and these variables can be used later in the programs.

The process of grouping characters together is called clustering.

Example 8.35, 8.36, 8.37, 8.38, 8.39, 8.40, 8.42

Page 19: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

8.1 Regular Expression Metacharacters Turning Off Capturing

?: metacharacter can be used to suppress the capturing of the subpattern.

Example 8.43

Metacharacters That Look Ahead and Behind Look ahead in the string for a pattern (?=pattern)

Look behind in the string for a pattern (?<=pattern)

Table 8.8

Example 8.44, 8.45, 8.46, 8.47

Page 20: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

8.1 Regular Expression Metacharacters

The tr or y Function The tr function translates characters, in a one-on-

one correspondence, from the characters in the search string to the characters in the replacement string.

Table 8.9

Example 8.48

Example 8.49 (tr Delete Option)

Example 8.50 (tr Complement Option)

Example 8.51 (tr Squeeze Option)

Page 21: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

8.2 Unicode

The Unicode standard is an effort to solve the problem by creating new character sets, called UTF8 and UTF16.

Unicode has the capacity to encompass all the world’s written language.

Perl and Unicode

Perl 5.6 supports UTF8 Unicode

The utf8 program turns on the Unicode settings and the bytes programs turn off.

Table 8.10

Example 8.52

Page 22: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Summary What Is a Regular Expression? Expression Modifiers and Simple

Statements Regular Expression Operators Regular Expression

metacharacters Unicode

Page 23: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Lab

Examples 7.1 – 7.39 (P 163 – 195) Examples 8.1 - 8.52 (P 197 - 248) Homework 4

Page 24: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Mid-term Exam Date: Next week

Exam Time: 11:00 AM - 11:30 AM

Contents: Chapter 1- Chapter 8.1.2

No books, no notes, no computer

Page 25: Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.

Next Week

Reading assignment (Textbook chapter 8.1.3 and Chapter 9)

Mid-term Exam (Chapter 1 – Chapter 8.1.2)