Top Banner
LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04
54
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

LIS651 lecture 5regular expressions & wotan use

Thomas Krichel

2005-11-04

Page 2: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

remember DOS?

• DOS had the * character as a wildcard. If you saidDIR *.EXE

• It would list all the files ending with .EXE• Thus the * wildcard would mean “all

characters except the dot”• Similarly, you could say

DEL *.*

• to delete all your files

Page 3: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

regular expression• Is nothing but a fancy wildcard. • There are various flavours of regular

expressions. – We will be using POSIX regular expressions

here. They themselves come in two flavors• old-style• extended

We study extended here aka POSIX 1003.2. – Perl regular expressions are more powerful and

more widely used.

• POSIX regular expressions are accepted by both PHP and mySQL. Details are to follow.

Page 4: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

pattern

• The regular expression describes a pattern of characters.

• Patters are common in other circumstances. – Query: ‘Krichel Thomas’ in Google– Query: ‘"Thomas Krichel"’ in Google– Dates are of the form yyyy-mm-dd.

Page 5: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

pattern matching

• We say that a regular expression matches the string if an instance of the pattern described by the regular expression can be found in the string.

• If we say “matches in the string” may make it a little more clearer.

• Sometimes people also say that the string matches the regular expression.

• I am confused.

Page 6: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

metacharacters

• Instead of just giving the star * special meaning, in a regular expression all the following have special meaning\ ^ $ . | ( ) * + { } ?

• Collectively, these characters are knows as metacharacters. They don't stand for themselves but they mean something elseDEL *.EXE

• does not mean: delete the file "*.EXE". It means delete anything ending with .EXE.

Page 7: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

simple regular expressions

• Characters that are not metacharacters just simply mean themselves‘good’ does not match in ‘Good Beer’

‘d B’ matches in ‘Good Beer’

‘dB’ does not match in ‘Good Beer’

‘Beer ‘ does not match in ‘Good Beer’

• If there are serveral matches, the pattern will match at the first occurance‘o’ matches in ‘Good Beer’

Page 8: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

the backslash \ quote

• If you want to match a metacharacter in the string, you have to quote it with the backslash‘a 6+ pack’ does not match in ‘a 6+ pack’

‘a 6\+ pack’ does match in ‘a 6+ pack’

‘\’ does not match in ‘a \ against boozing’‘\\’ does match in ‘a \\ against boozing’

Page 9: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

other characters to be quoted

• Certain non-metacharacters also need to be quoted. These include some of the usual suspects– \n the newline– \r the carriage return– \t the tabulation character

• But this quoting occurs by virtue of PHP, it is not part of the regular expression.

• Remember Sandford’s law.

Page 10: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

anchor metacharacter ^

• ^ matches at the beginning of the string.• $ matches at the end of the string.

‘keeper’ matches in ‘beerkeeper’

‘keeper$’ matches in ‘beerkeeper’

‘^keeper’ does not match in ‘beerkeeper’

‘^$’ matches in ‘’

• Note that in a double quoted-string an expression starting with $ will be replaced by the variable string value (or nothing if the variable has not been set).

Page 11: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

character classes• We can define a character class by

grouping a list of characters between [ and ] ‘b[ie]er’ matches in ‘beer’

‘b[ie]er’ matches in ‘bier’

‘[Bb][ie]er’ matches in ‘Bier’

• Within a class, metacharacters need not be escaped. In the class only - ] and ^ are metacharacters. They can be quoted without the \.

Page 12: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

dash in the class• Within a character class, the dash - becomes

a metacharacter. • You can use to give a range, according to the

sequence of characters in the character set you are using. It’s usually alphabetic‘be[a-e]r’ matches in ‘beer’

‘be[a-e]r’ matches in ‘becr’

‘be[a-e]r’ does not match in ‘befr’

• If the dash - is the last character in the class, it is treated like an ordinary character.

Page 13: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

^ in the character class

• If the caret ^ appears as the first element in the class, it negates the characters mentioned.‘be[^i]r’ matches in ‘beer’

‘b[^ie]er’ does not match in ‘bier’

‘be[^a-e]r’ does match in ‘befr’

‘be[e^]r’ matches in ‘beer’

‘beer[^6-9] matches ‘beer0’ to ‘beer5’

• Otherwise, it is an ordinary character.

Page 14: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

standard character classes• The following predefined classes exist

[:alnum:] any alphanumeric characters

[:digit:] any digits

[:punct:] any punctuation characters

[:alpha:] any alphabetic characters (letters)

[:graph:] any graphic characters

[:space:] any space character (blank and \n, \r)

[:blank:] any blank character (space and tab)

[:lower:] any lowercase character

Page 15: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

standard character classes

[:upper:] any uppercase character

[:cntrl:] any control character

[:print:] any printable character

[:xdigit:] any character for a hex number

• They are locale and operating system dependent.

• With this discussion we leave character classes.

Page 16: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

The period . metacharacter

• The period matches any character bar the newline \n.

• The reason why the \n is not counted is historic. In olden days matching was done line by line, because the computer could not hold as much memory.‘.’ does not match in ‘’;

‘^.$’ does not match in "\n"

‘^.$ matches in ‘a’

Page 17: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

alternative operator |

• This acts like an or‘beer|wine’ matches in ‘beer’

‘beer|wine’ matches in ‘wine’

Page 18: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

grouping

• You can use ( ) to group ‘(beer|wine) (glass|)’ matches in ‘beer glass’

‘(beer|wine) (glass|)’ matches in ‘wine glass’

‘(beer|wine) (glass|)’ matches in ‘beer ’

‘(beer|wine) (glass|)’ matches in ‘wine ’

‘(beer|wine) (glass(es|)|)’ matches in

‘beer glasses’

• Yes, groups can be nested.

Page 19: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

repetition operators• * means zero or more times what preceeds it.• + means one or more times what preceeds it.• ? means one or more times what preceeds it.• The shortest preceding expression is used, i.e.

either a single character or a group.(beer )* matches in ‘’

(beer )? matches in ‘’

(beer )+ matches in ‘beer beer beer’

be+r matches in ‘beer’

be+r does not match in ‘bebe’

Page 20: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

enumeration• We can use {min,max} to give a minimum min

and a maximum max. min and max are positive integers.

• ? is just a shorthand for {0,1}• + is just a shorthand for {1,}• * is just a shorthand for {0,}

‘be{1,3}r’ matches in ‘ber’

‘be{1,3}r’ matches in ‘beer’

‘be{1,3}r’ matches in ‘beeer’

‘be{1,3}r’ does not matches in ‘beeeer’

Page 21: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

examples

• US zip code ^[0-9]{5}(-[0-9]{4})*$• something like a current date in ISO form

^(20[0-9]{2})-(0[1-9]|1[0-2])-([12][0-9]|3[01])$• (D[89])|(L[5-9]))IS[0-9]{2}• <[:alpha:]+ */*>

Page 22: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

not using posix regular expressions

• Do not use regular expressions when you want to accomplish a simple for which there is a special PHP function already available.

• A special PHP function will usually do the specialized task easier. Parsing and understanding the regular expression takes the machine time.

Page 23: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

ereg()

• ereg(regex, string) searches for the pattern described in regex within the string string.

• It returns the false if no string was found.• If you call the function as ereg(regex, string,

matches) the matches will be stored in the array matches. Thus matches will be a numeric array of the grouped parts (something in ()) of the string in the string. The first group match will be $matches[1].

Page 24: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

ereg_replace

• ereg_replace ( regex, replacement, string ) searches for the pattern described in regex within the string string and replaces occurrences with replacement. It returns the replaced string.

• If replacement contains expressions of the form \\number, where number is an integer between 1 and 9, the number sub-expression is used. $better_order=ereg_replace('glass of (Karlsberg|

Bruch)', 'pitcher of \\1',$order)

Page 25: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

split()• split(regex, string, [max]) splits the string

string at the occurrences of the pattern described by the regular expression regex. It returns an array. The matched pattern is not included.

• If the optional argument max is given, it means the maximum number of elements in the returned array. The last element then contains the unsplit rest of the string string.

• Use explode() if you are not splitting at a regular expression pattern. It is faster.

Page 26: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

case-insensitive function

• eregi() does the same as ereg() but work case-insensitively.

• eregi_replace() does the same as ereg_replace() but work case-insensitively.

• spliti() does the same as split() but work case-insensitively.

Page 27: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

Regular expressions is mySQL

• You can use POSIX regular expressions in mySQL in the SELECT commandSELECT … WHERE REGEXP ‘regex’

• where regex is a regular expression.

Page 28: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

communication with wotan

• For file editing and manipulation, we use putty.

• For file transfer, we use winscp.• Both are available on the web. • The protocol is ssh, the secure shell, based

public-key cryptography.

Page 29: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

installing putty

• Go to your favorite search engine to search for putty.

• If you have administrator rights install the installer version.

• Since you have already installed winscp, you should have no further problems.

Page 30: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

putty options

• In the window/translation choose UTF-8, always.

• Find out what the size of your screen is of screen that your are using for the font that you are using, and save that in your session.

• For wotan, the port is 22, ssh.

Page 31: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

issuing commands

• While you are logged in, you talk to the computer by issuing commands.

• Your commands are read by command line interpreter.

• The command line interpreter is called a shell. • You are using the Bourne Again Shell, bash.

Page 32: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

bash features

• bash allows to browse the command history with the arrow keys

• bash allows to edit commands with the arrow keys

• “exit” is the command to leave the shell.

Page 33: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

files, directories and links

• Files are continuous chunks data on disks that are required for software applications.

• A link is a file that contain the address of another file. Microsoft call it a shortcut.

• Directories are files that contain other files. Microsoft calls them folders.

• In UNIX, the directory separator is “/”• The top directory is “/” on its own.

Page 34: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

home directory

• When you first log in to wotan you are placed in your home directory /home/username

• “cd” is the command that gets you back to the home directory.

• The home directory is also abbreviated as “~“• cd ~user gets you to the home of user user.• “cd ~” does what?

Page 35: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

~/public_html

• Is your web directory. I created it with “mkdir public_html” in your home directory.

• The web server on wotan will map requests to http://wotan.liu.edu/~user to show the file ~user/public_html/index.html

• The web server will map requests to http://wotan.liu.edu/~user/file to show the file ~user/public_html/file

• The server will do this by virtue of a configuration option.

Page 36: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

changing directory, listing files

• cd directory changes into the directory directory

• the current directory is “.”• its parent directory is “..”• ls lists files

Page 37: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

users and groups

• “root” is the user name of the superuser.• The superuser has all privileges.• There are other physical users, i.e. persons

using the machine• There are users that are virtual, usually

created to run a daemon. For example, the web sever in run by a user www-data.

• Arbitrary users can be put together in groups.

Page 38: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

permission model

• Permission of files are given– to the owner of the file– to the the group of the file– and to the rest of the world

• A group is a grouping of users. Unix allows to define any number of groups and make users a member of it.

• The rest of the world are all other users who have access to the system. That includes www-data!

Page 39: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

listing files

• “ls” lists files• “ls -l” make a long listing. It contains

– elementary type and permissions (see next slide)– owner– group– size– date – name

Page 40: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

first element in ls -l• Type indicator

– d means directory– l means link– - means ordinary file

• 3 letters for permission of owner• 3 letters for permission of group• 3 letters for permission of rest of the world• r means read, w means write, x means execute• Directories need to be executable to get in

them…

Page 41: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

change permission: chmod

• usage: chmod permission file• file is a file• permisson is three numbers, for owner,

group and rest of the world.• Each number is sum of elementary numbers

– 4 is read– 2 is write– 1 is excute– 0 means no permission.

• Example: chmod 764 file

Page 42: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

general structure of commands

• commandname –flag --option• Where commandname is a name of a

command• flag can be a letter• Several letters set several flags at the same

time• An option can also be expressed with - -

and a word, this is more user-friendly than flags.

Page 43: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

example command: ls

• ls lists files• ls -l makes a long listing• ls -a lists all files, not only regular files but

some hidden files as well– all files that start with a dot are hidden

• ls -la lists all files is long listing • ls --all is the same as ls -a. --all is known as

a long listing.

Page 44: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

copying and removing files

• cp file copyfile copies file file to file copyfile. If copyfile is a directory, it copies into the directory.

• mv file movedfile moves file file to file movedfile. If movedfile is a directory, it moves into the directory.

• rm file removes file, there is no recycling bin!!

Page 45: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

directories and files

• mkdir directory makes a directory• rmdir directory removes an empty directory• rm -r directory removes a directory and all its

files• more file

– Pages contents of file, no way back• less file

– Pages contents of file, “u” to go back, “q” to quit

Page 46: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

file transfer• you can use winscp to upload and

download files to wotan. • If uploaded files in the web directory remain

invisible, that is most likely a problem with permission. Refer back to permissions.

• chmod 644 * will put it right for the files• chmod 755 . (yes with a dot) will put it

right for the current directory • * is a wildcard for all files.• rm -r * is a command to avoid.

Page 47: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

editing

• There are a plethora of editors available. • For the neophyte, nano works best. • nano file edits the file file.• nano -w switches off line wrapping.• nano shows the commands available at the

bottom of the screen. Note that ^letter, where letter is a letter, means pressing CONTROL and the letter letter at the same time.

Page 48: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

emacs

• This is another editor that is incredibly featureful and complex.

• Written by Richard M. Stallman, of GNU and GPL fame.

• Get an emacs cheat sheet of the web before you start it. Or look at next slide.

Page 49: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

emacs commands

• ^x^s saves buffer• ^x^c exits emacs• ^g escapes out of a troublesome situation• control+space sets the mark• ^w removes until the mark (cut)• ^y pastes

Page 50: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

common emacs/bash commands

• ^k kills until the end of the line or removes empty line

• ^y yank what has been killed (paste)• ^a get to the beginning of the line• ^e get to the end of the line

Page 51: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

emacs modes

• Just like people get into different moods, emacs gets into different modes.

• One mode that will split your pants is the PHP mode.

• “emacs file.php” to get into PHP mode.• Then look how emacs checks for

completion of parenthesis, braces, brackets, and the ; and use the tab character to indent.

Page 52: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

copy and paste

• Putty allows to copy and paste text between windows and wotan.

• On the windows machine, it uses the windows approach to copy and paste

• On wotan machine, – you copy by highlighting with the mouse’ left

button– you paste using the middle button

Page 53: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

running mySQL• You can run mySQL in command line mode

in wotan. Typemysql -u user -p

• You will then be prompted for your password. The username and password are your mySQL user name and mySQL password, not your wotan user name and wotan password.

• Don’t forget the semicolon after each command!

Page 54: LIS651 lecture 5 regular expressions & wotan use Thomas Krichel 2005-11-04.

http://openlib.org/home/krichel

Thank you for your attention!

Please switch off machines b4 leaving!