Top Banner
1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries
41

1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

Dec 14, 2015

Download

Documents

Holly Price
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

1

Textual Data

Many computer applications manipulate textual data

• word processors• web browsers• online dictionaries

Page 2: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

2

Java’s String Class

• in simplest form, just quoted text"This is a string"

"So is this"

"hi"

• used as parameters to– Text constructor– System.out.println

Page 3: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

3

Strings are Objects

• String is a class, not a primitive type

• Java provides many methods for manipulating them

• compare with equals method

• find length with length method

Page 4: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

4

Manipulating Strings

• Java also provides String literals and + operator– special features because strings used in

many programs

Page 5: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

5

The Empty String

• smallest possible string

• made up of no characters at all (length is 0)

• ""

• typically used when we want to build something from nothing

Page 6: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

6

Building a String "From Nothing"

Ex. Morse code

• Allow user to display a series of dots and dashes

• Long mouse click signifies dash

• Short click signifies dot

private String currentCode = "";• currentCode is empty until user begins to enter dots and dashes

• 16.1.rtf

Page 7: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

7

Long Strings

• Strings can be arbitrarily long– String chapter in your Java text can be 1 big string

• Practical issue for long strings: Readability– Might want line breaks in a string– newline character \n

Ex. Let's add instructions to the Morse Code program

Page 8: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

8

Morse Code Instructions

This program will allow you to enter a message in Morse Code.

To enter your message:Click the mouse quickly to generate a dot;Depress the mouse longer to generate a dash.

Page 9: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

9

Printing Instructions

1. Series of 5 System.out.printlin instructions, or

2. Define String constant INSTRUCTIONS; print INSTRUCTIONS

private static final String INSTRUCTIONS ="This program will allow you to enter a message in Morse code.\n" +"\n" +"To enter your message:\n" +"Click the mouse quickly to generate a dot;\n" +"Depress the mouse longer to generate a dash.";

Note "\n" just has length one!!

Page 10: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

10

Readability and Legality

Java does not allow us to write a String literal with actual line breaks in it!

System.out.println( "The message that you have entered contains

characters that cannot be translated." );

is illegal

System.out.println( "The message that you have entered contains " +

"characters that cannot be translated." );

is legal

Page 11: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

11

Many String Methods

• someString.length() returns an int that is number of characters in

someString

• someString.endsWith( otherString )returns true if and only if otherString is a suffix of

someString

• someString.startsWith( otherString )returns true if and only if otherString is a prefix of

someString

Page 12: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

12

More Useful Methods

• Example. Web browsers offer automatic address completion

I type "http://www.a"

My browser suggests "http://www.aol.com"

• Keep track of URLs typed in by users

• Use this to provide suggestions

• Start of a URL History class

Page 13: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

13

Finding a Substring

• someString.indexOf( otherString )– think of otherString as a pattern to be found– returns an int giving first index in someString

where otherString found

• Example. if sentence is"Strings are objects in Java."

and pattern is "in", then

sentence.indexOf(pattern)returns 3.

Page 14: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

14

If sentence is

"Strings are objects in Java."and pattern is "primitive type", then

sentence.indexOf(pattern)returns -1

Page 15: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

15

Using indexOf to find URLs

// Return true if and only if the history contains the given URLpublic boolean contains( String aURL ) {

// Look for URL terminated by newline separatorreturn urlString.indexOf( aURL + "\n" ) >= 0;

}

Why must we add newline to the URL to be found?

Page 16: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

16

Another indexOf• someString.indexOf( pattern, startIndex)

– Searches for pattern in someString, beginning at index given by startIndex

• If someString is

"Strings are objects in Java."and pattern is "ing", then

someString.indexOf( pattern, 0)returns 3

someString.indexOf( pattern, 5)returns -1

someString.indexOf( "in", 5)returns 20

Page 17: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

17

Case Sensitivity

someString.indexOf( "IN" )yields -1

if someString is

"Strings are objects in Java."

Page 18: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

18

Dealing with Lower and Upper Case

• sometimes useful and important to distinguish between lower and upper case

• sometimes not

if "http://www.cs.williams.edu" in our history

surely we want to recognize

"HTTP://www.cs.williams.edu"as the same

Note: part of URL after domain name may be case sensitive. Will ignore that here.

Page 19: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

19

Methods for Handling Case

• someString.equalsIgnoreCase( otherString )returns true if someString and otherString are composed of the same sequence of characters

ignoring diffs in case• someString.toLowerCase()

returns a copy of someString with upper case chars replaced by lower case

• someString.toUpperCase()

Page 20: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

20

Improving our contains method

// Return true if and only if the history contains the given URLpublic boolean contains( String aURL ) {

String lowerUrlString = urlString.toLowerCase();// Look for URL terminated by newline separatorreturn lowerUrlString.indexOf( aURL.toLowerCase() + "\n" ) >=0;

}

Alternative: Maintain URL History in lower case

• Fig16.6.rtf

Page 21: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

21

Cutting and Pasting

• can paste strings together with concatenation operator (+)

• can also extract substrings• somestring.substring( startIndex, endIndex )

returns substring of someString beginning at startIndex and up to, but not including, endIndex

Ex. If urlString is “http://www.cs.williams.edu”urlString.substring( 7, 10 )

returns "www" and

urlString.substring( 0, 7 )returns “http://” and

urlString.substring( 7, urlString.length() )returns “www.cs.williams.edu.”

Page 22: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

22

Rules for substring

• startIndex must be a valid index in the string

• endIndex may not be greater than the length of someString

Page 23: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

23

Will use substring to help us find URL completions

• Let prefix be URL entered so far.• Use indexOf to find prefix in urlString• Extract full URL from urlString (up to

newline)• Add full URL to list of all possible

completions.

• fig 16.7.rtf

Page 24: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

24

Trimming Strings

• often want to ignore leading and trailing blanks in a string

“http://www.cs.williams.edu” vs.

"http://www.cs.williams.edu "• someString.trim()

returns a copy of someString with white space removed from both ends

• Fig 16.8.rtf

Page 25: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

25

Comparing Strings

• equals and equalsIgnoreCase• someString.compareTo( anotherString )

returns– 0, if someString and anotherString are equal– positive int, if someString appears after

anotherString in lexicographic ordering– negative int, if someString appears before

anotherString in lexicographic ordering

Page 26: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

26

Lexicographic Ordering

if• 2 strings are made up of alphabetic characters and• both all lower case or upper case

then

lexicographic ordering = alphabetical ordering

<maintaining URL history in order>

Page 27: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

27

StringBuffer

• Java Strings are immutable.• StringBuffer is essentially a mutable String• Various ways to construct them

// empty with initial capacity 1000StringBuffer urlStringBuffer = new StringBuffer(1000);

// create StringBuffer from existing StringStringBuffer urlStringBuffer = new StringBuffer (urlString);

• Many useful methods (append, replace, delete)• Some String methods missing (toLowerCase,

toUpperCase)

Page 28: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

28

Characters

• Strings are sequences of characters• Java data type char represents characters• a primitive data type• char literal written by putting character in single

quotes

'a', 'A', '?', '7', '\n'

Note: these are not the same as

"a", "A", "?", "7", "\n"

Page 29: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

29

Declaration and Use

• To declare variable letter of type charchar letter;

• chars in Java represented internally as integers• can perform arithmetic operations on them• can compare them with operators like < and >

Page 30: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

30

1. Determine whether a char represents a digit in the range 0-9.

if ( mysteryChar >= '0' && mysteryChar <= '9')works because integers representing '0' to '9' are consecutive numbers

1. e

2. Determine whether mysteryChar is a lower-case alphabetical character

if ( mysteryChar >= 'a' && mysteryChar <= 'z')

works because ints representing 'a' to 'z' are consecutive

Page 31: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

31

Constructing Strings from chars

• can build a String from char components

new String (characterArray)• If example is the array of char

then

String aString = new String(example);creates the String

"an example"

'a' 'n' ' ' 'e' 'x' 'a' 'm' 'p' 'l' 'e'

Page 32: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

32

Extracting chars from Strings

• aString.charAt( index )returns the char at the specified index in aString

• If aString is "Coffee", then

aString.charAt(1)

returns '0'• common use for charAt: check whether the

characters in a string have some property

Page 33: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

33

Using charAt

• Consider a medical record management program

• Want to treat weight as an int

• If weightField is the weight text field:

String weight = weightField.getText();int weightValue = Integer.parseInt(weight);

But this only works if weight entered looks like an int

Page 34: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

34

Checking for Integer Conversation

Valid: "154", "016"

Not valid: "154lbs", " 12"// Returns true if and only if number is a string of// digits in the range 0-9public boolean validInt( String number ) {

for (int i = 0; i < number.length(); i++) {char digit = number.charAt( i );if (digit < '0' || digit > '9') {

return false;}

}return true;

}

Page 35: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

35

Operations on chars

• ability to perform arithmetic on chars can be extremely useful.

Example. A program that will translate a message into Morse code.– Make it simple: alphabetic messages only– Assume all characters upper case.

Page 36: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

36

Translating to Morse Code

I LOVE JAVA

.. .-.. --- ...- . .--- .- ...- .-

Page 37: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

37

High-level Translation

// Converts an alphabetic string into Morse Codepublic String toMorseCode( String message ) {

String morseMessage = "";for (int i = 0; i < message.length(); i++) {

char letter = message.charAt( i );if (letter == ' ') {

morseMessage = morseMessage + WORD_SPACE:} else {

morseMessage = morseMessage + morseCode( letter ) + " ";}

}return morseMessage;

}

Page 38: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

38

How Does morseCode work?

• look up code in array

• would be convenient if int value of 'A' was 0, but it isn't– can calculate appropriate index!

[letter - 'A']

– if letter is 'A', gives 0– if letter is 'B', gives 1

etc.

Page 39: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

39

Translating a Character to Morse Code

// Returns the sequence of dots and dashes corresponding to// a letter of the alphabetpublic String morseCode( char letter ) {

return letterCode[letter - 'A'];}

Page 40: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

40

Chapter Review

• Java provides String literals and + operator• But Strings are objects!• Many useful methods

– equals, equalsIgnoreCase– compareTo– toUpperCase, toLowerCase– indexOf– substring– trim– startsWith, endsWithand many others

Page 41: 1 Textual Data Many computer applications manipulate textual data word processors web browsers online dictionaries.

41

char

• allows us to manipulate characters

• written as individual characters between single quotes

• represented internally as integers - can perform arithmetic on them