-
1
COMP 110 Note Set #6: Characters, Strings, Tokens, Tokenizing
(or Splitting) Strings Outline of Noteset #6 Summary/Review of
Algorithms from earlier: accumulator, min/max, linear search
Program Testing: compile-time checks, run-time checks, logical
checks, performance checks Text: char and String datatypes The
ASCII character encoding table. The char datatype and character
constants. The String datatype and string constants. Converting
Strings to numbers with Integer.parseInt(), Double.parseDouble(),
etc. Converting numbers to Strings with Integer.toString(),
Double.toString(), etc. Using the length() method for String (note
that this is a method for String but a property for arrays).
Comparing String content using equals(). Comparing String
references using == Comparing String ordering using compareTo()
Examining individual characters in a String with charAt()
Extracting substrings with substring() Searching for characters in
a String with indexOf() Tokenizing Strings with StringTokenizer
class Tokenizing Strings with split() method. Other String
formatting examples Command Line Arguments
-
2
Summary: Algorithms Based on Loops Here are a few of the
algorithms discussed recently that you should memorize as
fundamental “patterns” or “building blocks” that will come up over
and over again in future problem solutions (examples use while
loops, but for loops are equivalent): Accumulator
int sum = 0; int[] data = ...; int i = 0; while (i <
data.length) { sum = sum + data[i]; i++; }
Min/Max
int[] data = ...; int mi = 0; // index of max element (initially
0) int i = 1; // index of current element to compare while (i
data[mi]) mi = i; // update max inde x i++; } int max =
data[mi];
Linear Search
int[] data = ...; int value = ...; boolean found = false; int i
= 0; while (i
-
3
Getting it Right: How do you know when your program is correct?
As you develop your programs, your program passes through several
gates or checkpoints. These checks get progressively harder to
pass. As your program passes them, you gain increased confidence
that your program is correct. But as you now realize, the
compiler’s (and even the interpreter’s) ability to find errors is
limited. A professional programmer must adopt a skeptical attitude
toward his or her software, ie, you must constantly look for
problems and work hard to convince yourself that your program is
correct. The first hurdle: the compiler The compiler detects the
most fundamental kind of programming error, usually thought of as
syntax errors: mismatched parens and braces, spelling mistakes,
etc. Errors caught by the compiler are said to be detected at
compile time. In other words, these are errors found in your
program before it ever even runs. The compiler won’t let you run
your program until all compiler errors are removed. The second
hurdle: the interpreter Even though the compiler says that your
program is free of compiler errors, this does not alone guarantee
that the program is correct. The program must be tested by
executing it and seeing what happens. If your program runs into an
unexpected situation, the JVM (Java Virtual Machine) environment
will usually stop your program prematurely and print out an error
message. The “unexpected situation” that an incorrect program
creates is usually called an exception. You must decipher the
message, correct the source code, recompile, and reexecute your
program to verify that the exception has been eliminated. Errors
caught by exceptions are said to be detected at run time . The
final hurdle: subjecting the output to your own judgment The
hardest of all errors to detect are subtle (or not-so-subtle) logic
errors. These don’t cause compiler errors and may not even cause
runtime exceptions. Only careful examination of a program’s inputs
and outputs can detect such errors. The moral of the story is to
adopt a very skeptical attitude when testing your programs.
“Future” hurdles Actually, there are other measures of quality for
programs that you will see in later classes. Already in this class,
you are encouraged to write programs that are “general”. In COMP
182/282, you will learn how to solve problems efficiently, ie,
solve the problem in a way that minimizes the use of time and space
(memory). Example: compute the sum of integers a and b Solution #1:
c = a + b; Solution #2: c = 0; for (int i=0; i
-
4
Characters The ASCII Character Encoding Standard As with all
data stored in digital form, textual information is ultimately
encoded as numbers within a computer program. The basis of this
process is to create a character encoding system. The encoding
assigns a number to every character that can appear in printable
text (or text that can be entered from a keyboard). The encoding
usually includes codes for a few special characters that are not
directly displayable. The best-known encoding for English is called
ASCII (American Standard Code for Information Interchange),
officially ANSI Standard X3.4-1968. See example table at
http://asciitable.com. A portion of the ASCII character encoding
standard is repeated below: Decimal Octal Hex Binary Value -------
----- --- ------ ----- ... 007 007 007 00000111 BEL (Bell) 008 010
008 00001000 BS (Backspace) 009 011 009 00001001 HT (Horizontal
Tab) 010 012 00A 00001010 LF (Line Feed) 011 013 00B 00001011 VT
(Vertical Tab) 012 014 00C 00001100 FF (Form Feed) 013 015 00D
00001101 CR (Carriage Return) ... 032 040 020 00100000 SP (Space)
033 041 021 00100001 ! 034 042 022 00100010 " 035 043 023 00100011
# 036 044 024 00100100 $ 037 045 025 00100101 % 038 046 026
00100110 & ... 048 060 030 00110000 0 049 061 031 00110001 1
050 062 032 00110010 2 ... 065 101 041 01000001 A 066 102 042
01000010 B 067 103 043 01000011 C 068 104 044 01000100 D 069 105
045 01000101 E ... 097 141 061 01100001 a 098 142 062 01100010 b
099 143 063 01100011 c 100 144 064 01100100 d 101 145 065 01100101
e ...
-
5
It’s hard to remember many numerical codes, so Java provides a
set of mnemonics called character constants. Character constants
(for printable characters) are created by writing the name of the
single character surrounded by single quotes.
'a' encoded by the number 97 'b' encoded by the number 98 'c’
encoded by the number 99 ... 'A' encoded by the number 65 'B'
encoded by the number 66 'C' encoded by the number 67 ...
An interesting usage of the encoding is for characters that
represent numerical digits. '0' encoded by the number 48 '1'
encoded by the number 49 '2' encoded by the number 50 ...
Note: double quotes are used for String constants; double and
single quotes are not interchangeable. • 'a' is a character
constant for a single character • "a" is a String constant for a
String of length 1 (a String containing one character). A single
character constant is a primitive value stored in a char variable.
A String containing one character is similar to an array with one
element. It is stored differently from a single primitive char. The
char Datatype To further support operations on characters, Java
provides a data type called char (an abbreviation of the word
“character”, just like “int” is an abbreviation of the word
“integer”). This datatype can be used to declare variables of type
char. Character constants can be assigned to variables of type
char.
char c; c = 'a';
The assignment statement stores or assigns the numerical code
for the character ‘a’ into the char variable c. If you were to look
at the value actually stored in the variable c, you would find the
number 97. To verify this, you can use casting, ie, cast the
character constant to an int and print out the int value. char c; c
= 'a'; int x; x = (int) c; System.out.println(x); Or to demonstrate
the same principle in one line, you can just write:
System.out.println( (int)'a'); Now look at the following program
and try to figure out what it prints to the display:
public class CharTest { public static void main(String[] args) {
System.out.println('a'); // print character ‘a’
System.out.println((int)'a'); // print code for ‘ a’ } }
Output is a 97
-
6
Special Characters Usually, character constants are written as a
single char inside single quotes. There are a few special character
constants that begin with the backslash character ‘\’. Here are a
few of the most common ones:
'\n' the newline character '\t' the tab character
'\"' the double quote character (single quote + backslash +
double quote + single quote) These are useful for detailed
formatting of information output from your program to the display.
They are frequently used by embedding them in the middle of a
longer String constant containing other text. For example, to print
the text Value of "x" is 3 to the display, including the double
quotation marks, use int x = 3; System.out.println("Value of \"x\"
is " + x); The special character for double quote is necessary to
prevent the compiler from prematurely terminating the String.
Strings A String is a series of characters that have been linked
together for convenience, to represent text that contains more than
one character. Informally, you can think of a String as being
implemented as an array of characters. Caution for C/C++
programmers: Java String is not the same as C/C++ string.
Conceptually a String is similar to an array of characters, but you
cannot treat it as such syntactically in a Java program. In C/C++ a
string really is an array of characters. C/C++ Only (not permitted
in Java) char str1[] = "this is a string"; char c = str1[3]; //
stores character ‘s’ into variable c int x = strlen( str1 ); //
calculates number of characters in string str1 In Java, a String
must be manipulated used object style syntax. For example, to
obtain a single character from a String, the method “charAt()” must
be applied to the String, and the method “length()” must be used to
obtain the number of characters in the String: Java (using Object
Style Syntax) String str1 = "this is a string"; char c =
str1.charAt(3); // you cannot use the notation str1[3] // to get
the character at position 3 in Java int x = str1.length(); String
Constants A String constant is a series of characters inside double
quotes. "a String constant" "a String constant that contains \n a
newline char acter" The backslash can be used to put double quotes
into the inside of the String "another \"String\" constant" String
Variables Variables can be created to refer to Strings, in a very
similar way that variables can be created to refer to arrays. That
is, variables to refer to Strings are also reference variables. The
statement
String s; creates a variable s of type “reference to String”. As
with arrays, the string doesn’t yet exist. It’s very common to
assign String constants to a String reference variable.
String s = "this is a string";
-
7
Many useful methods return Strings as their result. Such a
result can also be assigned to a String. int x = 22; String s =
Integer.toString(x);
String Constructors A String constructor is a special predefined
method String(). It is always used in conjunction with the “new”
operator (which you already used to create arrays). It takes one
parameter of type String (usually a String constant or expression,
but other expressions are possible). The result is a reference to a
newly created unique String in memory.
String s = "abcde"; // assign a String constant to reference
variable s String t = new String("abcde"); // assign a unique
String to reference variable t
Class String and Predefined String Methods The String in Java is
actually defined by a class. Non-primitive data types are defined
in Java by writing a class definition. The Java language has
predefined many useful classes which are organized into packages.
The String class is defined inside package “java.lang”. Other
packages such as java.io and java.util will be used shortly.
Normally, each package your program uses must be imported by your
program import java.io.*; or import java.util.*; This statement
“imports” all the classes inside package “java.io” or package
“java.util”. The package “java.lang” is special because it is
automatically imported into every Java program. So class String is
automatically available to every Java program without importing its
definition. There are many predefined methods (operations) that can
be used to perform useful operations on Strings. int length(): The
length of a String The length of a String indicates how many
characters it contains. This is similar to the concept of length
for arrays. But there is an important difference. For arrays,
length is a property. For Strings, length is a method (an operation
or a function).
int[] data = new int[10]; int x = data. length ; // no parens
String s = " this is a string " ; int x = s. length() ; // with
parens
Comparing Strings There are at least three ways to compare two
Strings • ==: testing to see if two Strings are identically the
same String • boolean equals(): comparing two Strings for equality
(same content) • int compareTo(): comparing two Strings for
relative ordering Why can’t we just use “==” when comparing two
Strings? As a general rule, the equality comparison operator “==”
should only be used with numbers. Two String values should be
compared using the String equals() method. DO use String x = ...;
String y = ...; if (x.equals(y)) { ... } // recommended DON’T use
if (x==y) { ... } // NOT recommended It may be appropriate in some
cases to use “==”, but usually the result is not what was
intended.
-
8
String x = “abcde”; String y = “abcde”; String z = “AbCdE”;
String w = x; x == x true x == y compiler dependent, but normally
true (we will assume it’s true in this course) x == z false x == w
true
• Case “x == x” is trivially true • Case “x == z” is trivially
false • Case “x == w” is interesting, because it illustrates a
property about reference variables. Two
reference variables can easily be made to refer to the same
value in memory. The expression “x==y” is also interesting. The
compiler will almost always optimize the creation of additional
Strings. If the same String constant appears twice in the same
program, the compiler will most likely only create the String
constant once, and then reuse it as needed. So “x == y” is true for
most compilers [Interestingly, the compiler is not required to do
this. As a result, this expression can theoretically have different
values on different compilers. Still, the Java compiler from Sun
does use this optimization, so we’ll assume this behavior in our
discussion of Strings.] Here’s a slightly different case with a
subtle difference:
String x = new String(“abcde”); String y = new String(“abcde”);
String z = new String(“AbCdE”); String w = x; x == x true x == y
false x == z false x == w true
In the case “x == y”, the value is always false. The String
constructor always creates unique copies of its parameter. The key
difference is the use of the keyword “new” with the String
constructor. This forces a physically different String constant to
be created in memory which happens to contain the same characters
as the original. x == y false // because Strings are physically
distinct in memory x.equals(y) true // because Strings contain same
characters
x
y
z
w
“abcde”
“AbCdE”
x
y
z
w
“abcde”
“AbCdE”
-
9
If we have two String values to compare, we usually don’t care
if the two Strings are physically the same String or not. We only
want to know if they have the same content or not. When testing for
equivalent content, we want to use “x.equals(y)” Note: there is no
notequals() method. To test for inequality, use !x.equals(y) String
x = ...; String y = ...; if (!x.equals(y)) { ... } Equality
Comparisons that Ignore Case It might be useful to compare Strings
without regard for upper case – lower case. Not surprisingly,
there’s a method for this: equalsIgnoreCase()
“abcde”.equals(“abcde”) true “abcde”.equals(“AbCdE”) false
“abcde”.equalsIgnoreCase(“AbCdE”) true
What about =? No! Comparison operations are designed for use
with primitive values (int, double). They cannot be used at all
with String values. Ordered Comparisons: compareTo() equals() works
as intended, but since its result is boolean, it can only be used
to answer • “yes (true), the two Strings are equal” or • “no
(false), the two Strings are not equal” In some applications we
might want to know more than just simple equality or inequality. We
might want to know if one String comes before or after another
String alphabetically or lexicographically. The compareTo() method
allows us to compare two String values lexicographically. Its
result is an int whose value indicates the order of the two
Strings. Remember the ASCII character encoding discussed earlier.
The numerical values of all the character codes can be used to put
Strings into a specific order. Normal alphanumeric or lexicographic
comparison:
“a” comes before “b” “aa” comes before “ab” ...
Shorter Strings come before longer ones “a” comes before “aa”
...
Digits start at 48, upper case letters at 65, lower case letters
at 97 “0” comes before “A” “A” comes before “a” ...
x
y
z
w
“abcde”
“AbCdE”
“abcde”
x
y
z
w
“abcde”
“AbCdE”
“abcde”
-
10
The compareTo() method uses character codes to make decisions
about String ordering. System.out.println("a".compareTo("aa"));
System.out.println("aa".compareTo("a")); Output is • < 0 if the
first String comes before the second • == 0 if the two Strings are
equivalent • > 0 if the first String comes after the second
Examples "a".compareTo("aa") < 0 "a".compareTo("a") == 0
"aa".compareTo("a") > 0 The exact numerical value of the
comparisons that result in a non-zero value actually provides more
detailed info about how the Strings are different, but in simple
programs, it’s more common to just look for 0, < 0, or > 0.
Parsing Strings that contain only Characters that represent numbers
Special Strings that contain only characters for numbers can be
converted into the corresponding numerical value. Let’s llustrate
the difference between String constants such as “123” and the
numerical constant 123.
int x = 123;
String s = "123";
Or, knowing what we know about character encodings:
49 50 51 Since this special kind of String comes up so often, we
need easy-to-use predefined operations to do the conversions for
us. When converting a String to a number, Java provides several
operations for parsing, ie, converting the String into its
numerical equivalent. Any attempt to parse a String that contains
non-numerical characters generates a runtime exception. How to
correctly handle such situations in your program will be covered in
the unit on exceptions later in the course. (exercise: think about
how the Integer.parseInt() operation works, given that you now know
how Strings representing numbers are actually stored) When
converting a number into a String, the operation is usually
performed automatically. There are a few predefined operations that
will do this operation, called formatting, for you explicitly. For
example:
int x = 3; System.out.println(“value of x is ” + x);
Look at the information given to System.out.println() for
output: "value of x is " + x Here is how the compiler deals with
this "value of x is "
-
11
In order to make sense out of this expression, the int must be
converted into an equivalent String before the append operation can
be completed. So do the following:
take the value of x, which is 3 convert it into an equivalent
String, ie “3” append it to the first string “value of x is ”
result is the String “value of x is 3”, and send this to the
display
Operations on Strings Use the Selector Operator “.” When writing
expressions that use multiple Strings, we frequently use the dot or
selection operator. Example:
String s = "xyz"; String t = "abc";
Instead of writing if (s == t) ... we write if ( s.equals(t) )
... Other examples int x = s.length(); char c = s.charAt( 2 ); etc.
In an expression like “s.equals(t)”, we say that the method
“equals()” is applied to s, and t is a parameter to the method.
This is actually an introduction to an OOP style of programming.
Strings are a type of object because the String type is defined by
a class definition. We create objects from class definitions. When
performing an operation on an object, the method is applied to the
object, rather than passing the object as a parameter. If other
data is required in addition to the original object, then this data
is passed as paramters normally. Examples: Length: instead of
writing: String s = "xyz"; int x = length(s); // wrong We write
String s = "xyz"; int x = s.length(); // correct Equals: instead of
writing String s = "xyz"; String t = "abc"; if ( equals( s, t ) ...
// wrong We write if ( s.equals( t ) ) ... // correct This kind of
expression is going to be very common when dealing with objects and
OOP which we’ll cover in lecture shortly.
-
12
Obtaining Individual Characters at Specific Positions A String
is similar to an array of characters. Each character in a String
occupies an indexed position. The indexes start at 0 and go until
the length of the String – 1. String x = "Hello, world!"
H e L l o , w o r l d ! 0 1 2 3 4 5 6 7 8 9 10 11 12
System.out.println( x.length() ); // 13 System.out.println(
x.charAt( 7 ) ); // w System.out.println( x.charAt( 11 ) ); //
d
The expression “x.charAt( 7 )” is conceptually similar to the
expression “x[ 7 ]”. But remember that Strings are not the same as
arrays of characters in Java, so the expression “x[ 7 ]” where x is
a String is not allowed in Java. You must use “x.charAt( 7 )”.
Comparison Between Arrays and String Strings are very similar to
arrays of characters, but not identical. Each character in a String
occupies a position or index, using 0-based counting.
int[] x = { 3, 18, 22, 34, 19 }; x[ 0 ] refers to the number
stored in array x at position 0: 3 x[ 1 ] refers to the number
stored in array x at position 1: 18 x[ 2 ] refers to the number
stored in array x at position 2: 22 x[ 3 ] refers to the number
stored in array x at position 3: 34 x[ 4 ] refers to the number
stored in array x at position 4: 19
The analogous code for String data requires the charAt()
operation to be applied to the String variable using the selector
operator.
String s = "pxvtmr"; s.charAt( 0 ) refers to the character at
position 0: 'p' s.charAt( 1 ) refers to the character at position
1: 'x' s.charAt( 2 ) refers to the character at position 2: 'v'
s.charAt( 3 ) refers to the character at position 3: 't' s.charAt(
4 ) refers to the character at position 4: 'm' s.charAt( 5 ) refers
to the character at position 5: 'r'
So Strings are similar to arrays of characters, but we do not
use the square bracket notation with Strings.
-
13
Substrings Given an original String, the substring() method
creates a new String that is a part of the original. Substrings are
defined by their starting and stopping index positions. There are
several versions.
String s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; String t = s.substring(
4, 8 ); System.out.println( t ); // "EFGH"
Note that the substring(4,8) includes the characters at
positions 4, 5, 6, and 7 (does not include 8). String v =
s.substring(8); System.out.println( v ); // "IJKLMNOPQRSTUVWXYZ"
This 2nd version of substring() only takes one argument. The result
is the substring from that position to the end of the string.
s.substring( x ) is the same as s.substring( x, s.length() ) Method
Overloading: another preview of OOP In this example, we have seen
two versions of a method with the same name: one version of
substring() that takes two arguments, and one version of
substring() that takes one argument. This is an example of method
overloading, ie, defining two methods that have the same name but
different numbers and/or types of arguments. [some examples in this
section taken from Hubbard, Programming with Java, Schaum’s Outline
Series, McGraw-Hill, 2004]. Locating Characters Within a String The
method indexOf() returns an index position of the first occurrence
of a character within a String. This is also an overloaded
function, ie, two versions with different numbers of arguments.
String str = "This is the Mississippi River."; int i = str.indexOf(
‘s’ );
// 1 st occurrence of 's' in str from the beginning
System.out.println( i ); // 3 int j = str.indexOf( 's', i+1 );
// 1 st occurrence of 's' in str, starting from position i +1
System.out.println( j ); // 6
-
14
Tokens and Tokenization Many programs deal with the processing
of textual data in the form of Strings. So far, we’ve been thinking
about Strings as a single data item. Programs that prompt the user
to input a numerical value receive a String that is then parsed
into the numerical equivalent, for example. String s = "45"; int x
= Integer.parseInt(s); But many String values that are input
actually have some internal structure. For example, suppose you
wanted to write a simple line-oriented calculator application that
works something like this:
> java Calc Please enter an expression: 4 + 3 7 Please enter
an expression: 15 – 4 11 Please enter an expression: quit >
Using the Scanner method nextLine(), we can obtain a line of
input from the keyboard in the form of a single String: Scanner in
= new Scanner(System.in); String s = in.nextLine(); // assume s
contains "4 + 3" But now we have a problem. We want to parse s to
obtain the numerical values that are “buried” inside it, but we
can’t just parse s “as is”. It will generate an exception because
the String s is not simply a number. It is a mixture of numerical
characters, arithmetic symbol characters, and space characters:
'4' ' ' '+' ' ' '3' 0 1 2 3 4
Clearly what we need is to first break up the single String s
into 3 separate Strings a, b, and c as follows: s: "4 + 3" original
a: "4" 1st token b: "+" 2nd token c: "3" 3rd token We need a way to
read the characters of the original String s, locate the pieces
that we are interested in, discard uninteresting characters such as
spaces, and save the remaining pieces as separate String values.
This task comes up so frequently that Java provides predefined code
to solve the problem for us. We just need to learn a little
terminology to use it. The Scanner class with its nextInt() and
nextDouble() methods take care of this problem automatically. In
this example, we are doing it “the hard way”. In some cases the
“hard way” might be “the only way” to solve some problems. The
subparts of our original String "4 + 3" that we are interested in
separating out are called tokens. In our example, the first token
is "4", the second token is "+", and the third token is "3". The
process of taking the original String and breaking it apart into
its tokens is called tokenization. In order to know where one token
stops and the next one begins, we have to decide or agree on the
separator characters. In this case, we assume that the space
character separates the tokens, but we might want to change the
definition of the separators from time to time. The separator
character or characters are called delimiters.
-
15
The “split()” method from class String The split() method is
built into the String class. It takes an initial string and chops
or splits it into substrings based on a set of splitting rules
provided by the user. The returned result is an array of Strings,
with each substring stored in one element of the array. Example:
String s = " this is a multi word string " ; String[] t = s.split(
" " ); The input parameter to the split() method is itself a
string. This string specifies the rules for performing the split.
In the simplest case as shown, the rule is simply the space
character “ ”. This rule tells the split() method to create a new
substring every time it encounters a space character. The complete
set of split() rules is very complex. The method can be used to
perform sophisticated text processing, but this is outside the
scope of the course. After the split() method returns, the array
can be examined like any other array: for (int i=0; i
-
16
Example String s = "This is a test."
• To obtain the individual tokens, create a StringTokenizer. •
Place this statement near the top of the file with any other import
statements
import java.util.*; • Create a StringTokenizer object associated
with String s. StringTokenizer tk = new StringTokenizer(s); •
Obtain the tokens by applying the nextToken() method to the
StringTokenizer object tk. String t1 = tk.nextToken(); // "This"
String t2 = tk.nextToken(); // "is" String t3 = tk.nextToken(); //
"a" String t4 = tk.nextToken(); // "test." Putting all the pieces
together gives:import java.util.*; public class TokenDemo { public
static void main(String args[]) { String s = "This is a test.";
StringTokenizer tk = new StringTokenizer(s);
String t1 = tk.nextToken(); // "This" String t2 =
tk.nextToken(); // "is" String t3 = tk.nextToken(); // "a" String
t4 = tk.nextToken(); // "test."
System.out.println("token #1 = \"" + t1 + "\"");
System.out.println("token #2 = \"" + t2 + "\"");
System.out.println("token #3 = \"" + t3 + "\"");
System.out.println("token #4 = \"" + t4 + "\""); } } Output is
token #1 = "This" token #2 = "is" token #3 = "a" token #4 =
"test."
Limitations with this approach? • Not general • What if we have
a String with more than four tokens? • Better to use methods like
countTokens() or hasMoreTokens() The following general version uses
loops and built-in operations to test for more tokens.
String s = "This is another test with a larger numb er of
tokens."; StringTokenizer tk = new StringTokenizer(s); String[]
tokens = new String[tk.countTokens()]; for (int i=0; i
-
17
Output is token #1 = "This" token #2 = "is" token #3 = "another"
token #4 = "test" token #5 = "with" token #6 = "a" token #7 =
"larger" token #8 = "number" token #9 = "of" token #10 =
"tokens."
Note: it’s not required for you to first put all the tokens into
an array, but it’s sometimes convenient.
-
18
• Another common programming style with StringTokenizer is to
use a while loop String s = ...; StringTokenizer tk = new
StringTokenizer(s); while ( tk.hasMoreTokens() ) { String t =
tk.nextToken(); ... } Another example: String s = "4 + 3";
StringTokenizer t = new StringTokenizer(s);
System.out.println(t.nextToken());
System.out.println(t.nextToken());
System.out.println(t.nextToken()); Output is:
4 + 3
There is also a related class called a BreakIterator which
partitions a string into subsets of characters. In the case of the
StringTokenizer, the separator characters are “consumed” by the
tokenizer and are not available for later examination. The
BreakIterator reports information about the tokens by index
position of characters within the original string and does not
consume any characters. In general, the split() method of class
String is the preferred way to tokenize a string. It should be your
default choice. Only use StringTokenizer or BreakIterator for
special purpose problems. Other Delimiters The documentation for
the constructor of the usual StringTokenizer looks like this public
StringTokenizer(String str)
Constructs a string tokenizer for the specified string. The
tokenizer uses the default delimiter set, which is "\t\n\r\f": the
space character, the tab character, the newline character, the
carriage-return character, and the form-feed character. Delimiter
characters themselves will not be treated as tokens.
Parameters: str - a string to be parsed.
But for special situations, you want to be able to create a
StringTokenizer that is customized to recognize other delimiter
characters. Here’s the documentation for the customizable
StringTokenizer: public StringTokenizer(String str, String
delim)
Constructs a string tokenizer for the specified string. The
characters in the delim argument are the delimiters for separating
tokens. Delimiter characters themselves will not be treated as
tokens.
Parameters: str - a string to be parsed delim - the
delimiters.
Example: String s = “this:is:a:string:with:colon:separators ”;
StringTokenizer t = new StringTokenizer( s, “:” ); The
StringTokenizer in this example is initialized to look for the
colon character “:” as the delimiter or separator, rather than the
default characters of space, tab, newline, etc.
-
19
More String Processing The tokenizer approach described above is
a useful practice exercise for beginning programmers, but it’s not
the best approach for solving the general problem. The best
approach uses the rules of regular expressions to break up a string
according to a specified pattern (the regular expression). One way
to access this feature in Java is to use the split operation for
Strings: Example: String t = "This is a test"; // note extra spac
es String[] tokens = t.split(" +"); // “ +” is RE for
// “one or more spaces” for (int i=0; i
-
20
Converting a char to a String There is a difference between a
character constant like ‘b’ and a String consisting of a single
character like “b”. A convenient operation to convert a character
constant into a one-character String is Character.toString() For
example: char c = ‘x’; // assigns char constant ‘x’ to char
variable c String s = Character.toString(c);
// converts ‘x’ to “x” in String variable s Shifting Characters
by Manipulating their ASCII Cod e We can sometimes take advantage
of the fact that the ASCII codes for the characters are assigned
sequentially for certain ranges. From the ASCII table given
earlier, here are a couple of lines:
Decimal Octal Hex Binary Value ------- ----- --- ------ -----
097 141 061 01100001 a 098 142 062 01100010 b 099 143 063 01100011
c ...
The statement System.out.print(‘a’) prints the character a on
the screen. But look at the following code:
int mysterycode = (int) 'a'; mysterycode++; char mysterychar =
(char) mysterycode; System.out.println(mysterychar);
Hopefully, it won’t be too surprising that it prints out the
character b on the screen. The expression ‘a’ is just a symbol for
the ASCII code value of 97, which is assigned to the int variable.
This int value is then incremented from 97 to 98, and turned back
into a character constant with the cast operation. As a character
code, 98 represents ‘b’, so this is what is printed out. You will
need this little trick as you write your solution to the next
lab.
-
21
Strings are Immutable Once a Java String has been created, it
has a property called “immutability” , which is just a fancy word
for “cannot be modified”. This is very different in general from
C/C++, where strings are just arrays of character constants which
can be modified however you want (unless you use the C++ keyword
const). Example: String s = "abcde " ; char c = s.charAt(0); //
assigns ‘a’ to variable c s.setCharAt(0, 'A'); // illegal, can’t
modify a character
// within the String Perhaps surprisingly, it’s okay to throw a
String away and replace it with a new one. String s = "abcde"; //
assign a reference to a Str ing to s s = "pqrst"; // replace the
reference to the firs t
// String with a second String // this effectively discards 1 st
String But in general if you want to systematically modify an
existing String to introduce a new one, you have a couple of
options. Some of the predefined String operations create a new
String which can be used to replace the original. String s =
"abcde"; s = s.toUpperCase(); // s now has value “ABCDE” To create
the new String one character at a time, you can use a loop and the
“append” operation. What does the following code do? Trace it to
find out. String t = "uvwxyz"; String u = ""; for (int i=0; i
-
22
String Operations (Summary) Comparisons and Testing for
Equality/Inequality "a".compareTo("b") result is 0 because b comes
after a "a".compareTo("a") result is 0 because the two Strings have
the same content "abc".equals("abc") result is true
!("abc".equals("cde")) result is true; note that there is no
operation called
.notequals(); use ! and .equals()
Use “==” only to check if two Strings are physically the same
String in memory (a much less common comparison than compare for
content) String a = "abc"; String b = "cde"; String c = "abc";
String d = new String("abc"); a == b // false a == c // true
(identical String constants unique in memory) a == d // false
(constructor builds distinct Strin g) Extracting Characters from a
String "abcde".charAt(3) // result is ‘d’ Extracting Substrings
"wxyzabcd".substring(3,5) // result is “za” "wxyzabcd".substring(5)
// result is “bcd” Parsing and Formatting Some String constants can
represent a number, others cannot.
"123" // can parse to an int "123.456" // can parse to a double
"abc" // cannot parse to any number
A String that contains characters representing a number can be
converted into the corresponding numerical value This operation is
called parsing. Going in the other direction, converting a number
into a String, and optionally adding other punctuation, is called
formatting . This extra punctuation is added purely for visual
effect. It really has no effect on the numerical value of
underlying data item. To parse a String into an int , use • int
Integer.parseInt(String s) To parse a String into a double, use •
double Double.parseDouble(String s) To format an int into a String
use • Integer.toString(int n) To format a double into a String use
• Double.toString(double d) Many operations that output numerical
values automatically promote the numerical value to String, so most
of the time, it is not necessary to explicitly format a number to
convert it to a String. The exception is when you specifically want
to control the formatting details, such as number of digits to
display to the right of the decimal point, or whether to separate
groups of three digits on the left with commas. The simple
formatting methods shown above do not provide a way to control the
formatting at this level of detail. Instead, there is a special
class called DecimalFormat that is used to perform detailed
formatting on numbers. This class is defined in the package
java.text.
-
23
The operation of a DecimalFormat object has two steps or phases.
The first step is that it must be created or instantiated
(“instantiate” is the OOP term meaning “use a class definition to
build a new non-primitive data item”). At the time it is created,
it is given a formatting “pattern” to work with. The second step
is, once it is created, it can then be used multiple times to turn
numbers into Strings with the “format()” operation. The first step
is performed once. The second step is performed as many times as
needed. Here’s how to create a DecimalFormat object:
import java.text.DecimalFormat; ... DecimalFormat f = new
DecimalFormat("#.###");
This statement creates a DecimalFormat object named f. The
String “#.###” is called a format string . It is composed of a
small number of characters including ‘#’ ‘0’ ‘.’ ‘,’ and a few
others. The character ‘#’ is used to indicate that a digit should
be printed in that position unless it is a trailing zero. The
character ‘0’ means that a digit should be printed in that position
even if it a trailing or leading zero. After the DecimalFormat
object has been created, the format() method is used to convert
numerical values into a String formatted according to the specified
format String. import java.text.*; // remember to import java.tex t
package
DecimalFormat f; String s; f = new DecimalFormat("#.###");
s = f.format(5.12345); System.out.println( s ); // 5.123 s =
f.format(5.12); System.out.println( s ); // 5.12
f = new DecimalFormat("#.000"); s = f.format(5.12345);
System.out.println( s ); // 5.123 s = f.format(5.12);
System.out.println( s ); // 5.120 Once the DecimalFormat object has
been created, it can be reused as many times as desired to convert
numerical values to properly formatted Strings. The resulting
Strings can then be used in whatever way is desired, ie, print to
the monitor, perform further internal processing, etc.
System.out.printf() Method A more recent addition to the language
is the System.out.printf() method, which is a throwback to the
original printf() function introduced by the C language. Printf()
uses embedded format sequences to accomplish similar formatting as
DecimalFormat, but in an arguably simpler way, at least for
programmers familiar with the function from the C language. The
format is: System.out.printf("format string", expr, expr, exp r,
…); where the format string contains normal text to be output
as-is, plus format specifiers embedded in the string starting with
the character %. For each format specifier, there should be an
additional expression separated by a comma after the end of the
format string. The compiler does not enforce this rule, but
violations may generate a runtime exception. Format specifiers
include: %d for integers %f for reals %b for booleans In addition
to the basic datatype, precision and field width info can be added.
For example %.3f real number with 3 digits to the right of the
decimal %10.3f real number with a total minimum width of 10
characters (including decimal and sign) and 3 digits to the right
of the decimal
-
24
System.format() is a similar function that takes the same inputs
as System.out.printf(), but which produces a string and returns it
rather than immediately printing the string to the output. The
following two codes would be equivalent:
int x = 123; double q = 23.456; boolean r = true;
System.out.printf("x = %d, q = %f, r = %b", x, q, r );
System.out.println();
Or int x = 123; double q = 23.456; boolean r = true; String s =
String.format("x = %d, q = %f, r = %b", x, q, r);
System.out.println(s);
-
25
Another Tokenizer Example Here’s a program that • Reads a String
from the keyboard • Breaks the line apart into a series of tokens •
Parses each token to convert it to an integer • Computes the sum of
all the integers • Prints the sum to the monitor.
import java.util.*; public class AddInts { public static void
main(String args[]) throws IOEx ception {
Scanner in = ...; System.out.print("Enter a series of integers:
") ; String line = in.nextLine(); int sum = 0; StringTokenizer
tkline = new StringTokenizer(line ); while (tkline.hasMoreTokens())
{ String currenttok = tkline.nextToken(); int currentval =
Integer.parseInt(currenttok); sum += currentval; }
System.out.println("The sum is " + sum); } }
Output is
C:\> java AddInts Please enter a series of integers: 5 6 7 8
The sum is 26
Exercise: rewrite using string split().
-
26
Command-Line Arguments Here’s a little more detail on how a Java
source file is turned into an executable application. Strings
placed on the command line after the name of the class are called
command-line arguments.
C:> javac SomeClass.java C:> java SomeClass 10 11 12
In this example the command line arguments are “10”, “11”, and
“12”, ie, every item of information on the command line that follow
the name of the Java class being executed. Command line arguments
are passed to your program as an array of Strings. This array is
made available to the main method as the input argument named
args.
public static void main( String[] args) { ... } In this
case:
java SomeClass 10 11 12 -- --
java SomeClass 10 11 12 cmd class file args[0] args[1]
args[2]
args[0] = "10" args[1] = "11" args[2] = "12" args.length = 3
The name of the command being executed (“java”) and the name of
the class file (“SomeClass”) are not part of the argument list. The
list starts with the first String or “token” on the command line
after the class file. As a nice simple demo, here’s a program that
does nothing but “echo” the command line arguments, if any, back to
the display: public static void main(String[] args) { for (int i=0;
i