Data Types and the Type System James Brucker
Important Topics
what is a "type system"? what are common data types? how are numeric types stored and operated on? compound types: arrays, struct, records enumerated types character and string types strong type checking versus not-so-strong
enumerations, type compatibility, and type safety advantages & disadvantages of compile-time checking
Important Topics
type compatibility what are compatibility rules in C, C++, and Java? when are user-defined types compatible?
type conversion what conversions are automatic in C, C++, Java? what conversions are allowed using a cast?
Importance of knowing data types
for(int k=0; k<9999999; k++) /* do something */;int k = Integer.MAX_VALUE; // = 2,147,483,647
Need to know the valid range of data values.
Need to know rules for operations.
int k = Integer.MAX_VALUE; // = 2,147,483,647k = k + 1; // overflow? int m = 7, n = 4;float x = m / n; // 1.75, 2.0 or 1.0
? Need to know what assignments are valid and how the compiler will convert from one type to another.
Need to know what variables represent: value of data or a reference to a storage location.
Data Type
A data type is a set of possible values and operations on those values
Example: int
set of values: -2,147,483,648 ..., 0, 1, 2,147,483,647
( -231 to 231 - 1 )
operations:
+: int int int * : int int int etc.
internal representation:
32-bit 2's complement
Data Types define meaning
To the computer, a stored value is just bits. The data type assigns meaning to those bits. Example C function:
/* "An int is a unsigned is a float" --the cpu */void rawdata( ) { union { int i;
unsigned int u;float f; } x;
while( 1 ) { printf("Input a value: "); scanf("%d", &x.i); // read as an int printf("int %d is unsigned %u is float %g\n", x.i, x.u, x.f); }}
Type System
The type system is the collection of all data types rules for type equivalence, type compatibility, and type
conversion between data types
Type System
Example: int n = 0.5 * 99; cannot directly multiply a float times int type conversion rule: "int" can be automatically
converted to "float". type system says float * float is float (49.5) in C, assignment compatibility rule says that you can
convert float back to int by truncation. in Java or C#, result is double and it is not assignment
compatible with int (assignment error)
Memory Concepts
We will cover memory management later, but first... When OS runs a program it allocates at least 2
memory segments: text segment for program instructions (Read Only) data segment for data (variables, constants, ...)
Data segment is divided into 3 parts: static area - static data stack area - stack oriented data heap - dynamic, non-stack data
Memory Concepts
int count = 0;const int MAXSIZE = 4000;
int *getarray(int n) {int *a = (int *)malloc(
n*sizeof(int) );return a;
}
int main( ) {int size;scanf("%d",&n);int *a = getarray(n);scanf("%f",a);a++;
}
countMAXSIZE
Stack Frame for mainsizea (pointer only)
a[0], a[1], ...
Stack Frame for getarray
unused space
StaticArea
StackArea
HeapArea
Virtual Memory
Most OS use virtual memory. The actual location of memory pages varies. Accessing memory efficiently affects program speed.
Program virtual
memory
page n Memory manager
Real memory
page n
page n+1
page n+1
Integer Data Types
C/C++ support both “unsigned” and “signed” integer types. Type # Bytes Range of values
short int 2 -32,768 (-215) to 32,767 (215 - 1)
unsigned short 2 0 to 65,535 (216 - 1)
int 4 -2,147,483,648 (-231)
to 2,147,483,647 (231 - 1)
unsigned int 4 0 to 4,294,967,295 (232 - 1)
long int same as "int" on Pentium and Athlon CPU
C permits “char” type for integer values, too…char 1 -128 to 127
unsigned char 1 0 to 255
Example use of unsigned int
To display the address of a variable in C:
printf("address of %s is %d\n", "x", (unsigned int)&x);
IEEE 754 Floating Point Standard
Problem: some numerical algorithms would run on one computer, but fail on another computer. with arithmetic overflow/underflow error on another.
Worse problem: results from different computers could differ greatly! This reduced trust in the answer from computer. In fact, when numerical results differ greatly it
usually indicates a problem in the algorithm! Solution: IEEE 754 (1985) defines a standard for
computer storage of floating point numbers.
IEEE Floating Point Data Types
0 1 1 1 0 0 0 0 . . . 1 1 0 0 0 1 0 1 0
Sign bit Mantissa Biased Exponent
-1.011100 x 211 =
Float: 1 8 bits bias= 127 23 bits
Double: 1 11 bits bias=1023 52 bits
PrecisionRange
Float: 10-38 - 10+38 24 bits =~ 7 dec. digits
Double: 10-308 - 10+308 53 bits =~ 15 dec. digits
Stored exponent = actual exponent + bias
Implicit Leading "1"
Floating point numbers are stored in normalized form:
13.2525 =1101.01010 = 1.1010100 x 23
3/16 =0.00110000 = 1.1000000 x 2-3
Normalized form: the leading digit is always one. So, IEEE 752 doesn't store it.
Rule: if the stored value has exp. 2-bias to 2+bias then the floating point value is stored in normalized format:
1011.01110 = 1.011011100 x 23
mantissa: 011011100...
exponent: 3+bias = 130 (single prec)
Gradual underflow
To extend the precision for small numbers, very small numbers are not stored in normalized form.
In this case the leading "1" is also stored and the biased exponent has value 0 (smallest exponent)
Value Mantissa Biased Exp.1.01101110x 2-126 01101110000000000000000 -126+bias = 11.01101110x 2-127 10110111000000000000000 -127+bias = 01.01101110x 2-128 01011011100000000000000 -127+bias = 01.01101110x 2-129 00101101110000000000000 -127+bias = 01.01101110x 2-130 00010110111000000000000 -127+bias = 0... as number gets smaller, leading significant digits shift right1.01101110x 2-147 00000000000000000000101 -127+bias = 01.01101110x 2-148 00000000000000000000010 -127+bias = 01.01101110x 2-149 00000000000000000000001 -127+bias = 0
IEEE 754 Floating Point Values
The standard defines special values: +/-Infinity: 1/0 = +Infinity, -3/0 = -Infinity,
exp(5000)= +Infinity, Infinity+Infinity = Infinity
NaN (Not-a-Number). 0/0 = NaN, Infinity*0 = NaN, ...
Value Mantissa Exponent
Normalized f.p. 0, 1 1 to 2*bias any
Denormalized 0, 1 00000000 any
Zero 0, 1 00000000 0
+Infinity 0 11111111 0
-Infinity 1 11111111 0
NaN 0, 1 11111111 any non-0
Sign Bit
Floating Point Questions
Question: How do you store 2.50 as a "float"?
2.50 = 1.25 x 2 = 1.01000000000 x 21
Implicit leading 1 rule: mantissa = 010000000000000
Exponent: 1 + bias = 128
Stored value: 0 10000000 010000000000000000000
Question: What is the decimal value of:
1 10000000 100000000000000000000
0 00000000 000000000000000000000
0 11111111 000000000000000000000
Floating Point Questions (cont'd)
Question: How do you store 0.1 as a "float"?
0.1 = 0.0011001100110011001100 ...
Normalized mantissa = 10011001100110011001100
Exponent: -3 + bias = 124
Stored val: 0 01111100 10011001100110011001100
0.1 has no exact representation in binary!
Question: what decimal values have an exact binary representation (no truncation error)???
Consequence of inexact conversion 0.1 does not have exact binary representation. Therefore, we may have: 10 * 0.1 != 1.0 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
!= 1.0 Don't use "==" as test criteria in loops with floats.
This loop never terminates:
double x = 0.1;
while( x != 1.0 ) { // better: ( x <= 1.0 )
System.out.println( x );
x = x + 0.1;
}
Type Compatibility for built-in types Operations in most languages will automatically convert
("promote") some data types:
2 * 1.75 convert 2 (int) to floating point Assignment compatibility: what automatic type
conversions are allowed on assignment?
int n = 1234567890;
float x = n; // OK is C or Java
n = x; // allowed in C? Java? char -> short -> int -> long -> double
short -> int -> float -> double What about long -> float ? Rules for C/C++ not same as Java.
C/C++ Arithmetic Type Conversion
For +, -, *, /, both operands must be the same type C/C++ compiler "promotes" mixed type operands to
make all operands same using the following rules:
Operand Types Promote Result
short op int short => int int
long op int int => long long
int op float int => float float
int op double int => double double
float op double float => double double
etc...
"op" is any arithmetic operation: + - * /
Assignment Type Conversion is not Arithmetic Type Conversion (1)
What is the result of this calculation?
int m = 15;
int n = 16;
double x = m / n;
Forcing Type Conversion Since arguments are integer, integer division is used:
double x = 15 / 16; // = 0 ! you must coerce "int" values to floating point.
There are two ways:
int m = 15;
int n = 16;
/** Efficient way: cast as a double */
double x = (double)m / (double)n ;
/** Clumsy way: multiply by a float (ala Fortran) */
double x = 1.0*m / n;
Assignment Type Conversion is not Arithmetic Type Conversion (2)
Many students wrote this in Fraction program:
public class Fraction {
int numerator; // numerator of the fraction
int denominator; // denominator of the fraction
...etc...
/** compare this fraction to another. */
public int compareTo( Fraction frac ) {
double r1 = this.numerator / this.denominator;
double r2 = frac.numerator / frac.denominator;
if ( r1 > r2 ) return 1;
else if ( r1 == r2 ) return 0;
else return -1;
}
Arrays
An array is a series of elements of the same type, with an index, which occupy consecutive memory locations.
float x[10]; // C: array of 10 “float” vars
char [] c = new char[40]; // Java: array of 40 "char"
x[0] x[1] x[2] x[9]. . .
Array x[ ] in memory:
4 Bytes = sizeof(float)
c[0] c[1] c[39]. . .
Array c[ ] in memory :
Array "dope vector"
In C or Fortran an array is just a set on continuous elements. No type or length information is stored.
Some languages store a "dope vector" (aka array descriptor) describing the array.
x[0]x[1]x[2]x[3]...
x
/* C language */double x[10];
01E4820
/* Language with dope */double x[10];
x[0]x[1]x[2]x[3]...x[9]
x double01001E4820
Array as Object
In Java, arrays are objects:
double [ ] x = new double[10]; x is an Object; x[10] is a double (primitive type).
x double[ ] +length = 10
x[0]x[1]x[2]...
x.getClass( ).toString( ) returns "[D"
1-Dimensional Arrays
Element of 1-D array computed as offset from start: float f[20]; address of f[n] = address(f) + n*sizeof(float)
Some languages permit arbitrary index bounds: Pascal:
var a: array [ 2..5 ] of real; FORTRAN
REAL (100) X array is X(1) ... X(100)
REAL (2:5) Y array is Y(2) ... Y(5) In any case, array element can be computed as offset:
address of a[n] = address(a) + (n-start)*sizeof(real)
2-Dimensional Arrays
There are different organizations of 2-D arrays: Rectangular array in row major order:
float r[4,3];
In memory (row major order):
r[0,0] r[0,1] r[0,2] r[1,0] r[1,1] r[1,2] r[2,0]
Rectangular array in column major order (Fortran):
real(4,3) x
in memory (column major order)
x(1,1) x(2,1) x(3,1) x(4,1) x(1,2) x(2,2) x(3,2) x(4,2) x(1,3)...
2-Dimensional Arrays
Computing address of array elements Rectangular arrays in row major order:
float x[ROWS][COLS];
address of x[j][k] = address(x)
+ (j*COLS + k) * sizeof(float) Three dimensional array:
float y[J][K][L];
address of y[j][k][l] = address(y)
+ j*K*L + k*L + l 2-D and 3-D arrays require more time to access due to this
calculation. Compiler can optimize when you access consecutive items
for(k = 0; k<COLS; k++) sum += x[j][k];
Arrays of Pointers: ragged arrays
Each element of a vector is a pointer to a vector char *days[7] = { "Sunday", "Monday",
"Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday" };
days[0]
days[1]
days[2]
days[3]
days[4]
days[5]
days[6]
days[7]
S u n d a y 0 M o n d a y 0 T u e s d a y 0 W e d n e s d a y 0 T h u r s d a y 0 F r i d a y 0 S a t u r d a y 0
Vector of pointers: 7 x 4 bytes = 28 bytes
Array of characters: = 57 bytes
days =
Arrays of Pointers: ragged arrays (2)
Compare previous slide with 2-D array:char days[ ][10] = { "Sunday", "Monday",
"Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday" };
S u n d a y 0
M o n d a y 0
T u e s d a y 0
W e d n e s d a y 0
T h u r s d a y 0
F r i d a y 0
S a t u r d a y 0
days =
2-D array = 7 x 10 bytes = 70 bytes
What is sizeof( ) for 2-D arrays?
Rectangular array in C:
char days[7][10] = { "Sunday", "Monday", ... };
int m = sizeof( days );
int n = sizeof( days[0] );
char *days[7] = { "Sunday", "Monday", ... };
int m = sizeof( days );
int n = sizeof( days[0] );
Array of pointers in C:
Java: always uses array of pointers
2-D arrays in Java are always treated as array of pointers
final int N = 10;
double [][] a;
a = new double[N][ ]; // create row pointers
for(int k=0; k<N; k++)
a[k] = new double[k+1]; // create columns
// array dimensions determined by initial values
int [][] m = { { 1, 2, 3, 4},
{ 5, 6}, { 8, 9, 10}, { 11 }
};
What are the sizes of each row of m ?
C#: rectangular and ragged arrays
A rectangular array in C# (one set of brackets)
const int N1 = 10, N2 = 20, N3=25;
// 2-dimensional array
double [,] a = new double[N1,N2];
// 3-dimensional array
double [,,] a = new double[N1,N2,N3];
// create array of row pointers
double [][] b = new double[N1][ ];
// allocate space for each row (can differ)
for (k=0; k<N1; k++) b[k] = new double[N2];
A ragged array in C# or Java uses multiple brackets:
In Java (but not C#) can write: b = new double[N1][N2]
Accessing Array Elements
Ragged Arrays require multiple levels of dereferencing
result = b[i][j];
double [ , ] b = new double[N1,N2];
result = b[i,j];
Rectangular array computes address as offset:
1. get address of b.
2. get b[i]. _addr = valueat( address(b) + i*sizeof( b[ ][ ] ) )
3. result = valueat( _addr + j*sizeof( b[ ][ ] ) )
1. get address of b.
2. result = valueat( address(b) +i*N2 + j )
In Java and C#, arrays are objects, so address is not this simple.
Efficiency and multi-dimensional array Multi-dimensional array access is much slower than 1-D array. Access in row order is more efficient, and can minimize paging.
// search a[ROWS,COLS] in row major order
for(int r=0; r<ROWS; r++) for (int c=0; c<COLS; c++)
if ( a[r,c] > max ) max = a[r,c];
r[0,0] r[0,1] ... r[0,ROWS-1] r[1,0] r[1,1]
// search a[ROWS,COLS] in column major order
for(int c=0; c<COLS; c++) for (int r=0; r<ROWS; r++)
if ( a[r,c] > max ) max = a[r,c];
r[0,0] r[0,1] ... r[0,ROWS-1] r[1,0] r[1,1]
Type Checking
Verifying that the actual value of an expression is valid for the type to which it is assigned.
A strongly typed language is one in which all type errors are detected at compile time or run time.
Example: Java is strongly typed: most type errors are detected
by compiler. Others, like casts, are checked at runtime and generate exceptions:
Object obj = new Double(2.5);
String s = obj; // compile time error
String s = (String) obj; // run-time ClassCastException
Type Compatibility for user types
typedef int type_a;
typedef int type_b;
int main( ) {
type_a a;
type_b b;
b = 5; // assign integer to "type_b" variable OK?
a = b; // assign "type_b" to "type_a" variable OK?
In C, "typedef" defines an alias for a type -- it doesn't create a new type.
Type Compatibility for user types (2)
struct A {
float x;
char c;
};
struct B {
float x;
char c;
};
typedef C {
float z;
char c;
};
int main( ) {
struct A a;
struct B b;
struct C c;
a.x = 0.5;
a.c = 'a';
b = a; // OK
c = a; // Error
if (b == a) // OK?
Type Compatibility for classes (3)
public class A {
public float x;
public char c;
}
public class B {
public float x;
public char c;
}
public class C {
public float z;
public char c;
}
public static void main(...) {
A a = new A();
B b;
C c;
a.x = 0.5;
a.c = 'a';
b = a; // Error
c = a; // Error
Type Conversion and Polymorphism
int max( int a, int b) { if ( a > b ) return a; else return b; }float max( float a, float b){ if ( a > b ) return a; else return b; }
int main( ) {int m, n;float x, y, z;x = 5.5;m = x; // OK to convert float to intz = max(x, y); // EASY! call max(float,float)n = max(m, x); // which max function?y = max(x, m); // which max function?
Explicit Polymorphism in C++
/* This template generates "max" functions of * any parameter type that the program needs. */template <typename T>T max( T a, T b ) { if ( a > b ) return a; else return b; }
int main( ) {int m = 4, n = 9;float x = 0.5, y = 2.7;
n = max(m, m); // generate max(int, int)y = max(x, y); // generate max(float, float)