Top Banner
Storage Types Display Format String Numeric (Dis)connect Characters Jeehoon Han [email protected] Fall 2017 Jeehoon Han [email protected] Storage Types Display Format String Numeric (Dis)connect Cha
25

Storage Types Display Format String Numeric (Dis)connect ...

Oct 15, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Storage Types Display Format String Numeric (Dis)connect ...

Storage TypesDisplay Format

String ↔ Numeric(Dis)connect Characters

Jeehoon [email protected]

Fall 2017

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 2: Storage Types Display Format String Numeric (Dis)connect ...

Storage TypesI Storage types

I Numbers (digits of accuracy)I Integers: byte(2), int(4), long(9)

I Floating points: float(7), double(16)

I Strings: str1, str2, ..., str#

where str# can hold words with # characters or less

I The default storage type is float

I Storing a variable containing numbers > 7 digitsI 8-9 digit integer: gen long varname

I Otherwise: gen double varname

I Changing the storage type of an existing variable:recast type varname

I Use compress to save memory by storing variables in the smallesttypes without losing precision

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 3: Storage Types Display Format String Numeric (Dis)connect ...

Storage TypesI Storage types

I Numbers (digits of accuracy)I Integers: byte(2), int(4), long(9)

I Floating points: float(7), double(16)

I Strings: str1, str2, ..., str#

where str# can hold words with # characters or less

I The default storage type is float

I Storing a variable containing numbers > 7 digitsI 8-9 digit integer: gen long varname

I Otherwise: gen double varname

I Changing the storage type of an existing variable:recast type varname

I Use compress to save memory by storing variables in the smallesttypes without losing precision

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 4: Storage Types Display Format String Numeric (Dis)connect ...

Storage Types: ExampleI set obs 1

gen var = 0.2

tab var if var == 0.2

⇒ no observation

I ProblemsI Numbers are stored in binary form and most decimals have no

exact representations in binary (0.2 → 0.00110011...)

I 0.2 is stored as 0.20000000298023224 in float

0.20000000000000001 in double

I When you create the variable var, 0.2 is stored in float

but Stata does all calculations in double precision

I Two ways to deal with this issueI Store data as double

I tab var if var==float(0.2)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 5: Storage Types Display Format String Numeric (Dis)connect ...

Storage Types: ExampleI set obs 1

gen var = 0.2

tab var if var == 0.2

⇒ no observation

I ProblemsI Numbers are stored in binary form and most decimals have no

exact representations in binary (0.2 → 0.00110011...)

I 0.2 is stored as 0.20000000298023224 in float

0.20000000000000001 in double

I When you create the variable var, 0.2 is stored in float

but Stata does all calculations in double precision

I Two ways to deal with this issueI Store data as double

I tab var if var==float(0.2)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 6: Storage Types Display Format String Numeric (Dis)connect ...

Storage Types: ExampleI set obs 1

gen var = 0.2

tab var if var == 0.2

⇒ no observation

I ProblemsI Numbers are stored in binary form and most decimals have no

exact representations in binary (0.2 → 0.00110011...)

I 0.2 is stored as 0.20000000298023224 in float

0.20000000000000001 in double

I When you create the variable var, 0.2 is stored in float

but Stata does all calculations in double precision

I Two ways to deal with this issueI Store data as double

I tab var if var==float(0.2)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 7: Storage Types Display Format String Numeric (Dis)connect ...

Display Format

I Specify the display formatformat varlist %fmt

I Numeric formatsI Fixed format: %w.df

General format: %w.dg

where w : the total width of the displayd : the number of decimals (fixed format)

For general format, Stata decides the number of decimals todisplay (if d > 0, d indicates the maximum number of decimalplaces)

I String format: %wswhere w : the width of characters

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 8: Storage Types Display Format String Numeric (Dis)connect ...

Display Format: example

I Default formatbyte, int: %8.0glong: %12.0gfloat: %9.0gdouble: %10.0g

I Examplesclear

set obs 1

gen double pi = 3.1415926535

list pi ⇒ 3.1415927format pi %8.0 g⇒ 3.14159format pi %8.5 f⇒ 3.14159

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 9: Storage Types Display Format String Numeric (Dis)connect ...

Inspecting DataI sysuse uslifeexp, clear

browse

I list

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 10: Storage Types Display Format String Numeric (Dis)connect ...

Inspecting DataI describe

I codebook region

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 11: Storage Types Display Format String Numeric (Dis)connect ...

Strings (pure text)� NumericsI String variable → numeric variable

encode country, gen(country code)

I Numeric variable → string variabledecode country code, gen(county str)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 12: Storage Types Display Format String Numeric (Dis)connect ...

Strings (numeric text)� Numerics

I String variable → numeric variable

I destring varlist, {gen(varname)|replace} [option]

I [option]

I ignore(‘‘chars’’): remove the nonnumeric charactersspecified

I force: treat any values containing nonnumeric characters asmissing values

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 13: Storage Types Display Format String Numeric (Dis)connect ...

Strings (numeric text)� Numerics

I Example:use http://www.stata-press.com/data/r13/destring2

I destring price, gen(priceA) ignore(‘‘$ ,’’)

destring price, gen(priceB) force

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 14: Storage Types Display Format String Numeric (Dis)connect ...

Strings (numeric text)� Numerics

I Numeric variable → string variableI tostring varlist, {gen(varname)|replace} [option]

I [option]

I format(%fmt): convert using specified formatI force: convert to string even if it entails information loss

I tostring priceA, gen(price strA)

tostring priceA, gen(price strB) format(%8.1f) force

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 15: Storage Types Display Format String Numeric (Dis)connect ...

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 16: Storage Types Display Format String Numeric (Dis)connect ...

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year =

substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 17: Storage Types Display Format String Numeric (Dis)connect ...

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month =

substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 18: Storage Types Display Format String Numeric (Dis)connect ...

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day =

substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 19: Storage Types Display Format String Numeric (Dis)connect ...

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 20: Storage Types Display Format String Numeric (Dis)connect ...

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 21: Storage Types Display Format String Numeric (Dis)connect ...

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer =

substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 22: Storage Types Display Format String Numeric (Dis)connect ...

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 23: Storage Types Display Format String Numeric (Dis)connect ...

Connect the Characters of Variables

I gen date1 = year+‘‘ ’’+month+‘‘ ’’+day

egen date2 = concat(year month day), punct(‘‘ ’’)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 24: Storage Types Display Format String Numeric (Dis)connect ...

SortI Arrange the data in ascending order

I sysuse uslifeexp, clear

I sort le sort year le

I gsort -year

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 25: Storage Types Display Format String Numeric (Dis)connect ...

Sort: Applications

I Create a lagged variableI sort year

gen le lag = le[ n-1]

I Finding duplicatesI sort year

list if year == year[ n-1]

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters