Storage Types Display Format String ↔ Numeric (Dis)connect Characters Jeehoon Han [email protected] Fall 2017 Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Cha
Storage TypesDisplay Format
String ↔ Numeric(Dis)connect Characters
Jeehoon [email protected]
Fall 2017
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Storage TypesI Storage types
I Numbers (digits of accuracy)I Integers: byte(2), int(4), long(9)
I Floating points: float(7), double(16)
I Strings: str1, str2, ..., str#
where str# can hold words with # characters or less
I The default storage type is float
I Storing a variable containing numbers > 7 digitsI 8-9 digit integer: gen long varname
I Otherwise: gen double varname
I Changing the storage type of an existing variable:recast type varname
I Use compress to save memory by storing variables in the smallesttypes without losing precision
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Storage TypesI Storage types
I Numbers (digits of accuracy)I Integers: byte(2), int(4), long(9)
I Floating points: float(7), double(16)
I Strings: str1, str2, ..., str#
where str# can hold words with # characters or less
I The default storage type is float
I Storing a variable containing numbers > 7 digitsI 8-9 digit integer: gen long varname
I Otherwise: gen double varname
I Changing the storage type of an existing variable:recast type varname
I Use compress to save memory by storing variables in the smallesttypes without losing precision
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Storage Types: ExampleI set obs 1
gen var = 0.2
tab var if var == 0.2
⇒ no observation
I ProblemsI Numbers are stored in binary form and most decimals have no
exact representations in binary (0.2 → 0.00110011...)
I 0.2 is stored as 0.20000000298023224 in float
0.20000000000000001 in double
I When you create the variable var, 0.2 is stored in float
but Stata does all calculations in double precision
I Two ways to deal with this issueI Store data as double
I tab var if var==float(0.2)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Storage Types: ExampleI set obs 1
gen var = 0.2
tab var if var == 0.2
⇒ no observation
I ProblemsI Numbers are stored in binary form and most decimals have no
exact representations in binary (0.2 → 0.00110011...)
I 0.2 is stored as 0.20000000298023224 in float
0.20000000000000001 in double
I When you create the variable var, 0.2 is stored in float
but Stata does all calculations in double precision
I Two ways to deal with this issueI Store data as double
I tab var if var==float(0.2)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Storage Types: ExampleI set obs 1
gen var = 0.2
tab var if var == 0.2
⇒ no observation
I ProblemsI Numbers are stored in binary form and most decimals have no
exact representations in binary (0.2 → 0.00110011...)
I 0.2 is stored as 0.20000000298023224 in float
0.20000000000000001 in double
I When you create the variable var, 0.2 is stored in float
but Stata does all calculations in double precision
I Two ways to deal with this issueI Store data as double
I tab var if var==float(0.2)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Display Format
I Specify the display formatformat varlist %fmt
I Numeric formatsI Fixed format: %w.df
General format: %w.dg
where w : the total width of the displayd : the number of decimals (fixed format)
For general format, Stata decides the number of decimals todisplay (if d > 0, d indicates the maximum number of decimalplaces)
I String format: %wswhere w : the width of characters
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Display Format: example
I Default formatbyte, int: %8.0glong: %12.0gfloat: %9.0gdouble: %10.0g
I Examplesclear
set obs 1
gen double pi = 3.1415926535
list pi ⇒ 3.1415927format pi %8.0 g⇒ 3.14159format pi %8.5 f⇒ 3.14159
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Inspecting DataI sysuse uslifeexp, clear
browse
I list
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Inspecting DataI describe
I codebook region
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Strings (pure text)� NumericsI String variable → numeric variable
encode country, gen(country code)
I Numeric variable → string variabledecode country code, gen(county str)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Strings (numeric text)� Numerics
I String variable → numeric variable
I destring varlist, {gen(varname)|replace} [option]
I [option]
I ignore(‘‘chars’’): remove the nonnumeric charactersspecified
I force: treat any values containing nonnumeric characters asmissing values
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Strings (numeric text)� Numerics
I Example:use http://www.stata-press.com/data/r13/destring2
I destring price, gen(priceA) ignore(‘‘$ ,’’)
destring price, gen(priceB) force
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Strings (numeric text)� Numerics
I Numeric variable → string variableI tostring varlist, {gen(varname)|replace} [option]
I [option]
I format(%fmt): convert using specified formatI force: convert to string even if it entails information loss
I tostring priceA, gen(price strA)
tostring priceA, gen(price strB) format(%8.1f) force
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at
position n (from the end of a string) for a length of m
I gen year = substr(date,1,4)
gen month = substr(date,6,2)
gen day = substr(date,-2,.)
I gen length = strlen(priceA)
gen decimal = substr(date,-2,.)
gen integer = substr(date,1,length-3)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at
position n (from the end of a string) for a length of m
I gen year =
substr(date,1,4)
gen month = substr(date,6,2)
gen day = substr(date,-2,.)
I gen length = strlen(priceA)
gen decimal = substr(date,-2,.)
gen integer = substr(date,1,length-3)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at
position n (from the end of a string) for a length of m
I gen year = substr(date,1,4)
gen month =
substr(date,6,2)
gen day = substr(date,-2,.)
I gen length = strlen(priceA)
gen decimal = substr(date,-2,.)
gen integer = substr(date,1,length-3)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at
position n (from the end of a string) for a length of m
I gen year = substr(date,1,4)
gen month = substr(date,6,2)
gen day =
substr(date,-2,.)
I gen length = strlen(priceA)
gen decimal = substr(date,-2,.)
gen integer = substr(date,1,length-3)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at
position n (from the end of a string) for a length of m
I gen year = substr(date,1,4)
gen month = substr(date,6,2)
gen day = substr(date,-2,.)
I gen length = strlen(priceA)
gen decimal = substr(date,-2,.)
gen integer = substr(date,1,length-3)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at
position n (from the end of a string) for a length of m
I gen year = substr(date,1,4)
gen month = substr(date,6,2)
gen day = substr(date,-2,.)
I gen length = strlen(priceA)
gen decimal = substr(date,-2,.)
gen integer = substr(date,1,length-3)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at
position n (from the end of a string) for a length of m
I gen year = substr(date,1,4)
gen month = substr(date,6,2)
gen day = substr(date,-2,.)
I gen length = strlen(priceA)
gen decimal = substr(date,-2,.)
gen integer =
substr(date,1,length-3)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at
position n (from the end of a string) for a length of m
I gen year = substr(date,1,4)
gen month = substr(date,6,2)
gen day = substr(date,-2,.)
I gen length = strlen(priceA)
gen decimal = substr(date,-2,.)
gen integer = substr(date,1,length-3)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Connect the Characters of Variables
I gen date1 = year+‘‘ ’’+month+‘‘ ’’+day
egen date2 = concat(year month day), punct(‘‘ ’’)
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
SortI Arrange the data in ascending order
I sysuse uslifeexp, clear
I sort le sort year le
I gsort -year
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters
Sort: Applications
I Create a lagged variableI sort year
gen le lag = le[ n-1]
I Finding duplicatesI sort year
list if year == year[ n-1]
Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters