Top Banner
Report to Users Alan Riley Vice President, Software Development StataCorp LP 2006 German Stata Users Group meeting, Mannheim, Germany A. Riley (StataCorp) Report to Users March 31, 2006 1 / 37
46

Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

May 31, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Report to Users

Alan Riley

Vice President, Software DevelopmentStataCorp LP

2006 German Stata Users Group meeting, Mannheim, Germany

A. Riley (StataCorp) Report to Users March 31, 2006 1 / 37

Page 2: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Outline

1 Stata Press

2 Stata 9

3 New developmentStata 9.1Stata 9.2 - Mata structuresStata 9.2 - work faster

A. Riley (StataCorp) Report to Users March 31, 2006 2 / 37

Page 3: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata Press

Most active year ever

Stata Journal indexed

Two revised editions of existing books

Four new books published

Seven books in progress

A. Riley (StataCorp) Report to Users March 31, 2006 3 / 37

Page 4: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata Press

Stata Journal

6th year of publication

Special edition - Stata 20th anniversary

Now indexed

Thomson Scientific citation indexes

Science Citation Index Expanded

CompuMath Citation index

A. Riley (StataCorp) Report to Users March 31, 2006 4 / 37

Page 5: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata Press

Stata Journal

6th year of publication

Special edition - Stata 20th anniversary

Now indexed

Thomson Scientific citation indexes

Science Citation Index Expanded

CompuMath Citation index

A. Riley (StataCorp) Report to Users March 31, 2006 4 / 37

Page 6: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata Press

More than doubled number of books published

Revised editions, 2005

Regression Models for Categorial Dependent Variables Using

Stata, 2nd Edition

by J. Scott Long, Jeremy Freese

Maximum Likelihood Estimation with Stata, 3rd Edition

by William Gould, Jeffrey Pitblado, William Sribney

A. Riley (StataCorp) Report to Users March 31, 2006 5 / 37

Page 7: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata Press

More than doubled number of books published

Revised editions, 2005

Regression Models for Categorial Dependent Variables Using

Stata, 2nd Edition

by J. Scott Long, Jeremy Freese

Maximum Likelihood Estimation with Stata, 3rd Edition

by William Gould, Jeffrey Pitblado, William Sribney

A. Riley (StataCorp) Report to Users March 31, 2006 5 / 37

Page 8: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata Press

More than doubled number of books published

New books, 2005

Data Analysis With Stata

by Ulrich Kohler and Frauke Kreuter

Multilevel and Longitudinal Modeling Using Stata

by Sophia Rabe-Hesketh and Anders Skrondal

A Gentle Introduction to Stata

by Alan Acock

An Introduction to Stata for Health Researchers

by Svend Juul

A. Riley (StataCorp) Report to Users March 31, 2006 6 / 37

Page 9: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata Press

Forthcoming books, 2006

An Introduction to Modern Econometrics Using Stata

by Christopher F. Baum

Generalized Linear Models and Extensions, 2nd Edition

by James Hardin, Joseph Hilbe

A Guide to Stochastic Frontier Models: Specification and Estimationby Subal Kumbhakar, Hung-Jen Wang

An Introduction to Forecasting Time Series Using Stataby Robert Yaffee

The 123s of Survey Statistics with Stata

by Nicholas Winter

Applied Microeconometrics Using Stata

by A. Colin Cameron, Pravin K. Trivedi

A. Riley (StataCorp) Report to Users March 31, 2006 7 / 37

Page 10: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata Press

Forthcoming books, 2007

Data Management Using Stata

by Michael Mitchell

A. Riley (StataCorp) Report to Users March 31, 2006 8 / 37

Page 11: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata 9

Released April 2005

20th anniversary

Largest release ever

A. Riley (StataCorp) Report to Users March 31, 2006 9 / 37

Page 12: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata 9

Stata 1, January 1985

44 commands

175 pages of documentation

Stata 8, January 2003

over 600 commands

4652 pages of documentation

Stata 9, April 2005

over 700 commands including new matrix language Mata

6413 pages of documentation

A. Riley (StataCorp) Report to Users March 31, 2006 10 / 37

Page 13: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata 9

Stata 1, January 1985

44 commands

175 pages of documentation

Stata 8, January 2003

over 600 commands

4652 pages of documentation

Stata 9, April 2005

over 700 commands including new matrix language Mata

6413 pages of documentation

A. Riley (StataCorp) Report to Users March 31, 2006 10 / 37

Page 14: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata 9

Stata 1, January 1985

44 commands

175 pages of documentation

Stata 8, January 2003

over 600 commands

4652 pages of documentation

Stata 9, April 2005

over 700 commands including new matrix language Mata

6413 pages of documentation

A. Riley (StataCorp) Report to Users March 31, 2006 10 / 37

Page 15: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata 9

Ongoing development

Continued release-as-we-go strategy

Stata 9.1

Stata 9.2

Mata structuresWork faster

A. Riley (StataCorp) Report to Users March 31, 2006 11 / 37

Page 16: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata 9.1

Multiple log files

Faster survey linearization

More stored estimation results

New Mata functions (permutation, string, regular expression, binaryI/O)

Sized PNG and TIFF exported graphs

adoupdate

And more...

A. Riley (StataCorp) Report to Users March 31, 2006 12 / 37

Page 17: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata 9.2

Mata structures

Set of variables tied together under a single name

struct structname {

declaration(s)}

Example

struct mystruct {

real scalar n1, n2real matrix x

}

A. Riley (StataCorp) Report to Users March 31, 2006 13 / 37

Page 18: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata 9.2

Mata structures

Set of variables tied together under a single name

struct structname {

declaration(s)}

Example

struct mystruct {

real scalar n1, n2real matrix x

}

A. Riley (StataCorp) Report to Users March 31, 2006 13 / 37

Page 19: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Mata structures

struct myresult {real scalar yoverx

real scalar xovery}

struct myresult scalar myfunc(real scalar x, real scalar y)

{struct myresult scalar res

res.yoverx = y/xres.xovery = x/y

return(res)}

...

struct myresult scalar results...

results = myfunc(3, 4)

A. Riley (StataCorp) Report to Users March 31, 2006 14 / 37

Page 20: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Mata structures

You can have vectors and matrices of structures

struct mystruct scalar tstruct mystruct vector tstruct mystruct rowvector t

struct mystruct colvector tstruct mystruct matrix t

t[2,3].n1

Structures can contain vectors and matrices

t[2,3].x[9,2]

A. Riley (StataCorp) Report to Users March 31, 2006 15 / 37

Page 21: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Mata structures

You can have vectors and matrices of structures

struct mystruct scalar tstruct mystruct vector tstruct mystruct rowvector t

struct mystruct colvector tstruct mystruct matrix t

t[2,3].n1

Structures can contain vectors and matrices

t[2,3].x[9,2]

A. Riley (StataCorp) Report to Users March 31, 2006 15 / 37

Page 22: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Mata structures

Structures can contain other structures

struct myresult {real scalar yoverx

real scalar xovery}

struct someresults {struct myresult scalar res1, res2

}

...struct someresults scalar myres...

myres.res1 = myfunc(3, 4)

myres.res2 = myfunc(5, 6)

A. Riley (StataCorp) Report to Users March 31, 2006 16 / 37

Page 23: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Mata structures

Advantages of structures

Organization

Convenience (return multiple results)

Abstraction (handles)

A. Riley (StataCorp) Report to Users March 31, 2006 17 / 37

Page 24: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata 9.2

Moore’s Law

Computer processing power doubles every 18 months

Max transistors per chip has doubled every 24 months

To maintain, industry must improve at rate of 1% per week

A. Riley (StataCorp) Report to Users March 31, 2006 18 / 37

Page 25: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

Work faster – work in parallel

new ‘flavor’ of Stata capable of performing symmetric multiprocessing(SMP)

same capabilities as Stata/SE, but faster due to parallelization ofcentral routines

for dual core, multicore, or multiprocessor computers

http://www.stata.com/statamp/

Difference between ‘processor’ and ‘core’

processor: central processing unit, or CPU

core: computation engine of a CPU with integer and floating pointprocessing units

A. Riley (StataCorp) Report to Users March 31, 2006 19 / 37

Page 26: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

Work faster – work in parallel

new ‘flavor’ of Stata capable of performing symmetric multiprocessing(SMP)

same capabilities as Stata/SE, but faster due to parallelization ofcentral routines

for dual core, multicore, or multiprocessor computers

http://www.stata.com/statamp/

Difference between ‘processor’ and ‘core’

processor: central processing unit, or CPU

core: computation engine of a CPU with integer and floating pointprocessing units

A. Riley (StataCorp) Report to Users March 31, 2006 19 / 37

Page 27: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

Design requirements

100% compatible with Stata/SE, Intercooled Stata, and Small Stata

No end-user programming necessary to obtain speed ups

No changes necessary to do-files, user-written programs, or datasets

Priority given to estimation commands

A. Riley (StataCorp) Report to Users March 31, 2006 20 / 37

Page 28: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

Supports 2 to 32 processors or cores on

Macintosh OSX (Intel)

32-bit Windows

64-bit Windows (x86-64)

64-bit Windows (Itanium)

32-bit Linux

64-bit Linux (x86-64)

64-bit Linux (Itanium)

64-bit Solaris (Sparc)

A. Riley (StataCorp) Report to Users March 31, 2006 21 / 37

Page 29: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

Perfection, in theory

100% efficiency is twice as fast on 2 processors/cores

Speed doubles for every doubling of number of processors

Execution time halves for every doubling of number of processors

Amdahl’s Law

F : sequential/non-parallelizable fractionN: number of processors

Maximum speed up:1

F+ 1−FN

A. Riley (StataCorp) Report to Users March 31, 2006 22 / 37

Page 30: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

Perfection, in theory

100% efficiency is twice as fast on 2 processors/cores

Speed doubles for every doubling of number of processors

Execution time halves for every doubling of number of processors

Amdahl’s Law

F : sequential/non-parallelizable fractionN: number of processors

Maximum speed up:1

F+ 1−FN

A. Riley (StataCorp) Report to Users March 31, 2006 22 / 37

Page 31: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

How much faster?

Median speed up (overall)

72% efficiency2 CPUs: 1.43 CPUs: 1.754 CPUs: 2.0

Median speed up (estimation comands)

88% efficiency2 CPUs: 1.73 CPUs: 2.34 CPUs: 2.8

A. Riley (StataCorp) Report to Users March 31, 2006 23 / 37

Page 32: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

How much faster?

Median speed up (overall)

72% efficiency2 CPUs: 1.43 CPUs: 1.754 CPUs: 2.0

Median speed up (estimation comands)

88% efficiency2 CPUs: 1.73 CPUs: 2.34 CPUs: 2.8

A. Riley (StataCorp) Report to Users March 31, 2006 23 / 37

Page 33: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

Observed upper bound

Median performance(estimation)

Median performance(all commands)

Observed lower bound1

2

3

4

Spe

ed c

ompa

red

to s

ingl

e pr

oces

sor

1 2 3 4Number of processors

Observed performance regionover all commands

A. Riley (StataCorp) Report to Users March 31, 2006 24 / 37

Page 34: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - All commands

25

50

75

100

Run

time

(per

cent

of s

ingl

e pr

oces

sor)

1 2 3 4Number of processors

Boundary of all commands

Median runtime

Theoretical maximum

A. Riley (StataCorp) Report to Users March 31, 2006 25 / 37

Page 35: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - Estimation commands

25

50

75

100

Run

time

(per

cent

of s

ingl

e pr

oces

sor)

1 2 3 4Number of processors

Boundary of all commands

Median runtime

Theoretical maximum

A. Riley (StataCorp) Report to Users March 31, 2006 26 / 37

Page 36: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

Comments on median results

half of commands run faster

some even faster than theory due to cache effects

half of commands run slower

some not sped up at all

inherently sequential/impossible to parallelize (time series)no effort made to parallelize (graph, xtmixed)

A. Riley (StataCorp) Report to Users March 31, 2006 27 / 37

Page 37: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

Methods

Open/MP API

Core algorithms

generate, replaceX

′X

Inverses‘Summers’Solvers

Modifications to individual important internal routines

Almost 400 sections of code modified

A. Riley (StataCorp) Report to Users March 31, 2006 28 / 37

Page 38: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - All commands

10

20

30

40

50

60

708090

100

6

Run

time

(per

cent

of s

ingl

e pr

oces

sor)

1 2 4 8 16Number of processors

Boundary of all commands

Median runtime

Theoretical maximum

A. Riley (StataCorp) Report to Users March 31, 2006 29 / 37

Page 39: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - Estimation commands

10

20

30

40

50

60

708090

100

6

Run

time

(per

cent

of s

ingl

e pr

oces

sor)

1 2 4 8 16Number of processors

Boundary of all commands

Median runtime

Theoretical maximum

A. Riley (StataCorp) Report to Users March 31, 2006 30 / 37

Page 40: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - regress

25

35

50

70

100

Per

cent

age

of s

ingl

e pr

ocss

or ti

me

1 2 3 4Number of Processors

ObservedModeledPerfect scaling

A. Riley (StataCorp) Report to Users March 31, 2006 31 / 37

Page 41: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - arima

25

35

50

70

100

Per

cent

age

of s

ingl

e pr

ocss

or ti

me

1 2 3 4Number of Processors

ObservedModeledPerfect scaling

A. Riley (StataCorp) Report to Users March 31, 2006 32 / 37

Page 42: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - gllamm

gllamm, i() geqs() link(ologit) family(binom)

25

35

50

70

100

Per

cent

age

of s

ingl

e pr

ocss

or ti

me

1 2 3 4Number of Processors

ObservedModeledPerfect scaling

A. Riley (StataCorp) Report to Users March 31, 2006 33 / 37

Page 43: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - gllamm

gllamm, i() geqs() link(logit) family(binom) nocons

25

35

50

70

100

Per

cent

age

of s

ingl

e pr

ocss

or ti

me

1 2 3 4Number of Processors

ObservedModeledPerfect scaling

A. Riley (StataCorp) Report to Users March 31, 2006 34 / 37

Page 44: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - gllamm

gllamm, i()

25

35

50

70

100

Per

cent

age

of s

ingl

e pr

ocss

or ti

me

1 2 3 4Number of Processors

ObservedModeledPerfect scaling

A. Riley (StataCorp) Report to Users March 31, 2006 35 / 37

Page 45: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP - gllamm

gllamm, i() eqs(cons w)

25

35

50

70

100

Per

cent

age

of s

ingl

e pr

ocss

or ti

me

1 2 3 4Number of Processors

ObservedModeledPerfect scaling

A. Riley (StataCorp) Report to Users March 31, 2006 36 / 37

Page 46: Report to Users - StataA Gentle Introduction to Stata by Alan Acock An Introduction to Stata for Health Researchers by Svend Juul A. Riley (StataCorp) Report to Users March 31, 2006

Stata/MP

10

20

30

40

5060708090

100

6

Run

time

(per

cent

of s

ingl

e pr

oces

sor)

1 2 4 8 16Number of processors

1st quartile2nd quartile3rd quartile4th quartile

A. Riley (StataCorp) Report to Users March 31, 2006 37 / 37