Transcript

Computer Tools for Academic ResearchFall 2011

Miklós Korenkorenm@ceu.hu

https://tools.coauthors.net

Introduction

The goal of the course

I We use computers all the time in our research.I downloading dataI running regressionsI writing text

I The goal of this course is to make your computer use moreeffective.

1

Learning outcomes

At the end of the courseI you will be 30% more productiveI your coauthors will love you (after some initial struggle)

2

An orderly printing shop

3

Keeping an orderly shop

I We are learning how to keep an orderly shop forI faster, more reliable work

I writing clean codeI easier collaboration with others

I including your future self

4

Outline

1. Programming principles, philosophies and their relevance forthe academic product cycle

2. Version control: keeping your code and other files in check3. Thinking about data: beyond the Excel spreadsheet4. Modular programming in Python5. More Python6. Testing: trying to break your own code

7. NumPy as a Matlab alternative.8. Symbolic math.9. Reference tools.

5

Outline

1. Programming principles, philosophies and their relevance forthe academic product cycle

2. Version control: keeping your code and other files in check3. Thinking about data: beyond the Excel spreadsheet4. Modular programming in Python5. More Python6. Testing: trying to break your own code7. NumPy as a Matlab alternative.8. Symbolic math.9. Reference tools.

5

Example

To illustrate the tools, we will use a sample "research project":I Hypothetical data on friends and whether or not you

downloaded Angry Birds.I We want to test whether friends’ download makes you more

likely to download.I Simple OLS for now, can think of instruments later.

6

7

Data on people

person,angrybirds1,02,13,0

8

Data on friendships

friend1,friend21,21,3

9

The academic product cycle

The academic product cycle

In a typical empirical project, you1. download/get the data2. clean the data3. run many descriptives and checks4. run regressions5. create tables, graphs6. write the paper (always in LATEX)7. write the slides8. submit to journal, present at conferences9. get rejected: go back to 3-6 until published

10. get request for replication code/data

10

Similarities to software development

I Much of this is also done by software developers.1. plan a project2. write code3. rewrite code4. release new version5. get bug reports and feature requests6. go back to 2-4

I These are also done in teams.

I Luckily (good) software developers have nice tools to makethem more productive.

11

Similarities to software development

I Much of this is also done by software developers.1. plan a project2. write code3. rewrite code4. release new version5. get bug reports and feature requests6. go back to 2-4

I These are also done in teams.I Luckily (good) software developers have nice tools to make

them more productive.

11

Programming principles

Programming principles

I Many programming courses are about how to optimizecomputer resources.

I This is about how to optimize human resources.I pragmatic programmingI agile programming

I Your time is more valuable than CPU time.

12

The n commandments

1. Don’t repeat yourself.2. Archive your work.3. Embrace plain text.4. Embrace the command line.5. Write modular code.6. Make no assumptions.

13

The DRY principle

Don’t Repeat Yourself

Ever.Treat this as dogma.

14

The DRY principle

Don’t Repeat YourselfEver.

Treat this as dogma.

14

The DRY principle

Don’t Repeat YourselfEver.Treat this as dogma.

14

The DRY principle

I Every piece of information should have a single authoritativesource.

I Compare:I areg lnwage treatment age, a(firmid)

cluster(firmid)I areg lnwage treatment schooling, a(firmid)

cluster(firmid)I To:

I foreach X in age schooling {I areg lnwage treatment ‘X’, a(firmid)

cluster(firmid)I }

I Why is the second better?

15

Derived principles

I Use version control.I Never send "draft_may_14.doc" by email.

I If it’s worth repeating, it’s worth automating.I Use srcipts to run regressions etc.

16

Programming approaches

We talk about three:1. procedural (Basic, Pascal, Matlab, Stata)2. functional (Haskell)3. object oriented (Java,Python)

17

A procedural code

1. x = loaddata(’file.csv’)2. for i=1:len(x)3. y(i) = 2*x(i)4. save y

I Does what you say in the order you say it. (Great for controlfreaks.)

I But violates principle 5.I Others will have no clue what it does.I Impossible to debug.

18

A procedural code

1. x = loaddata(’file.csv’)2. for i=1:len(x)3. y(i) = 2*x(i)4. save y

I Does what you say in the order you say it. (Great for controlfreaks.)

I But violates principle 5.I Others will have no clue what it does.I Impossible to debug.

18

Functional programming

y = f(x)

I Just implement the function and don’t worry about the rest.I Suits well to math / economics applications.I Conforms with principles 5 and 6.

I But we are often far from this ideal.

19

Functional programming

y = f(x)

I Just implement the function and don’t worry about the rest.I Suits well to math / economics applications.I Conforms with principles 5 and 6.I But we are often far from this ideal.

19

Object orinted programming (OOP)

I Objects have attributes (data) and methods (functions).I regression.yI regression.xI regression.run()

I Useful to collect data and functions that belong together.I Also permits more complicated hieararchies among objects.

I Most modular of the three.

20

Object orinted programming (OOP)

I Objects have attributes (data) and methods (functions).I regression.yI regression.xI regression.run()

I Useful to collect data and functions that belong together.I Also permits more complicated hieararchies among objects.

I Most modular of the three.

20

top related