Top Banner
Basics Exercise Next meetings Big Data and Automated Content Analysis Week 2 – Wednesday »Getting started with Python« Damian Trilling [email protected] @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 8 April 2014 Big Data and Automated Content Analysis Damian Trilling
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BD-ACA week2

Basics Exercise Next meetings

Big Data and Automated Content AnalysisWeek 2 – Wednesday

»Getting started with Python«

Damian Trilling

[email protected]@damian0604

www.damiantrilling.net

Afdeling CommunicatiewetenschapUniversiteit van Amsterdam

8 April 2014

Big Data and Automated Content Analysis Damian Trilling

Page 2: BD-ACA week2

Basics Exercise Next meetings

Today

1 The very, very, basics of programming with PythonDatatypesIndention: The Python way of structuring your program

2 Exercise

3 Next meetings

Big Data and Automated Content Analysis Damian Trilling

Page 3: BD-ACA week2

The very, very, basics of programming

You’ve read all this in chapter 3.

Page 4: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

Basic datatypes (variables)

int 32

float 1.75

bool True, False

string "Damian"

"5" and 5 is not the same.But you can transform it: int("5") will return 5.You cannot calculate 3 * "5".But you can calculate 3 * int("5")

Big Data and Automated Content Analysis Damian Trilling

Page 5: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

Basic datatypes (variables)

int 32

float 1.75

bool True, False

string "Damian"

"5" and 5 is not the same.But you can transform it: int("5") will return 5.You cannot calculate 3 * "5".But you can calculate 3 * int("5")

Big Data and Automated Content Analysis Damian Trilling

Page 6: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

More advanced datatypes

list firstnames = [’Damian’,’Lori’,’Bjoern’]lastnames =[’Trilling’,’Meester’,’Burscher’]

list ages = [18,22,45,23]

dict familynames= {’Bjoern’: ’Burscher’,’Damian’: ’Trilling’, ’Lori’: ’Meester’}

dict {’Bjoern’: 26, ’Damian’: 31, ’Lori’:25}

Note that the elements of a list, the keys of a dict, and the valuesof a dict can have any datatype! (It should be consistent, though!)

Big Data and Automated Content Analysis Damian Trilling

Page 7: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

More advanced datatypes

list firstnames = [’Damian’,’Lori’,’Bjoern’]lastnames =[’Trilling’,’Meester’,’Burscher’]

list ages = [18,22,45,23]

dict familynames= {’Bjoern’: ’Burscher’,’Damian’: ’Trilling’, ’Lori’: ’Meester’}

dict {’Bjoern’: 26, ’Damian’: 31, ’Lori’:25}

Note that the elements of a list, the keys of a dict, and the valuesof a dict can have any datatype! (It should be consistent, though!)

Big Data and Automated Content Analysis Damian Trilling

Page 8: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

More advanced datatypes

list firstnames = [’Damian’,’Lori’,’Bjoern’]lastnames =[’Trilling’,’Meester’,’Burscher’]

list ages = [18,22,45,23]

dict familynames= {’Bjoern’: ’Burscher’,’Damian’: ’Trilling’, ’Lori’: ’Meester’}

dict {’Bjoern’: 26, ’Damian’: 31, ’Lori’:25}

Note that the elements of a list, the keys of a dict, and the valuesof a dict can have any datatype! (It should be consistent, though!)

Big Data and Automated Content Analysis Damian Trilling

Page 9: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

More advanced datatypes

list firstnames = [’Damian’,’Lori’,’Bjoern’]lastnames =[’Trilling’,’Meester’,’Burscher’]

list ages = [18,22,45,23]

dict familynames= {’Bjoern’: ’Burscher’,’Damian’: ’Trilling’, ’Lori’: ’Meester’}

dict {’Bjoern’: 26, ’Damian’: 31, ’Lori’:25}

Note that the elements of a list, the keys of a dict, and the valuesof a dict can have any datatype! (It should be consistent, though!)

Big Data and Automated Content Analysis Damian Trilling

Page 10: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

Functions

functions Take an input and return something elseint(32.43) returns the integer 32. len("Hello")returns the integer 5.

methods are similar to functions, but directly associated withan object. "SCREAM".lower() returns the string"scream"

Both functions and methods end with (). Between the (),arguments can (sometimes have to) be supplied.

Big Data and Automated Content Analysis Damian Trilling

Page 11: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

Functions

functions Take an input and return something elseint(32.43) returns the integer 32. len("Hello")returns the integer 5.

methods are similar to functions, but directly associated withan object. "SCREAM".lower() returns the string"scream"

Both functions and methods end with (). Between the (),arguments can (sometimes have to) be supplied.

Big Data and Automated Content Analysis Damian Trilling

Page 12: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

Functions

functions Take an input and return something elseint(32.43) returns the integer 32. len("Hello")returns the integer 5.

methods are similar to functions, but directly associated withan object. "SCREAM".lower() returns the string"scream"

Both functions and methods end with (). Between the (),arguments can (sometimes have to) be supplied.

Big Data and Automated Content Analysis Damian Trilling

Page 13: BD-ACA week2

Basics Exercise Next meetings

Datatypes

Python lingo

Functions

functions Take an input and return something elseint(32.43) returns the integer 32. len("Hello")returns the integer 5.

methods are similar to functions, but directly associated withan object. "SCREAM".lower() returns the string"scream"

Both functions and methods end with (). Between the (),arguments can (sometimes have to) be supplied.

Big Data and Automated Content Analysis Damian Trilling

Page 14: BD-ACA week2

Indention: The Python way of structuring your program

Page 15: BD-ACA week2

Basics Exercise Next meetings

Indention

Indention

StructureThe program is structured by TABs or SPACEs

1 firstnames=[’Damian’,’Lori’,’Bjoern’]2 age={’Bjoern’: 27, ’Damian’: 32, ’Lori’: 26}3 print ("The names and ages of all BigData people:")4 for naam in firstnames:5 print (naam,age[naam])

Don’t mix up TABs and spaces! Both are valid, but you haveto be consequent!!!

Big Data and Automated Content Analysis Damian Trilling

Page 16: BD-ACA week2

Basics Exercise Next meetings

Indention

Indention

StructureThe program is structured by TABs or SPACEs

1 firstnames=[’Damian’,’Lori’,’Bjoern’]2 age={’Bjoern’: 27, ’Damian’: 32, ’Lori’: 26}3 print ("The names and ages of all BigData people:")4 for naam in firstnames:5 print (naam,age[naam])

Don’t mix up TABs and spaces! Both are valid, but you haveto be consequent!!!

Big Data and Automated Content Analysis Damian Trilling

Page 17: BD-ACA week2

Basics Exercise Next meetings

Indention

Indention

StructureThe program is structured by TABs or SPACEs

1 print ("The names and ages of all BigData people:")2 for naam in firstnames:3 print (naam,age[naam])4 if naam=="Damian":5 print ("He teaches this course")6 elif naam=="Lori":7 print ("She was an assistant last year")8 elif naam=="Bjoern":9 print ("He helps on Wednesdays")

10 else:11 print ("No idea who this is")

Big Data and Automated Content Analysis Damian Trilling

Page 18: BD-ACA week2

Basics Exercise Next meetings

Indention

IndentionThe line before an indented block starts with a statementindicating what should be done with the block and ends with a :

Indention of the block indicates that

• it is to be executed repeatedly (for statement) – e.g., foreach element from a list

• it is only to be executed under specific conditions (if, elif,and else statements)

• an alternative block should be executed if an error occurs(try and except statements)

• a file is opened, but should be closed again after the block hasbeen executed (with statement)

Big Data and Automated Content Analysis Damian Trilling

Page 19: BD-ACA week2

Basics Exercise Next meetings

Indention

IndentionThe line before an indented block starts with a statementindicating what should be done with the block and ends with a :

Indention of the block indicates that

• it is to be executed repeatedly (for statement) – e.g., foreach element from a list

• it is only to be executed under specific conditions (if, elif,and else statements)

• an alternative block should be executed if an error occurs(try and except statements)

• a file is opened, but should be closed again after the block hasbeen executed (with statement)

Big Data and Automated Content Analysis Damian Trilling

Page 20: BD-ACA week2

Basics Exercise Next meetings

Indention

IndentionThe line before an indented block starts with a statementindicating what should be done with the block and ends with a :

Indention of the block indicates that

• it is to be executed repeatedly (for statement) – e.g., foreach element from a list

• it is only to be executed under specific conditions (if, elif,and else statements)

• an alternative block should be executed if an error occurs(try and except statements)

• a file is opened, but should be closed again after the block hasbeen executed (with statement)

Big Data and Automated Content Analysis Damian Trilling

Page 21: BD-ACA week2

Basics Exercise Next meetings

Indention

IndentionThe line before an indented block starts with a statementindicating what should be done with the block and ends with a :

Indention of the block indicates that

• it is to be executed repeatedly (for statement) – e.g., foreach element from a list

• it is only to be executed under specific conditions (if, elif,and else statements)

• an alternative block should be executed if an error occurs(try and except statements)

• a file is opened, but should be closed again after the block hasbeen executed (with statement)

Big Data and Automated Content Analysis Damian Trilling

Page 22: BD-ACA week2

Basics Exercise Next meetings

Indention

IndentionThe line before an indented block starts with a statementindicating what should be done with the block and ends with a :

Indention of the block indicates that

• it is to be executed repeatedly (for statement) – e.g., foreach element from a list

• it is only to be executed under specific conditions (if, elif,and else statements)

• an alternative block should be executed if an error occurs(try and except statements)

• a file is opened, but should be closed again after the block hasbeen executed (with statement)

Big Data and Automated Content Analysis Damian Trilling

Page 23: BD-ACA week2

Basics Exercise Next meetings

Indention

IndentionThe line before an indented block starts with a statementindicating what should be done with the block and ends with a :

Indention of the block indicates that

• it is to be executed repeatedly (for statement) – e.g., foreach element from a list

• it is only to be executed under specific conditions (if, elif,and else statements)

• an alternative block should be executed if an error occurs(try and except statements)

• a file is opened, but should be closed again after the block hasbeen executed (with statement)

Big Data and Automated Content Analysis Damian Trilling

Page 24: BD-ACA week2

Basics Exercise Next meetings

We’ll now together do the exercise “Describing an existingstructured dataset”.

Big Data and Automated Content Analysis Damian Trilling

Page 25: BD-ACA week2

Basics Exercise Next meetings

Next meetings

Big Data and Automated Content Analysis Damian Trilling

Page 26: BD-ACA week2

Basics Exercise Next meetings

Week 3: Data harvesting and storageMonday, 13–4A conceptual overview of APIs, scrapers, crawlers, RSS-feeds,databases, and different file formats

Wednesday, 15–4Writing some first data collection scripts

Preparation

• Conceptual level: Read the article by Morstatter, Pfeffer, Liu,and Carley (2013) about the limitations of the Twitter API.

• Technical level: Make sure you are comfortable with thetechniques we’ve covered so far. Play around. Giveyourself some tasks and solve them. Google.

Big Data and Automated Content Analysis Damian Trilling