Top Banner
Tue/Thu 1:25-2:40 Intro to Data Science Kimball B11 https://courses.cit.cornell.edu/info2950_2018sp/ Instructor: Paul Ginsparg (242 Gates Hall) Info 2950, Lecture 1 25 Jan 2018
14

lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

Sep 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

Tue/Thu 1:25-2:40

Intro to Data Science

Kimball B11

https://courses.cit.cornell.edu/info2950_2018sp/

Instructor: Paul Ginsparg (242 Gates Hall)

Info 2950, Lecture 1 25 Jan 2018

Page 2: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

https://courses.cit.cornell.edu/info2950_2018sp/https://courses.cit.cornell.edu/info2950_2018sp/

Final: Tue 22 May 2:00-4:30

cs 2110/2800

Page 3: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

(last year) https://courses.cit.cornell.edu/info2950_2017sp/

Page 4: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

https://piazza.com/cornell/spring2018/info2950/home

Page 5: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

0. Review of basic python / jupyter notebook1. Counting and probability (factorial, binomial coefficients, conditional probability, Bayes Theorem) Real Data: text classifier, etc. [baby machine learning]2. Statistics: mean, variance; binomial, Gaussian, Poisson distributions3. Graph theory (nodes, edges), networks (c.f. Info2040), graph algorithms4. Power Law data (need exponential and logarithms …)5. Linear and Logistic regression, Pearson and Spearman correlators6. Markov and other correlated data

Rosen chapters 2,6,7,10,11Easley/Kleinberg chpts 3,18+ many other on-line resources [e.g. http://www.cs.cornell.edu/courses/cs1380/2018sp/textbook/,adapted from Berkeley http://data8.org/ (started apr ’16)]

Rough Syllabus

Page 6: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

I would found an institution where any person can study data science. - Ezra Cornell

CS 1380 + ORIE 1380 + STSCI 1380Data Science For All

Spring 2018 MWF 10:10-11:00 am No experience required – Open to all – Fulfills MQR-AS

A course for anyone who wants to study data visualization, prediction, machine learning, and programming in Python. We’ll analyze real-world d a t a s e t s o n c r i m e , h e a l t h , transportation, literature, and more!

https://tinyurl.com/datascienceforall

Page 7: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

Problem sets will involve both programming and non-programming problems.

Problem sets are not group projects.You are expected to abide by the Cornell University Code of Academic Integrity. It is your responsibility to understand and follow these policies. (In particular, the work you submit for course assignments must be your own. You may discuss homework assignments with other students at a high level, by for example discussing general methods or strategies to solve a problem, but you must cite the other student in your submission. Any work you submit must be your own understanding of the solution, the details of which you personally and individually worked out, and written in your own words.)

You’ll be penalized if you copy an iPython notebook, OR if yours is copied.

Page 8: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

Will be discussed in section tomorrow:

including instructions for installing anaconda, we'll standardize on python 3due to minor python 2.7/3.6 compatibility issues(though welcome to use python 2)

course upload site: https://pgcourse.infosci.cornell.edu/cgi-bin/probset.py

N.B.: (former?) known problem with python installations: cs 1110 unfortunately recommended misconfigured software that violates standard practice by surreptitiously adding environment variables to ~/.bashrc file.

(instructions for removing:https://courses.cit.cornell.edu/info2950_2018sp/resources/bashprob.html )

“Problem Set 0”, due Tue 30 Jan 23:59

Page 9: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

https://www.anaconda.com/download/

Page 10: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

Definition. A set S is a collection of objects.

The objects of a set are called elements x of the set: x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 5, |C| = 3, |Stu↵| = 4, |;| = 0

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

E = {x | x is even}

A = {1, 2, 3}

1

Definition. A set S is a collection of objects.

The objects of a set are called elements x of the set: x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 5, |C| = 3, |Stu↵| = 4, |;| = 0

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

E = {x | x is even}

A = {1, 2, 3}

1

Definition. A set S is a collection of objects.

The objects of a set are called elements x of the set: x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 5, |C| = 3, |Stu↵| = 4, |;| = 0

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

E = {x | x is even}

A = {1, 2, 3}

1

S

|S| = 29

Page 11: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

Definition. A set S is a collection of objects.

The objects of a set are called elements x of the set: x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 5, |C| = 3, |Stu↵| = 4, |;| = 0

1

A subset T of a set S is a set of elements all of which are contained in S.

T ⇢ S (proper subset) or T ✓ S

empty set ; 2 S for all S

Examples:

C 0 = {Ithaca,Chicago}

C 0 ⇢ C

X 0 = {x | x is a whole number between 2 and 5} is a subset of X

The power set P(S) of a set S is the set of all subsets of S.

Example: For the set A = {1, 2, 3}, P(A) = {;, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}}

For a set S with n elements, what is |P(S)|?

2

A subset T of a set S is a set of elements all of which are contained in S.

T ⇢ S (proper subset) or T ✓ S

empty set ; 2 S for all S

Examples:

C 0 = {Ithaca,Chicago}

C 0 ⇢ C

X 0 = {x | x is a whole number between 2 and 5} is a subset of X

The power set P(S) of a set S is the set of all subsets of S.

Example: For the set A = {1, 2, 3}, P(A) = {;, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}}

For a set S with n elements, what is |P(S)|?

2

S

T

Page 12: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

di↵erence of two sets A�B = {x | x 2 A and x 62 B}

Examples:

X � Stu↵ = {2, 3, 4, 5}

Stu↵ �X = {snow,Cornell, y}

C � ; = {Ithaca,Boston,Chicago}

X � E = {1, 3, 5}

symmetric di↵erence A4B = {x | x 2 A or x 2 B, and x 62 A \B}

Examples:

X4Stu↵ = {2, 3, 4, 5, snow,Cornell, y}

C4; = {Ithaca,Boston,Chicago}

X4A = {4, 5}

Cartesian product of two sets A⇥B = {(x, y) | x 2 A and y 2 B}

Example:

A⇥A = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}

4

For two sets to be the same, must have the same elements.

A = B means that 8x we have x 2 A i↵ x 2 B

(Equivalently A = B means that A ✓ B and B ✓ A)

5

A subset T of a set S is a set of elements all of which are contained in S.

T ⇢ S (proper subset) or T ✓ S

empty set ; 2 S for all S

Examples:

C 0 = {Ithaca,Chicago}

C 0 ⇢ C

X 0 = {x | x is a whole number between 2 and 5} is a subset of X

The power set P(S) of a set S is the set of all subsets of S.

Example: For the set A = {1, 2, 3},P(A) = {;, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}}

For a set S with n elements, what is |P(S)|?

2

A subset T of a set S is a set of elements all of which are contained in S.

T ⇢ S (proper subset) or T ✓ S

empty set ; 2 S for all S

Examples:

C 0 = {Ithaca,Chicago}

C 0 ⇢ C

X 0 = {x | x is a whole number between 2 and 5} is a subset of X

The power set P(S) of a set S is the set of all subsets of S.

Example: For the set A = {1, 2, 3},P(A) = {;, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}}

For a set S with n elements, what is |P(S)|?

2

Page 13: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

Definition. A set S is a collection of objects.

The objects of a set are called elements x of the set: x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 5, |C| = 3, |Stu↵| = 4, |;| = 0

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

E = {x | x is even}

A = {1, 2, 3}

1

Set Operations

union of two sets A [B = {x | x 2 A or x 2 B}

Examples:

X [ Stu↵ = {1, 2, 3, 4, 5, snow,Cornell, y}

C [ ; = {Ithaca,Boston,Chicago}

A [X = {1, 2, 3, 4, 5} (In this case, A [X = X).

intersection of two sets A \B = {x | x 2 A and x 2 B}

Examples:

X \ Stu↵ = {1}

C \ ; = ;

X \ E = {2, 4}

A \X = {1, 2, 3} (In this case A \X = A)

3

Set Operations

union of two sets A [B = {x | x 2 A or x 2 B}

Examples:

X [ Stu↵ = {1, 2, 3, 4, 5, snow,Cornell, y}

C [ ; = {Ithaca,Boston,Chicago}

A [X = {1, 2, 3, 4, 5} (In this case, A [X = X).

intersection of two sets A \B = {x | x 2 A and x 2 B}

Examples:

X \ Stu↵ = {1}

C \ ; = ;

X \ E = {2, 4}

A \X = {1, 2, 3} (In this case A \X = A)

3

A B

Set Operations

union of two sets A [B = {x | x 2 A or x 2 B}

Examples:

X [ Stu↵ = {1, 2, 3, 4, 5, snow,Cornell, y}

C [ ; = {Ithaca,Boston,Chicago}

A [X = {1, 2, 3, 4, 5} (In this case, A [X = X).

intersection of two sets A \B = {x | x 2 A and x 2 B}

Examples:

X \ Stu↵ = {1}

C \ ; = ;

X \ E = {2, 4}

A \X = {1, 2, 3} (In this case A \X = A)

3

A B

Conditional Probability

Suppose we know that one event has happened and we wish to ask about another.

For two events A and B, the joint probability of A and B is defined as

p(A,B) = p(A ∩ B)

the probability of the intersection of events A and B in the sample space,

equivalently the probability that events A and B both occur

The conditional probability of A relative to B is

p(A|B) = p(A ∩B)/p(B) “the probability of A given B”

Example: Flip a fair coin 3 times.

B = event that we have at least one H

A = event of getting exactly 2 Hs

What is the probability of A given B?

In this case, (A ∩ B) = A, p(A) = 3/8, p(B) = 7/8,

and therefore p(A|B) = 3/7.

Notice that the definition of conditional probability also gives us the formula: p(A∩B) =

p(A|B)p(B). For three events we have: p(A ∩ B ∩ C) = p(A|B ∩ C)p(B|C)p(C). (What is

a general rule?)

We can also use conditional probabilities to find the probability of an event by breaking

the sample space into disjoint pieces. If S = S1 ∪ S2 . . .∪ Sn and all pairs Si, Sj are disjoint

then for any event A, p(A) =!

i p(A|Si)p(Si) =!

i p(A ∩ Si).

Example: Suppose we flip a fair coin twice. Let S1 be the outcomes where the first flip

is H and S2 be the outcomes where the first flip is T . What is the probability of A = getting

2 Hs? p(A) = (1/2)(1/2) + (0)(1/2) = 1/4.

Two events A and B are independent if p(A ∩ B) = p(A)p(B). This immediately gives:

A and B are independent iff p(A|B) = p(A).

12

Page 14: lec1 · Title: lec1.key Created Date: 1/30/2018 5:57:27 PM

Definition. A set S is a collection of objects.

The objects of a set are called elements x of the set: x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 5, |C| = 3, |Stu↵| = 4, |;| = 0

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

E = {x | x is even}

A = {1, 2, 3}

1

di↵erence of two sets A�B = {x | x 2 A and x 62 B}

Examples:

X � Stu↵ = {2, 3, 4, 5}

Stu↵ �X = {snow,Cornell, y}

C � ; = {Ithaca,Boston,Chicago}

X � E = {1, 3, 5}

symmetric di↵erence A4B = {x | x 2 A or x 2 B, and x 62 A \B}

Examples:

X4Stu↵ = {2, 3, 4, 5, snow,Cornell, y}

C4; = {Ithaca,Boston,Chicago}

X4A = {4, 5}

Cartesian product of two sets A⇥B = {(x, y) | x 2 A and y 2 B}

Example:

A⇥A = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}

4

di↵erence of two sets A�B = {x | x 2 A and x 62 B}

Examples:

X � Stu↵ = {2, 3, 4, 5}

Stu↵ �X = {snow,Cornell, y}

C � ; = {Ithaca,Boston,Chicago}

X � E = {1, 3, 5}

symmetric di↵erence A4B = {x | x 2 A or x 2 B, and x 62 A \B}

Examples:

X4Stu↵ = {2, 3, 4, 5, snow,Cornell, y}

C4; = {Ithaca,Boston,Chicago}

X4A = {4, 5}

Cartesian product of two sets A⇥B = {(x, y) | x 2 A and y 2 B}

Example:

A⇥A = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}

4

A Bdi↵erence of two sets A�B = {x | x 2 A and x 62 B}

Examples:

X � Stu↵ = {2, 3, 4, 5}

Stu↵ �X = {snow,Cornell, y}

C � ; = {Ithaca,Boston,Chicago}

X � E = {1, 3, 5}

symmetric di↵erence A4B = {x | x 2 A or x 2 B, and x 62 A \B}

Examples:

X4Stu↵ = {2, 3, 4, 5, snow,Cornell, y}

C4; = {Ithaca,Boston,Chicago}

X4A = {4, 5}

Cartesian product of two sets A⇥B = {(x, y) | x 2 A and y 2 B}

Example:

A⇥A = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}

4

A B

di↵erence of two sets A�B = {x | x 2 A and x 62 B}

Examples:

X � Stu↵ = {2, 3, 4, 5}

Stu↵ �X = {snow,Cornell, y}

C � ; = {Ithaca,Boston,Chicago}

X � E = {1, 3, 5}

symmetric di↵erence A4B = {x | x 2 A or x 2 B, and x 62 A \B}

Examples:

X4Stu↵ = {2, 3, 4, 5, snow,Cornell, y}

C4; = {Ithaca,Boston,Chicago}

X4A = {4, 5}

Cartesian product of two sets A⇥B = {(x, y) | x 2 A and y 2 B}

Example:

A⇥A = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}

4