Top Banner
introduction Outline Introduction Course outline Software packages Accessing data References Data Analytics for Social Science Introduction Johan A. Elkink School of Politics & International Relations University College Dublin 23 January 2020
39

Data Analytics for Social Science Introduction

Apr 13, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Data Analytics for Social ScienceIntroduction

Johan A. Elkink

School of Politics & International Relations

University College Dublin

23 January 2020

Page 2: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

1 Introduction

2 Course outline

3 Software packages

4 Accessing data

Page 3: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Outline

1 Introduction

2 Course outline

3 Software packages

4 Accessing data

Page 4: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Statistics and politics

Page 5: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Statistics and politics

Page 6: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Trade-offs in research methods

In statistical analysis,observations are quantifiedand statistical models are usedto investigate relationshipsbetween variables.

Qualitative Quantitative

small number of cases many casesin-depth breadthmore accurate more generalizablemeasurement validity measurement reliability

(Gerring, 2001)

Page 7: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Typical data

• Survey data, where individuals are asked aboutdemographics, attitudes, behaviour, preferences, etc.

• National data, where characteristics of the institutionalregime or economic variables are recorded.

• Organisational data, e.g. political parties, companies, etc.

Page 8: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

2016 Brexit Referendum questionnaire

“If you do vote in the referendum on Britain’s membership ofthe European Union, how do you think you will vote?”

1 Remain in the EU

2 Leave the EU

3 Don’t know

Page 9: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Example: age and voting in Brexit

0.0

0.2

0.4

0.6

25 50 75

Age

Pro

port

ion

votin

g fo

r le

ave

Support for Brexit by age

Page 10: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

2016 ‘Brexit’ Referendum questionnaire

“How sure are you about what would happen to the UK if itleft the EU or if it remained in the EU?”(separately asked for leaving and for remaining in the EU)

1 Very unsure

2 Quite unsure

3 Quite sure

4 Very sure

5 Don’t know

Page 11: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Example: age and voting in Brexit

0.00

0.25

0.50

0.75

1.00

1 2 3 4

Uncertainty remain in EU

Pro

port

ion

votin

g fo

r le

ave

Uncertaintyleaving EU

1

2

3

4

Support for Brexit by levels of uncertainty

Page 12: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Data science

In academic research, our objective is to understand the socialworld. We typically want to identify causal relationshipsbetween variables. E.g. are voters with less knowledge ofpolitics less likely to vote?

Commercially, the objective is often to predict the social world.E.g. given that you bought this book, and others bought thefollowing books, which book are you most likely to want to buynext?

Not only does the high demand for data scientists mean morework for people who understand social science and statistics,there are also a lot of new tools developed.

E.g. statistical analysis of text, network analysis, deep learningtools.

Page 13: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Data science

In academic research, our objective is to understand the socialworld. We typically want to identify causal relationshipsbetween variables. E.g. are voters with less knowledge ofpolitics less likely to vote?

Commercially, the objective is often to predict the social world.E.g. given that you bought this book, and others bought thefollowing books, which book are you most likely to want to buynext?

Not only does the high demand for data scientists mean morework for people who understand social science and statistics,there are also a lot of new tools developed.

E.g. statistical analysis of text, network analysis, deep learningtools.

Page 14: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Data science

social science = substantive expertise

Page 15: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Outline

1 Introduction

2 Course outline

3 Software packages

4 Accessing data

Page 16: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Topics

looking at data

Introduction

Data inspection &visualisation

Comparing throughvisualisation

classifying groups

Linear regression

Logistic regression

Trees and forests

Networks and geography

mapping data

Cluster analysis

Principal components &multidimensional scaling

Wordscores

Topic models

Page 17: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Topics

visualisation

Introduction

Data inspection &visualisation

Comparing throughvisualisation

supervised learning

Linear regression

Logistic regression

Trees and forests

Networks and geography

unsupervised learning

Cluster analysis

Principal components &multidimensional scaling

Wordscores

Topic models

Page 18: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Topics

survey data

Introduction

Data inspection &visualisation

Comparing throughvisualisation

national data

Linear regression

Logistic regression

Trees and forests

Networks and geography

text data

Cluster analysis

Principal components &multidimensional scaling

Wordscores

Topic models

Page 19: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Textbook

Page 20: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Textbook

Page 21: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Assignments

1 The first analysis will concern survey data and make useprimarily of graphical and descriptive statistics, based onthe Brexit referendum survey. Deadline: 24 February.

2 The second analysis will focus on the use of regressionanalysis and classification, based on country-level data.Deadline: 6 April.

3 The third analysis will focus on the statistical analysis oftext. Deadline: 5 May.

Note that all assignments will be based on the statisticalanalyses performed during the lab sessions, in class—you willnot be required to perform new analysis (although trying someadditional variations can improve the quality of the submission).

Page 22: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Plagiarism

Is not allowed.Details can be found in the syllabus ...

Page 23: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Contact

Without office this year, so: [email protected]

http://www.joselkink.net/DASS-Spring-2020.php

Always check the course website!

Do not hesitate to get in touch when struggling with themodule!

Page 24: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Outline

1 Introduction

2 Course outline

3 Software packages

4 Accessing data

Page 25: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Software comparison

Source: http://r4stats.com/articles/popularity/, 12 June 2015

Page 26: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Software comparison (log scale)

Source: http://r4stats.com/articles/popularity/, 12 June 2015

Page 27: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Software and code

For the sake of replicability and transparency, saving commandsis key in the use of statistical software.

• Data preparation

• Data transformation

• Descriptives

• Analysis

Including clarifyingcommentary.

software format

SPSS .sps

Stata .do

R .R

Python .py

Page 28: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

R

Developed by statisticians and extensively used in politicalscience, data science, statistics, etc.

pros consFree software Variable documentation qualityVery extensive package library Inconsistent interfacesReal programming language Steep learning curve at startLarge and active user-base No graphical user interface1

Multiple data setsHighest quality graphics

http://www.r-project.org

http://www.rstudio.com

1... but that’s why we use RStudio.

Page 29: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

RStudio

Page 30: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

RStudio

Page 31: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

RStudio data view

Page 32: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

RStudio with RMarkdown

See also the video on using Markdown in RStudio.

Page 33: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Data analysis process

Page 34: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Outline

1 Introduction

2 Course outline

3 Software packages

4 Accessing data

Page 35: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Example data set

Age Vote Party Education Sex

1 21 Yes FF 4 Male2 30 No 3 Female3 80 Yes FG 3 Male4 50 Yes Lab 2 Male5 33 No 5 Female6 20 No 2 Female7 43 Yes FF 5 Female8 42 Yes FF 2 Male

FF = Fianna Fail; FG = Fine Gael; Lab = LabourEducation: 1 = none; 2 = primary; 3 = secondary; 4 =tertiary; 5 = post-graduate

Page 36: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Data formats

• data prepared for competitor statistical packages such asStata and SPSS;

• data published in tables on the web, such as in Wikipedia;

• data published in raw text tabular format, especially forexample large surveys;

• data published in Excel or other spreadsheet (see video);

• data stored in relational or non-relational databases, suchas SQL, Redis, etc.;

• or just plain text files.

−→ package “rio”, command “import()”

Page 37: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Data formats

• data prepared for competitor statistical packages such asStata and SPSS;

• data published in tables on the web, such as in Wikipedia;

• data published in raw text tabular format, especially forexample large surveys;

• data published in Excel or other spreadsheet (see video);

• data stored in relational or non-relational databases, suchas SQL, Redis, etc.;

• or just plain text files.

−→ package “rio”, command “import()”

Page 38: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Lab

There is more to say about graphs and visualisation of data,but the best is to get familiar with some basic visualisationsfirst, through the use of R(Studio).

Demonstration:

• RStudio interface

• Data view

• Markdown syntax

• Installing R packages

• ... and error whenpackage is missing

Page 39: Data Analytics for Social Science Introduction

introduction

Outline

Introduction

Course outline

Softwarepackages

Accessing data

References

Gerring, John. 2001. Social science methodology: A critical framework. Cambridge: Cambridge UniversityPress.