Course Description

Rapid increases in computational power and the explosion of Internet and mobile phone use have transformed our lives, the way we communicate and our behavior. This digital revolution has also fundamentally changed social sciences. Online data, Social Media data, Cellphone data, and Web experiments offer social scientists the opportunity to address core social questions in new ways.

In this course, we will study how traditional methods used in social sciences can help us make sense of new data sources, and how computational tools can be used to access, analyze and visualize new (and often very large) data sources. The course covers substantive topics relevant to population research as well as a selection of data science tools to extract Internet data, manage large data sets and analyze them.

The main goals of this short course are i) to develop critical thinking about the emergent field of “big data” analysis, ii) to learn some of the methods, approaches and tools of social media and big data analysis, iii) to become familiar with some of the literature at the intersection of population research and computational social science.

The course is designed for a diverse group of students with background in social sciences (e.g., sociology, demography, economics, political science, statistics, psychology). I will emphasize substance, and key statistical and computational concepts that are quite general and can be applied to a number of problems that arise in different disciplines.

There will be a mix of lectures, hands-on guided exercises about various computational tools, and general discussion of reading material. Students are expected to bring their own laptops and to have basic familiarity with the statistical software R.

 

Course Outline

Day 1

Accessing Online data: An introduction to the use of APIs (Application Programming Interfaces) to access data from websites and social media

Day 2

Collecting Twitter data and analyzing tweets

An introduction to text analysis and sentiment analysis

Day 3

Research design for the analysis of social media data.

In-depth discussion of examples from the literature about:

-Calibration methods to address selection bias

-Regression discontinuity for the analysis of Yelp reviews

-Difference-in-difference for the study of Twitter tweets

Day 4

Computational methods to manage very large data sets and for scalability

Day 5

Interactive Data Visualization

 

About the instructor:

Emilio Zagheni is an Assistant Professor of Sociology and a Data Science Fellow of the eScience Institute at the University of Washington, Seattle. He is currently co-chairing the IUSSP (International Union for the Scientific Study of Population) Panel on “Big Data and Population Processes.”  He received his PhD in Demography and MA in Statistics from the University of California, Berkeley. His research uses mathematical, statistical and computationally-intensive approaches to study the causes and consequences of population dynamics. Some of his recent work includes the estimation of international migration patterns using data from Yahoo, Twitter and Linkedin, as well as the development of methods to extract information from non-representative samples. More information is available at www.zagheni.net.