Professors

Pablo Aragón and Diego Sáez-Trumper.

Description

Information and Communication Technologies and the Internet are core infrastructures of modern society. The World Wide Web and social media have transformed the way we collect, process and exploit information, significantly affecting areas as diverse as finance, labor, politics, education, research and many other aspects of our daily life. 

The goal of this course is to provide students with the theoretical background and practical tools of Web Intelligence based on techniques from Information Retrieval, Natural Language Processing and Semantics. We will review the state of the art in Social Media Analysis and Web Search and Mining, including methods and resources to make sense of data from web platforms.

Prerequisites

Students must have a solid background in mathematics, algorithms and data structures. They are also expected to have programming skills, preferably in Python. Machine learning is optional but strongly recommended.

Contents

After an introduction to Web Intelligence (history of the Web, the long tail, social media, online data collection, etc.), contents are divided in two tracks:

  • Social Media Analysis:
    Application Programming Interfaces
    Sentiment Analysis
    Topic Modeling
  • Web Search and Mining:
    The web as resource
    Ontologies
    Knowledge Bases

Additional content will be covered through seminars by guest lecturers.

Methodology

Core sessions of the two tracks will combine seminar lectures with hands-on labs for the students to retrieve and analyze data from the Web. The course will also include two seminars with guest lecturers to illustrate students with cases of interest.

Bibliography

  • Baeza-Yates, R. & Ribeiro-Neto B. (2010). Modern Information Retrieval (2nd edition). Addison-Wesley.
  • Salganik, M. J. (2019). Bit by bit: Social Research in the Digital Age. Princeton University Press
  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit.  O'Reilly Media, Inc
  • Jurafsky, D. (2019). Speech & Language Processing (3rd edition)

Evaluation

There will be labs in different sessions for students to form groups (2 people max.) and put into practice the concepts of the course with Python notebooks, templates will be released on https://colab.research.google.com

Throughout the course, each group will build their own notebook to analyze any topic of their interest through the techniques and libraries presented during the labs. Using additional and novel approaches (new web data sources, complementary libraries, etc.) will be positively valued.

The findings from the notebooks will be presented and evaluated in the final session.