Professors: Diego Sáez-Trumper and Pablo Aragón


Information and Communication Technologies and the Internet are core infrastructures of the modern society. The World Wide Web and social media have transformed the way we collect, process and exploit information, significantly affecting areas as diverse as finance, labor, politics, education, research and many other aspects of our daily life. 

The goal of this course is to provide students with the theoretical background and practical tools of Web Intelligence based on techniques from Information Retrieval, Natural Language Processing and Semantics. We will review the state of the art in Social Media Analysis and Web Search and Mining, including methods and resources to make sense of data from web platforms.


The student must have a solid background in mathematics, algorithms and data structures. The student is expected to have programming skills in Python. Machine learning is optional but suggested.


Beyond the introduction to Web Intelligence (history of the Web, the long tail, social media, online data collection, etc.), contents are divided in two tracks: 

  • Social Media Analysis:
    Application Programming Interfaces
    Sentiment Analysis
    Topic Modeling
  • Web Search and Mining:
    The web as resource
    Knowledge Base

Additional content will be covered through seminars by guest lecturers.

Teaching Methods
Core sessions of the two tracks will combine seminar lectures with hands-on labs for the students to retrieve and analyze data from the Web. The course will also include two seminars with guest lecturers to illustrate students with industrial cases of interest.


  • Baeza-Yates, R. & Ribeiro-Neto B. (2010). Modern Information Retrieval (2nd edition). Addison-Wesley.
  • Salganik, M. J. (2019). Bit by bit: Social Research in the Digital Age. Princeton University Press
  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit.  O'Reilly Media, Inc
  • Jurafsky, D. (2019). Speech & Language Processing (3rd edition)



There will be labs in different sessions for students to form groups (2 people) and put into practice the concepts of the course through notebooks in Python, templates will be released on 

Throughout the course, each group will build their own notebook to analyse any topic of their interest through the techniques and libraries presented during the labs. The addition of other approaches (new web data sources, complementary libraries, etc.)  will be positively valued.

The notebooks will be presented and evaluated in the final session.