Web Intelligence

Professors: ANA FREIRE, FRANCESCO BONCHI, RICARDO BAEZA-YATES

Description

Study how to gather, process, search and mine data in the Web and its applications to search engines. Understand the basic concepts behind information retrieval and data mining.

Prerequisites

The student must have a solid background in mathematics, algorithms and data structures. Machine learning is optional but suggested.

Contents

Its content is divided in three parts: theory of information retrieval, information retrieval on the web and information retrieval on data.

  • Introduction:
    • Characteristics of the Web. Web structure. Retrieval vs. browsing. The long tail. Social networks.
  • Basic concepts of information retrieval and data mining
    • Main document relevance models: Boolean, vector, probabilistic. Browsing models. Precision vs. retrieval. Quality evaluation. Reference collections.
    • Inverted indexes. Construction. Query processing. Use of compression.
    • Basic data mining algorithms.
  • Information retrieval on the web
    • Architecture of a Web search engine. The crawler. Indexing systems, queries and ranking. Scalability. Ranking through link analysis. Multimedia search: images, audio and video.
  • Web Data Mining
    • Mining the content of the Web. Example: opinion mining. Structure Mining and Social Networks. Example:  finding communities. Usage mining. Example: query log analysis. Advanced example: Web Spam detection.

Bibliography

  • Baeza-Yates, R. y Ribeiro-Neto, B. Modern Information Retrieval, 2nd edition, Addison-Wesley 2010 (www.mir2ed.org)
  • Chakrabarti,. S. Web data mining: Discovering Knowledge from Hypertext Data, Morgan Kaufmann, 2002 (second edition to appear).
  • World Wide Web Consortium, w3c.org, 2011.

Evaluation

The evaluation is through a written report where the student has to survey a topic related to the content of the course, including his/her own thoughts.