Web Content Display

Professors: CARLOS CASTILLO

Description

Critical Data Studies is an interdisciplinary subject offered by professors from three disciplines: computing, philosophy, and law.

The course seeks to train students in addressing issues of personal data protection, as well as those arising from two broad applications of intelligent systems: data-driven decision support, and automated decision making.

About half of the sessions cover computing technologies for, e.g., anonymizing data, or detecting and mitigating algorithmic bias. The other half of the sessions study different conceptualizations of power around data processing pipelines, analyze bias and discrimination in computer systems from a moral philosophy perspective, and overview the relevant legal frameworks for data processing.

The course includes 12 theory sessions for delivering and discussing the main concepts and methods. Optionally, students can attend 6 seminar sessions for case studies, not graded, and 6 practice sessions to receive help in the data analysis assignments. The evaluation will be done on the basis of a mid-term exam and a final exam (about the theory part), assignments (data analysis), and a report (algorithmic audit project. The scope of the course are issues of fairness, accountability, transparency in data processing from an ethical, legal, and technological perspective.

  1. Personal data processing: privacy, confidentiality, surveillance, recourse, data collection and power differentials
  2. Data-driven decision support: biases and transparency in data processing, data-rich communication, and data visualization
  3. Automated decision making: conceptualizations of power and discrimination in scenarios with different degrees of automation.
  4. External algorithmic auditing in practice: data collection, metrics definition, metric boundaries, reporting.

Format

The course will combine seminar lectures with problem-solving sessions (both analytical and programming). The programming sessions will be in Matlab, following the BRML toolbox. The evaluation will be based on regular exercises and a final exam.

Contents

T01. The risks of data processing: We introduce the main problems we will be dealing with during the course, and present
different conceptualizations of these issues.

T02. Ethical frameworks and tools for auditing algorithms: We analyze the possibilities and limits of automatic auditing of data
processing.

T03. Data protection regulation: We describe the regulation of data protection in Europe, focusing on data protection issues.

T04. Statistical Disclosure Control: We present technologies for protecting privacy on data, with an emphasis on k-anonymity.

T05. Addressing biases in data processing: We introduce data feminism and a framework to understand biases in social data processing.

T06. Regulation of automated profiling and decision making: We describe the regulation of data protection in Europe, focusing on automated profiling and algorithmic decision making.

T07. Algorithmic Discrimination: We introduce key conceptualizations around discrimination within a computer system.

T08. Measuring algorithmic bias in automatic classification: We introduce metrics for algorithmic fairness in automated systems used for classification and prediction.

T09. Mitigating biases in automatic classification: We introduce metrics for algorithmic fairness in automated systems used for classification and prediction.

T10. Fairness in algorithmic processing: We describe ethical concerns of fairness, and how can they be translated into requirements for computer systems.

T11. Fairness and transparency in ranking and recommendation: We study issues of bias and transparency in the context of information systems to rank people or items.

T12. Guest lecturer: data protection: We will host a guest lecture on data protection.

Teaching Methods

The course is structured around theory classes in which the topics of the course are introduced. In optional practice sessions, students may receive help on the programming assignments, which are to be delivered individually. In optional seminar sessions, students work in small groups analyzing a case or a problem posed by the professor; these are not graded.

Associated skills

CB8. That the students are able to integrate knowledge and face the complexity of making judgments on the basis of information that is incomplete or limited, including reflecting about the social and ethical responsibilities associated with the application of their knowledge and judgment.

Specific skills and learning outcomes

CE1. Apply models and algorithms in machine learning, autonomous systems, natural language interaction, mobile robotics and/or web intelligence to a well-identified problem of intelligent systems.

  1. Solves problems related to interactive intelligent systems. Specifically, students can solve the problem of detecting and mitigating biases in such a system.
  2. Identifies the appropriate models and algorithms to solve a specific problem in the field of interactive intelligent systems.Specifically, students can identify data processing methods to reduce disclosure risk and to mitigate biases.
  3. Evaluates the result of applying a model or algorithm to a specific problem. Specifically, students can use standard metrics of algorithmic fairness, and at the same time understand the limitations of such metrics.
  4. Presents the result of the application of a model or algorithm to a specific problem according to scientific standards. Specifically, students can present in written the results of an external audit (without the collaboration of the auditee), performed over an existing dataset or an existing online service.

Prerequisites

The course requires students to know basic machine learning methods and basic data mining methods, from data preparation to data modeling and analysis.

Evaluation

Continuous evaluation will be based in the following elements:

  • P = Average of grades in practices/assignments = 25%
  • R = Project = 25%
  • M = Mid-term exam = 20%
  • E = Final exam = 30%

The project corresponds to auditing a specific dataset or a specific online service for algorithmic discrimination and presenting the quantitative and qualitative conclusions in a brief (6 pages) paper.

To pass the course under continuous evaluation:

  • P must be greater than or equal to 5.0.
  • 0.4M + 0.6E should be greater than or equal to 5.0.
  • 0.25P + 0.25R + 0.20M + 0.30E should be greater than or equal to 5.0.

If a student fails to pass, a resit exam is necessary. The resit exam replaces the final exam grade (E in the list above).

Bibliography

Main books:

  • O'neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.
  • Barocas, S., Hardt, M., Narayanan, A. (TBA). Fairness and machine learning: limitations and opportunities. Work in Progress.

Recommended books:

  • Perez, C. C. (2019). Invisible women: Exposing data bias in a world designed for men. Random House.
  • D'Ignazio, C., & Klein, L. F. (2020). Data feminism. MIT Press.
  • Eubanks, V. (2018). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin's Press.
  • Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. nyu Press.
  • Moreau, S. (2020). Faces of Inequality: A Theory of Wrongful Discrimination. Oxford University Press, USA.

Complete teaching plan