Professors: Vladimir Estivill-Castro

Motivation

As artificial intelligence (AI) systems are increasingly deployed in high-stakes domains such as healthcare, justice, and finance, the need for transparency, interpretability, and human trust has become paramount. Despite their strong predictive performance, many state-of-the-art models—particularly black-box approaches such as deep neural networks—lack the level of clarity required for ethical, accountable, and trustworthy decision-making.
In recent years, the machine learning (ML) community has increasingly recognised the critical importance of model interpretability. The body of research on Explainable Artificial Intelligence (XAI) has grown substantially, driven by legal, ethical, social, and usability considerations that demand a deeper understanding of how ML systems operate and influence decisions. While several XAI techniques have reached a level of maturity—most notably post-hoc explanation methods such as LIME and SHAP—these approaches often struggle to provide explanations that are fully faithful, stable, and aligned with human understanding.
In parallel, alternative approaches have emerged, including inherently interpretable and sparse models, as well as Human-in-the-Loop Learning (HILL) techniques that integrate human knowledge and feedback into the learning process. This course aims to provide a comprehensive overview of the challenges and opportunities in XAI, addressing both the regulatory landscape—where transparency and value alignment are increasingly mandated—and the technical foundations of explainability. Through the study of algorithms, methodologies, and practical case studies, students will gain a solid understanding of this now well-established and rapidly evolving field.

Course Description

This course explores the foundations, methods, and applications of Explainable Artificial Intelligence (XAI). Students will study why explainability is essential in high-stakes domains, examine regulatory and ethical requirements, and learn state-of-the-art technical approaches to interpreting machine learning models. The course combines theoretical foundations, hands-on implementation, and human-centred evaluation of explanations.

Learning Outcomes

By the end of the course, students will be able to:

  1. Explain why interpretability and transparency are critical in AI systems.
  2. Distinguish between different types of explainability (global vs. local, intrinsic vs. post-hoc).
  3. Apply and critically evaluate XAI techniques, including LIME, SHAP, counterfactual methods, and saliency methods.
  4. Design interpretable ML models and Human-in-the-Loop systems.
  5. Assess explainability methods with respect to faithfulness, robustness, and usability.
  6. Understand regulatory and ethical frameworks governing explainable AI.

Prerequisites

  • Machine Learning Fundamentals
  • Python or Java programming
  • Basic knowledge of statistics and linear algebra

Syllabus

Topic 1: Introduction to Explainable AI and the Foundations of Interpretability

  • Motivation: trust, accountability, and risk
  • Black-box vs. interpretable models
  • Case studies in healthcare, justice, and finance
  • Definitions and taxonomies of interpretability
  • Global vs. local explanations
  • Model transparency vs. post-hoc explainability

Topic 2:  Legal, Ethical, and Societal Perspectives

  • EU AI Act and regulatory requirements
  • Fairness, accountability, and non-discrimination
  • Human-centered and value-aligned AI
  • Inherently Interpretable Models
    • Linear models, decision trees, rule-based systems
    • Sparse and monotonic models
    • Accuracy–interpretability trade-offs

Topic 3: XAI Methods

  • Feature Attribution Methods
    • Saliency maps
    • Gradient-based methods
    • Integrated gradients and limitations
  • Post-hoc Explainability Techniques
    • LIME
    • SHAP
  • Counterfactual Explanations**
    • What-if analysis
    • Actionable explanations
    • Optimisation-based counterfactual generation

Topic 4: Explainability for Deep Learning*

  • CNN and transformer explainability
  • Attention mechanisms
  • Explainability in NLP and vision
  • Robustness and Faithfulness of Explanations
    • Stability and sensitivity of explanations
    • Adversarial attacks on XAI
    • Evaluation metrics for explainability

Topic 5: Human-in-the-Loop Learning

  • Interactive ML and explainability
  • User feedback and model refinement
  • Trust calibration and cognitive aspects

Topic 6: XAI in Practice

  • Toolkits and libraries (SHAP, Captum, InterpretML)
  • Deployment challenges
  • Explainability in MLOps pipelines
  • Evaluation and User Studies
    • Human-centred evaluation methods
    • Usability and acceptance testing
    • Designing explanation interfaces

Assessment Structure (To be confirmed)

  1. Assignments / Labs (40%)
    • Practical implementation and analysis of XAI methods
  2. Midterm Exam or Essay (20%)
    • Conceptual and critical understanding
  3. Final Project (40%)
  4. Design and evaluation of an explainable AI system (with report and presentation)

Bibliography

[1] Molnar, C. “Interpretable Machine Learning” (https://leanpub.com/interpretable-machine-learning)
[2] Doshi-Velez & Kim (2017). “Towards a Rigorous Science of Interpretable ML” (https://arxiv.org/abs/1702.08608)
[3] Uday Kamath, John Liu “Explainable Artificial Intelligence: An Introduction to Interpretable Machine Learning” (https://link.springer.com/book/10.1007/978-3-030-83356-5)

Some papers by Vladimir Estivill-Castro on Machine Ethics, Explainable IA and Human-In-The-Loop Learning

  • Vladimir Estivill-Castro and Nuru Nabuuso “Efficient Construction of Interpretable Oblique Decision Trees” EXPLAINS : 2nd International Conference on Explainable AI for Neural and Symbolic Methods (2025)
  • ”Erdélyi, O., Erdélyi, G., Estivill-Castro, V. (2025). “Why Randomness Matters for Fairness”. In New Trends in Disruptive Technologies, Tech Ethics and Artificial Intelligence. DiTTET 2025. vol 1465. Springer
  • Georgios Angelopoulos, Vladimir Estivill-Castro: “Human-Robot Dialogue that Elicits the Alignment of Moral Principles For Driverless Vehicles”. ACM-HRI (2024: 196-200)
  • Vladimir Estivill-Castro, Eugene Gilmore, René Hexel: “Constructing Explainable Classifiers from the Start - Enabling Human-in-the Loop Machine Learning”. Information. 13(10): 464 (2022)
  • Vladimir Estivill-Castro, Eugene Gilmore, René Hexel: “Interpretable Decisions Trees via Human-in-the-Loop-Learning”. AusDM 2022: 115-130
  • Misbah Javaid, Vladimir Estivill-Castro: “Explanations from a Robotic Partner Build Trust on the Robot's Decisions for Collaborative Human-Humanoid Interaction”. Robotics 10(1): 51 (2021)
  • Vladimir Estivill-Castro: “Collaborative Knowledge Acquisition with a Genetic Algorithm”. ICTAI 1997: 270-277