Explainable Artificial Intelligence

Motivation

As artificial intelligence (AI) systems are increasingly deployed in high-stakes domains such as healthcare, justice, and finance, the need for transparency, interpretability, and human trust has become paramount. Despite their strong predictive performance, many state-of-the-art models—particularly black-box approaches such as deep neural networks—lack the level of clarity required for ethical, accountable, and trustworthy decision-making.
In recent years, the machine learning (ML) community has increasingly recognised the critical importance of model interpretability. The body of research on Explainable Artificial Intelligence (XAI) has grown substantially, driven by legal, ethical, social, and usability considerations that demand a deeper understanding of how ML systems operate and influence decisions. While several XAI techniques have reached a level of maturity—most notably post-hoc explanation methods such as LIME and SHAP—these approaches often struggle to provide explanations that are fully faithful, stable, and aligned with human understanding.
In parallel, alternative approaches have emerged, including inherently interpretable and sparse models, as well as Human-in-the-Loop Learning (HILL) techniques that integrate human knowledge and feedback into the learning process. This course aims to provide a comprehensive overview of the challenges and opportunities in XAI, addressing both the regulatory landscape—where transparency and value alignment are increasingly mandated—and the technical foundations of explainability. Through the study of algorithms, methodologies, and practical case studies, students will gain a solid understanding of this now well-established and rapidly evolving field.

Course Description

This course explores the foundations, methods, and applications of Explainable Artificial Intelligence (XAI). Students will study why explainability is essential in high-stakes domains, examine regulatory and ethical requirements, and learn state-of-the-art technical approaches to interpreting machine learning models. The course combines theoretical foundations, hands-on implementation, and human-centred evaluation of explanations.

Learning Outcomes

By the end of the course, students will be able to:

Explain why interpretability and transparency are critical in AI systems.
Distinguish between different types of explainability (global vs. local, intrinsic vs. post-hoc).
Apply and critically evaluate XAI techniques, including LIME, SHAP, counterfactual methods, and saliency methods.
Design interpretable ML models and Human-in-the-Loop systems.
Assess explainability methods with respect to faithfulness, robustness, and usability.
Understand regulatory and ethical frameworks governing explainable AI.

Prerequisites

Machine Learning Fundamentals
Python or Java programming
Basic knowledge of statistics and linear algebra

Syllabus

Topic 1: Introduction to Explainable AI and the Foundations of Interpretability

Motivation: trust, accountability, and risk
Black-box vs. interpretable models
Case studies in healthcare, justice, and finance
Definitions and taxonomies of interpretability
Global vs. local explanations
Model transparency vs. post-hoc explainability

Topic 2: Legal, Ethical, and Societal Perspectives

EU AI Act and regulatory requirements
Fairness, accountability, and non-discrimination
Human-centered and value-aligned AI
Inherently Interpretable Models
- Linear models, decision trees, rule-based systems
- Sparse and monotonic models
- Accuracy–interpretability trade-offs

Topic 3: XAI Methods

Feature Attribution Methods
- Saliency maps
- Gradient-based methods
- Integrated gradients and limitations
Post-hoc Explainability Techniques
- LIME
- SHAP
Counterfactual Explanations**
- What-if analysis
- Actionable explanations
- Optimisation-based counterfactual generation

Topic 4: Explainability for Deep Learning*

CNN and transformer explainability
Attention mechanisms
Explainability in NLP and vision
Robustness and Faithfulness of Explanations
- Stability and sensitivity of explanations
- Adversarial attacks on XAI
- Evaluation metrics for explainability

Topic 5: Human-in-the-Loop Learning

Interactive ML and explainability
User feedback and model refinement
Trust calibration and cognitive aspects

Topic 6: XAI in Practice

Toolkits and libraries (SHAP, Captum, InterpretML)
Deployment challenges
Explainability in MLOps pipelines
Evaluation and User Studies
- Human-centred evaluation methods
- Usability and acceptance testing
- Designing explanation interfaces

Assessment Structure (To be confirmed)

Assignments / Labs (40%)
- Practical implementation and analysis of XAI methods
Midterm Exam or Essay (20%)
- Conceptual and critical understanding
Final Project (40%)
Design and evaluation of an explainable AI system (with report and presentation)

Bibliography

[1] Molnar, C. “Interpretable Machine Learning” (https://leanpub.com/interpretable-machine-learning)
[2] Doshi-Velez & Kim (2017). “Towards a Rigorous Science of Interpretable ML” (https://arxiv.org/abs/1702.08608)
[3] Uday Kamath, John Liu “Explainable Artificial Intelligence: An Introduction to Interpretable Machine Learning” (https://link.springer.com/book/10.1007/978-3-030-83356-5)

Some papers by Vladimir Estivill-Castro on Machine Ethics, Explainable IA and Human-In-The-Loop Learning

Vladimir Estivill-Castro and Nuru Nabuuso “Efficient Construction of Interpretable Oblique Decision Trees” EXPLAINS : 2nd International Conference on Explainable AI for Neural and Symbolic Methods (2025)
”Erdélyi, O., Erdélyi, G., Estivill-Castro, V. (2025). “Why Randomness Matters for Fairness”. In New Trends in Disruptive Technologies, Tech Ethics and Artificial Intelligence. DiTTET 2025. vol 1465. Springer
Georgios Angelopoulos, Vladimir Estivill-Castro: “Human-Robot Dialogue that Elicits the Alignment of Moral Principles For Driverless Vehicles”. ACM-HRI (2024: 196-200)
Vladimir Estivill-Castro, Eugene Gilmore, René Hexel: “Constructing Explainable Classifiers from the Start - Enabling Human-in-the Loop Machine Learning”. Information. 13(10): 464 (2022)
Vladimir Estivill-Castro, Eugene Gilmore, René Hexel: “Interpretable Decisions Trees via Human-in-the-Loop-Learning”. AusDM 2022: 115-130
Misbah Javaid, Vladimir Estivill-Castro: “Explanations from a Robotic Partner Build Trust on the Robot's Decisions for Collaborative Human-Humanoid Interaction”. Robotics 10(1): 51 (2021)
Vladimir Estivill-Castro: “Collaborative Knowledge Acquisition with a Genetic Algorithm”. ICTAI 1997: 270-277