Title & Description

Supervisor(s)

Machine Learning algorithms to characterize users affected by health issues 

Social media platforms are often used by users to express how they feel. Specifically, users who are currently struggling with issues such as anorexia nervosa or suicidal instincts, use these channels to constantly express themselves and update their followers on their current status. Getting to know these users would allow to develop intelligent algorithms, capable of supporting them. For this reason, the goal of this thesis is to characterize users affected by one or more of these issues, by developing Machine Learning algorithms that analyze the content they post. The first part of the work will be devoted to the crawling of large datasets from social media platforms (e.g., Reddit and Twitter), to be able to train the algorithms with enough knowledge. Requirements: strong motivation and good Python programming skills.

Ana Freire

Price equity in recommender systems 

Recommender systems suggest items that users might like but with whom they have not interacted yet. They have become one of the main sources through which we interact with items on the Web (e.g., the suggestions of videos on YouTube or products on Amazon). Nowadays, the price of the items is playing a role, in the so-called profit-aware recommender systems. But it is of central importance to provide users with items that have equitable prices, thus avoiding treating them differently. In other words, if two users usually tend to buy cheap items, one should not systematically receive more expensive items than the other. This thesis aims at characterizing how state-of-the-art recommendation algorithms perform in terms of price equity and to develop novel algorithms to guarantee this property. Requirements: strong motivation and good Python/Java programming skills. This work will be done in collaboration with the research center Eurecat.

Eurecat

Ana Freire

Ludovico Boratto

Algorithmic fairness visualization and models comparison 

State of the art in algorithmic fairness has proposed, at least, 21 different metrics of fairness for Automatic Decision Making (ADM) systems. In this context, machine learning practitioners require new inputs of information to understand which definition would better apply in a given system and the potential implications of using that definition to remove potential unfairness of the system. It is known that the applicability of one definition or another to audit a given system depends on the point of view of the auditor (owners of the system vs. those affected by system decisions). The purpose of this thesis would be to understand what kinds of metrics are more suitable for different context, to understand what end-users or developers would consider fair, and/or to develop new visualization mechanisms to help ML practitioners have a better understanding of their system when using one or another definition of fairness.

Carlos Castillo

Fairness and Gender Bias in University Admission Testing 

Standardized testing is a well-establish methodology for determining access to colleges and universities in many countries. Students take a test, apply to some institutions, and are ranked according to their results on those tests. In general, we could see these tests as predictive models that attempt to predict whether the applicant is a good student (e.g., likely to finish his/her studies) or not. In this project, we would like to analyze potential biases of these tests. For instance, we would like to know whether their error rates are different for groups determined by gender, ethnicity, or income. We have detailed anonymized data for hundreds of thousands of students in one OECD country that could be mined for this problem. Additionally, we would like to explore the potential for different affirmative action policies, to explore to what extent they may be beneficial or detrimental to certain groups, such as women.

Carlos Castillo

Risk-Prediction Instruments in Criminal Justice 

The Justice Department of Catalonia has been using for several years structured risk prediction instruments, similar to COMPAS in the US or OASIS in the UK, to predict the risk of recidivism and violent recidivism. Correctional officers, psychologists, and social workers apply a questionnaire to juvenile offenders and to people who have been convicted for a crime (these are different questionnaires); the questionnaire is then processed using a model trained on thousands of previous cases, and an output is produced, which is then interpreted by the person applying the instrument. Detailed anonymized data is available for studying to what extent different instruments can be biased against different sub-populations (e.g., immigrants or children of immigrant). Additionally, we would like to explore the potential for new risk prediction instruments. The work would be done in collaboration with researchers on criminology at the University of Barcelona and at the Justice Department of Catalonia.

Carlos Castillo

Aspect-oriented sentiment analysis 

Sentiment analysis (often also referred to as “opinion mining”) deals with the identifications of sentiments of the author of a text (e.g., review of a product) towards a topic, object, action, etc. Often, a unique sentiment (positive or negative) is assigned to the whole text – which is obviously a simplification since an author can find some aspects of the considered topic/object/action/… positive and some negative. The thesis will address the development of a linguistically-motivated deep learning algorithm for aspect-oriented sentiment analysis that will be able to recognize not only the polarity of the sentiment (`positive’ vs. `negative’) towards an aspect, but also any emotion associated with this sentiment (`happy’, `ambivalent’, `sad’, etc.) in a dataset related to the perception of arts.

Leo Wanner

Alexander Shvets

Aspect-oriented offensive speech analysis 

The detection of offensive speech online, which intends to insult, humiliate or hurt individuals or groups of people, is an increasingly popular research topic. But state-of-the-art solutions focus merely on the classification of a social media post or blog contribution in terms of a predefined typology (e.g., ‘hate speech’, `sexist’, `racist’, …); no analysis is made towards which aspect(s) of the targeted individuals/groups the offence is directed. In this thesis, models will be developed that address this challenge. Newspaper blogs and social media datasets will be used.

Leo Wanner

Juan Soler Company

Multilingual Neural Natural Language Text Generation 

Deep learning models have become common for natural language generation. However, so far, they focus predominantly on the generation of isolated sentences from shallow linguistic (syntactic) structures. In the current thesis, models for the generation of paragraph-long texts from generalized syntactic graphs will be explored. The models will, in particular, be able to capture co-references (such as, e.g., “Barcelona is a lovely city. I live here already for more than 15 years”) and the coherent structure of the narrative. The developed models will be tested on available large datasets for a number of languages.

Leo Wanner

Deep reinforcement learning-driven dialogue management 

In order to ensure a flexible coherent conversation between a human and the machine that goes beyond predefined information exchange patterns, advanced dialogue management strategies must be developed, which take the history and the goals of the conversation into account and which are able to handle interruptions, side sequences, grounding and other phenomena of a natural dialogue. Neural network-based Reinforcement Learning models have shown to have the potential to cope with these challenges. In the current thesis, such a model will be explored on a dataset composed of job interviews. The compilation of the dataset forms part of the thesis. This thesis will be developed in collaboration with the German Centre for Artificial Intelligence.

Leo Wanner

Patrick Gebhard

Identification of Political Bias in Media Coverage 

At the very latest the Fake News debate revealed the crucial role of media for the direction of the political tendencies in a modern society. Unfortunately, even if we leave Fake News aside, main stream media coverage of events with social, societal or political repercussion is not always objective, but, rather, follows a specific political agenda, the interpretation of the event in the light of a specific societal and/or political schema, or an existing (or self-imposed) educational mandate. The thesis will develop strategies for automatic recognition and classification of political bias in selected media coverage. The strategies will ideally implement an incremental learning mechanism, which will allow for a continuous improvement of their performance.

Leo Wanner

Neural Graph-to-Graph Transduction 

The standard neural network-based models in Natural Language Processing (NLP) are sequence-to-sequence models. In other words, they require a linear sequence of entities as input. This is insufficient since in many applications the input is constituted by hierarchical linguistic structures (trees or acyclic directed graphs). To address this challenge, hierarchical models such as Graph CNNs or Tree LSTMs have been proposed to project a hierarchical structure onto a sequence. In the current thesis, models will be explored that make one step further in that they will be able to project a given graph structure onto another graph structure of any required complexity (acyclic graph, tree or chain, i.e., linear sequence). The developed model will be tested on a number of different NLP applications. This thesis will be developed in collaboration with Google AI.

Leo Wanner

Bernd Bohnet

Bootstrapping a multilingual collocation dictionary 

Collocations are idiosyncratic (i.e., language-specific) expressions such as “take a walk” and “ask a question” (cf. in Spanish dar un paseo, lit. 'give a walk' and hacer una pregunta, lit. 'make a question' respectively) are a great challenge in both natural language processing and in second language learning - partially also because their meaning is not composed of the meaning of the isolated words that participate in the combination (thus, you don't 'take' or 'give' anything when you go for a walk). The goal of this Thesis is to develop a deep learning (neural network) - based algorithm for bootstrapping a multilingual English - L2 dictionary of collocations from large corpora, assigning the meaning to the extracted collocations in accordance with a given typology. The work of the thesis will build upon existing example-based deep learning and word embedding implementations and multilingual corpora. This thesis will be developed in collaboration with Cardiff University.

Leo Wanner

Luis Espinosa Anke

Statistical Modeling of Online Discussions 

Online discussion is a core feature of numerous social media platforms and has attracted increasing attention for different and relevant reasons, e.g., the resolution of problems in collaborative editing, question answering and e-learning platforms, the response of online communities to news events, online political and civic participation, etc. This project aims to address, from a probabilistic modeling perspective, some existing challenges, both computational and social, that appear in platforms that enable online discussion. For example, how to deal with scalability issues, how to evaluate and improve the quality of online discussions, or how to generally improve social interaction such platforms.

Generative models of online discussion threads: state of the art and research challenges

Vicenç Gómez

Distributed control for teams of autonomous UAVs 

The aim of this project is to derive controllers for teams of unmanned aerial vehicles (UAVs). The focus will be on extending the current centralized algorithm to a distributed setting. The simulator used for the centralized version is implemented in ROS. Required MIIS courses: machine learning, autonomous systems, mobile robotics.

Real-Time Stochastic Optimal Control for Multi-agent Quadrotor Systems

Vicenç Gómez

Modelling the cooperative behaviours in real-time environments using Reinforcement Learning 

In this project we will model real-time experimental Game Theoretic tasks involving several agents using Reinforcement Learning techniques. The model will be based on Markov Decision Processes (MDP). The aim is to be able to make predictions on modifications of the experiments and to add increasingly complex features to the model including the prediction of other agent behavior and agent identity. Cooperative emerging behaviors will be studied, like for example in the presence of limited resources as in the Tragedy of the Common case where agents need to learn to consume resources in a controlled and coordinated way. Requirements: machine learning, autonomous systems

Vicenç Gómez

Martí Sanchez-Fibla

Greenhouse Gas Emissions: Are Countries Reporting Fabricated Data? 

The levels of Greenhouse Gas (GHG) emissions are being reported by countries under the United Nations Framework Convention on Climate Change. This data is crucial for the future of humanity on Earth. Billionaire economic decisions about global development and finance depend upon this data worldwide. The veracity of the reported figures has been questioned recently, after some records appeared to contradict real measurements coming from air monitoring stations. The uncertainty of the data has been highlighted for transition countries, but the most visible reasons for concerns appeared actually in Europe and in the United States. Current scientific approaches in climate science and the human dimensions of global change lack appropriate tools to identify the case for actively reported false data. There is not yet a conclusive model identifying unreliable outliers for this type of data. This thesis will consist of the application and/or development of statistical and machine learning techniques for the automatic identification of false data being actively reported from countries. The project is a collaboration between the Climate Service Center, Germany and UPF.

Vicenç Gómez

Roger Cremades

Generation and Detection of Deepfakes 

In the recent years many methods for face swapping and manipulation commonly known as "deepfakes" are rapidly evolving. The deepfakes make use of state-of-the-art techniques from deep learning and computer vision fields, making them increasingly harder to detect even for human evaluators. While there are legit applications of these techniques for creating deepfake videos in the audiovisual industry, they have the potential to abuse and to individuals, or propagate fake news. The focus of this thesis is first review the state-of-the-art and develop techniques for creating deepfakes and second, develop new algorithms to detect deepfakes and manipulated media, participating in the "Deepfake Detection Challenge", if results are satisfactory. The project will be carried out in Telefónica Research.

Vicenç Gómez

Ferran Diego

Carlos Segura

Automatic Semantic Representations of Web Documents 

The Web is a huge wealth of information, whose contents are typically accessed by means of search engines. However, search is typically performed by simply finding matching keywords. This approach ignores important aspects of the documents: their topic, degree of objectivity, content quality, readability, etc. The challenge is to (1) identify ways to automatically extract those attributes from web documents and (2) find a representation of the document that allows fast matching with user requests. The goal of this thesis is to develop an unsupervised vector representation method to capture and summarize semantic aspects of Web Documents to be used in a full-scale web search engine. To this end we will explore/extend word and document embedding algorithms like fastText, ELMo and BERT to web document retrieval. This project will be developed in collaboration with NTENT.

NTENT

Andreas Kaltenbrunner

Language Games 

Join an international project (Atlantis - CHIST-ERA) to work with NAO robots on fundamental research in visual question answering, as related to the CLEVR dataset for compositional language and elementary visual reasoning. We will use an analytic approach based on precision language processing in order to have a gold standard for testing and extending the CLEVR dataset. The main goals are (i) to develop a Spanish grammar in Fluid Construction Grammar (similar to an English grammar already developed), (ii) to ground computational semantics in the vision system of the NAO, (iii) implement language games to prepare language evolution experiments for the same set-up. Location: IBE (PRBB building next to Hospital del Mar) UPF Poblenou campus. Requirements: Solid background in (symbolic) computing. Basic familiarity with foundations of AI (machine vision, computational linguistics, knowledge representation).

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Martí Sanchez-Fibla

Luc Steels (IBE, PRBB)

Learning and Planning in Simple Video-Games 

One of the big breakthroughs in AI during the last few years was the DQN algorithm that learned to play Atari video games directly from the screen using a combination of deep learning and reinforcement learning techniques. Neither DQN nor the systems the follow it, however, learn to play these games as humans do which is by understanding the video games in terms of objects and relations, and planning accordingly. The goal of the project is to make progress in that direction by 1) defining suitable high-level planning languages for modeling some these games, 2) defining planning algorithms for deriving the actions to be done in such games when the model is known, and 3) learning parts of the model from observed traces when the model is not fully known. In principle, we will work with symbolic models, complete or incomplete, and not directly with the information available in the screen. That would be a follow up step which goes beyond the work that can be done in a Master project. Some games to consider: Point-and-shoot games, Pong, Space Invaders, Pacman, etc. None lends itself to be modeled and solved by current planning languages and algorithms.

GVG-AI competition

VGDL: Video Game Description Language (used in GVG-AI)

OpenAI Gym

Planning with pixels in (almost) real time

Planning with simulators

Hector Geffner

Active Goal Recognition 

Goal recognition is inferring the hidden goal of an agent given her observed behavior. The recognition problem can be approached as planning problem where the cost of plans compatible with the observations are used to infer a posterior distribution over the possible goals. Action goal recognition is a version of the problem where the observer needs to act in order to obtain the observations require to infer the agent hidden goal. The Master project is about formulating the problem and solving using current planning models and tools.

Plan recognition as planning

Compact policies for fully observable non-deterministic problems as SAT

Hector Geffner

Ranking Attributes for Information Gain in Forests of Decision Trees (data science / machine learning related project) 

Decision trees are one of the most well-known techniques in machine learning, data science, analytics and data mining. They are transparent: can be presented as rules for human understanding. With the availability of big-data, algorithms for construction fo decision trees in a platform like MapReduce are a new challenge. This project consists of developing the parallelization algorithms and their implementation, perhaps for Hadoop when we consider forest of decision trees. This implementation will be focused on the applications of the forest of decision trees on privacy-preserving data mining on on-line social networks. We can provide references to recent literature on how the forest of decision trees are used to provide privacy alerts to users of OLSN. Another application is feature selection.

Vladimir Estivill-Castro

Manipulation with Nao/Pepper arm and fingers 

The project consists of developing the infrastructure and integrate motion planning and task planning algorithms for a Nao/Pepper robot to pick up tiles, like domino pieces. Currently, there is significant interest in the robotics community to combine these two types of planning to carry out tasks like cleaning a table with different objects. This kind of tasks is now common in the [email protected] challenges. We can provide some primary literature on the topic. Infrastructure for this project exists at the Mipal lab (Nathan), but can also be conducted with collaboration at the Gold Coast Robotics lab.

Vladimir Estivill-Castro

Pepper robot is an expert on ethical decision 

Should robots make decision about humans future, lives, or humans' behaviour? The challenge is to construct an intelligent and integrated system on a Pepper robot that can solve ethical dilemas and to evaluate the reaction of human observers to the competencies displayed by the robot. The types of dilemas would be those faced by drivers when facing a choice between two evils. Some of these dilemas will be inspired by the famous series of the Trolley Problem ethical dilemas. Some of these would be adjusted to traffic situations and autonomous vehicles. USer's should be able to configure a scenario for the robot to analyse and to pronounce a judgment. The aim is to evaluate how convincing the robot is about its moral stance.

Vladimir Estivill-Castro

Combining reasoning with robotic localisation 

There are many algorithms for robotic localisation in a field. However, these algorithms are really integrated with qualitative reasoning approaches. The most famous example of qualitative reasoning is Allen's interval algebra (and associated algorithms). The challenge is to investigate a spatial qualitative reasoning system over the robotic localisation in a soccer field to obtain strategical or tactical decision making that shows improved performance is the soccer field for RoboCup soccer.

Vladimir Estivill-Castro

Robot localisation in a soccer field 

The project consists of developing vision algorithms (or use available software) to recognise landmarks of the RoboCup soccer SPL and use such landmarks to perform robot localisation. The benchmark for this project is to be able to generate the behaviour that places the robot at a position in the filed within the 25 seconds of the SET state in the GameController by the league. The research element is to potentially incorporate elements of learning landmarks or hierarchical localisation that finds useful information although not the exact pose. Infrastructure for this project exists at the Mipal lab (Nathan), but can also be conducted with collaboration at the Gold Coast Robotics lab.

Vladimir Estivill-Castro

Game playing Nao 

The aim is to develop a demonstration of human-robot interaction by which a Nao plays as naturally s possible a simple game, like tick-tack-toe on a fixed space, perhaps with rope and special pieces. The robot applies some localisation and some game strategy. The entire software is to be developed using model-driven development and finite-state machines as much as possible for coordinating the control. The robot shall be flexible in its speech and use localisation within the game space but not necessarily within the room. Sensor fusion to detect human have completed their move is the main research challenge.

Vladimir Estivill-Castro

Constrained optimization methods for computational optimal transport 

The framework of optimal transport addresses the problem of measuring distances between probability distributions. A rough definition of optimal transport distances can be given as follows: Given two probability distributions P and Q over the sets X and Y, and a cost function measuring transportation cost between elements of the two sets, the optimal transport distance between P and Q the total cost of transporting all mass from P to Q. This optimization problem can be formulated as a linear program called the Monge--Kantorovich optimal transport problem. This project follows up on recent progress made on computationally efficient algorithms for solving this LP, and particularly investigates the possibility of employing techniques for constrained optimization and saddle-point optimization for improving existing solutions. The project requires very strong mathematical skills, particularly in multivariate calculus and linear algebra. Knowledge of convex analysis and optimization is a plus, but not absolutely necessary at the current stage.

Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances

Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration

Gergely Neu

Thompson sampling for sequential prediction 

Thompson sampling is one of the most well-studied algorithms for a class of sequential decision-making problems known as stochastic multi-armed bandit problems. In a stochastic multi-armed bandit problem, a learner selects actions in a sequential fashion, and receives a sequence of rewards corresponding to the chosen actions. A crucial assumption made in this problem is that the rewards associated with each action are random variables drawn independently from a fixed (but unknown) distribution. The goal of this project is to do away with this assumption and study Thompson sampling in non-stationary environments where the rewards may be generated by an arbitrary external process. Precisely, the project considers the framework of sequential prediction with expert advice, and aims to show theoretical performance guarantees for this algorithm and/or analyze its perormance empirically. The project requires very strong mathematical skills, particularly in probability theory and multivariate calculus. Knowledge of convex analysis is a plus, but not absolutely necessary at the current stage.

Learning to Optimize Via Posterior Sampling

An Information-Theoretic Analysis of Thompson Sampling

Online Linear Optimization via Smoothing

Thompson Sampling for Adversarial Bit Prediction

Gergely Neu

Better algorithms for online linear-quadratic control 

Linear-quadratic control is one of the most well-studied problem settings in optimal control theory: it considers control systems where states follow a linear dynamics, and the incurred costs are quadratic in the states and control inputs. In recent years, the problem of online learning in linear-quadratic control problems have received significant attention within the machine-learning community. One particularly interesting development is the formulation of the control problem as a semidefinite program (SDP), which allows the application of tools from online convex optimization. The present project aims to develop new algorithms for online linear quadratic control based on this framework by exploring the possibility of using regularization functions that make better use of the SDP geometry than existing methods based on online gradient descent. The project requires very strong mathematical skills, particularly in multivariate calculus and linear algebra. Knowledge of convex analysis and optimization is a plus, but not absolutely necessary at the current stage.

Online Linear Quadratic Control

Online PCA with Optimal Regret

Gergely Neu

Implicit regularization methods for high-dimensional optimization 

This project aims at studying the regularization properties of various incremental optimization methods such as gradient descent, averaged gradient descent, and exponentiated gradient descent. While recent work has successfully uncovered relations between averaging schemes for gradient descent and L2 regularization, these results remain specific for the classical problem of linear least-squares regression. One branch of this project is concerned with generalizing these results to more general convex optimization problems. Another direction the project aims to explore is the regularization effects of other gradient-descent variants, and particularly the sparsity-inducing properties of exponentiated gradient descent. The project requires very strong mathematical skills, particularly in multivariate calculus and linear algebra. Knowledge of convex analysis and optimization is a plus, but not absolutely necessary at the current stage.

Exponentiated Gradient versus Gradient Descent for Linear Predictors

Iterate averaging as regularization for stochastic gradient descent

A Continuous-Time View of Early Stopping for Least Squares

Connecting Optimization and Regularization Paths

Implicit Regularization for Optimal Sparse Recovery

Gergely Neu

Multilingual Lexical Simplification 

Lexical simplification is the task of replacing complex words or expressions by simpler synonyms in a context-aware fashion. Lexical simplification is useful to make texts more accessible to different types of users such as people with cognitive impairment. In this project the candidate will investigate current methods in lexical simplification and implement techniques based on current continuous vector representations and neural network architectures. The project seeks to contribute to our current research on multilingual text simplification at TALN. The MsC candidate will have available both a dataset for experimentation and simplification software already available for several languages.

Horacio Saggion

Beyond Abstracts: Generating Summaries of Scientific Texts  

Scientists worldwide face the problem of scientific information overload since the pace at which scientific articles are published is increasing exponentially. In this scenario, the possibility to access to a brief and complete overview of the contents of an article is essential to cope with the great amount of papers to consider. Often abstracts, which are published together with scientific papers, are too short and lack essential information for a complete assessment of the value of the research presented. New approaches to the creation of text summaries, which identify the fundamental contents of documents, constitute a useful instrument to create rich, structured and focused synthesis of the contents of a publication, thus providing new ways to deal with scientific information overload. This master project aims at developing techniques to produce summaries of scientific documents based on an analysis of its content with advanced linguistics tools. The project will investigate techniques to train summarization systems based on available annotated data. The MsC candidate will have available a dataset for experimentation and text processing and summarization libraries to carry out the project.

Horacio Saggion

Criticize or Praise? Citation Characterization in Scientific Papers 

Scientific texts do not stand in isolation, they are connected to each other by means of citations that identify the background on which a given scientific work stands. Citations are particularly important in assessing research output, mainly by means of reference counts (e.g. h-index). Besides citation counting, in recent years, citation semantics, concerning the characterization of the purpose of a citation in a text, started to gain momentum. In order to fully take advantage of citations to assess a piece of work, it is particularly important to understand why a piece of work has been cited in a given context (give credit, identify methods and tools, provide background, criticize, etc.). The characterization of the purpose of citations can have a significant impact in many activities related to the fruition and assessment of scientific literature including scientific text summarization, scientific information retrieval, paper / author recommendation, etc. This master project aims at developing systems to automatically detect the semantics of a given citation in text. The work will be based on the use of supervised techniques to classify citations using a variety of information sources arising from the linguistic and semantic analysis of scientific documents. The MsC candidate will have available both a dataset for experimentation and text processing and summarization libraries to carry out the project.

Horacio Saggion

Extracting the Science from Research Articles 

This thesis will study and develop machine learning techniques (preferably Deep Learning techniques) to extract different types of information from research articles. The work will be based on the development of supervised techniques for the identification of the following types of information: problem, technique, results, advantages, disadvantages, among others. The student will have available data and software to carry out the work.

Horacio Saggion

How do you feel listening? 

This thesis will study and develop machine learning techniques (preferably Deep Learning techniques) to identify the sentiment and emotions of people listening to music in social networks. The study will be based on the analysis of social media data collected before, during, and after concert performances. Contextual information will be used to improve the performance of current systems. The student will have available data and software to carry out the work.

Horacio Saggion

Summarizing Multimodal Content: the case of text and images 

This thesis will investigate the contribution of textual and non-textual information for the summarization of long articles which include multimodal information. Recent works have shown that classification systems can work better when information from multiple modalities is used. This has been little investigated for summarization.

Horacio Saggion

Mining Social Sciences 

The objective of the thesis is to investigate the application of text mining to the field of social sciences. The work will generate techniques for the analysis, visualization, and exploration of collections of documents in the social sciences for the purpose of semantic access, idea formulation, recommendation, etc.

Horacio Saggion

Improving the accessibility of Catalan texts in an e-mail client 

This TFM aims at developing text simplification technology for the Catalan language. More concretely the work will be the adaptation of an existing lexical simplifier with the use of Word Embeddings and integrating it in the e-mail client KOLUMBA developed to make e-mail communication more accessble for people with disabilities. This TFM has a monetary incentive for the student of at least €600 since it is associated to a crowd-sourcing grant.

Horacio Saggion

Construction of interactive statistical atlases of human anatomy 

Universitat Pompeu Fabra is currently collaborating with Queen Mary University of London on a new initiative called the UK Biobank (www.ukbiobank.ac.uk), which is aimed at analysing a large amount of population data for improved understanding of complex diseases. It includes rich biomedical data of 100.000 individuals comprising data on various organs and tissues, including the heart, brain, bones and abdomen. Based on this big data, the goal of the project is to build an interactive statistical model of human anatomy, which will enable to study the biological variability found in given organs and tissues (e.g. heart or brain) and to link these to individual-specific characteristics of health and disease. A user-friendly interface will enable users to interactively select biomedical subgroups of interest (e.g. disease class or female/male) and the system will intelligently compute and display the relevant shape/tissue variability in the organ of interest. The student working on this project should have interest in machine learning and visualization techniques, and good level in English to facilitate communication with our collaborators in London.

Karim Lekadir

Process mining to understanding how teachers design learning activities 

Authoring tools and community platforms devoted to teachers (e.g. the Integrated Learning Design Environment) collect data about teachers’ action in the process of designing learning activities. In this context, the application of process mining techniques would bring light about how teachers design, i.e. which process they follow to gather inspiration by exploring designs created by others to which steps they follow in the authoring process. The TIDE group has developed several authoring tools and the ILDE community platform, which have been used by several teacher communities (two schools, teachers participating in professional development programs, etc). This project will consist in applying process mining techniques to these datasets, extract knowledge about how teachers design in each community and compare them.

Davinia Hernández-Leo

Ishari Amarasinghe

Intelligent Interactive Systems in Education 

Several topics related to the application of interactive and artificial intelligence techniques to the design of systems to support teaching and learning (includes adaptive and personalized learning, classroom orchestration, etc.).

http://www.upf.edu/web/tide

Davinia Hernández-Leo

Data Analysis in Education Technologies 

Several topics related to the application of data analytics techniques to the design of systems that support teaching and learning (includes community, teaching and learning analytics, interactive dashboards, etc.).

http://www.upf.edu/web/tide

Davinia Hernández-Leo

Conversational agents/chatbots for CSCL applications 

Conversational agents have been deployed into a variety of learning technology applications to enrich interaction between humans and machines. Tutorial Dialog Systems that employ Conversational Agents (CAs) to deliver instructional content to learners in one-on-one tutoring settings have been shown to be effective. This project focuses on extending this technology to collaborative learning settings. The student will focus on the development, integration and evaluation of conversational agents (following iterative design process) into a computer-supported collaborative learning (CSCL) tool called 'PyramidApp' which facilitates easy deployment of collaborative learning activities in classroom and distance learning settings.

http://www.upf.edu/web/tide

Davinia Hernández-Leo

Learning analytics for learning redesign and orchestration 

The design and redesign of increasingly effective learning situations are currently not informed by indicators of the impact in learning of previous design realizations. Second, the orchestration of learning situations is a daunting task for teachers and learners, that involves the monitoring, awareness, (self-)regulation and assessment of learning activities. Both problems stem from the fact that obtaining the adequate information required to make decisions about the (re)design and orchestration of non-trivial learning situations is out of reach for teachers and learners, given the high number of participants or the diversity of devices that can be involved in the scenarios. Learning analytics can be considered a suitable approach to tackle both problems as they deal with the analysis of data about learning with the aim of understanding and optimizing learning and the environments in which it occurs. In fact, the potential of learning analytics to improve the support of teachers and students in different settings has already been shown. However, according to a recent report of the European Commission’s JRC, much research still needs to be done to tailor learning analytics for specific needs and contexts. The goal of this project is to investigate to what extent the learning analytics and their visualization should differ if they are to support learning redesign or orchestration.

Davinia Hernández-Leo

Ishari Amarasinghe

Understanding participant behaviours in Citizen Science online learning activities 

Citizen Science (CS) involve the collection and analysis of data relevant to solve research questions by members of the general public, usually as part of a collaborative project with professional scientists. In this master thesis project, the student will select a CS activity that is supported by technology (e.g., a CS devoted platform) to analyze how participants behave and interact with technology to collaborate in the endeavour. The analysis can target organizational/operational characteristics, scientific outcomes, individual/group learning, other success or failure indicators, etc., and societal aspects, related to the impact of those activities on society, such as gender, age, geographical and socio-economic differences; etc.

Davinia Hernández-Leo

Ishari Amarasinghe

Study of the interrelation between MRI-derived structural and functional brain integrity measures in neurodegenerative diseases 

The proposed projects are focussed on the interrelation between MRI-derived structural and functional brain integrity measures in neurodegenerative diseases such as multiple sclerosis, diabetes, epilepsy... The final goal is to get MRI-derived measures that can be applied to the clinical routine. Concretely, we would like to formalize the relation between the structural and functional components by using advanced statistical models; we would like to investigate which component plays a major role in class separation, and identify possible new biomarkers by using machine-learning techniques. Finally, we would like to develop biomarkers of functional integrity based on current analysis techniques that can be incorporated into the clinical routine. The projects will be carried out at the MRI Neuroradiology Unit of the Vall d'Hebron University Hospital.

Gemma Piella

Deep radiologist: using deep convolutional networks to improve malignancy prediction in CT scans 

Lung cancer starts as small asymptomatic nodules in the lungs. Many times those nodules do not evolve in tumours, but it is essential to find them and follow-up their progression in time to decide when to perform excision or biopsy. The objective of this project is to improve current detection algorithm in CT images and test on a large ex-smoker's cohort. In collaboration with Stanford, Hospital AZ Groeninge Belgium and Clinical Universidad de Navarra.

Mario Ceresa

Better world futures 

We live in an increasingly complex, confusing, and multilayered world where we do not fully understand the consequences of our actions. Many times we have different opinions how to solve important pressing problems such as terrorism, climate change, growing population, health epidemics, low economic growth and poverty. We propose to build on massive agent based reinforcement learning methods to help comparing the outcomes of different possible scenarios and choose the best course of action.

Better World Futures

Mario Ceresa

Prediction of trabecular micro-fractures 

Bone fracture is a very local event. The fractured tissue has peculiar morphometric characteristics, however the prediction of this event is still an open question that lead every year to thousands of unpredicted or miss-treated patients. An ongoing study allowed us to study the morphometrical characteristics of trabecular fracture using image processing tools and micro-CT images. During this project we want to develop a classifier able to identify the weak region within the trabecular framework and predict its possibility to fail. The classifier will integrate information coming from the geometry and the mechanical behaviour of the trabecular structure.

Simone Tassani