Master Thesis
Evaluation Guidelines
The master thesis project is evaluated by a committee constituted by three members: the thesis supervisor and two other members of the department. This evaluation considers the tutoring sessions, the oral presentation and the written report.
Important dates
Early Jan | Agree with your supervisor on a topic |
31 March | First draft containing motivation, context, state of the art and scientific questions. |
Late June | Submission of the draft report to the panel committee. |
July | Oral presentations and submission of the final report to the panel committee. |
Oral presentation
Oral presentations of the thesis will be scheduled in mid July. They will take 20 minutes plus 10 minutes of questions from the evaluation committee.
Thesis report
The student has to submit a draft version of the thesis report to the evaluation committee before June 30th. Accordingly, the student and the supervisor have to arrange that the supervisor has enough time to read and assess the final thesis text before the presentation. The final version of the report must be submitted to the panel committee before the oral presentation.
Formatting of the report
As for the format of the written thesis (Font size, line spacing, margins, Section numbering etc) we propose that the students follow the template provided by the Universitat Pompeu Fabra for master theses. For Latex you can use this one. The MIIS board does not have any preference on the word processor and will accept documents written with any word processor.
The actual report can be structured according to Introduction, State of the Art, Methods, Results, Discussion and Conclusion. However, this is not mandatory. Different overall organizations of the report can be used, if necessary. As a general guideline we would like to indicate that a Master report is not supposed to reach the comprehensiveness of a PhD thesis. An adequate length of the thesis is between 30-50 pages (this number is regarded as a guideline).
More Information
Proposed thesis topics
Title & Description |
Supervisor(s) |
Abstraction and Reasoning Challenge This project is about addressing the Abstraction and Reasoning Challenge (ARC), a kaggle competition consisting of large corpus of visual tasks that require abstraction and reasoning skills to be solved. Each ARC task contains 3-5 pairs of train inputs and outputs, and a test input for which you need to predict the corresponding output with the pattern learned from the train examples. The objective of this project to apply AI techniques (and possibly suggest novel ones) that combine search and learning for automated program synthesis that are developed in the AI/ML research group to address the challenge, or a part of it. |
|
Motion coordination of a dual-arm mobile manipulator The goal of the thesis is to create and implement a generic and modular layer, which encapsulates and coordinates the entire movement of the individual robots that the Mobile Anthropomorphic Dual-Arm Robot platform is made of, to used it in a task and motion planning research framework. |
|
Visual Servo Control in a Collaborative Robot for WEEE (Waste Electrical and Electronic Equipment) dismantling This project presents the application of a visual servo control to a collaborative robot, both using a simulation environment and in a real platform for dismantling of WEEE components. |
|
Collaborative robot teaching using augmented/virtual reality The objective of the thesis is to create and implement a teleoperation system assisted by augmented/virtual reality devices to command a collaborative robot with the purpose of improving the robot performance and teach it new skills. |
|
Learning policy and reward function using active reward learning techniques minimizing human supervision In robot learning contexts, defining a function that specifies when to reinforce the agent's actions can often be very expensive. To address this challenge, algorithms will be developed that will not only learn the robot's control policy but also learn the reinforcement function. To do this, supervisors (human or based on conventional task planners) will be used to evaluate the robot performance. Strategies will be implemented that will minimize the number of supervision calls and maximize the information obtained in each call. |
|
Development of safe learning strategies based on constraint manifolds Reinforcement learning in robotic agents is particularly complex due to several challenges including safety during the learning process and mechanical constraints. These challenges are not commonly addressed in the generic artificial intelligence literature since they do not deal with physical agents. In this project, common reinforcement learning strategies will be combined with constrained optimizations in the constraint space to ensure secure learning. |
|
Implementation of randomization strategies for learning conditions to minimize the Sim2Real gap This project addresses the challenge that exists to be able to take advantage of learning conducted in a simulated environment in real environments. Mechanisms will be introduced for a greater generalization of the control policies learned, focusing on the randomization of physical conditions of the simulation environment. These strategies will be applied transversally to the different learning methodologies so that it is expected to obtain robust learning to variations in the conditions of the real environment. |
|
Gender bias in online platforms This project will analyze and study potential gender biases of online platforms. These may include job search engines, e-commerce websites, search engines, and platform economy sites for which some data could be scrapped/obtained. The specific platform is to be decided with the master student. The work will involve data scrapping/collection, measurement through fair ranking/recommendation algorithms, and analysis of potential biases found. It requires solid knowledge of Python and desire to work within a framework of data feminism, which seeks to eliminate gender biases in data collection and data processing pipelines. |
|
Risk-Prediction Instruments in Criminal Justice The Justice Department of Catalonia has been using for several years structured risk prediction instruments, similar to COMPAS in the US or OASIS in the UK, to predict the risk of recidivism and violent recidivism. Correctional officers, psychologists, and social workers apply a questionnaire to juvenile offenders and to people who have been convicted for a crime (these are different questionnaires); the questionnaire is then processed using a model trained on thousands of previous cases, and an output is produced, which is then interpreted by the person applying the instrument. Detailed anonymized data is available for studying to what extent different instruments can be biased against different sub-populations (e.g., immigrants or children of immigrant). Additionally, we would like to explore the potential for new risk prediction instruments. The work would be done in collaboration with researchers on criminology at the University of Barcelona and at the Justice Department of Catalonia. |
|
Reinforcement Learning for Virus mutations modelling RNA viruses use different mechanisms of genetic variation for their survival and with this mechanism they are among the most prevalent parasites for humans. One of the characteristics that helps viruses achieve this is their high mutation rate. Their offspring on average differs by one or two mutations from their parent. This combined with their short replication times, results in a complex virus swarm. Their ability to change their genome this easily makes them able to appear in new hosts and circumvent immunity obtained by vaccination. One promising technique to anticipate new Variants of Concern is to apply inverse Reinforcement Learning to mutation data, thereby modelling the process of virus evolution. In a previous collaboration with UPF1, we jointly developed a method to apply Inverse Reinforcement Learning to evolutionary data of SARSCoV2 to predict which mutations could be more dangerous in the future. In the scope of this contract, we ask UPF to extend that previous model to the multiple rewards setting with a proof-of-concept implementation applied to SARS-CoV-2 data in Python. Inverse Optimal Control for Modeling Virus Mutations in SARS-CoV-2 (TFM Ilse Meijer) |
|
Optimal strategies for future health emergencies in Europe The COVID-19 pandemic, caused by the severe acute respiratory coronavirus 2 (SARS-CoV-2), shows characteristics that are challenging for public health care systems. Nonpharmaceutical interventions such as population- and individual-based social distancing, testing, and contact tracing have so far been the principal public health measures. Mathematical models have been developed to assess the transmission dynamics of the virus, the severity of the disease, as well as the effectiveness of public health measures. In the aftermath of the COVID-19 pandemic, we would like to find optimal strategies for managing potential health crises. Most mathematical models are based on deterministic compartmental modelling, where the epidemiological subpopulations are classified into a susceptible category, several infected population groups as well as groups with recovered individuals or fatalities. Dynamic transitions from one class to the next are based on transition rates, usually constant and sometimes also variable over time. In JRC we recently developed a novel method to understand and control the spread of epidemics [1]. In the scope of this collaboration JRC will provide an initial implementation of the model and UPF will extend it and test it on new epidemiological data. [1] A Distributed Optimal Control Model Applied to COVID-19 Pandemic. R-M. Kovacevic et al. |
|
Encrypted Federated Learning for health data Encrypted learning (i.e. Homomorphic encryption, MC2, etc) is a technique that permits users to perform computations without sharing the data in clear. This is important for sensitive data, such as health care information, because it can enable new services by removing privacy barriers inhibiting data sharing or increase security to existing services. In the scope of this collaboration JRC.F7 will provide several models to classify bacteria from their DNA using a variety of ML techniques such as XGBoost, CNN and LSTM neural networks and we ask UPF to provide a deliverable comparing the performance of those models trained in a standard centralized fashion against encrypted federated learning. |
|
Functional genomic pathways in gut microbiome Recent advances in the field of human gut microbiome research have revealed significant associations and potential mechanistic insights regarding a vast array of complex chronic diseases, including cancer, auto-immune disease and metabolic syndrome. Recently, a paper [1] introduced the Gut Microbiome Health Index (GMHI), an index for evaluating health status (i.e., degree of presence/absence of diagnosed disease) based on the species-level taxonomic profile of a stool shotgun metagenome (gut microbiome) sample. GMHI determines the likelihood of having a disease, independent of the clinical diagnosis; this is done so by comparing the relative abundances of two sets of microbial species associated with good and adverse health conditions, which are identified from an integrated dataset of 4347 publicly available, human stool metagenomes. However, genetic elements in bacteria are extremely "dynamic", and they can easily move from one species to another, by using mechanisms like plasmid gene transfer, or conjugation. This could in turn make the taxonomic classification less relevant for this type of studies because it is crucial to keep active a "biological function" independently from the species it acts. For this reason, we hypothesize that the GMHI could become more robust if it does not rely only on the taxonomic profile of the bacteria, but also on their network of biological functions. In the scope of this contract, the JRC will provide the data and initial scripts to recalculate the GHMI index and the Tenderer will replicate the study presented in [1] and study the possibility of making it more robust by explicitly incorporating genetic functional groups. |
|
Relation-based identification and classification of idiosyncratic expressions in multilingual newspaper material Correct interpretation of idiosyncratic expressions such as take a walk, meet a challenge, give up the advantage, launch a war, etc. (referred to as “collocations”) are of high relevance to many natural language processing applications and second language learning alike: their meaning is not compositional. In other words, it is not composed of the meaning of the isolated words of the expression – you don’t’ “take” anything when you take a walk and you don’t “meet” anyone when you meet a challenge. The goal of this thesis is to develop a neural network-based algorithm that takes the relation between the individual words of such expressions into account to identify them in texts and classify them in accordance with a given typology. The work of the thesis will build upon an existing algorithm for identification of semantic relations (such as, e.g., between walk and movement or between war and fight). |
|
Multilingual Neural Natural Language Text Generation Deep learning models have become common for natural language generation. They usually start from raw text material (which they then transform into a more fluent narrative), syntactic trees or semantic graphs. Experimental setups also cope with ontological structures of limited size. However, none of them combines so far different types of input structures. In this thesis, a model will be explored, which, when prompted with a limited number of ontological predicate-argument statements (such as, e.g., ‘(007 IS literary_character)’, ‘(007 PROTAGONIST_IN “High_time_to_kill”), ‘(Benson AUTHOR “High_time_to_kill”)’), generates a short story (in this case, about James Bond, the Bond Book High time to kill and its author Raymond Benson), drawing upon supplementary textual material and selecting the relevant information from it. |
|
Deep reinforcement learning-driven dialogue management In order to ensure a flexible coherent conversation between a human and the machine that goes beyond predefined information exchange patterns, advanced dialogue management strategies must be developed, which take the history and the goals of the conversation into account and which are able to handle interruptions, side sequences, grounding and other phenomena of a natural dialogue. Neural network-based Reinforcement Learning models have shown to have the potential to cope with these challenges. In the proposed thesis, such a model will be explored on a large dataset of movie dialogues. The focus will be on handling side sequences, i.e., interventions of one of the interaction parties that introduce a new topic into the conversation. |
|
Identification of the communicative intent of the speaker in spontaneous dialogues In order to adequately react to a statement of the user, a conversational agent must not only understand the content of this statement, but also identify correctly the intention of the user when they utter it. In the context of this thesis, a generic deep learning-based communicative intent identification (and classification) model will be developed. As training and development corpus, one of the available large-scale dialog corpora will be used. The model will be tested on out-of-domain dialogues. |
|
Entity and entity relation extraction and linking in medical discourse In order to obtain a comprehensive description of a specific entity (e.g., intervertebral disc) or process (e.g., intervertebral disc degeneration) in terms of its properties and relations, contexts, etc. from textual material, this entity / process and its synonyms must be identified, disambiguated and related to other relevant entities / processes. In this thesis, a deep neural model for entity and entity relation extraction and linking for English data will be developed and applied to the topic of intervertebral disc degeneration (protein and RNA classes), for which a specialized corpus is already available. The thesis will start from an existing implementation developed at TALN. |
|
Machine vision learning in noisy environments and unreliable object labelling For industrial processes, such as quality control of objects on moving conveyors, often there are issues with the data quality, such as lighting conditions and pose variation, target object and feature variation, and significant errors in labelling/classifying the image features. Hence, it is interesting to evaluate techniques which mitigate one or more of these issues, such as unsupervised learning from the images, automatic labelling and adversarial learning (e.g. 2 or more deep neural networks). Also, white box data modelling techniques can complement the black box models, especially in the data exploration phase. In the thesis project, the student will apply various techniques to improve on our current results, hopefully coming close to the state of the art in the field. The student will have the opportunity for close collaboration with IRIS Technology Solutions on real deployment projects using state of the art hardware and software, which may include EC Horizon projects. The MSc candidate will be provided with the available data and support from the end users and experienced technical specialists. Requirements: Basic programming skills in C++, Python. You Only Look Once (YOLO): Unified, Real-Time Object Detection |
|
‘Kinetic’ simulator using a multi-agent system Refers to a simulator using "intelligent agent" software. An existing system will be used as starting point, developed in Repast Simphony and ReLogo (a Java type language), which has basic rules programmed for agent interaction and movement in a 2D solution space. The work will involve adding greater intelligence to the agents (e.g. based on rules induced from the process data or by calling a trained machine learning model), and testing with real and simulated data. "Scalability" tests should also be done so that the simulation can run faster with a larger volume of data. The student will have the opportunity for close collaboration with IRIS Technology Solutions, in order to develop and implement the technical part, and validate the system on real applications, which may be related to EC Horizon projects. Requirements: Basic programming skills in Java, C++, Python. Complex adaptive systems modeling with Repast Simphony An Introduction to Repast Simphony Modeling using a simple predator-prey example |
|
Smart platform for project planning and resource assignment in complex environments This platform will be dedicated to complex and dynamic project planning, which includes elements of resource planning, budget assignment, task assignment. worker satisfaction. The system will embody different advanced AI components: a recommender system (case based reasoning with machine learning back-end); a multi-agent system to model the solution space for resources and projects; game theory to evaluate scenarios with gain/penalty metrics and competitive as well as collaborative resource assignment. The student will have the opportunity for close collaboration with IRIS Technology Solutions for the deployment of this project. The MSc candidate will be provided with the available data and support from experienced technical specialists. Requirements: Basic programming skills in C++, Java, Python. Cooperative Game Theoretic Framework for Joint Resource Management in Construction Applications of game theory in project management: a structured review and analysis |
|
Data Analysis in Education Technologies Several topics related to the application of AI and data analytics techniques to the design of systems that support teaching and learning (includes community, teaching and learning analytics, interactive dashboards, etc.) |
|
Urban data governance. Beyond open data How data from public sector, private companies and citizens can be managed and governed to extract more richness while keeping privacy and securing data explotation models. The EU’s Data Governance Act seeks for promoting data reuse and the GAIA-X project is meant to provide an architecture for such ideal. Current debates around Data Spaces and Data Trusts, some include a Federated infraestructure and some others trust in DAOs (blockchain) to provide a solution that includes democratic approach to data sharing. This TFM will explore different opportunities and question what are the assumptions behind each model and understand how different infraestructure is needed to accomplish the needs of different uses of data in the urban context. It will be focused on theoretical analysis of different models and technical assessments of current data technologies and architectures. In addition, the analysis of different spatial-temporal datasets will be analysed to study how cities can integrate and benefit from this approach. |
|
Intersectional Fairness During the last years, Fairness Accountability and Transparency (FaccT) has been discussed to reduce social harm and discrimination in the design and development of algorithms. Although, most of the works on FaccT approaches only address discrimination towards limited categories in just one dimension (e.g., gender, race). Recently, voices from black feminism and social studies have called to a more depth approach to a fair and just development. Intersectionality is an analytical framework for understanding how aspects of a person's social and political identities combine to create different modes of discrimination and privilege. It identifies multiple factors of advantage and disadvantage. Examples of these factors include gender, caste, sex, race, class, sexuality, religion, disability, physical appearance, and height. That is, it goes behind the classical gender/race binaries. Thus, looking at one dimension at a time doesn’t always tell us the whole story. Despite that the call for an intersectional approach is growing among the Computer Science and Data Science communities, few works explore the real possibility of working the concept of FAccT over algorithms. The goal of this thesis is to develop an exploration of diverse datasets and algorithms used in the context of social interest (e.g., social welfare, recidivism, education) and to develop technical solutions taking into account an intersectional perspective. This thesis is approached with a highly theoretical conceptualization as well as a technical development. |
|
Fair housing algorithm Housing market in Barcelona is under constant scrutiny due to high demand and current economic-crisis. Moreover, the housing market prices are determined by the same longstanding valuations without reflecting current societal and environmental needs (e.g. green areas, low pollution, and diversity). The goal of this thesis is to understand which factors related to climate change and pollution could affect the real state market and housing policies in the near future. The project is meant to analyze different sources of data and to analyze the current status of Barcelona’s housing market from a data-driven approach. As a result, it is expected to develop an algorithm to re-calculate the cost of housing, not under market demand-offer dynamics, but under Sustainable Development Goals (SGDs) indicators, the impact of climate change and social justice principles. |
|
3D Herbert : Delivering packages with drones The project is to develop the control software for a Drone so that it reproduces the behaviour of the famous Herbert prototype that illustrated the subsumption architecture for robotics. However, the idea is to involve several variants, first, the problem would be in 3D and second, the application should allow races between the fully autonomous robot and a tele-operated Drone piloted by a human being. The project should integrate vision to recognise objects, and task planning to anticipate the most optimal set of future actions to efficiently complete a course of object collection and delivery. The project shall start with a demonstration on a simulator such as Gazebo o V-REP and potentially use ROS, but that is not necessary. An additional challenge is that the course should be easily configurable. |
|
Ranking Attributes for Information Gain in Forests of Decision Trees (data science / machine learning related project) Decision trees are one of the most well-known techniques in machine learning, data science, analytics and data mining. They are transparent: can be presented as rules for human understanding. With the availability of big-data, algorithms for construction of decision trees in a platform like MapReduce are a new challenge. This project consists of developing the parallelization algorithms and their implementation, perhaps for Hadoop when we consider forest of decision trees. This implementation will be focused on the applications of the forest of decision trees on privacy-preserving data mining on on-line social networks. We can provide references to recent literature on how the forest of decision trees are used to provide privacy alerts to users of OLSN. Another application is feature selection. |
|
Manipulation with Pepper arm and fingers The project consists of developing the infrastructure and integrating motion planning and task planning algorithms for a Pepper robot to pick up boxes, like small cereal boxes. Currently, there is significant interest in the robotics community to combine these two types of planning to carry out tasks like cleaning a table with different objects. This kind of task is now common in the [email protected] challenges. The project should include elements of feedback between motion planning and vision. The idea is that vision may not recognise objects but suggest new positions which would enable the recognition of objects. Simultaneously, motion planning may be refined by feedback from vision. |
|
Combining reasoning with robotic localisation There are many algorithms for robotic localisation in a field. However, these algorithms are really integrated with qualitative reasoning approaches. The most famous example of qualitative reasoning is Allen's interval algebra (and associated algorithms). The challenge is to investigate a spatial qualitative reasoning system over the robotic localisation in a soccer field to obtain strategic or tactical decision making that shows improved performance is the soccer field for RoboCup soccer. |
|
Game playing Pepper The aim is to develop a demonstration of human-robot interaction by which a Pepper robot plays as naturally as possible a simple game, like tic-tac-toe on a fixed space, perhaps with rope and special pieces. The robot applies some localisation and some game strategy. The entire software is to be developed using model-driven development and finite-state machines as much as possible for coordinating the control. The robot shall be flexible in its speech and use localisation within the game space but not necessarily within the room. Sensor fusion to detect humans have completed their move is the main research challenge. |
|
Ethical Pepper The aim is to develop a demonstration of human-robot interaction by which a Pepper robot learns ethical dilemmas and reason over ethical challenges. The robot shall apply several techniques including game theory or dialogue with humans to argue and explain ethical decisions. The goal is to investigate practical debates between a robot and humans regarding the alignment problem. |
|
Statistical Modeling of Online Discussions Online discussion is a core feature of numerous social media platforms and has attracted increasing attention for different and relevant reasons, e.g., the resolution of problems in collaborative editing, question answering and e-learning platforms, the response of online communities to news events, online political and civic participation, etc. This project aims to address, from a probabilistic modeling perspective, some existing challenges, both computational and social, that appear in platforms that enable online discussion. For example, how to deal with scalability issues, how to evaluate and improve the quality of online discussions, or how to generally improve social interaction such platforms. Generative models of online discussion threads: state of the art and research challenges |
|
Modelling the cooperative behaviours in real-time environments using Reinforcement Learning In this project we will model real-time experimental Game Theoretic tasks involving several agents using Reinforcement Learning techniques. The model will be based on Markov Decision Processes (MDP). The aim is to be able to make predictions on modifications of the experiments and to add increasingly complex features to the model including the prediction of other agent behavior and agent identity. Cooperative emerging behaviors will be studied, like for example in the presence of limited resources as in the Tragedy of the Common case where agents need to learn to consume resources in a controlled and coordinated way. Requirements: machine learning, autonomous systems |
|
Novel Algorithms for Centralized Multiagent Planning Centralized multiagent planning is the problem of computing a joint plan for a team of agents that collaborate to achieve a goal. This problem has many applications in the real world, e.g. ride sharing, logistic planning and manufacturing. The aim of this project is to develop novel algorithms for multiagent planning based on an existing approach that reduces centralized multiagent planning problems to exponentially smaller approximate AI planning problems that can be efficiently solved using state-of-the-art planning methods. The novel algorithms will be tested in challenging scenarios from the Multi-Agent Programming Contest. |
|
Graph Neural Networks for Large-scale Optimization Large-scale constrained optimization problems are frequently encountered in real-world applications, e.g. traffic management, logistic planning, warehouse optimization, multiagent planning, and scheduling, to name a few. Though constrained optimization problems have been extensively studied, because of their computational complexity, existing optimal solvers do not scale well to complex problems. Recently, graph neural networks (GNNs) have been successfully applied to approximate solutions to large optimization problems. The aim of the thesis is to study the potential of graph neural networks for large-scale optimization, and test it in realistic applications. |
|
Linearly-solvable Markov decision processes with function approximation A linearly-solvable Markov decision process (LMDP) is a special case of the more commonly used Markov decision process (MDP) in reinforcement learning. In LMDPs, the Bellman equations governing the value function are linear, which typically makes learning more efficient. In spite of being simpler than MDPs, LMDPs are surprisingly expressive and can be used to represent many reinforcement learning benchmarks. In particular, any MDP with deterministic actions can be converted into an LMDP. However, the large body of work on deep reinforcement learning largely ignores LMDPs as a representation. The goal of this project is to develop function approximation algorithms for LMDPs, e.g. deep neural networks. One way to do so is to study existing algorithms for MDPs and simplify them for the LMDP setting.. |
|
Entropy-regularized reinforcement learning algorithms Entropy regularization is an extremely popular component of modern reinforcement learning methods. After several years of applying entropy regularization as a heuristic to drive exploration and improve optimization properties of RL algorithms, there has been lots of recent progress in providing solid theoretical foundations justifying this technique. Most recently, a sophisticated regularization technique has lead to the development of a principled RL algorithm, Q-REPS, that is entirely derived from first principles, yet can be implemented in large-scale environments essentially without any approximations to the theory. This project aims to progress the research on this method from two parallel directions: 1) Implement Q-REPS in large-scale environments and analyze its performance empirically and 2) analyze various properties of the algorithm that remain poorly understood. (Some important questions being: understanding the role of the regularization functions and finding alternatives, or studying the properties of the action-value functions returned by the algorithm.) The project requires strong mathematical skills, particularly in linear algebra and probability theory. Knowledge of convex analysis and optimization is a plus, but not absolutely necessary at the current stage. A unified view of entropy-regularized Markov decision processes |
|
Error averaging in approximate dynamic programming Error-propagation analysis of approximate dynamic programming methods is a classic research topic with several implications on reinforcement learning algorithms. The most well-known results consider basic DP algorithms like value iteration and policy iteration, with several new results added in recent years regarding regularized counterparts of these methods. One discovery on this front was the discovery that certain regularization choices result in an "error-averaging" property that ensures a benign propagation of policy evaluation errors: roughly speaking, instead of accumulating the absolute errors linearly, these schemes allow the cancellation of positive and negative errors. This project sets out to understand this phenomenon in more detail by decoupling the averaging effect from the regularization effects as much as possible, thus potentially enabling the development and analysis of more effective error-averaging RL algorithms. The project requires strong mathematical skills, particularly in linear algebra and probability theory. Knowledge of convex analysis and optimization is a plus, but not absolutely necessary at the current stage. Approximate Modified Policy Iteration and its Application to the Game of Tetris Leverage the Average: an Analysis of KL Regularization in RL |
|
Thompson sampling for sequential prediction Thompson sampling is one of the most well-studied algorithms for a class of sequential decision-making problems known as stochastic multi-armed bandit problems. In a stochastic multi-armed bandit problem, a learner selects actions in a sequential fashion, and receives a sequence of rewards corresponding to the chosen actions. A crucial assumption made in this problem is that the rewards associated with each action are random variables drawn independently from a fixed (but unknown) distribution. The goal of this project is to do away with this assumption and study Thompson sampling in non-stationary environments where the rewards may be generated by an arbitrary external process. Precisely, the project considers the framework of sequential prediction with expert advice, and aims to show theoretical performance guarantees for this algorithm and/or analyze its perormance empirically. The project requires strong mathematical skills, particularly in probability theory and multivariate calculus. Knowledge of convex analysis is a plus, but not absolutely necessary at the current stage. Learning to Optimize Via Posterior Sampling An Information-Theoretic Analysis of Thompson Sampling |
|
Better algorithms for online linear-quadratic control Linear-quadratic control is one of the most well-studied problem settings in optimal control theory: it considers control systems where states follow a linear dynamics, and the incurred costs are quadratic in the states and control inputs. In recent years, the problem of online learning in linear-quadratic control problems have received significant attention within the machine-learning community. One particularly interesting development is the formulation of the control problem as a semidefinite program (SDP), which allows the application of tools from online convex optimization. The present project aims to develop new algorithms for online linear quadratic control based on this framework by exploring the possibility of using regularization functions that make better use of the SDP geometry than existing methods based on online gradient descent. The project requires strong mathematical skills, particularly in multivariate calculus and linear algebra. Knowledge of control theory, convex analysis and optimization is a plus, but not absolutely necessary at the current stage. |
|
Deep Neural Multilingual Lexical Simplification Lexical Simplification is a sub-task of Automatic Text Simplification that aims at replacing difficult words with easier to read (or understand) synonyms while preserving the information and meaning of the original text. This is a key task to facilitate reading comprehension to different target readerships such as foreign language learners, native speakers with low literacy levels or people with different reading impairments (e.g. dyslexic individuals). This master’s project aims at investigating novel techniques in Deep Learning and Natural Language Processing to improve current lexical simplification systems. The master student will have the opportunity to develop this research on an available multilingual dataset (English, Portuguese, Spanish) – TSAR 2022 Shared Task Dataset – annotated by human informants. More specifically, the annotations for a given complex word in a sentence is a list of suitable substitutes which could be used instead of the complex word. The candidate is expected to develop cross-lingual or multilingual techniques using available pre-trained models. In the recent TSAR 2022 shared task challenge, several teams participated proposing deep learning techniques (e.g. neural language models) to solve the problem. The candidate is expected to have some knowledge of Natural Language Processing (NLP) and experience with current deep learning paradigms in NLP. ALEXSIS: A Dataset for Lexical Simplification in Spanish. Lexical simplification benchmarks for English, Portuguese, and Spanish. |
|
From Sign Language to Spoken Language: Experiments in Deep Multimodal Translation In the era of mass communication and wide uptake of digital technologies amongst the general public, there still exist numerous communication barriers for the Deaf and Hard-of-Hearing (DHH) community. In spite of the recent advances in Machine Translation (MT) for spoken languages, automatic translation between spoken and Sign Languages or between Sign Languages remains a difficult problem. There is a great opportunity for research in Machine Translation (MT) to bridge the gap between written/spoken languages and Sign Languages. This master’s project aims at experimenting with current approaches in machine translation for sign languages with emphasis on the Sign Language - Spoken Language direction. The student will first experiment with available well-known datasets (e.g. German Sign Language PHOENIX14T dataset – video/text/glosses) exploring different models to extract patterns from videoframes and decode them to produce spoken text. After that, it is planned to apply Transfer Learning to produce models for less resourced SLs (e.g. Spanish or Catalan Sign Language). The candidate is expected to have some knowledge of Natural Language Processing (NLP) and experience with current deep learning paradigms in NLP. Knowledge of image processing is highly desirable. |
|
Audio-visual speech and singing voice separation Source separation is the automatic estimation of the individual isolated sources that make up the audio mixture. The goal of this project is to separate a human voice in a mixture by using both the audio and video modalities. We are interested in both speech and singing voice signals. The most direct applications of speech separation are speaker identification and speech recognition (for example, to create automatic captioning of videos). While some of the applications of singing voice separation are: automatic creation of karaoke, music transcription, or music unmixing and remixing. Leveraging visual and motion information from the target person's face is particularly useful when there are different voices present in the mixture. Deep neural networks that extract features from the video sequence will be explored and used in conjunction with an audio network in order to improve the audio source separation task by incorporating visual cues. |