Thesis
[PhD Thesis] Computational modeling of user activity in full-body interaction environments for ASC children: multimodal analysis of social interaction behaviors through psychophysiology, system activity, video coding, questionnaires, and body cues
Author: Batuhan Sayis
Supervisor: Narcís Parés, Rafael Ramírez
Full-body Interaction experiences based on Mixed Reality (MR) systems are already playing an important role in encouraging socialization behaviors in children with Autism Spectrum Condition (ASC), as seen in the state of the art of this thesis. However, the data from these systems is multimodal in nature and complex to analyze. Fusion and analysis of this data is crucial to achieve a complete understanding of how these resources interact with each other. In this PhD Thesis, given the characteristics of full-body interaction, we developed new multimodal data gathering and evaluation techniques to better understand the effectiveness of the experience developed in our Full-body Interaction Lab (FuBIntLab) called Lands of Fog. This is a large-scale MR, full-body interaction environment, which allows two children to play face-to-face and explore the physical and virtual worlds simultaneously. Specifically, we developed an experimental setup for comparing Lands of Fog with a control condition based on LEGO construction toys, which includes: recording psychophysiological measures synchronized with other data sources such as observed overt behaviors and system logs of game events. In order to capture accurate psychophysiological data, we developed a wearable that is child-friendly and robust to movement artifacts in the context of ambulatory full-body interaction. In order to integrate observed overt behaviors with other data sources, we designed and developed a novel video coding protocol and an adapted coding grid conceived for Social Interaction Behaviors (SIBs) in ASC children. Using a repeated measure design, we collected data from seventy-two children (36 ASC/non-ASC dyads) from the city of Barcelona, with ages between 8-12 years old (N = 12 female, N = 60 male). Data from these trials has been organized into a public database and processed based on a semi-automatic software pipeline developed within this project. Based on this data we developed three different computational models for modelling SIBs in children with ASC during Lands of Fog sessions, compared to LEGO sessions. The results of this research support the idea that full-body interaction MR environments are capable of fostering SIBs in children with ASC with similar success as the LEGO setting, with an added advantage of being more flexible. Findings reported here shed new light on developing a tool that is mediating, guiding, and supporting the progress of the children in terms of practicing SIBs and providing structure and assistance to therapists.
Link to manuscript: http://hdl.handle.net/10803/671850
[PhD Thesis] Emancipation of the bitcoin outcasts: addressing overlooked elements of the bitcoin network for improving security and efficiency
Author: Federico Franzoni
Supervisor: Vanesa Daza
During the last decade, cryptocurrencies have revolutionized the financial industry. In these systems, participants communicate by means of a peer-to-peer protocol. Today, many of such protocols take Bitcoin as a reference model, making its study particularly important. This thesis explores some important aspects of the Bitcoin network, related to its security and efficiency, that received limited coverage in research. Firstly, properties of the Testnet network are explored, showing they can be exploited for malicious activities. Secondly, security aspects of an open network topology are studied, arguing against the current obfuscated approach, and designing a viable monitoring system. Then, unreachable nodes are considered, showing their relevance in the network, and proposing changes to the protocol that improve efficiency and security. Finally, a new transaction relay protocol is proposed, which improves anonymity. The results obtained show that the aspects we analyze are not sufficiently covered in research and deserve more deep investigation.
Link to manuscript: http://hdl.handle.net/10803/671853
[PhD Thesis] Unsupervised learning for parametric optimization in wireless networks
Author: Rasoul Nikbakht Silab
Supervisors: Àngel Lozano Solsona
This thesis studies parametric optimization in cellular and cell-free networks, exploring data-based and expert-based paradigms. Power allocation and power control, which adjust the transmit power to meet different fairness criteria such as max-min or max-product, are crucial tasks in wireless communications that fall into the parametric optimization category. The state-of-the-art approaches for power control and power allocation often demand huge computational costs and are not suitable for real-time applications. To address this issue, we develop a general-purpose unsupervised-learning approach for solving parametric optimizations; and extend the well-known fractional power control algorithm. In the data-based paradigm, we create an unsupervised learning framework that defines a custom neural network (NN), incorporating expert knowledge to the NN loss function to solve the power control and power allocation problems. In this approach, a feedforward NN is trained by repeatedly sampling the parameter space, but, rather than solving the associated optimization problem completely, a single step is taken along the gradient of the objective function. The resulting method is applicable for both convex and non-convex optimization problems. It offers two-to-three orders of magnitude speedup in the power control and power allocation problems compared to a convex solver—whenever appliable. In the expert-driven paradigm, we investigate the extension of fractional power control to cell-free networks. The resulting closed-form solution can be evaluated for uplink and downlink effortlessly and reaches an (almost) optimum solution in the uplink case. In both paradigms, we place a particular focus on large scale gains—the amount of attenuation experienced by the local-average received power. The slow-varying nature of the large-scale gains relaxes the need for a frequent update of the solutions in both the data-driven and expert-driven paradigms, enabling real-time application for both methods.
Link to manuscript: http://hdl.handle.net/10803/671246
[PhD Thesis] Audio-visual deep learning methods for musical instrument classification and separation
Author: Olga Slizovskaia
Supervisors: Emilia Gómez, Gloria Haro
In music perception, the information we receive from a visual system and audio system is often complementary. Moreover, visual perception plays an important role in the overall experience of being exposed to a music performance. This fact brings attention to machine learning methods that could combine audio and visual information for automatic music analysis. This thesis addresses two research problems: instrument classification and source separation in the context of music performance videos. A multimodal approach for each task is developed using deep learning techniques to train an encoded representation for each modality. For source separation, we also study two approaches conditioned on instrument labels and examine the influence that two extra sources of information have on separation performance compared with a conventional model. Another important aspect of this work is in the exploration of different fusion methods which allow for better multimodal integration of information sources from associated domains.
Link to manuscript: http://hdl.handle.net/10803/669963
[PhD Thesis] Responsive spectrum management for wireless local area networks: from heuristic-based policies to model-free reinforcement learning
Author: Sergio Barrachina
Supervisor: Boris Bellalta
In this thesis, we focus on the so-called spectrum management's joint problem: efficient allocation of primary and secondary channels in channel bonding wireless local area networks (WLANs). From IEEE 802.11n to more recent standards like 802.11ax and 802.11be, bonding channels together is permitted to increase transmissions' bandwidth. While such an increase favors the potential network capacity and the activation of higher transmission rates, it comes at the price of reduced power per Hertz and accentuated issues on contention and interference with neighboring nodes. So, if WLANs were per se complex deployments, they are becoming even more complicated due to the increasing node density and the new technical features required by novel highly bandwidth-demanding applications. This dissertation provides an in-depth study of channel allocation and channel bonding in WLANs and discusses the suitability of solutions ranging from heuristic-based to reinforcement learning (RL)-based. To characterize channel bonding in saturated WLANs, we first propose an analytical model based on continuous-time Markov networks (CTMNs). This model relies on a novel, purpose-designed algorithm that generates CTMNs from spatially distributed scenarios, where nodes are not required to be within the carrier sense range of each other. We identify the key factors affecting the throughput and fairness of different channel bonding policies and expose critical interrelations among nodes in the spatial domain. By extending the analytical model to support unsaturated regimes, we highlight the benefits of allocating channels as wide as possible all together with adaptive policies to cope with unfair situations. Apart from the analytical model, this thesis relies on simulations to generalize channel bonding in dense scenarios while avoiding costly, sometimes unfeasible, experimental testbeds. Unfortunately, existing wireless network simulators tend to be too simplistic or too computational demanding. That is why we develop the Komondor wireless network simulator, with the essential advantage over other well-known simulators lying in its high event processing rate. We then deviate from analytical models and simulations and tackle real measurements through the Wi-Fi All-Channel Analyzer (WACA), the first system specifically designed to simultaneously measure the energy in all the 24 bondable Wi-Fi channels at the 5 GHz band. With WACA, we perform a first-of-its-kind spectrum measurement in areas including urban hotspots, residential neighborhoods, universities, and even a football match in Futbol Club Barcelona’s Camp Nou stadium. Our experimental findings reveal the underpinning factors controlling throughput gain, from which we highlight the inter-channel correlation. %We show the significance of the gathered dataset for finding new insights, which would not be possible otherwise, given that simple channel occupancy models severely underestimate the potential gains. As for solution proposals, we first cover heuristic-based approaches to find satisfactory configurations quickly. In this regard, we propose dynamic-wise (DyWi), a lightweight, decentralized, online primary channel selection algorithm for dynamic channel bonding. DyWi improves the expected WLAN throughput by considering not only the occupancy of the target primary channel but also the activity in the secondary channels. Even when assuming significant delays due to primary channel switching, simulations reveal important throughput and delay improvements. Finally, we identify machine learning (ML) approaches applicable to the spectrum management problem in WLANs and justify why model-free RL suits it the most. In particular, we put the focus on the adequate performance of stateless variations of RL and anticipate multi-armed bandits as the right solution since i) we need fast adaptability to suit user experience in dynamic Wi-Fi scenarios and ii) the number of multichannel configurations a network can adopt is limited; thus, agents can fully explore the action space in a reasonable time.
Link to manuscript: http://hdl.handle.net/10803/670782
[PhD thesis] The Orchestration of computer-supported collaboration scripts with learning analytics
Author: Ishari Amarasinghe
Supervisors: Davinia Hernández-Leo; Anders Jonsson
Computer-supported collaborative learning (CSCL) creates avenues for productive collaboration between students. In CSCL, collaborative learning flow patterns (CLFPs) provide pedagogical rationale and constraints for structuring the collaboration process. While structured collaboration facilitates the design of favourable learning conditions, orchestration of collaboration becomes an important factor, as learner participation and real-world constraints can create deviations in real time. On the one hand, limited research has examined the orchestration challenges related to collaborative learning situations scripted according to CLFPs in authentic educational contexts to resolve collaboration at different scales. On the other hand, learning analytics (LA) can be used to provide proper technological tooling, infrastructure and support to orchestrate collaboration. To this end, this dissertation addresses the following research question: How can LA support orchestration mechanisms for scripted CSCL? To address this question, this dissertation first focuses on studying the orchestration challenges associated with scripted CSCL situations on small scales (in the classroom learning context) and large scales (in the distance learning context, specifically in massive open online courses [MOOCs]). In the classroom learning context, lack of teacher access to activity regulation mechanisms constituted a key challenge. In MOOCs, sustained student participation in multiple phases of the script was a primary challenge. The dissertation also focuses on studying the design of LA interventions that might address the orchestration challenges under examination. The proposed LA interventions range from human-in-control to machine-in-control in nature given the feasibility and regulation needs of the learning contexts under investigation. Following a design-based research (DBR) methodology, evaluation studies were conducted in naturalistic classrooms and in MOOCs to evaluate the effects of the proposed LA interventions and to understand the conditions for their successful implementation. The results of the evaluation studies conducted in the classroom context shed light on how teachers interpret LA data and how they action the resulting knowledge in authentic collaborative learning situations. In the distance learning context, the proposed interventions were critical in sustaining continuous flows of collaboration. The practical benefits and limitations of deploying LA solutions in real-world settings, as well as future research directions, are outlined.
Link to manuscript: http://hdl.handle.net/10803/670420
[PhD thesis] Machine learning and deep neural networks approach to modelling musical gestures
Author: David Cabrera Dalmazzo
Supervisor: Rafael Ramírez
Gestures can be defined as a form of non-verbal communication associated with an intention or an emotional state articulation. They are not only intrinsically part of the human language, but also explain specific details of a body-knowledge execution. Gestures are being studied not only in the language research field but also in dance, sports, rehabilitation, and music; where the term is understood as a “learned technique of the body”. Therefore, in music education, gestures are assumed as automatic-motor abilities learned by repetitional practice, to self-teach and fine-tune the motor actions optimally. Hence, those gestures are intended to be part of the performer’s technical repertoire to take fast actions/decisions on-the flight, assuming that they are not only relevant in music expressive capabilities but also, a method for a correct ‘energy-consumption’ habit development to avoid injuries. In this thesis, we applied state-of-the-art machine learning (ML) techniques to model violin bowing gestures in professional players. Concretely, we recorded a database of expert performers and different student levels and developed three strategies to classify and recognise those gestures in real-time: a) First, we developed a multimodal synchronisation system to record audio, video and IMU sensor data with a unified time reference. We programmed a custom C++ application to visualise the output from the ML models. We implemented a Hidden Markov Model to detect fingering disposition and bow-stroke gesture performance. b) A second approach is a system that extracts general time features from the gestures samples, creating a dataset of audio and motion data from expert performers implementing a Deep Neural Networks algorithm. To do so, we have implemented the hybrid model CNN LSTM architecture. c) Furthermore, a Melspectrogram based analysis that can read and extract patterns from only audio data, opening the option of recognising relevant information from the audio recordings without the need for external sensors to achieve similar results. All of these techniques are complementary and also incorporated into an education application as a computer assistant to enhance music-learners practice by providing useful real-time feedback. The application will be tested in a professional education institution.
Link to manuscript: http://hdl.handle.net/10803/670399
[PhD thesis] Nonlinear signal analysis of micro and macro electroencephalographic recordings from epilepsy patients
Author: Cristina González Martínez
Supervisor: Ralph Andrzejak
The use of nonlinear signal analysis measures to characterize electroencephalographic (EEG) recordings can be key for a better understanding of the underlying brain dynamics. In neurological disorders such as epilepsy, these dynamics are altered as result of a disturbed coordination between neuronal populations. The aim of this thesis is to characterize the seizure-free interval of EEG recordings from epilepsy patients by means of nonlinear signal analysis techniques to investigate whether this type of analysis can contribute to the localization of the seizure onset zone, the brain region from which initial seizure discharges can be recorded. For this purpose, we used a surrogate-corrected nonlinear predictability score and a surrogatecorrected nonlinear interdependence measure to analyze all-night EEG recordings from epilepsy patients implanted with hybrid depth electrodes equipped with macro contacts and micro wires. Our results show that the combined analysis of macro and micro EEG recordings may help to further increase the degree to which quantitative EEG analysis can contribute to the diagnostics in epilepsy patients.
Link to manuscript: http://hdl.handle.net/10803/670397
[PhD thesis] Automatic generation of descriptive related work reports
Authors: Ahmed Ghassan Tawfiq AbuRa'ed
Supervisor: Horacio Saggion
A related work report is a section in a research paper which integrates key information from a list of related scientific papers providing context to the work being presented. Related work reports can either be descriptive or integrative. Integrative related work reports provide a high-level overview and critique of the scientific papers by comparing them with each other, providing fewer details of individual studies. Descriptive related work reports, instead, provide more in-depth information about each mentioned study providing information such as methods and results of the cited works. In order to write a related work report, scientist have to identify, condense/summarize, and combine relevant information from different scientific papers. However, such task is complicated due to the available volume of scientific papers. In this context, the automatic generation of related work reports appears to be an important problem to tackle. The automatic generation of related work reports can be considered as an instance of the multi-document summarization problem where, given a list of scientific papers, the main objective is to automatically summarize those scientific papers and generate related work reports. In order to study the problem of related work generation, we have developed a manually annotated, machine readable data-set of related work sections, cited papers (e.g. references) and sentences, together with an additional layer of papers citing the references. We have also investigated the relation between a citation context in a citing paper and the scientific paper it is citing so as to properly model cross-document relations and inform our summarization approach. Moreover, we have also investigated the identification of explicit and implicit citations to a given scientific paper which is an important task in several scientific text mining activities such as citation purpose identification, scientific opinion mining, and scientific summarization. We present both extractive and abstractive methods to summarize a list of scientific papers by utilizing their citation network. The extractive approach follows three stages: scoring the sentences of the scientific papers based on their citation network, selecting sentences from each scientific paper to be mentioned in the related work report, and generating an organized related work report by grouping the sentences of the scientific papers that belong to the same topic together. On the other hand, the abstractive approach attempts to generate citation sentences to be included in a related work report, taking advantage of current sequence-to-sequence neural architectures and resources that we have created specifically for this task. The thesis also presents and discusses automatic and manual evaluation of the generated related work reports showing the viability of the proposed approaches.
Link to manuscript: http://hdl.handle.net/10803/669975
[PhD thesis] Towards spatial reuse in future wireless local area networks: a sequential learning approach
Author: Francesc Wilhelmi Roca
Supervisors: Boris Bellalta, Cristina Cano, Anders Jonsson
The Spatial Reuse (SR) operation is gaining momentum in the latest IEEE 802.11 family of standards due to the overwhelming requirements posed by next-generation wireless networks. In particular, the rising traffic requirements and the number of concurrent devices compromise the efficiency of increasingly crowded Wireless Local Area Networks (WLANs) and throw into question their decentralized nature. The SR operation, initially introduced by the IEEE~802.11ax-2021 amendment and further studied in IEEE 802.11be-2024, aims to increase the number of concurrent transmissions in an Overlapping Basic Service Set (OBSS) using sensitivity adjustment and transmit power control, thus improving spectral efficiency. Our analysis of the SR operation shows outstanding potential in improving the number of concurrent transmissions in crowded deployments, which contributed to enabling low-latency next-generation applications. However, the potential gains of SR are currently limited by the rigidity of the mechanism introduced for the 11ax, and the lack of coordination among BSSs implementing it. The SR operation is evolving towards coordinated schemes where different BSSs cooperate. Nevertheless, coordination entails communication and synchronization overhead, which impact on the performance of WLANs remains unknown. Moreover, the coordinated approach is incompatible with devices using previous IEEE 802.11 versions, potentially leading to degrading the performance of legacy networks. For those reasons, in this thesis, we start assessing the viability of decentralized SR, and thoroughly examine the main impediments and shortcomings that may result from it. We aim to shed light on the future shape of WLANs concerning SR optimization and whether their decentralized nature should be kept, or it is preferable to evolve towards coordinated and centralized deployments. To address the SR problem in a decentralized manner, we focus on Artificial Intelligence (AI) and propose using a class of sequential learning-based methods, referred to as Multi-Armed Bandits (MABs). The MAB framework suits the SR problem because it addresses the uncertainty caused by the concurrent operation of multiple devices (i.e., multi-player setting) and the lack of information in decentralized deployments. MABs can potentially overcome the complexity of the spatial interactions that result from devices modifying their sensitivity and transmit power. In this regard, our results indicate significant performance gains (up to 100\% throughput improvement) in highly dense WLAN deployments. Nevertheless, the multi-agent setting raises several concerns that may compromise network devices' performance (definition of joint goals, time-horizon convergence, scalability aspects, or non-stationarity). Besides, our analysis of multi-agent SR encompasses an in-depth study of infrastructure aspects for next-generation AI-enabled networking.
Link to manuscript: http://hdl.handle.net/10803/669970
[PhD thesis] Map-less inventory and location for an RFID-based robot
Author: Víctor Casamayor Pujol
Supervisors: Rafael Pous Andrés
This thesis presents a new paradigm for RFID-based inventory robots. This map-less operation increases the operative autonomy of the robots as they no longer require a mapping step. This new paradigm is based on the stigmergy concept. Additionally, this new paradigm leads to a simplification of the robot design and allows the cooperation among multiple robots, increasing the robustness and scalability of the system while reducing its cost. The stock-counting problem is defined and an algorithm based on stigmergy is proposed as a solution, which is initially tested in simulation, an later in real scenarios. This thesis details the design process and development of two robots that can take advantage of this new paradigm and that are tested in a real environment, the library of the university. Finally the thesis also presents a new RFID groups location algorithm aligned with the main characteristics of the new paradigm: simplification and efficiency.
Link to manuscript: http://hdl.handle.net/10803/669969
[PhD thesis] Machine learning to support exploring and exploiting real-world clinical longitudinal data
Author: Mariana Nogueira
Supervisors: Bart Bijnens, Gemma Piella Fenoy, Mathieu de Craene
Following-up on patient evolution by reacquiring the same measurements over time (longitudinal data) is a crucial component in clinical care dynamics, as it creates opportunity for timely decision making in preventing adverse outcome. It is thus important that clinicians have proper longitudinal analysis tools at their service. Nonetheless, most traditional longitudinal analysis tools have limited applicability if data are (1) not highly standardized or (2) very heterogeneous (e.g. images, signal, continuous and categorical variables) and/or high-dimensional. These limitations are extremely relevant, as both scenarios are prevalent in routine clinical practice. The aim of this thesis is the development of tools that facilitate the integration and interpretation of complex and nonstandardized longitudinal clinical data. Specifically, we explore approaches based on unsupervised dimensionality reduction, which allow the integration of complex longitudinal data and their representation as low-dimensional yet clinically interpretable trajectories. We showcase the potential of the proposed approach in the contexts of two specific clinical problems with different scopes and challenges: (1) nonstandardized stress echocardiography and (2) labour monitoring and decision making. In the first application, the proposed approach proved to help in the identification of normal and abnormal patterns in cardiac response to stress and in the understanding of the underlying pathophysiological mechanisms, in a context of nonstandardized longitudinal data collection involving heterogeneous data streams. In the second application, we showed how the proposed approach could be used as the central concept of a personalized labour monitoring and decision support system, outperforming the current reference labour monitoring and decision support tool. Overall, we believe that this thesis validates unsupervised dimensionality reduction as a promising approach to the analysis of complex and nonstandardized clinical longitudinal data.
Link to manuscript: http://hdl.handle.net/10803/669968
Thesis carried out in the context of the CardioFunXion Marie Curie Industrial Network coordinated by DTIC-UPF, with the participation of Philips France, and additionally supported by the MdM program
[PhD thesis] Computational anatomy as a driver of understanding structural and functional cardiac remodeling
Author: Gabriel Bernardino
Supervisors: Bart Bijnens, Miguel Ángel González Ballester; Mathieu de Craene
We present a statistical shape analysis framework to identify cardiac shape remodelling while accounting for individual´s natural variability and apply it in two clinical applications: comparing triathletes with controls, and comparing individuals who were born small-for-their-gestational-age (SGA) and controls. We were able to identify the shape remodelling due to the practice of endurance sport: it consisted a dilation of the left ventricle and an increase of the left ventricular myocardial mass. In the right ventricle (RV), the increase of volume was concentrated in the outflow. This changes in shape correlated with a better performance during exercise. In SGA, we found subtle differences in the RV that correlated with worse performance during exercise. These differences were bigger when SGA condition was combined with cardiovascular risk factors: smoking and overweight. Finally, we present a geometry processing technique for parcellating the RV cavity in 3 subvolumes for regional analysis without point-to-point correspondence
Link to manuscript: http://hdl.handle.net/10803/668213
Thesis carried out in the context of the CardioFunXion Marie Curie Industrial Network coordinated by DTIC-UPF, with the participation of Philips France, and additionally supported by the MdM program
[PhD thesis] Towards the improvement of decision tree learning: a perspective on search and evaluation
Author: Cecilia Nunes
Supervisor: Óscar Cámara, Anders Jonsson
Data mining and machine learning (ML) are increasingly at the core of many aspects of modern life. With growing concerns about the impact of relying on predictions we cannot understand, there is widespread agreement regarding the need for reliable interpretable models. One of the areas where this is particularly important is clinical decision-making. Specifically, explainable models have the potential to facilitate the elaboration of clinical guidelines and related decision-support tools. The presented research focuses on the improvement of decision tree (DT) learning, one of the most popular interpretable models, motivated by the challenges posed by clinical data. One of the limitations of interpretable DT algorithms is that they involve decisions based on strict thresholds, which can impair performance in the presence noisy measurements. In this regard, we proposed a probabilistic method that takes into account a model of the noise in the distinct learning phases. When considering this model during training, the method showed moderate improvements in accuracy compared to the standard approach, but significant reductions in number of leaves. Standard DT algorithms follow a locally-optimal approach which, despite providing good performances at a low computational cost, does not guarantee optimal DTs. The second direction of research therefore concerned the development of a non-greedy DT learning approach that employs Monte Carlo tree search (MCTS) to heuristically explore the space of DTs. Experiments revealed that the algorithm improved the trade-off between performance and model complexity compared to locally-optimal learning. Moreover, dataset size and feature interactions played a role in the behavior of the method. Despite being used for their explainability, DTs are chiefly evaluated based on prediction performance. The need for comparing the structure of DT models arises frequently in practice, and is usually dealt with by manually assessing a small number of models. We attempted to fill this gap by proposing an similarity measure to compare the structure of DTs. An evaluation of the proposed distance on a hierarchical forest of DTs indicates that it was able to capture structure similarity. Overall, the reported algorithms take a step in the direction of improving the performance of DT algorithms, in particular in what concerns model complexity and a more useful evaluation of such models. The analyses help improve the understanding of several data properties on DT learning, and illustrate the potential role of DT learning as an asset for clinical research and decision-making.
Link to manuscript: http://hdl.handle.net/10803/667879
Thesis carried out in the context of the CardioFunXion Marie Curie Industrial Network coordinated by DTIC-UPF, with the participation of Philips France, and additionally supported by the MdM program
[PhD thesis] Characterizing online participation in civic technologies
Author: Pablo Aragón
Supervisor: Vicenç Gómez, Andreas Kaltenbrunner
This thesis constitutes one of the first investigations focused on characterizing online participation in civic technologies, a type of platform increasingly popular on the Internet that allows citizens new forms, on a larger scale, of political participation. Given the opportunities of civic technologies in democratic governance, it should be noted that their design, like that of any online platform, is not neutral. The ways in which information is presented or interaction between users is allowed can greatly alter the results of participation. For this reason, we analyze the impact of different interventions in civic technologies in relation to online conversation views, ordering criteria for ranking petitions, and deliberative interfaces. Since these interventions were carried out by the corresponding development teams, the analyses have required to develop novel computational and statistical methods, while also extending generative models of discussion threads to better characterise the dynamics of online conversations. Results of the different case studies highlight the social and political impact of these interventions, suggesting new directions for future research and the need to develop a paradigm of citizen experimentation for democracy.
Link to manuscript: http://hdl.handle.net/10803/668042
Slides: https://elaragon.net/2019/11/27/characterizing-online-participatio-in-civic-technologies/
Pablo Aragón's blog: https://elaragon.net/
[PhD thesis] Deep Neural Networks for Music and Audio Tagging
Author: Jordi Pons
Supervisor: Xavier Serra
Automatic music and audio tagging can help increase the retrieval and re-use possibilities of many audio databases that remain poorly labeled. In this dissertation, we tackle the task of music and audio tagging from the deep learning perspective and, within that context, we address the following research questions:
- Which deep learning architectures are most appropriate for(music) audio signals?
- In which scenarios is waveform-based end-to-end learning feasible?
- How much data is required for carrying out competitive deep learning research?
In pursuit of answering research question(i), we propose to use musically motivated convolutional neural networks as an alternative to designing deep learning models that is based on domain knowledge, and we evaluate several deep learning architectures for audio at a low computational cost with a novel methodology based on non-trained(randomly weighted) convolutional neural networks. Throughout our work, we find that employing music and audio domain knowledge during the model’s design can help improve the efficiency, interpretability, and performance of spectrogram-based deep learning models.
For research questions (ii)and (iii), we perform a study with the Sample CNN, a recently proposed end-to-end learning model, to assess its viability for music audio tagging when variable amounts of training data —ranging from 25k to 1.2M songs— are available. We compare the Sample CNN against a spectrogram-based architecture that is musically motivated and conclude that, given enough data, end-to-end learning models can achieve better results. Finally, throughout our quest for answering research question(iii), we also investigate whether a naive regularization of the solution space, prototypical networks, transfer learning, or their combination, can foster deep learning models to better leverage a small number of training examples. Results indicate that transfer learning and proto-typical networks are powerful strategies in such low-data regimes.
Usable outcome on Github: musicnn
Music tagging demo on Medium
Link to Jordi Pon's blog
[MSc thesis] A study on the development of maker activities with primary education teachers and students: from self-concept change to gender factors
Author: Judit Martínez Moreno
Supervisors: Davinia Hernández Leo, Patricia Santos Rodríguez
MSc program: Master in Cognitive Systems and Interactive Media
Schools have to prepare young people for the future workplace, and there are two factors that have to be highly considered to do so: the high demand for a qualified workforce in technology and research, and the importance of knowledge and skills as the engine of our economy. Educators have to develop the 21st century digital skills and maker activities seem to be a good way to do so. In this line, there are being created many projects to help the scholar community to follow the maker methodology, such as Makers a les Aules, a project developed in 10 public schools from Barcelona.
The aim of this thesis was to conduct an exploratory research about the development of maker activities with primary education teachers and students. In relation to teachers, it wanted to be analyzed which reasons drove them to develop maker activities, how does their self-concept change after following a maker methodology and what do they still need to be improved to make it easier to include these activities in the classroom. On the other hand, in relation to students, it wanted to be analyzed their prior knowledge regarding maker activities, how does their self-concept change after participating in the project and if there were any differences regarding to gender. There were posed some general assumptions based on previous literature regarding these issues. There were used different research instruments to collect mainly qualitative data such as pre- questionnaires and post-questionnaires, a case study observations and interviews. The data was analyzed through qualitative analysis and statistical methods.
Some of the assumptions were accomplished. Few teachers and nearly all students had prior experience using maker tools. There were gender differences since boys had more prior experience than girls using maker tools in specific contexts, but all of them report the same level of enjoyment. Teachers participating in the project were willing to learn how to introduce this methodology in their classroom to innovate in their lessons. They increased their perceived knowledge and ability to design and develop maker activities in the classroom. Students increased their interest and self-perceived efficacy in technology, and their level of autonomy doing maker activities. Some limitations that teachers could face for developing maker activities are the lack of knowledge, access to the material and time. Some actions should be carried out to overcome these limitations.
Thesis available OA in Zenodo: https://doi.org/10.5281/zenodo.3484507
[BSc thesis] Mining Zenodo: Data extraction and indexing of a research repository
Author: Sergi Pastor Rochina
Supervisor: Horacio Saggion
The output of scientific publications is increasing steadily each year. Due to this scientific literature overload, an exhaustive research of a certain topic becomes overwhelming and researchers cannot get a solid grasp of all this valuable knowledge.
Natural language processing and text mining have become essential to tackle this issue and provide a solution: a comprehensive, careful and accessible perspective on the knowledge contained in scientific publications. This work aims to take advantage of this opportunity and develop an application to extract and index part of the information contained in Zenodo, an open-access repository developed under the European OpenAIRE program. The work will be based on the employment of the Dr Inventor tool (a text mining framework that enables the automated analysis of scientific publications) to support the extraction of specific information types from Zenodo’s collection of research papers in order to allow semantic indexing, discourse classification and discovery. The application is meant to be a useful and helpful tool to ease the learning experience of researchers, students and more.
Keywords: Natural Language Processing; data mining; open-access repository; web application
[PhD thesis] Incorporating Prosody into Neural Speech Processing Pipelines. Applications on automatic speech transcription and spoken language machine translation
Author: Alp Öktem
Supervisor: Mireia Farrús and Antonio Bonafonte
In this dissertation, I study the inclusion of prosody into two applications that involve speech understanding: automatic speech transcription and spoken language translation. In the former case, I propose a method that uses an attention mechanism over parallel sequences of prosodic and morphosyntactic features. Results indicate an F1 score of 70.3% in terms of overall punctuation generation accuracy. In the latter problem I deal with enhancing spoken language translation with prosody. A neural machine translation system trained with movie-domain data is adapted with pause features using a prosodically annotated bilingual dataset. Results show that prosodic punctuation generation as a preliminary step to translation increases translation accuracy by 1% in terms of BLEU scores. Encoding pauses as an extra encoding feature gives an additional 1% increase to this number. The system is further extended to jointly predict pause features in order to be used as an input to a text-to-speech system.
Keywords: prosody, automatic speech transcription, punctuation restoration, spoken language machine translation, bilingual spoken corpus
Link at TDX: http://hdl.handle.net/10803/666222
Author's GitHub account and personal page: http://alpoktem.github.io/
Video of the defence
[PhD thesis] Towards virtualized network functions as a service
Author: Windhya Rankothge
Supervisor: Jorge Lobo
Abstract
Network Function Virtualization (NFV) is a promising technology that proposes to move packet processing from dedicated hardware middle-boxes to software running on commodity servers. As such, NFV brings the possibility of outsourcing enterprise Network Function (NFs) processing to the cloud. However, for a Cloud Service Provider (CSP) to offer such services, several research problems still need to be addressed. When an enterprise outsources its NFs to a CSP, the CSP is responsible for deciding: (1) where initial Virtual NFs (VNFs) should be instantiated, and (2) what, when and where additional VNFs should be instantiated to satisfy the traffic changes (scaling), (3) how to update the network configurations with minimum impact on network performances, etc. This brings the requirement of a cloud management framework for VNFs and the cloud infrastructure related operations: provisioning, configuring, maintaining and scaling of the VNFs, as well as configuring and updating of the cloud network. In this thesis we explore three aspects of a cloud management framework for VNF: (1) dynamic resource allocation, (2) VNFs scaling methods and (3) dynamic load balancing. In the context of dynamic resource allocation for VNFs, we explore two resource allocation algorithms for: (1) the initial placement of VNFs, and (2) the scaling of VNFs to support traffic changes. We propose two approximation approaches (heuristic based): (1) Iterated Local Search (ILS) and (2) Genetic Programming (GP) to implement the resource allocation algorithms. We compare these heuristic based approaches with a traditional resource allocation approach: Integer Linear Programming (ILP). In the context of VNFs scaling methods, we explored three different scaling approaches: (1) vertical scaling, (2) migration and (3) horizontal. We analyse the three scaling methods in-terms of their practical implementation aspects as well as the optimization aspects with respect to the management. In the context of dynamic load balancing, we explore load balancing approaches that maintain affinity and handle states and sessions of the traffic, so that the requirement of state migration is avoided. We propose a session-aware load balancing algorithm based on consistent hashing.
Additional material:
Datasets
- Data Modelling for the Evaluation of Virtualized Network Functions Resource Allocation Algorithms. Updates in Github and version for phd thesis at UPF repository
Software
Modules available at Windhya Rankothge's Github account
[PhD thesis] Source Separation Methods for Orchestral Music: Timbre-Informed and Score-Informed Strategies
Author: Marius Miron
Supervisors: Emilia Gómez, Jordi Janer
Abstract
Humans are able to distinguish between various sound sources in their environment and selectively attend to specific ones. However, it is a difficult task to teach a computer to automatically separate the acoustic scene into sources and solely focus on specific elements. This signal processing task is commonly known as audio source separation and involves recovering the sources which are mixed together in a combined signal.
This thesis is concerned with source separation of Western classical music mixtures, namely orchestral music. Being able to separate the audio corresponding to the instruments allows for interesting applications such as focusing on a particular section in the orchestra or re-creating the experience of a concert in virtual reality. Additionally, the separated instrument tracks can be further analyzed by other music information research algorithms which perform better on these signals than on the audio signal of the mixture.
Music source separation improves if we know which instruments are present in the piece, and if we have the score e.g. the notes played by each instrument. In fact, the more information we have about a music piece, %the more we can restrict our model, and
the better the resulting separation. For orchestral music the instruments are known, and we train timbre models for each instrument, a case commonly known as timbre-informed source separation. In addition, since scores are commonly available for orchestral pieces, we leverage this information to further improve the separation. This scenario is known in literature as score-informed source separation.
Towards an objective evaluation, in the second part of the thesis we propose an orchestral music dataset accompanied by score annotations and an evaluation methodology which assesses the influence of difference parts of the separation framework.
In the third part of the thesis, our contributions are towards fixing context-specific problems encountered in score-informed source separation, like the errors in the alignment between a score and the associated renditions. Furthermore, while we work towards improve existing separation frameworks, in the fourth part of the thesis we propose a low latency framework relying on deep learning. With respect to that, we aim at overcoming data scarcity in the case of supervised source separation approaches by taking advantage of the traits of this music tradition to generate better data to train neural networks. In addition, in the fifth part, we introduce a cloud-based source separation software architecture and the associated applications.
Most of this work follows the research reproducibility principles, inasmuch the datasets, code, software prototypes, published papers, and project reports are made available along with the necessary instructions.
Additional material:
Details at Marius Marion's web
Winner of the María de Maeztu Open Science Award, PhD Workshop 2017
- Datasets
We propose the PHENICX-Anechoic dataset which relies on the Aalto orchestral anechoic recordings. We denoised the original recordings and annotated each of the tracks corresponding to the instrument groups.
The Bach10 recordings synthesized with Sibelius can be found on the zenodo repository.
- Code
The note refinement code in part three of the thesis is on github. We do not have the rights to distribute the NMF framework and to some extent note refinement is integrated with this framework. However, multi-channel score-informed source separation can be computed using the Repovizz website. You need to create a datapack with datapack designer, upload it and select the checkbox which computes the source separation. The uploaded datapack will comprise the separated tracks.
Code and detailed instructions on how to reproduce experiments in part four (source separation using deep learning) can be found on the associated github repository. The separation results and the computed metrics with BSS Eval can be found on the zenodo page. Similarly, for the score-informed version, check the zenodo page.
- Videos
Below you can watch a demo from the official app which uses the separated tracks.
ç
With the orchestra focus demo where you can listen specific instruments from an orchestra.
Full text available at the TDX repository
[MSc thesis] Audio Data Augmentation with respect to Musical Instrument Recognition
Author: Siddharth Bhardwaj
Supervisors: Olga Slizovskaia, Emilia Gómez and Gloria Haro
MSc program: Master in Sound and Music Computing
Identifying musical instruments in a polyphonic music recording is a difficult yet crucial problem in music information retrieval. It helps in auto-tagging of a musical piece by instrument, consequently enabling searching music databases by instrument. Other useful applications of instrument recognition are source separation, genre recognition, music transcription, and instrument specific equalizations. We review the state of the art methods for the task, including the recent Convolutional Neural Networks based approaches. These deep learning models require large quantities of annotated data, a problem which can be partly solved by synthetic data augmentation. We study different types of audio data transformations that can help in various audio related tasks, publishing an augmentation library in the process. We investigate the effect of using augmented data during the training process of three state of the art CNN based models. We achieved a performance improvement of 2% over the best performing model with almost half the number of trainable model parameters. We attained 6% performance improvement for the single-layer CNN architecture, and 4% for the multi-layer architecture . Also, we study the influence of each type of audio augmentation on each instrument class individually.
Additional material:
[PhD thesis] Knowledge Extraction and Representation Learning for Music Recommendation and Classification
Author: Sergio Oramas
Supervisor: Xavier Serra
In this thesis, we address the problems of classifying and recommending music present in large collections. We focus on the semantic enrichment of descriptions associated to musical items (e.g., artists biographies, album reviews, metadata), and the exploitation of multimodal data (e.g., text, audio, images). To this end, we first focus on the problem of linking music-related texts with online knowledge repositories and on the automated construction of music knowledge bases. Then, we show how modeling semantic information may impact musicological studies and helps to outperform purely text-based approaches in music similarity, classification, and recommendation. Next, we focus on learning new data representations from multimodal content using deep learning architectures, addressing the problems of cold-start music recommendation and multi-label music genre classification, combining audio, text, and images. We show how the semantic enrichment of texts and the combination of learned data representations improve the performance on both tasks.
Additional material:
- Thesis and compilation of results in Sergio Oramas' web
- Final version in Zenodo and at TDX
- Datasets
- ELMD Dataset of ∼13k documents and almost 150k annotated musical entities, which are linked to DBpedia and MusicBrainz. From this corpus, a gold standard dataset of 200 documents with manually annotated entities is also created. http://mtg.upf.edu/download/datasets/elmd
- MARD Large dataset of about 64k albums with customer reviews, acoustic features per track, metadata, and single-label genre annotations. http://mtg.upf.edu/download/datasets/mard
- SAS Two datasets of 188 and 2,336 artist biographies respectively, together with artist similarity ground truth data. http://mtg.upf.edu/download/datasets/semantic-similarity
- KG-Rec Two datasets of tags and text descriptions about musical items, together with user feedback information on those items. A dataset of sounds with ∼21k items and 20k users, and a dataset of songs with ∼8.5k items and ∼5k users. http://mtg.upf.edu/download/datasets/knowledge-graph-rec
- MSD-A Dataset of ∼24k artist biographies linked to the artists present in the Million Song Dataset. http://mtg.upf.edu/download/datasets/msd-a
- MuMu Large dataset of about ∼31k albums, with ∼450k customer reviews, ∼147k audio tracks, cover artworks, and multi-label genre annotations. https://www.upf.edu/web/mtg/mumu
- Knowledge bases
- KBSF Knowledge base of popular music extracted from a corpus of ∼32k documents with stories about songs. http://mtg.upf.edu/download/datasets/kbsf
- FlaBase Knowledge base of flamenco music, created by combining data from 7 different data sources, and enriched with information extracted from ∼1k artist biographies. http://mtg.upf.edu/download/datasets/flabase
- Software
- ELVIS System that integrates different entity linking tools, enriching their output and providing high confident entity disambiguations. https://github.com/sergiooramas/elvis
- TARTARUS System to perform and evaluate deep learning experiments on classification and recommendation from different data modalities and their combination. https://github.com/sergiooramas/tartarus
- MEL API and demo website for a Music Entity Linking system that disambiguate musical entities to MusicBrainz. http://mel.mtg.upf.edu
[MSc thesis] Term extraction and document similarity in an Integrated Learning Design Environment
Author: Alberto Martínez Rodríguez
Supervisor: Davinia Hernández Leo, Horacio Saggion
MSc program: Master in Intelligent Interactive Systems
The Integrated Learning Design Environment is a social platform focused in supporting teachers in the computer-assisted design of Learning activities. In this platform, teachers and course designers can contextualize, author and share their designs within their community. This social component, of the ILDE, would benefit from the application of Information Retrieval and Natural Language Processing techniques to facilitate teachers and course designers to find shared designs as fast and efficient as possible. In this work, we use Natural Language Processing to classify learning designs written in Catalan, get the content of the users, parse this content with Freeling and extract education domainspecific terminology from the documents. To extract the terminology, a combination of two methods is used. The first method uses the Multilingual Central Repository ontology to check if a term belongs to any of four pedagogical fields. The second methodology, computes the tf-idf of all the documents terms using a non-domain-specific corpus, the Catalan Wikipedia. This work also discusses the potential of the proposed combination of methods to retrieve simple and complex terms from documents. The resulting combined method distributes the weight of each method in the extraction process to assign a score to each retrieved term. After this process of extracting education domain-specific terminology from different ILDE documents, it has been created a Document Similarity Application addressed to teachers and course designers. This application allows users to search documents based on the similarity between these documents and another document of the same ILDE community. Besides, given a document, users can visualize the education terminology that belongs to that document. Finally, users can also search for certain documents using a terminology-based query to obtain a set of documents and their similarity with respect to that query.
Additional material:
[PhD thesis] The Information Structure - Prosody Interface: On the role of hierarchical thematicity in an empirically-grounded model
Author: Mónica Domínguez Bajo
Supervisors: Mireia Farrús, Leo Wanner
Abstract and link in UPF e-repository to be added
Additional material:
- Final text and compilation including datasets, code and associated material at GitHub
- Open access version of the thesis at TDX repository
[PhD thesis] Effective planning with expressive languages
Author: Guillem Francès
Supervisor: Héctor Geffner
Classical planning is concerned with finding sequences of actions that achieve a certain goal from an initial state of the world, assuming that actions are deterministic, states are fully known, and both are described in some modeling language. This work develops effective means of dealing with expressive modeling languages for classical planning. First, we show that expressive languages not only allow simpler problem representations, but also capture additional problem structure that can be leveraged by heuristic solution methods. We develop heuristics that support functions and existential quantification in the problem definition, and show empirically that they can be more informed and cost-effective. Second, we develop a novel width-based algorithm that matches state-of-the-art performance without looking at the declarative representation of actions. This is a significant departure from previous research, and advances the use of expressive modeling languages in planning and the scope and effectiveness of classical planners
Additional material:
- Open access version available at TDX repository
- The planner is open-sourced under a GNU General Public License, version 3 (GPL- 3.0), and available for download at https://github.com/aig-upf/fs
- General web page of the author, with additional material related to this thesis http://gfrances.github.io/
[MSc thesis] Computational comparative analysis of the Twitter networks of the 2015 and 2016 Spanish national elections
Author: Helena Gallego Gamo
Supervisors: Pablo Aragón, Vicenç Gómez, Andreas Kaltenbrunner
MSc program: Master in Intelligent Interactive Systems
In the last years, Spanish politics have transitioned from bipartidism to multipartidism. This change led to an unstable situation which finally evolved to the rare scenario of two general elections in the period of six months. The two elections had a main difference: the two biggest left-wing parties formed a coalition in the second election while they had run separately in the first one. In the second election and after merging, the coalition lost around one million votes contradicting opinion polls. In this study, community analysis in the retweet networks of the two online campaigns is performed in order to assess whether activity in Twitter reflects the outcome or parts of the outcomes of both elections. The results show that the leftwing parties lost more online supporters than the other parties. Furthermore, an inspection of the Twitter activity of the supporters unveils a decrease in engagement especially marked for the smaller party in the coalition, in line with post-electoral traditional polls. The clusters obtained with the community detection method are also used to situate in the ideological spectrum a set of Spanish media sources and to understand their audiences and behavioral differences when replying or retweeting them.
Additional material:
- Open Access version of the thesis available at UPF e-repository
- Publication asociated to the master thesis (link) and poster (link)
[MSc thesis] Resolution of concurrent planning problems using classical planning
Author: Daniel Furelos Blanco
Supervisor: Anders Jonsson
MSc program: Master in Intelligent Interactive Systems
In this work, we present new approaches for solving multiagent planning and temporal planning problems. These planning forms are two types of concurrent planning, where actions occur in parallel. The methods we propose rely on a compilation to classical planning problems that can be solved using an off-the-shelf classical planner. Then, the solutions can be converted back into multiagent or temporal solutions. Our compilation for multiagent planning is able to generate concurrent actions that satisfy a set of concurrency constraints. Furthermore, it avoids the exponential blowup associated with concurrent actions, a problem that many multiagent planners are facing nowadays. Incorporating similar ideas in temporal planning enables us to generate temporal plans with simultaneous events, which most state-of-the-art temporal planners cannot do. In experiments, we compare our approaches to other approaches. We show that the methods using transformations to classical planning are able to get better results than state-of-the-art approaches for complex problems. In contrast, we also highlight some of the drawbacks that this kind of methods have for both multiagent and temporal planning. We also illustrate how these methods can be applied to real world domains like the smart mobility domain. In this domain, a group of vehicles and passengers must self-adapt in order to reach their target positions. The adaptation process consists in running a concurrent planning algorithm. The behavior of the approach is then evaluated.
Additional material:
- Open Access version of the thesis available at UPF e-repository
- Software available for download
- The code for the Smart Carpooling Demo available at https://github. com/aig-upf/smart-carpooling-demo
- The code for the universisal PDDL parser is available at https://github.com/ aig-upf/universal-pddl-parser-multiagent.
- The code for algorithms for solving temporal planning problems is availaible at https://github.com/aig-upf/temporal-planning
[MSc thesis] Cross-Entropy method for Kullback-Leibler control in multi-agent systems
Author: Beatriz Cabrero Daniel
Supervisor: Mario Ceresa, Vicenç Gómez
MSc program: Master in Intelligent Interactive Systems
We consider the problem of computing optimal control policies in large-scale multiagent systems, for which the standard approach via the Bellman equation is intractable. Our formulation is based on the Kullback-Leibler control framework, also known as Linearly-Solvable Markov Decision Problems. In this setting, adaptive importance sampling methods have been derived that, when combined with function approximation, can be effective for high-dimensional systems. Our approach iteratively learns an importance sampler from which the optimal control can be extracted and requires to simulate and reweight agents’ trajectories in the world multiple times. We illustrate our approach through a modified version of the popular stag-hunt game; in this scenario, there is a multiplicity of optimal policies depending on the “temperature” parameter of the environment. The system is built inside Pandora, a multi-agent-based modeling framework and toolbox for parallelization, freeing us from dealing with memory management when running multiple simulations. By using function approximation and assuming some particular factorization of the system dynamics, we are able to scale-up our method to problems with M = 12 agents moving in two-dimensional grids of size N = 21×21, improving on existing methods that perform approximate inference on a temporal probabilistic graphical model.
Additional material:
- Open Access version of the thesis available at UPF e-repository
- Software and data available for download (link)
[MSc thesis] A generative model of user activity in the Integrated Learning Design Environment
Author: Joan Bas Serrano
Supervisor: Vicenç Gómez, Davinia Hernández-Leo
MSc program: Master in Intelligent Interactive Systems
The objective of this project has been to build a generative model of the user activity in the Integrative Learning Design Environment (ILDE) able to describe the data of different communities and which can be used for both to gain understanding about the data and to test hypothetical situations. The model that we present is called Hierarchical Multivariate Hawkes Model and works with a two layer procedure that first draws the beginning of working sessions and then fills these sessions with events of different kinds using a Multivariate Hawkes Model. In the project we first make an statistical temporal analysis of the data, to understand it and see the important features to be modeled, then we introduce the model, validate it, and show some of its applications. Through these steps it has been shown that the model is able to reproduce satisfactorily the sequences of events produced by the ILDE users and it can be easily used to tackle real problems that would be difficult to face with typical statistical tools.
Additional material:
- Open Access version of the thesis available at UPF e-repository
- Software and data available for download (link)
[PhD thesis] Image based analysis and modeling of the detailed cardiac ventricular anatomy
Author: Bruno Paun
Supervisor: Bart Bijnens, Constantine Butakoff
The role of trabeculations and their normal morphological expression in the human heart is still unclear. Clinical studies have shown that excessive trabeculation can cause heart failure due to diastolic and systolic dysfunction, thromboembolism and arrhythmias. Quantifying and modeling those structures could provide us insights into their function, their influence on cardiac performance and also their connection with cardiomyopathies. The contributions of this thesis can be summarized as follows: 1) a simplified model of the trabeculated left ventricle (LV) to study the impact of trabeculations on stroke volume, strain and pump capacity of the LVs of different geometries, 2) a simple as well as a more elaborate method for geometry independent parametrization of the detailed cardiac left and right ventricular anatomy, 3) a framework for visualization and statistical analysis of the trabeculations, and 4) a longitudinal analysis of the cardiac trabeculations in a mouse embryo at different gestational stages.
Additional material:
[PhD thesis] Technology support for scalable and dynamic collaborative learning: a pyramid flow pattern approach
Author: Kalpani Manathunga
Supervisor: Davinia Hernández-Leo
Collaborative Learning is the pedagogical approach that considers social interactions as key means to trigger rich learning processes. Collaborative Learning Flow Patterns define best practices to orchestrate collaborative learning activity flow mechanisms (i.e., group formation, roles or resources allocation, phase change). Flow patterns have been experimented and evaluated as effective in small scale settings for decades. Directly applying these pedagogical methods to large learning scenarios is challenging due to the burden that scale represents in the orchestration load or the difficulty of keeping a dynamic meaningful progression when flexible changes are required in a large classroom or in a MOOC. Some attempts have shown positive results, but research around scalable collaborative learning approaches, models and technologies for large classes is scattered. This dissertation conducts a systematic literature review of collaborative learning applications on large classes and analyses the social learning potential of diverse technology-supported spaces in massive courses. Then the dissertation focuses the study on how collaborative learning could address key challenges (i.e., scalability and dynamism) identified in large collaborative learning contexts. Consequently, the thesis proposes a Pyramid flow pattern instantiation, composed of a model with a set of algorithmic rules for flow creation, flow control and flow awareness as well as a PyramidApp authoring and enactment system implementing the model. Experimentation across diverse learning contexts shows that, on one hand, the contributions support meaningful scalable and dynamic collaborative learning and on the other hand, learners and educators perceive the experiences as engaging, with learning values and effective from the perspective of orchestration.
Additional material:
- Open access version available at TDX repository
- Data: Manathunga, K., & Hernández-Leo, D. (2016b). PyramidApp configurations and participants behaviour dataset [Data set]. Zenodo. http://doi.org/10.5281/zenodo.375555
- Software in the Github account of the group
- PyramidApp web, including videos and user manual
[PhD thesis] Knowledge acquisition in the information age: the interplay between lexicography and natural language processing
Author: Luis Espinosa-Anke
Supervisor: Horacio Saggion
Natural Language Processing (NLP) is the branch of Artificial Intelligence aimed at understanding and generating language as close as possible to a human’s. Today, NLP benefits substantially of large amounts of unnanotated corpora with which it derives state-of-the-art resources for text understanding such as vectorial representations or knowledge graphs. In addition, NLP also leverages structured and semi-structured information in the form of ontologies, knowledge bases (KBs), encyclopedias or dictionaries. In this dissertation, we present several improvements in NLP tasks such as Definition and Hypernym Extraction, Hypernym Discovery, Taxonomy Learning or KB construction and completion, and in all of them we take advantage of knowledge repositories of various kinds, showing that these are essential enablers in text understanding. Conversely, we use NLP techniques to create, improve or extend existing repositories, and release them along with the associated code for the use of the community.
Additional material:
- Open access version available at TDR repository
- Datasets and software: https://bitbucket.org/luisespinosa/
[PhD thesis] From Heuristics-Based to Data-Driven Audio Melody Extraction
Author: Juan José Bosch
Supervisor: Emilia Gómez
The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied and realistic set of data, and considering multiple definitions of melody. We first present an extensive overview of the state of the art, and perform an evaluation on a novel symphonic music melody extraction dataset. Results show that most approaches are not able to generalise well to the characteristics of such data, which presents a high pitch range. A pitch salience function based on source-filter modelling is found to be specially useful in such context. We then propose its integration with melody tracking methods based on pitch contour characterisation, and evaluate them on a wide range of music genres. Firstly, this salience function is adapted for pitch contour creation by combining it with another one based on harmonic summation. This combination increases the salience of melody pitches and improves melody extraction accuracy over previous approaches, with two different contour-based melody tracking methods: pitch contour selection based on heuristic rules, and supervised pitch contour classification. Secondly, the latter approach is further improved by using novel timbre, tonal and spatial features, which are helpful to discriminate melodic from non-melodic pitch contours. Finally, we also propose a method for the estimation of multiple melodic lines based on pitch contour classification, which exploits continuity within melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications.
Additional material:
- Open access version available at TDR repository and Zenodo
- Datasets: The symphonic music dataset proposed in this thesis is available at:
http://mtg.upf.edu/download/datasets/orchset
- Code: Source code of the melody extraction algorithms proposed in this thesis is available at: