Publications
List of results published directly linked with the projects co-funded by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Program (MDM-2015-0502).
List of publications acknowledging the funding in Scopus.
The record for each publication will include access to postprints (following the Open Access policy of the program), as well as datasets and software used. Ongoing work with UPF Library and Informatics will improve the interface and automation of the retrieval of this information soon.
The MdM Strategic Research Program has its own community in Zenodo for material available in this repository as well as at the UPF e-repository
Slizovskaia O, Haro G, Gómez E. Conditioned Source Separation for Music Instrument Performances
Separating different music instruments playing the same piece is a challenging task since the different audio sources are synchronized and playing in harmony. Moreover, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated.
This paper proposes a source separation method for multiple musical instruments sounding simultaneously and explores how much additional information apart from the audio stream can lift the quality of source separation. We explore conditioning techniques at different levels of a primary source separation network and utilize two extra modalities of data, namely presence or absence of instruments in the mixture, and the corresponding video stream data.
Nikbakht R, Lozano A. Uplink Fractional Power Control for Cell-Free Wireless Networks. 2019 IEEE International Conference on Communications (ICC)
This paper proposes a power control policy for the uplink of cell-free wireless networks. Such policy, which generalizes the fractional power control used extensively in cellular networks, relies only on large-scale quantities, is fully distributed, and features a single control parameter. By adjusting this parameter, the SIR distribution experienced by the users can be compressed or expanded, effecting a tradeoff between average performance and fairness.
Chiruzzo L, AbuRa’ed A, Bravo A, Saggion H. LaSTUS-TALN+INCO @ CL-SciSumm 2019. 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2019)
In this paper we present several systems developed to participate in the 4th Computational Linguistics Scientific Document Summarization Shared challenge which addresses the problem of summarizing a scientific paper using information from its citation network (i.e., the papers that cite the given paper). Given a cluster of scientific documents where one is a reference paper (RP) and the remaining documents are papers citing the reference, two tasks are proposed: (i) to identify which sentences in the reference paper are being cited and why they are cited, and (ii) to produce a citation-based summary of the reference paper using the information in the cluster. Our systems are based on both supervised (LSTM and convolutional neural networks) and unsupervised techniques using word embedding representations and features computed from the linguistic and semantic analysis of the documents
Torrents-Barrena J, López-Velazco R, Piella G, Masoller N, Valenzuela-Alcaraz B, Gratacós E, Eixarch E, Ceresa M, González Ballester MA. TTTS-GPS: Patient-specific preoperative planning and simulation platform for twin-to-twin transfusion syndrome fetal surgery. Computer Methods and Programs in Biomedicine. https://doi.org/10.1016/j.cmpb.2019.104993
Twin-to-twin transfusion syndrome (TTTS) is a serious condition that may occur in pregnancies when two or more fetuses share the same placenta. It is characterized by abnormal vascular connections in the placenta that cause blood to flow unevenly between the babies. If left untreated, perinatal mortality occurs in 90% of cases, whilst neurological injuries are still present in TTTS survivors. Minimally invasive fetoscopic laser surgery is the standard and optimal treatment for this condition, but is technically challenging and can lead to complications. Acquiring and maintaining the required surgical skills need consistent practice, and a steep learning curve. An accurate preoperative planning is thus vital for complex TTTS cases. To this end, we propose the first TTTS fetal surgery planning and simulation platform. The soft tissue of the mother, the uterus, the umbilical cords, the placenta and its vascular tree are segmented and registered automatically from magnetic resonance imaging and 3D ultrasound using computer vision and deep learning techniques. The proposed state-of-the-art technology is integrated into a flexible C++ and MITK-based application to provide a full exploration of the intrauterine environment by simulating the fetoscope camera as well as the laser ablation, determining the correct entry point, training doctors’ movements and trajectory ahead of operation, which allows improving upon current practice. A comprehensive usability study is reported. Experienced surgeons rated highly our TTTS planner and simulator, thus being a potential tool to be implemented in real and complex TTTS surgeries.
Accuosto P, Saggion H. Transferring knowledge from discourse to arguments: A case study with scientific abstracts. 6th ACL Workshop on Argument Mining
In this work we propose to leverage resources available with discourse-level annotations to facilitate the identification of argumentative components and relations in scientific texts, which has been recognized as a particularly challenging task. In particular, we implement and evaluate a transfer learning approach in which contextualized representations learned from discourse parsing tasks are used as input of argument mining models. As a pilot application, we explore the feasibility of using automatically identified argumentative components and relations to predict the acceptance of papers in computer science venues. In or-der to conduct our experiments, we propose an annotation scheme for argumentative units and relations and use it to enrich an existing corpus with an argumentation layer
Additional material:
Corpus with annotations http://scientmin.taln.upf. edu/argmin/scidtb_argmin_annotations.tgz.
Best paper award at 6th ACL Workshop on Argument Mining
Aspandi D, Martinez O, Sukno F, Binefa X. Fully End-to-End Composite Recurrent Convolution Network for Deformable Facial Tracking In The Wild. 14th IEEE International Conference on Automatic Face & Gesture Recognition
Human facial tracking is an important task in computer vision, which has recently lost pace compared to other facial analysis tasks. The majority of current available tracker possess two major limitations: their little use of temporal information and the widespread use of handcrafted features, without taking full advantage of the large annotated datasets that have recently become available. In this paper we present a fully end-to-end facial tracking model based on current state of the art deep model architectures that can be effectively trained from the available annotated facial landmark datasets. We build our model from the recently introduced general object tracker Re, which allows modeling the short and long temporal dependency between frames by means of its internal Long Short Term Memory (LSTM) layers. Facial tracking experiments on the challenging 300-VW dataset show that our model can produce state of the art accuracy and far lower failure rates than competing approaches. We specifically compare the performance of our approach modified to work in tracking-by-detection mode and showed that, as such, it can produce results that are comparable to state of the art trackers. However, upon activation of our tracking mechanism, the results improve significantly, confirming the advantage of taking into account temporal dependencies.
doi: 10.1109/FG.2019.8756630
Aspandi D, Martinez O, Binefa X. Heatmap-Guided Balanced Deep Convolution Networks for Family Classification in the Wild. 14th IEEE International Conference on Automatic Face & Gesture Recognition
Automatic kinship recognition using Computer Vision, which aims to infer the blood relationship between individuals by only comparing their facial features, has started to gain attention recently. The introduction of large kinship datasets, such as Family In The Wild (FIW), has allowed large scale dataset modeling using state of the art deep learning models. Among other kinship recognition tasks, family classification task is lacking any significant progress due to its increasing difficulty in relation to the family member size. Furthermore, most current state of-the-art approaches do not perform any data pre-processing (which try to improve models accuracy) and are trained without a regularizer (which results in models susceptible to overfitting). In this paper, we present the Deep Family Classifier (DFC), a deep learning model for family classification in the wild. We build our model by combining two sub-networks: internal Image Feature Enhancer which operates by removing the image noise and provides an additional facial heatmap layer and Family Class Estimator trained with strong regularizers and a compound loss. We observe progressive improvement in accuracy during the validation phase, with a state of the art results of 16.89% is obtained for the track 2 in the RFIW2019 challenge and 17.08% of familly classification task on FIW dataset.
doi: 10.1109/FG.2019.8756557
Barrachina-Muñoz S, Wilhelmi F, Bellalta B. Online Primary Channel Selection for Dynamic Channel Bonding in High-Density WLANs. arXiv preprint
In order to dynamically adapt the transmission bandwidth in wireless local area networks (WLANs), dynamic channel bonding (DCB) was introduced in IEEE 802.11n. It has been extended since then, and it is expected to be a key element in IEEE 802.11ax and future amendments such as IEEE 802.11be.While DCB is proven to be a compelling mechanism by itself, its performance is deeply tied to the primary channel selection, especially in high-density (HD) deployments, where multiple nodes contend for the spectrum. Traditionally, this primary channel selection relied on picking the most free one without any further consideration. In this paper, in contrast, we propose dynamic-wise (DyWi), a light-weight, decentralized, online primary channel selection algorithm for DCB that maximizes the expected WLAN throughput by considering not only the occupancy of the target primary channel but also the activity of the secondary channels. Even when assuming important delay costs due to primary switching, simulation results show a significant improvement both in terms of average delay and throughput.
Garcia-Canadilla P, de Vries T, Gonzalez-Tendero A, Bonnin A, Gratacos E, Crispi F, Bijnens B, Zhang C. Structural coronary artery remodelling in the rabbit fetus as a result of intrauterine growth restriction. PLoS ONE
Intrauterine growth restriction (IUGR) is a fetal condition that affects up to 10% of all pregnancies and is associated with cardiovascular structural and functional remodelling that persists postnatally. Some studies have reported an increase in myocardial coronary blood flow in severe IUGR fetuses which has been directly associated to the dilatation of the coronary arteries. However, a direct measurement of the coronaries’ lumen diameter in IUGR has not been reported before. The aim of this paper is to perform, for the first time, a quantitative analysis of the effects of IUGR in cardiac geometry and coronary vessel size in a well-known rabbit model of IUGR using synchrotron-based X-ray Phase Contrast Tomography Imaging (X-PCI). Eight rabbit fetal hearts were imaged non-destructively with X-PCI. 3D reconstructions of the coronary arterial tree were obtained after semi-automatic image segmentation. Different morphometric features including vessel lumen diameter of the three main coronaries were automatically quantified. IUGR fetuses had more globular hearts and dilated coronary arteries as compared to controls. We have quantitatively shown that IUGR leads to structural coronary vascular tree remodelling and enlargement as an adaptation mechanism in response to an adverse environment of restricted oxygen and nutrients and increased perfusion pressure.
Open access article: https://doi.org/10.1371/journal.pone.0218192
Wilhelmi F, Barrachina-Muñoz S, Bellalta B. On the Performance of the Spatial Reuse Operation in IEEE 802.11ax WLANs. arXiv pre-print
The Spatial Reuse (SR) operation included in the IEEE 802.11ax-2020 (11ax) amendment aims at increasing the number of parallel transmissions in an Overlapping Basic Service Set (OBSS). However, many unknowns exist about the performance gains that can be achieved through SR. In this paper, we provide a brief introduction to the SR operation described in the IEEE 802.11ax (draft D4.0). Then, a simulation-based implementation is provided in order to explore the performance gains of the SR operation. Our results show the potential of using SR in different scenarios covering multiple network densities and traffic loads. In particular, we observe significant performance gains when a WLAN applies SR with respect to the default configuration. Interestingly, the highest improvements are observed in the most pessimistic situations in terms of network density and traffic load.
https://arxiv.org/abs/1906.08063
Dataset in Zenodo https://zenodo.org/record/3250080
Barceló P, Baumgartner A, Dalmau V, Kimelfeld B. Regularizing Conjunctive Features for Classification. 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
We consider the feature-generation task wherein we are given a database with entities labeled as positive and negative examples, and the goal is to find feature queries that allow for a linear separation between the two sets of examples. We focus on conjunctive feature queries, and explore two fundamental problems: (a) deciding whether separating feature queries exist (separability), and (b) generating such queries when they exist. In the approximate versions of these problems, we allow a predefined fraction of the examples to be misclassified. To restrict the complexity of the generated classifiers, we explore various ways of regularizing (i.e., imposing simplicity constraints on) them by limiting their dimension, the number of joins in feature queries, and their generalized hypertree width (ghw). Among other results, we show that the separability problem is tractable in the case of bounded ghw; yet, the generation problem is intractable, simply because the feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without ever explicitly generating the feature queries.
Juhl KA, Paulsen RR, Dahl AB, Dahl VA, De Backer O, Kofoed K, Camara O. Guiding 3D U-nets with signed distance fields for creating 3D models from images. Medical Imaging with Deep Learning (MIDL2019)
Morphological analysis of the left atrial appendage is an important tool to assess risk of ischemic stroke. Most deep learning approaches for 3D segmentation is guided by binary label maps, which results in voxelized segmentations unsuitable for morphological analysis. We propose to use signed distance fields to guide a deep network towards morphologically consistent 3D models. The proposed strategy is evaluated on a synthetic dataset of simple geometries, as well as a set of cardiac computed tomography images containing the left atrial appendage. The proposed method produces smooth surfaces with a closer resemblance to the true surface in terms of segmentation overlap and surface distance.
Pons J, Serrà J, Serra X. Training neural audio classifiers with few data. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections. In particular, we study whether (i) a naive regularization of the solution space, (ii) prototypical networks, (iii) transfer learning, or (iv) their combination, can foster deep learning models to better leverage a small amount of training examples. To this end, we evaluate (i-iv) for the tasks of acoustic event recognition and acoustic scene classification, considering from 1 to 100 labeled examples per class. Results indicate that transfer learning is a powerful strategy in such scenarios, but prototypical networks show promising results when one does not count with external or validation data.
https://ieeexplore.ieee.org/document/8682591
https://arxiv.org/abs/1810.10274
Slides http://jordipons.me/media/TrainingFewData_full.pdf
Github https://github.com/jordipons/neural-classifiers-with-few-audio
Accuosto P, Saggion H. Discourse-Driven Argument Mining in Scientific Abstracts. Natural Language Processing and Information Systems. NLDB 2019
Argument mining consists in the automatic identification of argumentative structures in texts. In this work we address the open question of whether discourse-level annotations can contribute to facilitate the identification of argumentative components and relations in scientific literature. We conduct a pilot study by enriching a corpus of computational linguistics abstracts that contains discourse annotations with a new argumentative annotation level. The results obtained from preliminary experiments confirm the potential value of the proposed approach.
https://doi.org/10.1007/978-3-030-23281-8_15
Open access version at UPF e-repository: http://hdl.handle.net/10230/41907
Dejea H, Garcia-Canadilla P, Cook AC, Guasch E, Zamora M, Crispi F, Stampanoni M, Bijnens B, Bonnin A . Comprehensive Analysis of Animal Models of Cardiovascular Disease using Multiscale X-Ray Phase Contrast Tomography. Nature Scientific Reports.
Cardiovascular diseases (CVDs) affect the myocardium and vasculature, inducing remodelling of the heart from cellular to whole organ level. To assess their impact at micro and macroscopic level, multi-resolution imaging techniques that provide high quality images without sample alteration and in 3D are necessary: requirements not fulfilled by most of current methods. In this paper, we take advantage of the non-destructive time-efficient 3D multiscale capabilities of synchrotron Propagation-based X-Ray Phase Contrast Imaging (PB-X-PCI) to study a wide range of cardiac tissue characteristics in one healthy and three different diseased rat models. With a dedicated image processing pipeline, PB-X-PCI images are analysed in order to show its capability to assess different cardiac tissue components at both macroscopic and microscopic levels. The presented technique evaluates in detail the overall cardiac morphology, myocyte aggregate orientation, vasculature changes, fibrosis formation and nearly single cell arrangement. Our results agree with conventional histology and literature. This study demonstrates that synchrotron PB-X-PCI, combined with image processing tools, is a powerful technique for multi-resolution structural investigation of the heart ex-vivo. Therefore, the proposed approach can improve the understanding of the multiscale remodelling processes occurring in CVDs, and the comprehensive and fast assessment of future interventional approaches.
https://doi.org/10.1038/s41598-019-43407-z (Open access article)
Kalpani M, Hernández Leo D. Flexible CSCL orchestration technology: mechanisms for elasticity and dynamism in pyramid script flows. 13th International Conference on Computer Supported Collaborative Learning (CSCL)
Flow patterns (e.g., Pyramid or Snowball) formulate good practices to script collaborative learning scenarios, which have been experimented in small-scale settings widely. Applying flow patterns on large-scale contexts present challenges to educators in terms of orchestration load. Orchestration technology can support educators to manage collaborative activities; yet existing technology do not address flexibility challenges like accommodating growing numbers of students or tolerating dynamic conditions in learning settings. We define elasticity and dynamism as two key elements in the flexibility of a script. Elasticity is related to the capacity of an orchestration technology to incorporate varying participant counts. Dynamism is the capacity to maintain a pedagogically meaningful script progression in presence of different individual behaviors. In this paper we propose flow creation and flow control mechanisms to address elasticity and dynamism in orchestration technology for Pyramid flows. These mechanisms, implemented in the PyramidApp tool, have been evaluated across four scenarios varying from small to large settings. The results show that rules enabling pyramid creation on-demand and the use of timers are useful to achieve elasticity and dynamism in the pyramid formation and progression in an automatic manner.
Milica V, Hernández Leo D. Shall we learn together in loud spaces? Towards understanding the effects of sound in collaborative learning environments. 13th International Conference on Computer Supported Collaborative Learning (CSCL)
In this paper we question the role of environmental sound on the process of CL. The first pilot study is presented where we investigated effects of environmental sound on EDA and voice VA of the participants. The created visualization presents the dependence between mentioned parameters and serves as an awareness tool for participants in CL. Preliminary results are provocative; there seems to be mentioned dependences and participants accept the proposed visualization as a useful tool to support self-regulation during CL.
Amarasinghe I, Hernández-Leo D, Jonsson A. Data-Informed Design Parameters for Adaptive Collaborative Scripting in Across-Spaces Learning Situations. User Modeling and User-Adapted Interaction
This study presents how predictive analytics can be used to inform the formulation of adaptive collaborative learning groups in the context of Computer Supported Collaborative Learning considering across-spaces learning situations. During the study we have collected data from different learning spaces which depicted both individual and collaborative learning activity engagement of students in two different learning contexts (namely the classroom learning and distance learning context) and attempted to predict individual student’s future collaborative learning activity participation in a pyramid-based collaborative learning activity using supervised machine learning techniques. We conducted experimental case studies in the classroom and in distance learning settings, in which real-time predictions of student’s future collaborative learning activity participation were used to formulate adaptive collaborative learner groups. Findings of the case studies showed that the data collected from across-spaces learning scenarios is informative when predicting future collaborative learning activity participation of students hence facilitating the formulation of adaptive collaborative group configurations that adapt to the activity participation differences of students in real-time. Limitations of the proposed approach and future research direction are illustrated.
https://doi.org/10.1007/s11257-019-09233-8
Open Access post-print at UPF e-repository: http://hdl.handle.net/10230/37277
Beardsley M, Santos P, Hernández-Leo D, Michos K. Ethics in educational technology research: informing participants in data sharing risks. British Journal of Educational Technology
Participants in educational technology research regularly share personal data which carries with it risks. Informing participants of these data sharing risks is often only done so through text contained within a consent form. However, conceptualizations of data sharing risks and knowledge of responsible data management practices among teachers and learners may be impoverished—limiting the effectiveness of a consent form in communicating such risks in a manner that adequately supports participants in making informed decisions about sharing their data. At two high schools participating in an educational research project involving the use of technology in the classroom, we investigate teacher and student conceptions of data sharing risks and knowledge of responsible data management practices; and introduce a communication approach that attempts to better inform educational technology research participants of such risks. Results of this study suggest that most teachers have not received formal training related to responsibly managing data; and both teachers and students see the need for such training as they come to realize that their understanding of responsible data management is underdeveloped. Thus, efforts beyond solely explaining data sharing risks in an informed consent form may be needed in educational technology research to facilitate ethical self‐determination.
https://doi.org/10.1111/bjet.12781
Open Access post-print in UPF e-repository: http://hdl.handle.net/10230/41901
Rankothge W, Ramalhinho , Lobo J. On the Scaling of Virtualized Network Functions. 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)
Offering Virtualized Network Functions (VNFs) as a service requires automation of cloud resource management to allocate cloud resources for the VNFs dynamically. Most of the existing solutions focus only on the initial resource allocation. However, the allocation of resources must adapt to dynamic traffic demands and support fast scaling mechanisms. There are three basic scaling models: vertical where re-scaling is achieved by changing the resources assigned to the VNF in the host server, horizontal where VNFs are replicated or removed to do rescaling, and migration where VNFs are moved to servers with more resources. In this paper, we present an Iterated Local Search (ILS) based framework for automation of resource reallocation that supports the three scaling models. We, then, use the framework to run experiments and compare the different scaling approaches, specifically how the optimization is affected by the scaling approach and the optimization objectives.
Ríssola EA, Ramírez-Cifuentes D, Freire A, Crestani F. Suicide Risk Assessment on Social Media:USI-UPF at the CLPsych 2019 Shared Task. Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology
This paper describes the participation of the USI-UPF team at the shared task of the 2019Computational Linguistics and Clinical Psychology Workshop (CLPsych2019). The goal is to assess the degree of suicide risk of social media users given a labelled dataset with their posts. An appropriate suicide risk assessment, with the usage of automated methods, can assist experts on the detection of people at risk and eventually contribute to prevent suicide. We propose a set of machine learning models with features based on lexicons, word embeddings, word level n-grams, and statistics extracted from users’ posts. The results show that the most effective models for the tasks are obtained integrating lexicon-based features, a selected set of n-grams, and statistical measures.
Seda L, Altin M, Bravo Serrano A, Saggion H. LaSTUS/TALN at SemEval-2019 Task 6: Identification and Categorization of Offensive Language in Social Media with Attention-based Bi-LSTM model. Proceedings of the 13th International Workshop on Semantic Evaluation
We present a bidirectional Long-Short Term Memory network for identifying offensive language in Twitter. Our system has been developed in the context of the SemEval 2019 Task 6 which comprises three different sub-tasks, namely A: Offensive Language Detection, B: Categorization of Offensive Language, C: Offensive Language Target Identification. We used a pre-trained Word Embeddings in tweet data, including information about emojis and hashtags. Our approach achieves good performance in the three sub-tasks.
https://aclweb.org/anthology/papers/S/S19/S19-2120/Albó L, Butera-Castelo R, Hernández-Leo D. Supporting the planning of hybrid-MOOCs learning designs. Proceedings of EMOOCs 2019. CEUR Workshop Proceedings
This paper presents a work-in-progress solution for planning hybrid Massive Open Online Courses (MOOC). The use of MOOCs in brick and mortar courses presents several design challenges. One of them is to find the most suitableonline course regarding the alignment with the face-to-face course structure, timelineand syllabus. Despite there are different lesson planning and design tools which support educators in the design of their courses, there is a lack of solutions allowing to incorporate the use of MOOC or MOOC resources in the planning processof blended courses. In this paper, we present aMOOC design module for being used in design authoring toolswhich aim to support theplanning of blended courses thatincorporate MOOCs (or MOOCs resources). We discuss twodifferent solutions for gathering information regarding existing MOOCs in the market: the creation of our own MOOC database versus the parsing of MOOC information from existing search engines on demand. Our exploration leadsus to discard the first solution as maintaining the database is highly demanding. Thus, the final system uses existing MOOC search engines to extract the online courses design information to later be used in the overall hybrid-course planning. As it is a work-in-progress article, we present and discuss our future steps for supporting educators in the of design hybrid MOOCs scenarios.
Open Access version at UPF e-repository: http://hdl.handle.net/10230/37225
Albó L, Hernández-Leo D, Moreno-Oliver V. Smartphones or laptops in the collaborative classroom? A study of video-based learning in higher education. Behaviour & Information Technology
This paper explores how the use of smartphones vs. laptops influences students’ engagement, behaviour and experience watching academic videos in a collaborative classroom. Experiments were run in authentic teaching sessions with a total of 483 first-year higher education students. The methodology applied is a quasi-experimental design with post-test-only, being the independent variable, the device used to visualise the academic videos. Results indicate that the use of laptops has provided better results in terms of student’s engagement with the videos, their collaborative behaviour and satisfaction with the device. Hence, the findings of this research suggest that the type of mobile device used in activities that consider the use of videos in a collaborative class need to be carefully chosen to maximise the student’s comfortability – and in consequence, their engagement with the video-based learning activity and their positive behaviour and experience within the collaborative context.
https://doi.org/10.1080/0144929X.2018.1549596
Post-print at UPF e-repository: http://hdl.handle.net/10230/36270
Won M, Chun S, Serra X. Toward Interpretable Music Tagging with Self-Attention. arXiv pre-print
Self-attention is an attention mechanism that learns a representation by relating different positions in the sequence. The transformer, which is a sequence model solely based on self-attention, and its variants achieved state-of-the-art results in many natural language processing tasks. Since music composes its semantics based on the relations between components in sparse positions, adopting the self-attention mechanism to solve music information retrieval (MIR) problems can be beneficial. Hence, we propose a self-attention based deep sequence model for music tagging. The proposed architecture consists of shallow convolutional layers followed by stacked Transformer encoders. Compared to conventional approaches using fully convolutional or recurrent neural networks, our model is more interpretable while reporting competitive results. We validate the performance of our model with the MagnaTagATune and the Million Song Dataset. In addition, we demonstrate the interpretability of the proposed architecture with a heat map visualization.
arXiv pre-print: https://arxiv.org/abs/1906.04972
Rullo A, Serra E, Lobo J. Redundancy as a Measure of Fault-Tolerance for the Internet of Things: A Review. Policy-Based Autonomic Data Governance. Lecture Notes in Computer Science.
In this paper we review and analyze redundancy-based fault-tolerant techniques for the IoT as a paradigm to support two of the main goals of computer security: availability and integrity. We organized the presentation in terms of the three main tasks performed by the nodes of an IoT network: sensing, routing, and control. We first discuss how the implementation of fault-tolerance in the three areas is primary for the correct operation of an entire system. We provide an overview of the different approaches that have been used to address failures in sensing and routing. Control devices typically implement state machines that take decisions based on the measurement of sensors and may also ask actuators to execute actions. Traditionally state-machine replication for fault-tolerance is realized through consensus protocols. Most protocols were developed in the 80’s and 90’s. We will review the properties of such protocols in detail and discuss their limitations for the IoT. Since 2008, consensus algorithms took a new direction with the introduction of the concept of blockchain. Standard blockchain based protocols cannot be applied without modifications to support fault-tolerance in the IoT. We will review some recent results in this new class of algorithms, and show how they can provide the flexibility required to support fault-tolerance in control devices, and thus overcome some of the limitations of the traditional consensus protocols.
https://link.springer.com/chapter/10.1007/978-3-030-17277-0_11
Amarasinghe I, Hernández-Leo D, Jonsson A. Data-Informed Design Parameters for Adaptive Collaborative Scripting in Across-Spaces Learning Situations.User Modeling and User-Adapted Interaction.
This study presents how predictive analytics can be used to inform the formulation of adaptive collaborative learning groups in the context of Computer Supported Collaborative Learning considering across-spaces learning situations. During the study we have collected data from different learning spaces which depicted both individual and collaborative learning activity engagement of students in two different learning contexts (namely the classroom learning and distance learning context) and attempted to predict individual student’s future collaborative learning activity participation in a pyramid-based collaborative learning activity using supervised machine learning techniques. We conducted experimental case studies in the classroom and in distance learning settings, in which real-time predictions of student’s future collaborative learning activity participation were used to formulate adaptive collaborative learner groups. Findings of the case studies showed that the data collected from across-spaces learning scenarios is informative when predicting future collaborative learning activity participation of students hence facilitating the formulation of adaptive collaborative group configurations that adapt to the activity participation differences of students in real-time. Limitations of the proposed approach and future research direction are illustrated.
Adame T, Bel A, Bellalta B. Increasing LPWAN Scalability by Means of Concurrent Multiband IoT Technologies: An Industry 4.0 Use Case. IEEE Access
One of the most important challenges of the Internet of Things (IoT) in the next years will be the smooth incorporation of millions of smart devices into its communications paradigm. The greater coverage area of sub-1-GHz low-power wide area networks (LPWANs) makes them a suitable technology to easily encompass hundreds of end devices under a single base station. However, LPWAN inherent simplicity affects negatively on scalability, as these networks are not flexible enough to deal with a high number of nodes unless their traffic load is really low, which limits many potential use-cases. This paper analyzes the scalability issue in LPWAN and proposes the INTER-HARE protocol: a solution based on the use of concurrent multiband IoT technologies, where an 868-MHz LPWAN acts as transparent backhaul for a set of subnetworks working at 2.4 GHz. The implementation of the INTER-HARE protocol on a real IoT platform was assessed both in several laboratory testbeds and in a pilot developed in the premises of an industrial company, proving its suitability in non-delay sensitive monitoring applications with end devices scattered throughout the targeted area.
Aguado AM, Olivares AL, Yagüe C, Silva E, Nuñez-García M, Fernandez-Quilez A, Mill J, Genua I, Arzamendi D, De Potter T, Freixa X, Camara O. In silico Optimization of Left Atrial Appendage Occluder Implantation Using Interactive and Modeling Tools. Frontiers in Physiology.
According to clinical studies, around one third of patients with atrial fibrillation (AF) will suffer a stroke during their lifetime. Between 70 and 90% of these strokes are caused by thrombus formed in the left atrial appendage. In patients with contraindications to oral anticoagulants, a left atrial appendage occluder (LAAO) is often implanted to prevent blood flow entering in the LAA. A limited range of LAAO devices is available, with different designs and sizes. Together with the heterogeneity of LAA morphology, these factors make LAAO success dependent on clinician's experience. A sub-optimal LAAO implantation can generate thrombi outside the device, eventually leading to stroke if not treated. The aim of this study was to develop clinician-friendly tools based on biophysical models to optimize LAAO device therapies. A web-based 3D interactive virtual implantation platform, so-called VIDAA, was created to select the most appropriate LAAO configurations (type of device, size, landing zone) for a given patient-specific LAA morphology. An initial LAAO configuration is proposed in VIDAA, automatically computed from LAA shape features (centreline, diameters). The most promising LAAO settings and LAA geometries were exported from VIDAA to build volumetric meshes and run Computational Fluid Dynamics (CFD) simulations to assess blood flow patterns after implantation. Risk of thrombus formation was estimated from the simulated hemodynamics with an index combining information from blood flow velocity and complexity. The combination of the VIDAA platform with in silico indices allowed to identify the LAAO configurations associated to a lower risk of thrombus formation; device positioning was key to the creation of regions with turbulent flows after implantation. Our results demonstrate the potential for optimizing LAAO therapy settings during pre-implant planning based on modeling tools and contribute to reduce the risk of thrombus formation after treatment.
https://doi.org/10.3389/fphys.2019.00237
Additional material:
Simulation data including solver configurations and results are available under request. A beta version of the VIDAA platform is available at the GitHub account of the BCN-Medtech research unit (https://github.com/bcnmedtech)
Torrents-Barrena J, Piella G, Masoller N, Gratacós E, Eixarch E, Ceresa M, González Ballester MA. Fully automatic 3D reconstruction of the placenta and its peripheral vasculature in intrauterine fetal MRI. Medical Image Analysis.
Recent advances in fetal magnetic resonance imaging (MRI) open the door to improved detection and characterization of fetal and placental abnormalities. Since interpreting MRI data can be complex and ambiguous, there is a need for robust computational methods able to quantify placental anatomy (including its vasculature) and function. In this work, we propose a novel fully-automated method to segment the placenta and its peripheral blood vessels from fetal MRI. First, a super-resolution reconstruction of the uterus is generated by combining axial, sagittal and coronal views. The placenta is then segmented using 3D Gabor filters, texture features and Support Vector Machines. A uterus edge-based instance selection is proposed to identify the support vectors defining the placenta boundary. Subsequently, peripheral blood vessels are extracted through a curvature-based corner detector. Our approach is validated on a rich set of 44 control and pathological cases: singleton and (normal / monochorionic) twin pregnancies between 25–37 weeks of gestation. Dice coefficients of 0.82 ± 0.02 and 0.81 ± 0.08 are achieved for placenta and its vasculature segmentation, respectively. A comparative analysis with state of the art convolutional neural networks (CNN), namely, 3D U-Net, V-Net, DeepMedic, Holistic3D Net, HighRes3D Net and Dense V-Net is also conducted for placenta localization, with our method outperforming all CNN approaches. Results suggest that our methodology can aid the diagnosis and surgical planning of severe fetal disorders.
Beardsley M, Santos P, Hernández‐Leo D, Michos K. Ethics in educational technology research: Informing participants on data sharing risks. British Journal of Educational Technology,
Participants in educational technology research regularly share personal data which carries with it risks. Informing participants of these data sharing risks is often only done so through text contained within a consent form. However, conceptualizations of data sharing risks and knowledge of responsible data management practices among teachers and learners may be impoverished—limiting the effectiveness of a consent form in communicating such risks in a manner that adequately supports participants in making informed decisions about sharing their data. At two high schools participating in an educational research project involving the use of technology in the classroom, we investigate teacher and student conceptions of data sharing risks and knowledge of responsible data management practices; and introduce a communication approach that attempts to better inform educational technology research participants of such risks. Results of this study suggest that most teachers have not received formal training related to responsibly managing data; and both teachers and students see the need for such training as they come to realize that their understanding of responsible data management is underdeveloped. Thus, efforts beyond solely explaining data sharing risks in an informed consent form may be needed in educational technology research to facilitate ethical self‐determination.
Porcaro L, Saggion H. Recognizing Musical Entities in User-generated Content. International Conference on Computational Linguistics and Intelligent Text Processing (CICLing) 2019
Recognizing musical entities is important for Music Information Retrieval (MIR) since it can improve the performance of several tasks such as music recommendation, genre classification or artist similarity. However, most entity recognition systems in the music domain have concentrated on formal texts (e.g. artists’ biographies, encyclopedic articles, etc.), ignoring rich and noisy user-generated content. In this work, we present a novel method to recognize musical entities in Twitter content generated by users following a classical music radio channel. Our approach takes advantage of both formal radio schedule and users’ tweets to improve entity recognition. We instantiate several machine learning algorithms to perform entity recognition combining task-specific and corpus-based features. We also show how to improve recognition results by jointly considering formal and user-generated content.
Additional material
- Dataset: 5,093 automatically generated tweets by the BBC Radio 3 Music Bot available at https://github.com/LPorcaro/musicner
Carrascosa M, Bellalta B. Decentralized AP selection using Multi-Armed Bandits: Opportunistic ε-Greedy with Stickiness. arXiv pre-print
WiFi densification leads to the existence of multiple overlapping coverage areas, which allows user stations (STAs)to choose between different Access Points (APs). The standard WiFi association method makes the STAs select the AP with the strongest signal, which in many cases leads to underutilization of some APs while overcrowding others. To mitigate this situation, Reinforcement Learning techniques such as Multi-Armed Bandits can be used to dynamically learn the optimal mapping between APs and STAs, and so redistribute the STAs among the available APs accordingly. This is an especially challenging problem since the network response observed by a given STA depends on the behavior of the others, and so it is very difficult to predict without a global view of the network. In this paper, we focus on solving this problem in a decentralized way, where STAs independently explore the different APs inside their coverage range, and select the one that better satisfy its needs. To do it, we propose a novel approach called Opportunistic ε-greedy with Stickiness that halts the exploration when a suitable AP is found, then, it remains associated to it while the STA is satisfied, only resuming the exploration after several unsatisfactory association periods. With this approach, we reduce significantly the network response variability, improving the ability of the STAs to find a solution faster, as well as achieving a more efficient use of the network resources.
Keywords: IEEE 802.11, WLANs, Reinforcement Learning, Multi-Armed Bandits
Additional material:
Law M, Russo A, Bertino A, Lobo J, Broda K. Representing and Learning Grammars in Answer Set Programming. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019)
In this paper we introduce an extension of context-free gram-mars called answer set grammars (ASGs). These grammars allow annotations on production rules, written in the language of Answer Set Programming (ASP), which can express context-sensitive constraints. We investigate the complexity of various classes of ASG with respect to two decision problems: deciding whether a given string belongs to the language of an ASG and deciding whether the language of an ASG is non-empty. Specifically, we show that the complexity of these decision problems can be lowered by restricting the subset of the ASP language used in the annotations. To aid the applicability of these grammars to computational problems that re-quire context-sensitive parsers for partially known languages, we propose a learning task for inducing the annotations of an ASG. We characterise the complexity of this task and present an algorithm for solving it. An evaluation of a (prototype)implementation is also discussed.
Ramírez-Cifuentes D, Mayans M, Freire A. Early Risk Detection of Anorexia onSocial Media. International Conference on Internet Science. INSCI 2018.
This paper proposes an approach for the early detection of anorexia nervosa (AN) on social media. We present a machine learning approach that processes the texts written by social media users. This method relies on a set of features based on domain-specific vocabulary, topics, psychological processes, and linguistic information extracted from the users’ writings. This approach penalizes the delay in detecting positive cases in order to classify the users in risk as early as possible. Identifying anorexia early, along with an appropriate treatment, improves the speed of recovery and the likelihood of staying free of the illness. The results of this work showed that our proposal is suitable for the early detection of AN symptoms.
- DOI: https://doi.org/10.1007/978-3-030-01437-7_1
- Pre-print at UPF e-repository: http://hdl.handle.net/10230/36745
Dalmazzo D, Ramirez R.Bowing Gestures Classification in Violin Performance: A Machine Learning Approach. Frontiers in Psychology
Gestures in music are of paramount importance partly because they are directly linked to musicians' sound and expressiveness. At the same time, current motion capture technologies are capable of detecting body motion/gestures details very accurately.
We present a machine learning approach to automatic violin bow gesture classification based on Hierarchical Hidden Markov Models (HHMM) and motion data. We recorded motion and audio data corresponding to seven representative bow techniques (Détaché, Martelé, Spiccato, Ricochet, Sautillé, Staccato and Bariolage) performed by a professional violin player. We used the commercial Myo device for recording inertial motion information from the right forearm and synchronized it with audio recordings. Data was uploaded into an online public repository.
After extracting features from both the motion and audio data, we trained an HHMM to identify the different bowing techniques automatically. Our model can determine the studied bowing techniques with over 94% accuracy. The results make feasible the application of this work in a practical learning scenario, where violin students can benefit from the real-time feedback provided by the system.
DOI: doi: 10.3389/fpsyg.2019.00344
Additional material:
The datasets [GENERATED/ANALYZED] for this study can be found in
https://github.com/Dazzid/DataToRepovizz/tree/myo_ to_repovizz/myo_recordings
Barrachina-Muñoz S,Adame T,Bel A,Bellalta B. Towards Energy Efficient LPWANs Through Learning-based Multi-hop Routing. 2019 IEEE World Forum on Internet of Things (WF-IoT 2019)
Low-power wide area networks (LPWANs) have been identified as one of the top emerging wireless technologies due to their autonomy and wide range of applications. Yet, the limited energy resources of battery-powered sensor nodes is a top constraint, especially in single-hop topologies, where nodes located far from the base station must conduct uplink (UL) communications in high power levels. On this point, multi-hop routings in the UL are starting to gain attention due to their capability of reducing energy consumption by enabling transmissions to closer hops. Nonetheless, a priori identifying energy efficient multi-hop routings is not trivial due to the unpredictable factors affecting the communication links in large LPWAN areas. In this paper, we propose epsilon multi-hop (EMH), a simple reinforcement learning (RL) algorithm based on epsilon-greedy to enable reliable and low consumption LPWAN multi-hop topologies. Results from a real testbed show that multi-hop topologies based on EMH achieve significant energy savings with respect to the default single-hop approach, which are accentuated as the network operation progresses.
Additional material:
- Datasets in Github
- arXiv pre-print
Nikbakht R, Lozano A. Uplink Fractional Power Control for Cell-Free Wireless Networks.IEEE Int'l Conf. on Communications (ICC'19)
Information to be included
Link: http://exemple.com
Germano F,Gómez V,Le Mens G. The few-get-richer: a surprising consequence of popularity-based rankings. WWW’19. The Web Conference
Ranking algorithms play a crucial role in online platforms ranging from search engines to recommender systems. In this paper, we identify a surprising consequence of popularity-based rankings: the fewer the items reporting a given signal, the higher the share of the overall traffic they collectively attract. This few-get-richer effect emerges in settings where there are few distinct classes of items (e.g., left-leaning news sources versus right-leaning news sources), and items are ranked based on their popularity. We demonstrate analytically that the few-get-richer effect emerges when people tend to click on top-ranked items and have heterogeneous preferences for the classes of items. Using simulations, we analyze how the strength of the effect changes with assumptions about the setting and human behavior. We also test our predictions experimentally in an online experiment with human participants. Our findings have important implications to understand the spread of misinformation.
Additional material:
- arXiv pre-print
- Dataset
- UPF news here
Briceno R, Bulatov A, Dalmau V, Larose B. Long range actions, connectedness, and dismantlability in relational structures. arXiv pre-print
In this paper we study alternative characterizations of dismantlability properties of relational structures in terms of various connectedness and mixing notions. We relate these results with earlier work of Brightwell and Winkler, providing a generalization from the graph case to the general relational structure context. In addition, we develop properties related to what we call (presence or absence of) boundary long range actions and the study of valid extensions of a given partially defined homomorphism, an approach that turns out to be novel even in the graph case. Finally, we also establish connections between these results and spatial mixing properties of Gibbs measures, the topological strong spatial mixing condition introduced by Brice˜no, and a characterization of finite duality due to Larose, Loten, and Tardif.
Ma J, Rankothge W, Makaya C, Morales M, Le F, Lobo J. An Overview of A Load Balancer Architecture for VNF chains Horizontal Scaling. 14th International Conference on Network and Service Management (CNSM)
We present an architectural design and a reference implementation for horizontal scaling of virtual network function chains. Our solution does not require any changes to network functions and is able to handle stateful network functions for which states may depend on both directions of the traffic. We use connection-aware traffic load balancers based on hashing function to maintain mappings between connections and the dynamically changing network function chains. Our references implementation uses OpenFlow switches to route traffic to the assigned network function instances according to the load balancer decisions. We conducted extensive simulations to test the feasibility of the architecture and evaluate the performance of our implementation.
Martins Dias G, Borges Margi C, de Oliveira FCP, Bellalta B. Cloud-Empowered, Self-Managing Wireless Sensor Networks: Interconnecting Management Operations at the Application Layer. IEEE Consumer Electronics Magazine
This article discusses the design and implementation of a scalable system architecture that integrates wireless sensor networks (WSNs) into the Internet of Things (IoT) and exploits cloud services to autonomously configure wireless sensor nodes to measure and transmit sensed data only at periods when the environment changes more often. The implementation relies on software-defined networking (SDN) features to simplify WSN management and exploits the power of existing cloud computing platforms to execute a reinforcement learning algorithm that makes decisions based on the environment's evolution.
Marín-Juarros, V., Asensio-Pérez, J.I., Hernández-Leo, D., Villagrá-Sobrino, S., García-Sastre, S. Supporting Online Collaborative Design for Teacher Professional Development. Technology, Pedagogy and Education
This paper describes a study on online collaborative design in the context of teacher professional development. 25 teachers from different Spanish universities and disciplines participated in the study. The aim was to understand how to support teachers in interuniversity teams to collaborate fully online along the learning design process of a discipline-based situation that integrates ICT, a problem scarcely tackled in the literature. The described interpretive study, using mixed methods, explores the support to online co-design provided by a novel ICT community platform named ILDE (Integrated Learning Design Environment). Lessons drawn from the results can contribute to the improvement of online collaborative design processes in the context of teacher professional development.
DOI: https://doi.org/10.1080/1475939X.2018.1547787
Open Access: http://uvadoc.uva.es/handle/10324/31435
Barrachina-Muñoz S, Wilhelmi F, Selinis I, Bellalta B. Komondor: a Wireless Network Simulator for Next-Generation High-Density WLANs. arXiv pre-print.
Komondor is a wireless network simulator for next-generation wireless local area networks (WLANs). The simulator has been conceived as an accessible (ready-to-use) open source tool for research on wireless networks and academia. An important advantage of Komondor over other well-known wireless simulators lies in its high event processing rate, which is furnished by the simplification of the core operation. This allows outperforming the execution time of other simulators like ns-3, thus supporting large-scale scenarios with a huge number of nodes. In this paper, we provide insights into the Komondor simulator and overview its main features, development stages and use cases. The operation of Komondor is validated in a variety of scenarios against different tools: the ns-3 simulator and two analytical tools based on Continuous Time Markov Networks (CTMNs) and the Bianchi's DCF model. Results show that Komondor captures the IEEE 802.11 operation very similarly to ns-3. Finally, we discuss the potential of Komondor for simulating complex environments -- even with machine learning support -- in next-generation WLANs by easily developing new user-defined modules of code.
Dominguez M, Burga A, Farrús M, Wanner L. Towards expressive prosody generation in TTS for reading aloud applications. Proc. IberSPEECH 2018
Conversational interfaces involving text-to-speech (TTS) applications have improved expressiveness and overall naturalness to a reasonable extent in the last decades. Conversational features, such as speech acts, affective states and information structure have been instrumental to derive more expressive prosodic contours. However, synthetic speech is still perceived as monotonous, when a text that lacks those conversational features is read aloud in the interface, i.e. it is fed directly to the TTS application. In this paper, we propose a methodology for pre-processing raw texts before they arrive to the TTS application. The aim is to analyze syntactic and information (or communicative) structure, and then use the high-level linguistic features derived from the analysis to generate more expressive prosody in the synthesized speech. The proposed methodology encompasses a pipeline of four modules: (1) a tokenizer, (2) a syntactic parser, (3) a communicative parser, and (3) an SSML prosody tag converter. The implementation has been tested in an experimental setting for German, using web-retrieved articles. Perception tests show a considerable improvement in expressiveness of the synthesized speech when prosody is enriched automatically taking into account the communicative structure
DOI: 10.21437/IberSPEECH.2018-9
Derkach D, Sukno FM. Automatic local shape spectrum analysis for 3D facial expression recognition. Image and Vision Computing.
We investigate the problem of Facial Expression Recognition (FER) using 3D data. Building from one of the most successful frameworks for facial analysis using exclusively 3D geometry, we extend the analysis from a curve-based representation into a spectral representation, which allows a complete description of the underlying surface that can be further tuned to the desired level of detail. Spectral representations are based on the decomposition of the geometry in its spatial frequency components, much like a Fourier transform, which are related to intrinsic characteristics of the surface. In this work, we propose the use of Graph Laplacian Features (GLFs), which result from the projection of local surface patches into a common basis obtained from the Graph Laplacian eigenspace. We extract patches around facial landmarks and include a state-of-the-art localization algorithm to allow for fully-automatic operation. The proposed approach is tested on the three most popular databases for 3D FER (BU-3DFE, Bosphorus and BU-4DFE) in terms of expression and AU recognition. Our results show that the proposed GLFs consistently outperform the curves-based approach as well as the most popular alternative for spectral representation, Shape-DNA, which is based on the Laplace Beltrami Operator and cannot provide a stable basis that guarantee that the extracted signatures for the different patches are directly comparable. Interestingly, the accuracy improvement brought by GLFs is obtained also at a lower computational cost. Considering the extraction of patches as a common step between the three compared approaches, the curves-based framework requires a costly elastic deformation between corresponding curves (e.g. based on splines) and Shape-DNA requires computing an eigen-decomposition of every new patch to be analyzed. In contrast, GLFs only require the projection of the patch geometry into the Graph Laplacian eigenspace, which is common to all patches and can therefore be pre-computed off-line. We also show that 14 automatically detected landmarks are enough to achieve high FER and AU detection rates, only slightly below those obtained when using sets of manually annotated landmarks.
Sanroma G, Benkarim OM, Piella G, Lekadir K, Hahner N, Eixarch E, González Ballester MA. Learning to combine complementary segmentation methods for fetal and 6-month infant brain MRI segmentation. Computerized Medical Imaging and Graphics.
Segmentation of brain structures during the pre-natal and early post-natal periods is the first step for subsequent analysis of brain development. Segmentation techniques can be roughly divided into two families. The first, which we denote as registration-based techniques, rely on initial estimates derived by registration to one (or several) templates. The second family, denoted as learning-based techniques, relate imaging (and spatial) features to their corresponding anatomical labels. Each approach has its own qualities and both are complementary to each other. In this paper, we explore two ensembling strategies, namely,stackingandcascadingto combine the strengths of both families. We present experiments on segmentation of 6-month infant brains and a cohort of fetuses with isolated non-severe ventriculomegaly (INSVM). INSVM is diagnosed when ventricles are mildly enlarged and no other anomalies are apparent. Prognosis is difficult based solely on the degree of ventricular enlargement. In order to find markers for a more reliable prognosis, we use the resulting segmentations to find abnormalities in the cortical folding of INSVM fetuses. Segmentation results show that either combination strategy outperform all of the individual methods, thus demonstrating the capability of learning systematic combinations that lead to an overall improvement. In particular, the cascading strategy outperforms the ensembling one, the former one obtaining top 5, 7 and 13 results (out of 21 teams) in the segmentation of white matter, gray matter and cerebro-spinal fluid in the iSeg2017 MICCAI Segmentation Challenge. The resulting segmentations reveal that INSVM fetuses have a less convoluted cortex. This points to cortical folding abnormalities as potential markers of later neurodevelopmental outcomes.
López-Guimet J, Peña-Pérez L, Bradley RS, García-Canadilla P, Disney C, Geng H, Bodey AJ, Withers PJ, Bijnens B, Sherratt MJ, Egea G. MicroCT imaging reveals differential 3D micro-scale remodelling of the murine aorta in ageing and Marfan syndrome. Theranostics
Aortic wall remodelling is a key feature of both ageing and genetic connective tissue diseases, which are associated with vasculopathies such as Marfan syndrome (MFS). Although the aorta is a 3D structure, little attention has been paid to volumetric assessment, primarily due to the limitations of conventional imaging techniques. Phase-contrast microCT is an emerging imaging technique, which is able to resolve the 3D micro-scale structure of large samples without the need for staining or sectioning.
Methods:Here, we have used synchrotron-based phase-contrast microCT to image aortae of wild type (WT) and MFSFbn1C1039G/+mice aged 3, 6 and 9 months old (n=5). We have also developed a new computational approach to automatically measure key histological parameters.
Results:This analysis revealed that WT mice undergo age-dependent aortic remodelling characterised by increases in ascending aorta diameter, tunica media thickness and cross-sectional area. The MFS aortic wall was subject to comparable remodelling, but the magnitudes of the changes were significantly exacerbated, particularly in 9 month-old MFS mice with ascending aorta wall dilations. Moreover, this morphological remodelling in MFS aorta included internal elastic lamina surface breaks that extended throughout the MFS ascending aorta and were already evident in animals who had not yet developed aneurysms.
Conclusions:Our 3D microCT study of the sub-micron wall structure of whole, intact aorta reveals that histological remodelling of the tunica media in MFS could be viewed as an accelerated ageing process, and that phase-contrast microCT combined with computational image analysis allows the visualisation and quantification of 3D morphological remodelling in large volumes of unstained vascular tissues.
doi: 10.7150/thno.26598
Slizovskaia O, Kim L, Haro G, Gomez E. End-to-End Sound Source Separation Conditioned On Instrument Labels. arXiv pre-print
Can we perform an end-to-end sound source separation (SSS) with a variable number of sources using a deep learning model? This paper presents an extension of the Wave-U-Net model which allows end-to-end monaural source separation with a non-fixed number of sources. Furthermore, we propose multiplicative conditioning with instrument labels at the bottleneck of the Wave-U-Net and show its effect on the separation results. This approach can be further extended to other types of conditioning such as audio-visual SSS and score-informed SSS.
Code and datasets: https://github.com/Veleslavia/vimss
Pons J, Serrà J, Serra X. Training neural audio classifiers with few data. arXiv pre-print.
We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections. In particular, we study whether (i) a naive regularization of the solution space, (ii) prototypical networks, (iii) transfer learning, or (iv) their combination, can foster deep learning models to better leverage a small amount of training examples. To this end, we evaluate (i-iv) for the tasks of acoustic event recognition and acoustic scene classification, considering from 1 to 100 labeled examples per class. Results indicate that transfer learning is a powerful strategy in such scenarios, but prototypical networks show promising results when one does not count with external or validation data.
Code: https://github.com/jordipons/neural-classifiers-with-few-audio
Entry at the author’s blog: http://www.jordipons.me/arxiv-article-training-neural-audio-classifiers-with-few-data/
Lluís F, Pons J, Serra X. End-to-end music source separation: is it possible in the waveform domain? arXiv pre-print.
Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase. In order to avoid omitting potentially useful information, we study the viability of using end-to-end models for music source separation. By operating directly over the waveform, these models take into account all the information available in the raw audio signal, including the phase. Our results show that waveform-based models can outperform a recent spectrogram-based deep learning model. Namely, a novel Wavenet-based model we propose and Wave-U-Net can outperform DeepConvSep, a spectrogram-based deep learning model. This suggests that end-to-end learning has a great potential for the problem of music source separation.
Code and demo: http://jordipons.me/apps/end-to-end-music-source-separation/
Kim J, Won M, Liem CSM, Hanjalic A. Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filtering with Playlist Title Information. Proceedings of the ACM Recommender Systems Challenge 2018
In this paper, we propose a hybrid Neural Collaborative Filtering (NCF) model trained with a multi-objective function to achieve a music playlist generation system. The proposed approach focuses particularly on the cold-start problem (playlists with no seed tracks) and uses a text encoder employing a Recurrent Neural Network (RNN) to exploit textual information given by the playlist title. To accelerate the training, we first apply Weighted Regularized Matrix Factorization (WRMF) as the basic recommendation model to pre-learn latent factors of playlists and tracks. These factors then feed into the proposed multi-objective optimization that also involves embeddings of playlist titles. The experimental study indicates that the proposed approach can effectively suggest suitable music tracks for a given playlist title, compensating poor original recommendation results made on empty playlists by the WRMF model.
Slides: https://www.slideshare.net/JaeHooonKim/towards-seed-free-music-playlist-generation
Blog post by the author on this article. Winner of the WWW 2018 Challenge: Larning to recogniz musical genre
Cikes, M. , Sanchez‐Martinez, S. , Claggett, B. , Duchateau, N. , Piella, G. , Butakoff, C. , Pouleur, A. C., Knappe, D. , Biering‐Sørensen, T. , Kutyifa, V. , Moss, A. , Stein, K. , Solomon, S. D., Bijnens, B. Machine learning‐based phenogrouping in heart failure to identify responders to cardiac resynchronization therapy. European Journal of Heart Failure
Aims
We tested the hypothesis that a machine learning (ML) algorithm utilizing both complex echocardiographic data and clinical parameters could be used to phenogroup a heart failure (HF) cohort and identify patients with beneficial response to cardiac resynchronization therapy (CRT).
Methods and results
We studied 1106 HF patients from the Multicenter Automatic Defibrillator Implantation Trial with Cardiac Resynchronization Therapy (MADIT‐CRT) (left ventricular ejection fraction ≤ 30%, QRS ≥ 130 ms, New York Heart Association class ≤ II) randomized to CRT with a defibrillator (CRT‐D,n = 677) or an implantable cardioverter defibrillator (ICD,n = 429). An unsupervised ML algorithm (Multiple Kernel Learning and K‐means clustering) was used to categorize subjects by similarities in clinical parameters, and left ventricular volume and deformation traces at baseline into mutually exclusive groups. The treatment effect of CRT‐D on the primary outcome (all‐cause death or HF event) and on volume response was compared among these groups. Our analysis identified four phenogroups, significantly different in the majority of baseline clinical characteristics, biomarker values, measures of left and right ventricular structure and function and the primary outcome occurrence. Two phenogroups included a higher proportion of known clinical characteristics predictive of CRT response, and were associated with a substantially better treatment effect of CRT‐D on the primary outcome [hazard ratio (HR) 0.35; 95% confidence interval (CI) 0.19–0.64;P = 0.0005 and HR 0.36; 95% CI 0.19–0.68;P = 0.001] than observed in the other groups (interactionP = 0.02).
Conclusions
Our results serve as a proof‐of‐concept that, by integrating clinical parameters and full heart cycle imaging data, unsupervised ML can provide a clinically meaningful classification of a phenotypically heterogeneous HF cohort and might aid in optimizing the rate of responders to specific therapies.
Torrents-Barrena J, Piella G, Masoller N, Gratacós E, Eixarch E, Ceresa M, González Ballester MA.Segmentation and classification in MRI and US fetal imaging: Recent trends and future prospects. Medical Image Analysis.
Fetal imaging is a burgeoning topic. New advancements in both magnetic resonance imaging and (3D) ultrasound currently allow doctors to diagnose fetal structural abnormalities such as those involved in twin-to-twin transfusion syndrome, gestational diabetes mellitus, pulmonary sequestration and hypoplasia, congenital heart disease, diaphragmatic hernia, ventriculomegaly, etc. Considering the continued breakthroughsin uteroimage analysis and (3D) reconstruction models, it is now possible to gain more insight into the ongoing development of the fetus. Best prenatal diagnosis performances rely on the conscious preparation of the clinicians in terms of fetal anatomy knowledge. Therefore, fetal imaging will likely span and increase its prevalence in the forthcoming years. This review covers state-of-the-art segmentation and classification methodologies for the whole fetus and, more specifically, the fetal brain, lungs, liver, heart and placenta in magnetic resonance imaging and (3D) ultrasound for the first time. Potential applications of the aforementioned methods into clinical settings are also inspected. Finally, improvements in existing approaches as well as most promising avenues to new areas of research are briefly outlined.
Garcia-Canadilla P, Dejea H, Bonnin A, Balicevic V, Loncaric S, Zhang C, Butakoff C, Aguado-Sierra J, Vázquez M, Jackson L H, Stuckey D J, Rau C, Stampanoni M, Bijnens B, Cook. Complex Congenital Heart Disease Associated With Disordered Myocardial Architecture in a Midtrimester Human Fetus. Circulation: Cardiovascular Imaging.
Background:
In the era of increasingly successful corrective interventions in patients with congenital heart disease (CHD), global and regional myocardial remodeling are emerging as important sources of long-term morbidity/mortality. Changes in organization of the myocardium in CHD, and in its mechanical properties, conduction, and blood supply, result in altered myocardial function both before and after surgery. To gain a better understanding and develop appropriate and individualized treatment strategies, the microscopic organization of cardiomyocytes, and their integration at a macroscopic level, needs to be completely understood. The aim of this study is to describe, for the first time, in 3 dimensions and nondestructively the detailed remodeling of cardiac microstructure present in a human fetal heart with complex CHD.
Methods and Results:
Synchrotron X-ray phase-contrast imaging was used to image an archival midgestation formalin-fixed fetal heart with right isomerism and complex CHD and compare with a control fetal heart. Analysis of myocyte aggregates, at detail not accessible with other techniques, was performed. Macroanatomic and conduction system changes specific to the disease were clearly observable, together with disordered myocyte organization in the morphologically right ventricle myocardium. Electrical activation simulations suggested altered synchronicity of the morphologically right ventricle.
Conclusions:
We have shown the potential of X-ray phase-contrast imaging for studying cardiac microstructure in the developing human fetal heart at high resolution providing novel insight while preserving valuable archival material for future study. This is the first study to show myocardial alterations occur in complex CHD as early as midgestation.
Data supplement: https://www.ahajournals.org/doi/suppl/10.1161/CIRCIMAGING.118.007753
Torrents-Barrena J, López-Velazco R, Masoller N, Valenzuela-Alcaraz B, Gratacós E, Eixarch E, Ceresa M, González Ballester MA. Preoperative Planning and Simulation Framework for Twin-to-Twin Transfusion Syndrome Fetal Surgery. CARE 2018, CLIP 2018, OR 2.0 2018, ISIC 2018: OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis
Twin-to-twin transfusion syndrome (TTTS) is a complication of monochorionic twin pregnancies in which arteriovenous vascular communications in the shared placenta lead to blood transfer between the fetuses. Selective fetoscopic laser photocoagulation of abnormal blood vessel connections has become the most effective treatment. Preoperative planning is thus an essential prerequisite to increase survival rates for severe TTTS. In this work, we present the very first TTTS fetal surgery planning and simulation framework. The placenta is segmented in both magnetic resonance imaging (MRI) and 3D ultrasound (US) via novel 3D convolutional neural networks. Likewise, the umbilical cord is extracted in MRI using 3D convolutional long short-term memory units. The detection of the placenta vascular tree is carried out through a curvature-based corner detector in MRI, and the Modified Spatial Kernelized Fuzzy C-Means with a Markov random field refinement in 3D US. The proposed TTTS planning software integrates all aforementioned algorithms to explore the intrauterine environment by simulating the fetoscope camera, determine the correct entry point, train doctors’ movements ahead of surgery, and consequently, improve the success rate and reduce the operation time. The promising results indicate potential of our TTTS planner and simulator for further assessment on clinical real surgeries.
Dalmazzo D, Tassani S, Ramírez R. A Machine Learning Approach to Violin Bow Technique Classification: a Comparison Between IMU and MOCAP systems. iWOAR '18 Proceedings of the 5th international Workshop on Sensor-based Activity Recognition and Interaction
Motion Capture (MOCAP) Systems have been used to analyze body motion and postures in biomedicine, sports, rehabilitation, and music. With the aim to compare the precision of low-cost devices for motion tracking (e.g. Myo) with the precision of MOCAP systems in the context of music performance, we recorded MOCAP and Myo data of a top professional violinist executing four fundamental bowing techniques (i.e. Détaché, Martelé, Spiccato and Ricochet). Using the recorded data we applied machine learning techniques to train models to classify the four bowing techniques. Despite intrinsic differences between the MOCAP and low-cost data, the Myo-based classifier resulted in slightly higher accuracy than the MOCAP-based classifier. This result shows that it is possible to develop music-gesture learning applications based on low-cost technology which can be used in home environments for self-learning practitioners.
Derkach D, Ruiz A, Sukno F. 3D Head Pose Estimation Using Tensor Decomposition and Non-linear Manifold Modeling. 2018 International Conference on 3D Vision (3DV)
Head pose estimation is a challenging computer vision problem with important applications in different scenarios such as human-computer interaction or face recognition. In this paper, we present an algorithm for 3D head pose estimation using only depth information from Kinect sensors. A key feature of the proposed approach is that it allows modeling the underlying 3D manifold that results from the combination of pitch, yaw and roll variations. To do so, we use tensor decomposition to generate separate subspaces for each variation factor and show that each of them has a clear structure that can be modeled with cosine functions from a unique shared parameter per angle. Such representation provides a deep understanding of data behavior and angle estimations can be performed by optimizing combination of these cosine functions. We evaluate our approach on two publicly available databases, and achieve top state-of-the-art performance.
Slizovskaia O, Gómez E, Haro G. A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features. The 2018 Joint Workshop on Machine Learning for Music. Joint workshop program of ICML, IJCAI/ECAI, and AAMAS
This work presents a method for analysis of the activations of audio convolutional neural networks by use of hand-crafted audio features. We analyse activations from three CNN architectures trained on different datasets and compare shallow-level activation maps with harmonic-percussive source separation and chromagrams, and deep-level activations with loudness and onset rate
arXiv postprint: https://arxiv.org/abs/1907.01813
https://sites.google.com/site/faimmusic2018/home
Amarasinghe, I. Hernández-Leo, D. Jonsson, A. Manatunga, K. Sustaining Continuous Collaborative Learning Flows in MOOCs: Orchestration Agent Approach.J.UCS Journal of Universal Computer Science, 24(8)1034-1051
Collaborative learning spaces deployed in Massive Open Online Courses (MOOCs) provide productive social learning opportunities. However, sustaining collaboration in these spaces is challenging. This paper provides a classification of MOOCs participants based on their behavior in a structured collaborative learning space. This analysis leads to requirements for new technological interventions to orchestrate collaborative learning flows in MOOCs. The paper proposes the design of an intelligent agent to address these requirements and reports a study which shows that the intervention of the proposed orchestration agent in a MOOC facilitates to maintain continuous yet meaningful collaboration learning flows.
Albó L, Hernández-Leo D. Co-creation process and challenges in the conceptualization and development of the edCrumble learning design tool. Joint Proceedings of the 1st Co-Creation in the Design, Development and Implementation of Technology-Enhanced Learning workshop (CC-TEL 2018) and Systems of Assessments for Computational Thinking Learning workshop (TACKLE 2018) co-located with 13th European Conference on Technology Enhanced Learning (ECTEL 2018).
This paper presents the co-creation process followed during the conceptualization, development and evaluation of edCrumble: a learning design (LD) tool which provides an innovative visual representation of the LDs characterized by data analytics with the aim of facilitating the planning, visualization, understanding and reuse of complex LDs. Researchers used several participants’ sources and profiles, different methods (including paper and web-based prototyping, questionnaires, interviews, focus groups, role-play games, sharing activities) and workshop types (isolated vs. long-time). Participatory design workshops and activities are described as well as the challenges encountered during the co-design process with the aim of informing other researchers who are thinking of using co-creation. These challenges include the recruitment and motivation of participants, the management of their expectations, the prioritization of the feedback diversity and a short evaluation of the methods used.
Open Access at UPF e-repository: http://hdl.handle.net/10230/35405
Ruiz Wills C, Foata B, González Ballester MA, Karppinen J, Noailly J. Theoretical Explorations Generate New Hypotheses About the Role of the Cartilage Endplate in Early Intervertebral Disk Degeneration. Frontiers in Physiology
Altered cell nutrition in the intervertebral disk (IVD) is considered a main cause for disk degeneration (DD). The cartilage endplate (CEP) provides a major path for the diffusion of nutrients from the peripheral vasculature to the IVD nucleus pulposus (NP). In DD, sclerosis of the adjacent bony endplate is suggested to be responsible for decreased diffusion and disk cell nutrition. Yet, experimental evidence does not support this hypothesis. Hence, we evaluated how moderate CEP composition changes related to tissue degeneration can affect disk nutrition and cell viability. A novel composition-based permeability formulation was developed for the CEP, calibrated, validated, and used in a mechano-transport finite element IVD model. Fixed solute concentrations were applied at the outer surface of the annulus and the CEP, and three cycles of daily mechanical load were simulated. The CEP model indicated that CEP permeability increases with the degeneration/aging of the tissue, in accordance with recent measurements reported in the literature. Additionally, our results showed that CEP degeneration might be responsible for mechanical load-induced NP dehydration, which locally affects oxygen and lactate levels, and reduced glucose concentration by 16% in the NP-annulus transition zone. Remarkably, CEP degeneration was a condition sine-qua-non to provoke cell starvation and death, while simulating the effect of extracellular matrix depletion in DD. This theoretical study cast doubts about the paradigm that CEP calcification is needed to provoke cell starvation, and suggests an alternative path for DD whereby the early degradation of the CEP plays a key role.
Ramírez-Cifuentes D., Mayans M., Freire A. Early Risk Detection of Anorexia onSocial Media. INSCI 2018. Lecture Notes in Computer Science
This paper proposes an approach for the early detection of anorexia nervosa (AN) on social media. We present a machine learning approach that processes the texts written by social media users. This method relies on a set of features based on domain-specific vocabulary, topics, psychological processes, and linguistic information extracted from the users’ writings. This approach penalizes the delay in detecting positive cases in order to classify the users in risk as early as possible. Identifying anorexia early, along with an appropriate treatment, improves the speed of recovery and the likelihood of staying free of the illness. The results of this work showed that our proposal is suitable for the early detection of AN symptoms.
Link: http://exemple.com
Michos, K., & Hernández-Leo, D., Albó, L. (2018). Teacher-led inquiry in technology-supported school communities. British Journal of Educational Technology 49(6), 1077-1095.
Learning design is a research field which studies how to best support teachers as designers of Technology Enhanced Learning (TEL) situations. Although substantial work has been done in the articulation of the learning design process, little is known about how learning designs are experienced by students and teachers, especially in the context of schools. This paper empirically examines if a teacher inquiry model, as a tool for systematic research by teachers into their own practice, facilitates the connection between the design and data‐informed reflection on TEL interventions in two school communities. High school teachers participated in a learning design professional development program supported by a web‐based community platform integrating a teacher inquiry tool (TILE). A multiple case study was conducted aimed at understanding: (a) current teacher practice and (b) teacher involvement in inquiry cycles of design and classroom implementations with technologies. Multiple data sources were used over a one year period including focus groups transcripts, teacher interview protocols, digital artifacts, and questionnaires. Sharing teacher‐led inquiries together with learning analytics was perceived as being useful for connecting pedagogical intentions with the evaluation of their enactment with learners, and this differed from their current practice. Teachers’ reflections about their designs focused on the time management of learning activities and their familiarity with the enactment and analytics tools. Results inform how technology can support teacher‐led inquiry and collective reflective practice in schools.
Dataset: https://zenodo.org/record/1183247#.XC-awFxKg2w
Lobo J.Relationship‐based access control: More than a social network access control model. WIREs Data Mining Knowledge Discovery.
In a computer system, access control refers to the mechanisms the system use to decide whether to grant or reject access to its resources. Access control decisions in social media services, such as Facebook, Twitter, Research Gate, or LinkedIn, are determined in large part by policies that can be described in terms of the relationships among the individuals potentially affected by the decision. The premise behind a larger interest in Relationship‐based Access Control (ReBAC) is that besides social media services, social and other forms of relationships can be an effective abstraction for describing and implementing access control policies. The aim of this paper is to present an overview of ReBAC from the point of view of the types of policies that have motivated the access control research community to develop different ReBAC systems. We also review and reflect on what it would take to implement and administer an ReBAC system.
Garreta-Domingo, M., Sloep, P., Hernández-Leo, D. (2018) Human-centred design to empower ‘teachers as designers’. British Journal of Educational Technologies 49(6), 1113-1130.
Educators of all sectors are learning designers, often unwittingly. To succeed as designers, they need to adopt a design mindset and acquire the skills needed to address the design challenges they encounter in their everyday practice. Human‐centred design (HCD) provides professional designers with the methods needed to address complex problems. It emphasizes the human perspective throughout the design lifecycle and provides a practice‐oriented approach, which naturally fits educators’ realities. This research reports the experiences of educators who used HCD to design ICT‐based learning activities. A mixed‐methods approach was used to gauge how participating educators experienced the design tasks. The perceived level of difficulty and value of the various methods varied, revealing significant differences between educators according to their level of knowledge of pedagogy frameworks. We discuss our findings from the vantage point of educators’ pedagogical beliefs and how experience shapes these. The results support the idea that HCD is a valuable framework for educators, one that may inform ongoing international efforts to shape a science and practice of learning design for teaching.
DOI: https://doi.org/10.1111/bjet.12682
Open Access: http://hdl.handle.net/10230/35462
Štajner S, Saggion H, Ponzetto SP. Improving Lexical Coverage of Text Simplification Systems for Spanish. Expert Systems with Applications
The current bottleneck of all data-driven lexical simplification (LS) systems is scarcity and small size of parallel corpora (original sentences and their manually simplified versions) used for training. This is especially pronounced for languages other than English. We address this problem, taking Spanish as an example of such a language, by building new simplification-specific datasets of synonyms and paraphrases using freely available resources. We test their usefulness in the LS task by adding them, in various combinations, to the existing text simplification (TS) training dataset in a phrase-based statistical machine translation (PBSMT) approach . Our best systems significantly outperform the state-of-the-art LS systems for Spanish, by the number of transformations performed and the grammaticality, simplicity and meaning preservation of the output sentences. The results of a detailed manual analysis show that some of the newly built TS resources, although they have a good lexical coverage and lead to a high number of transformations, often change the original meaning and do not generate simpler output when used in this PBSMT setup. The good combinations of these additional resources with the TS training dataset and a good choice of language model, in contrast, improve the lexical coverage and produce sentences which are grammatical, simpler than the original, and preserve the original meaning well.
Paun B, Bijnens B, Cook AC, Mohun, TJ, Butakoff C. Quantification of the detailed cardiac left ventricular trabecular morphogenesis in the mouse embryo. Medical Image Analysis
During embryogenesis, a mammalian heart develops from a simple tubular shape into a complex 4-chamber organ, going through four distinct phases: early primitive tubular heart, emergence of trabeculations, trabecular remodeling and development of the compact myocardium. In this paper we propose a framework for standardized and subject-independent 3D regional myocardial complexity analysis, applied to analysis of the development of the mouse left ventricle. We propose a standardized subdivision of the myocardium into 3D overlapping regions (in our case 361) and a novel visualization of myocardial complexity, whereupon we: 1) extend the fractal dimension, commonly applied to image slices, to 3D and 2) use volume occupied by the trabeculations in each region together with their surface area, in order to quantify myocardial complexity. The latter provides an intuitive characterization of the complexity, given that compact myocardium will tend to occupy a larger volume with little surface area while high surface area with low volume will correspond to highly trabeculated areas.
Using 50 mouse embryo images at 5 different gestational ages (10 subjects per gestational age), we demonstrate how the proposed representation and complexity measures describe the development of LV myocardial complexity. The mouse embryo data was acquired using high resolution episcopic microscopy. The complexity analysis per region was carried out using: 3D fractal dimension, myocardial volume, myocardial surface area and ratio between the two. The analysis of gestational ages was performed on embryos of 14.5, 15.5, 16.5, 17.5 and 18.5 embryonic days, and demonstrated that the regional complexity of the trabeculations increases longitudinally from the base to the apex, with a maximum around the middle. The overall complexity decreases with gestational age, being most complex at 14.5. Circumferentially, at ages 14.5, 15.5 and 16.5, the trabeculations show similar complexity everywhere except for the anteroseptal and inferolateral area of the wall, where it is smaller. At 17.5 days, the regions of high complexity become more localized towards the inferoseptal and anterolateral parts of the wall. At 18.5 days, the high complexity area exhibits further localization at the inferoseptal and anterior part of the wall.
Graphical abstract
Fernandez-Lopez A, Sukno FM. Survey on automatic lip-reading in the era of deep learning. Image and Vision Computing
In the last few years, there has been an increasing interest in developing systems for Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods based on Deep Learning (DL) have become very popular and have permitted to substantially push forward the achievable performance. In this survey, we review ALR research during the last decade, highlighting the progression from approaches previous to DL (which we refer to as traditional) toward end-to-end DL architectures. We provide a comprehensive list of the audio-visual databases available for lip-reading, describing what tasks they can be used for, their popularity and their most important characteristics, such as the number of speakers, vocabulary size, recording settings and total duration. In correspondence with the shift toward DL, we show that there is a clear tendency toward large-scale datasets targeting realistic application settings and large numbers of samples per class. On the other hand, we summarize, discuss and compare the different ALR systems proposed in the last decade, separately considering traditional and DL approaches. We address a quantitative analysis of the different systems by organizing them in terms of the task that they target (e.g. recognition of letters or digits and words or sentences) and comparing their reported performance in the most commonly used datasets. As a result, we find that DL architectures perform similarly to traditional ones for simpler tasks but report significant improvements in more complex tasks, such as word or sentence recognition, with up to 40% improvement in word recognition rates. Hence, we provide a detailed description of the available ALR systems based on end-to-end DL architectures and identify a tendency to focus on the modeling of temporal context as the key to advance the field. Such modeling is dominated by recurrent neural networks due to their ability to retain context at multiple scales (e.g. short- and long-term information). In this sense, current efforts tend toward techniques that allow a more comprehensive modeling and interpretability of the retained context.
Bif Goularte F, Modesto Nassar S, Fileto R, Saggion H. A text summarization method based on fuzzy rules and applicable to automated assessment. Expert Systems with Applications
In the last two decades, the text summarization task has gained much importance because of the large amount of online data, and its potential to extract useful information and knowledge in a way that could be easily handled by humans and used for a myriad of purposes, including expert systems for text assessment. This paper presents an automatic process for text assessment that relies on fuzzy rules on a variety of extracted features to find the most important information in the assessed texts. The automatically produced summaries of these texts are compared with reference summaries created by domain experts. Differently from other proposals in the literature, our method summarizes text by investigating correlated features to reduce dimensionality, and consequently the number of fuzzy rules used for text summarization. Thus, the proposed approach for text summarization with a relatively small number of fuzzy rules can benefit development and use of future expert systems able to automatically assess writing. The proposed summarization method has been trained and tested in experiments using a dataset of Brazilian Portuguese texts provided by students in response to tasks assigned to them in a Virtual Learning Environment (VLE). The proposed approach was compared with other methods including a naive baseline, Score, Model and Sentence, using ROUGE measures. The results show that the proposal provides better f-measure (with 95% CI) than aforementioned methods.
Molkaraie M, Gómez V. Monte Carlo Methods for the Ferromagnetic Potts Model Using Factor Graph Duality.IEEE Transactions on Information Theory. doi: 10.1109/TIT.2018.2857565
Normal factor graph duality offers new possibilities for Monte Carlo algorithms in graphical models. Specifically, we consider the problem of estimating the partition function of the ferromagnetic Ising and Potts models by Monte Carlo methods, which are known to work well at high temperatures, but to fail at low temperatures.We propose Monte Carlo methods (uniform sampling and importance sampling) in the dual normal factor graph, and demonstrate that they behave differently: they work particularly well at low temperatures. By comparing the relative error in estimating the partition function, we show that the proposed importance sampling algorithm significantly outperforms the state-of-the-art deterministic and Monte Carlo methods. For the ferromagnetic Ising model in an external field, we show the equivalence between the valid configurations in the dual normal factor graph and the terms that appear in the high-temperature series expansion of the partition function. Following this result, we discuss connections with Jerrum–Sinclair–s polynomial randomized approximation scheme (the subgraphs-world process) for evaluating the partition function of ferromagnetic Ising models.
Ramírez-Cifuentes D, Freire A. UPF’s Participation at the CLEF eRisk 2018: Early Risk Prediction on the Internet. Conference and Labs of the Evaluation Forum (CLEF2018), ERISK Workshop
This paper describes the participation of the Web Science and Social Computing Research Group from the Universitat Pompeu Fabra, Barcelona (UPF) at CLEF 2018 eRisk Lab1 . Its main goal, divided in two different tasks, is to detect, with enough anticipation, cases of depression (T1) and anorexia (T2) given a labeled dataset with texts written by social media users. Identifying depressed and anorexic individuals by using automatic early detection methods, can provide experts a tool to do further research regarding these conditions, and help people living with them. Our proposal presents several machine learning models that rely on features based on linguistic information, domain-specific vocabulary and psychological processes. The results, regarding the F-Score, place our best models among the top 5 approaches for both tasks
Albó L, Hernández-Leo D. edCrumble: designing for learning with data analytics. Proceedings of the 13th European Conference on Technology Enhanced Learning (EC-TEL 2018)
This demonstration introduces ILDE2/edCrumble, an online learning design platform that allows teachers the creation of learning designs (LDs) with the support of data analytics. ILDE2/edCrumble is built on top of the LdShake platform, which provides social features enabling the sharing and co-edition of LDs. The tool provides an innovative visual representation of LDs combining face-to-face and online learning in different places (in-class and out-of-class) and times (synchronous and asynchronous). Decision making during the LD process is supported by two types of analytics: resulting from the design of the activities sequenced in a timeline (LD analytics); and aggregated meta-data extracted from several grouped LDs (community analytics). Preliminary results conducted as part of an iterative design-based research process, show that the tool is being perceived as easy to use and useful. During the demo we will show the use case of how LD and community analytics can help balancing the workload and design between different courses which are part of a whole curriculum
https://doi.org/10.1007/978-3-319-98572-5_55
Additional material:
Albó L, Hernández-Leo D. Identifying design principles for learning design tools: the case of edCrumble. Proceedings of the 13th European Conference on Technology Enhanced Learning (EC-TEL 2018)
Despite the existing variety of learning design tools, there is a gap in their understanding and adoption by the educators in their everyday practices. Sharing is one of the main pillars of learning design but sometimes it is not a sufficient reason to convince teachers to adopt the habit of documenting their practices so they can be shared. This study presents the design principles of edCrumble, an online learning design platform that allow teachers the creation and sharing of blended learning designs with the support of data analytics. The design principles have been learned and extracted from a participatory design process with teachers during the conceptualization and ongoing development of the tool. Several workshops including interviews were carried out as part of a design-based research iteration process. Later analysis has been done to extract and highlight those design principles aiming informing the development of learning design tools towards better learning design adoption.
https://doi.org/10.1007/978-3-319-98572-5_31
Additional material:
Hernández‐Leo D, Martinez‐Maldonado R, Pardo A, Muñoz‐Cristóbal JA, Rodríguez‐Triana MJ. Analytics for learning design: A layered framework and tools. British Journal of Educational Technology
The field of learning design studies how to support teachers in devising suitable activities for their students to learn. The field of learning analytics explores how data about students' interactions can be used to increase the understanding of learning experiences. Despite its clear synergy, there is only limited and fragmented work exploring the active role that data analytics can play in supporting design for learning. This paper builds on previous research to propose a framework (analytics layers for learning design) that articulates three layers of data analytics—learning analytics, design analytics and community analytics—to support informed decision‐making in learning design. Additionally, a set of tools and experiences are described to illustrate how the different data analytics perspectives proposed by the framework can support learning design processes.
Abura'ed A,Bravo A,Chiruzzo L,Saggion H. LaSTUS/TALN+INCO @ CL-SciSumm 2018 - Using Regression and Convolutions for Cross-document Semantic Linking and Summarization of Scholarly Literature. Proceedings of the 3rd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2018)
In this paper we present several systems developed to participate in the 3rd Computational Linguistics Scientific Document Summarization Shared challenge which addresses the problem of summarizing a scientific paper taking advantage of its citation network (i.e., the papers that cite the given paper). Given a cluster of scientific documents where one is a reference paper (RP) and the remaining documents are papers citing the reference, two tasks are proposed: (i) to identify which sentences in the reference paper are being cited and why they are cited, and (ii) to produce a citation-based summary of the reference paper using the information in the cluster. Our systems are based on both supervised (Convolutional Neural Networks) and unsupervised techiques taking advantage of word embeddings representations and features computed from the linguistic and semantic analysis of the documents.
Giraldo S, Ortega A, Perez A, Ramirez R, Waddell G,Williamon A. Automatic assessment of violin performance using dynamic time warping classification.26th Signal Processing and Communications Applications Conference (SIU)
The automatic assessment of music performance has become an area of special interest due to the increasing amount of technology-enhanced music learning systems. However, in most of these systems the assessment of the musical performance is based on the accuracy of onsets and pitch, paying little attention to other relevant aspects of performance. In this paper we present a preliminary study to assess the quality of violin performance using machine learning techniques. We collect recording examples of selected violin exercises varying from expert to amateur performances. We process the audio signal to extract features to train models using clustering based on Dynamic Time Warping distance. The quality of new performances is evaluated based on the level of match/miss-match to each of the recorded training examples
DOI: http://dx.doi.org/10.1109/SIU.2018.8404556
Nikbakht R, Jonsson A, Lozano A, Dual-Kernel Online Reconstruction of Power Maps. IEEE Global Communications Conference (GLOBECOM'18)
Details to come soon
Link: http://exemple.com
Aragón, P., Bermejo, Y., Gómez, V., & Kaltenbrunner, A. Interactive Discovery System for Direct Democracy. Advances in Social Networks Analysis and Mining. The 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2018)
Decide Madrid is the civic technology of Madrid City Council which allows users to create and support online petitions. Despite the initial success, the platform is encountering problems with the growth of petition signing because petitions are far from the minimum number of supporting votes they must gather. Previous analyses have suggested that this problem is produced by the interface: a paginated list of petitions which applies a non-optimal ranking algorithm. For this reason, we present an interactive system for the discovery of topics and petitions. This approach leads us to reflect on the usefulness of data visualization techniques to address relevant societal challenges.
Additional material:
Oramas S, Espinosa-Anke L, Gómez F, Serra X. Natural language processing for music knowledge discovery. Journal New Music Research
Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago. In this work, we present different Natural Language Processing (NLP) approaches to harness the potential of these text collections for automatic music knowledge discovery, covering different phases in a prototypical NLP pipeline, namely corpus compilation, text-mining, information extraction, knowledge graph generation, and sentiment analysis. Each of these approaches is presented alongside different use cases (i.e. flamenco, Renaissance and popular music) where large collections of documents are processed, and conclusions stemming from data-driven analyses are presented and discussed.
Additional material:
Lykousas N , Gomez V , Patsakis C. Adult content in Social Live Streaming Services: Characterizing deviant users and relationships. The 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2018)
Social Live Stream Services (SLSS) exploit a new level of social interaction. One of the main challenges in these services is how to detect and prevent deviant behaviors that violate community guidelines. In this work, we focus on adult content production and consumption in two widely used SLSS, namely Live.me and Loops Live, which have millions of users producing massive amounts of video content on a daily basis. We use a pre-trained deep learning model to identify broadcasters of adult content. Our results indicate that moderation systems in place are highly ineffective in suspending the accounts of such users. We create two large datasets by crawling the social graphs of these platforms, which we analyze to identify characterizing traits of adult content producers and consumers, and discover interesting patterns of relationships among them, evident in both networks.
Additional material:
Barbieri F, Marujo L, Karuturi P, Brendel W. Multi-task Emoji Learning. 1st International Workshop on Emoji Understanding and Applications in Social Media. Co-located withThe 12th International AAAI Conference on Web and Social Media (ICWSM-18)
Emojis are very common in social media and understanding their underlying semantics is of great interest from a Natural Language Processing point of view. In this work, we investigate emoji prediction in short text messages using a multi-task pipeline that simultaneously predicts emojis, their categories and sub-categories. The categories are either manually predefined in the unicode standard or automatically obtained by clustering over word embeddings. We show that using this categorical information adds meaningful information, thus improving the performance of emoji prediction task. We systematically analyze the performance of the emoji prediction task by varying the number of training samples and also do a qualitative analysis by using attention weights from the prediction task
http://knoesis.org/resources/Emoji2018/Emoji2018_Papers/Paper6_Emoji2018.pdf
Dominguez M , Farrus M , Wanner L. Thematicity-based Prosody Enrichment for Text-to-Speech Applications. 9th International Conference on Speech Prosody 2018
Theoretical studies on the information structure–prosody interface argue that the content packaged in terms of theme and rheme correlates with the intonation of the corresponding sentence as regards to rising and falling patterns (L*+H LH% and H* LL% respectively). When such a correspondence is used to derive prosody in text-to-speech applications, it is often the case that ToBI labels are statically mapped to acoustic parameters. Such an approach is insufficient to solve the problem of monotonous synthetic voices for two reasons: it is repetitive with respect to prosody enrichment, and a binary flat themerheme representation does not serve to describe properly long complex sentences. In this paper, we introduce a methodology for a more versatile thematicity-based prosody enrichment based on: (i) a hierarchical tripartite thematicity model as proposed in the Meaning–Text Theory, and (ii) a corpus-based approach for the automatic extraction of acoustic parameters (fundamental frequency, breaks and speech rate) that are mapped to a varied range of prosody control tags of the synthesized speech. Such a prosody enrichment has shown to provide higher results in a perception test when implemented in a TTS system.
Version at UPF e-repository:http://hdl.handle.net/10230/34905
GitHub account for the author:https://github.com/monikaUPF
Morales A, Piella G, Martínez O, Sukno FM. A quantitative comparison of methods for 3D face reconstruction from 2D images. 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition
In the past years, many studies have highlighted the relation between deviations from normal facial morphology (dysmorphology) and some genetic and mental disorders. Recent advances in methods for reconstructing the 3D geometry of the face from 2D images opens new possibilities for dysmorphology research without the need for specialized 3D imaging equipment. However, it is unclear whether these methods could reconstruct the facial geometry with the required accuracy. In this paper we present a comparative study of some of the most relevant approaches for 3D face reconstruction from 2D images, including photometric-stereo, deep learning and 3D Morphable Model fitting. We address the comparison in qualitatively and quantitatively terms using a public database consisting of 2D images and 3D scans from 100 people. Interestingly, we find that some methods produce quite noisy reconstructions that do not seem realistic, whereas others look more natural. However, the latter do not seem to adequately capture the geometric variability that exists between different subjects and produce reconstructions that look always very similar across individuals, thus questioning their fidelity
Barbieri F, Camacho-Collados J. How Gender and Skin Tone Modifiers Affect Emoji Semantics in Twitter. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics
In this paper we analyze the use of emojis in social media with respect to gender and skin tone. By gathering a dataset of over twenty two million tweets from United States some findings are clearly highlighted after performing a simple frequency-based analysis. Moreover, we carry out a semantic analysis on the usage of emojis and their modifiers (e.g. gender and skin tone) by embedding all words, emojis and modifiers into the same vector space. Our analyses reveal that some stereotypes related to the skin color and gender seem to be reflected on the use of these modifiers. For example, emojis representing hand gestures are more widely utilized with lighter skin tones, and the usage across skin tones differs significantly. At the same time, the vector corresponding to the male modifier tends to be semantically close to emojis related to business or technology, whereas their female counterparts appear closer to emojis about love or makeup.
Additional material:
- Code and SW2V embeddings are available at https: //github.com/fvancesco/emoji_modifiers
- Postprint at UPF e-repository
Camacho-Collados J, Delli Bovi C, Espinosa-Anke L, Oramas S, Pasini T, Santus E, Shwartz V, Navigli R, Saggion H. SemEval-2018 Task 9: Hypernym Discovery. Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval 2018)
This paper describes the SemEval 2018 Shared Task on Hypernym Discovery. We put forward this task as a complementary benchmark for modeling hypernymy, a problem which has traditionally been cast as a binary classification task, taking a pair of candidate words as input. Instead, our reformulated task is defined as follows: given an input term, retrieve (or discover) its suitable hypernyms from a target corpus. We proposed five different subtasks covering three languages (English, Spanish, and Italian), and two specific domains of knowledge in English (Medical and Music). Participants were allowed to compete in any or all of the subtasks. Overall, a total of 11 teams participated, with a total of 39 different systems submitted through all subtasks. Data, results and further information about the task can be found at https://competitions. codalab.org/competitions/17119.
Barbieri F, Camacho-Collados J, Ronzano F, Espinosa-Anke L, Ballesteros M, Basile V, Patti V, Saggion H. SemEval 2018 Task 2: Multilingual Emoji Prediction. Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval 2018)
This paper describes the results of the first shared task on Multilingual Emoji Prediction, organized as part of SemEval 2018. Given the text of a tweet, the task consists of predicting the most likely emoji to be used along such tweet. Two subtasks were proposed, one for English and one for Spanish, and participants were allowed to submit a system run to one or both subtasks. In total, 49 teams participated in the English subtask and 22 teams submitted a system run to the Spanish subtask. Evaluation was carried out emoji-wise, and the final ranking was based on macro F-Score. Data and further information about this task can be found at https://competitions. codalab.org/competitions/17344
Additional material:
Participants were provided with a Javabased crawler (https://github.com/fra82/ twitter-crawler) to ease the download of the textualcontent of tweets from the ID list
AbuRa’ed A, Saggion H. LaSTUS/TALN at Complex Word Identification (CWI) 2018 Shared Task. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
This paper presents the participation of the LaSTUS/TALN team in the Complex Word Identification (CWI) Shared Task 2018 in the English monolingual track . The purpose of the task was to determine if a word in a given sentence can be judged as complex or not by a certain target audience. For the English track, task organizers provided a training and a development datasets of 27,299 and 3,328 words respectively together with the sentence in which each word occurs. The words were judged as complex or not by 20 human evaluators; ten of whom are natives. We submitted two systems: one system modeled each word to evaluate as a numeric vector populated with a set of lexical, semantic and contextual features while the other system relies on a word embedding representation and a distance metric. We trained two separate classifiers to automatically decide if each word is complex or not. We submitted six runs, two for each of the three subsets of the English monolingual CWI track.
Additional material:
Accuosto P, Saggion H. Improving the accessibility of biomedical texts by semantic enrichment and definition expansion. XXXIV Congreso Internacional de la Sociedad Española para el Procesamiento del Lenguaje Natural; 2018
We present work aimed at facilitating the comprehensibility of healthrelated English-Spanish parallel texts by means of the semantic annotation of biomedical concepts and the automatic expansion of their definitions. In order to overcome the limitations posed by the scarcity of resources available for Spanish, we propose to exploit existing tools targeted at English and then transfer the produced annotations. The evaluations performed show the feasibility of this approach. An enriched set of texts is made available, which can be retrieved, visualized and downloaded through a web interface
Additional material
The linguistic resources generated in the context of this work, including the annotated documents in JSON format, are made available for download from the AsisTerm web site ( http://scientmin.taln.upf.edu/scielo/). AsisTerm provides an on-line interface to search and visualize biomedical abstracts from the ScieLO parallel corpus (Neves, Jimeno-Yepes, and N´ev´eol, 2016) in English and Spanish annotated with UMLS concepts
The full evaluations are made available on-line. ( 1http://scientmin.taln.upf.edu/scielo/ evaluations/Evaluations_AsisTerm.pdf )
Wilhelmi F, Barrachina-Muñoz S, Bellalta B, Cano C, Jonsson A, Neu G. Potential and Pitfalls of Multi-Armed Bandits for Decentralized Spatial Reuse in WLANs. arXiv pre-print
Spatial Reuse (SR) has recently gained attention for performance maximization in IEEE 802.11 Wireless Local Area Networks (WLANs). Decentralized mechanisms are expected to be key in the development of solutions for next-generation WLANs, since many deployments are characterized by being uncoordinated by nature. However, the potential of decentralized mechanisms is limited by the significant lack of knowledge with respect to the overall wireless environment. To shed some light on this subject, we show the main considerations and possibilities of applying online learning in uncoordinated WLANs for the SR problem. In particular, we provide a solution based on Multi-Armed Bandits (MABs) whereby independent WLANs dynamically adjust their frequency channel, transmit power and sensitivity threshold. To that purpose, we provide two different strategies, which refer to selfish and environment-aware learning. While the former stands for pure individual behavior, the second one aims to consider the performance experienced by surrounding nodes, thus taking into account the impact of individual actions on the environment. Through these two strategies, we delve into practical issues for enabling MABs usage in wireless networks, such as convergence ensuring or adversarial effects. In addition, our simulation results illustrate the potential of our proposed solutions for enabling SR in future WLANs, showing that substantial improvements on network performance are achieved regarding throughput and fairness.
https://arxiv.org/abs/1805.11083
Additional material:
All of the source code used in this work is open under the GNU General Public License v3.0, encouraging sharing of algorithm between potential contributors.
Francesc Wilhelmi. Potentials and pitfalls of decentralized spatial reuse.
https:// github.com/fwilhelmi/potential_pitfalls_mabs_spatial_reuse Commit: e761e6e, 2018
The source code of 11axHDWLANSim is open under the GNU General Public License v3.0 and can be found at https://github.com/wn-upf/Komondo
AbuRa’ed A, Chiruzzo L, Saggion H. Experiments in detection of implicit citations. OSP 2018: 7th International Workshop on Mining Scientific Publications
The identification of explicit and implicit citations to a given reference paper is important for numerous scientific text mining activities such as citation purpose identification, scientific opinion mining, and scientific summarization. This paper presents experiments on the identification of implicit citations in scientific papers. As in previous work, and relying on an annotated dataset of explicit and implicit citation sentences, we cast the problem as classification, evaluating several machine learning algorithms trained on a set of task-motivated features. We compare our work with the state of the art on the annotated dataset obtaining improved performance. We also annotate a new dataset which we make publicly available to validate our approach. The results on the new dataset confirm our set of features outperforms previously published research
OA version at UPF e-repository: http://hdl.handle.net/10230/35180
Hernández-Leo D,Asensio-Pérez JI,Derntl M, Francesca Pozzi F,Chacón J,Prieto LP, Persico D . An Integrated Environment for Learning Design. Frontiers in ICT.
Learning design tools aim at supporting practitioners in their task of creating more innovative and effective computer-supported learning situations. Despite there being a myriad of proposed tools, their use presents challenges that recent studies link with practitioners' varied pedagogical approaches and context restrictions, as well as with barriers to practical application derived from the fact that most tools only cover limited functionality and do not support cooperation between practitioners. In this paper we investigate whether it is possible to provide a flexible community system that supports multiple learning design tasks. We propose an Integrated Learning Design Environment (ILDE), which is a networked system integrating collaboration functions, design editors and middleware that enables deployment of the designed learning situations into Virtual Learning Environments. We describe the iterative user-centered process adopted in the design of ILDE as well as its architecture. The architecture is implemented to show its feasibility and that it is capable of providing the targeted functionality. We also present the results of its use in training workshops with 148 practitioners from five different institutions in vocational training, higher and adult education. Some of the learning designs were deployed in VLEs and enacted with students in real learning situations.
Additional material:
- Integrated Learning Design Environment: ILDE https://ilde.upf.edu/
Oramas S, Barbieri F, Nieto O, Serra X. Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval
Music genre labels are useful to organize songs, albums, and artists into broader groups that share similar musical characteristics. In this work, an approach to learn and combine multimodal data representations for music genre classification is proposed. Intermediate representations of deep neural networks are learned from audio tracks, text reviews, and cover art images, and further combined for classification. Experiments on single and multi-label genre classification are then carried out, evaluating the effect of the different learned representations and their combinations. Results on both experiments show how the aggregation of learned representations from different modalities improves the accuracy of the classification, suggesting that different modalities embed complementary information. In addition, the learning of a multimodal feature space increase the performance of pure audio representations, which may be specially relevant when the other modalities are available for training, but not at prediction time. Moreover, a proposed approach for dimensionality reduction of target labels yields major improvements in multi-label classification not only in terms of accuracy, but also in terms of the diversity of the predicted genres, which implies a more fine-grained categorization. Finally, a qualitative analysis of the results sheds some light on the behavior of the different modalities in the classification task.
Additional material:
- Both datasets used in the experiments are released as MSD-I and MuMu. The released data includes mappings between data sources, genre annotations, splits, texts, and links to images.
- The source code to reproduce the audio, text, and multimodal experiments (Tartarus) and the visual experiments is also available.
Link: http://exemple.com
Beardsley M, Hernández‐Leo D, Ramirez‐Melendez R. Seeking reproducibility: Assessing a multimodal study of the testing effect. Journal of Computer Assisted Learning
Low‐cost devices have widened the use of multimodal data in experiments providing a more complete picture of behavioural effects. However, the accurate collection and combination of multimodal and behavioural data in a manner that enables reproducibility is challenging and often requires researchers to refine their approaches. This paper presents a direct replication of a multimodal wordlist experiment. Specifically, we use a low‐cost Emotiv EPOC® to acquire electrophysiological measures of brain activity to investigate whether retrieval during learning facilitates the encoding of subsequent learning as measured by performance on recall tests and reflected by changes in alpha wave oscillations. Behavioural results of the wordlist experiment were replicated, but physiological results were not. We conclude the paper by highlighting the challenges faced in terms of replicating the previous work and in attempting to facilitate the reproducibility of our own experiment.
Additional material:
Calvanese D, Montali M, Lobo J. Verification of Fixed-Topology Declarative Distributed Systems with External Data. Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management
Logic-based languages, such as Datalog and Answer Set Programming, have been recently put forward as a data-centric model to specify and implement network services and protocols. This approach provides the basis for declarative distributed computing, where a distributed system consists of a network of computational nodes, each evolving an internal database and exchanging data with the other nodes. Verifying these systems against temporal dynamic specifications is of crucial importance. In this paper, we attack this problem by considering the case where the network is a fixed connected graph, and nodes can incorporate fresh data from the external environment. As a verification formalism, we consider branching-time, first-order temporal logics. We study the problem from different angles, delineating the decidability frontier and providing tight complexity bounds for the decidable cases.
Additional material:
Furelos-Blanco D, Jonsson A. Solving Concurrent Multiagent Planning using Classical Planning. 6th Workshop on Distributed and Multi-Agent Planning (DMAP 2018)
In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfies a given set of concurrency constraints, while avoiding the exponential blowup associated with concurrent actions. Theoretically, we show that the compilation is sound and complete. Empirically, we show that our compilation can solve challenging multiagent planning problems that require concurrent actions.
Additional material
- The code of the compilation and the domains are available at the GitHub account of the AI-ML research group at UPF
- Open access version at UPF e-repository
Furelos-Blanco D, Bucchiarone A, Jonsson A. CARPooL: Collective Adaptation using concuRrent PLanning. 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018)
In this paper we present the CARPooL demonstrator, an implementation of a Collective Adaptation Engine (CAE) that addresses the challenge of collective adaptation in the smart mobility domain. CARPooL resolves adaptation issues via concurrent planning techniques. It also allows to interact with the provided solutions by adding new issues or analyzing the actions done by each agent.
Additional material:
The software is available at the GitHub account of the AI-ML research group at UPF
Version at UPF e-repository
Bucchiarone A, Khandokar F, Furelos D, Jonsson A, Mourshed M. Collective Adaptation through Concurrent Planning: the Case of Sustainable Urban Mobility. 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018)
In this paper we address the challenges that impede collective adaptation in smart mobility systems by proposing a notion of ensembles. Ensembles enable systems with collective adaptability to be built as emergent aggregations of autonomous and self-adaptive agents. Adaptation in these systems is triggered by a run-time occurrence, which is known as an issue. The novel aspect of our approach is, it allows agents affected by an issue in the context of a smart mobility scenario to adapt collaboratively with minimal impact on their own preferences through an issue resolution process based on concurrent planning algorithms.
Additional material
The software is available at the GitHub account of the AI-ML research group at UPF
Version at UPF e-repository
Furelos-Blanco D, Jonsson A, Palacios H, Jiménez S. Forward-Search Temporal Planning with Simultaneous Events. ICASP’18 Workshop on Constraint Satisfaction Techniques for Planning and Scheduling Problems (COPLAS 2018)
In this paper we describe STP, a novel algorithm for temporal planning. Similar to several existing temporal planners, STP relies on a transformation from temporal planning to classical planning, and constructs a temporal plan by finding a sequence of classical actions that solve the problem while satisfying a given set of temporal constraints. Our main contribution is that STP can solve temporal planning problems that require simultaneous events, i.e. the temporal actions have to be scheduled in such a way that two or more of their effects take place concurrently. To do so, STP separates each event into three phases: one phase in which temporal actions are scheduled to end, one phase in which simultaneous effects take place, and one phase in which temporal actions are scheduled to start. Experimental results show that STP significantly outperforms state-of-the-art temporal planners in a domain requiring simultaneous events.
Additional material
- The code of the compilation and the domains are available at GitHub account of the AI-ML research group at UPF
- Postprint version at UPF e-repository
Michos K, Hernández-Leo D. Supporting awareness in communities of learning design practice. Computers in Human Behavior
The field of learning design has extensively studied the use of technology for the authoring of learning activities. However, the social dimension of the learning design process is still underexplored. In this paper, we investigate communities of teachers who used a social learning design platform (ILDE). We seek to understand how community awareness facilitates the learning design activity of teachers in different educational contexts. Following a design-based research methodology, we developed a community awareness dashboard (inILDE) based on the Cultural Historical Activity Theory (CHAT) framework. The dashboard displays the activity of teachers in ILDE, such as their interactions with learning designs, other members, and with supporting learning design tools. Evaluations of the inILDE dashboard were carried out in four educational communities – two secondary schools, a master programme for pre-service teachers, and in a Massive Open Online Course (MOOC) for teachers. The dashboard was perceived to be useful in summarizing the activity of the community and in identifying content and members' roles. Further, the use of the dashboard increased participants' interactions such as profile views and teachers showed a willingness to build on the contributions of others. As conclusions of the study, we propose five design principles for supporting awareness in learning design communities, namely community context, practice-related insights, visualizations and representations, tasks and community interests.
Additional material:
Garreta-Domingo M,Hernández-Leo D,Sloep P. Evaluation to support learning design: Lessons learned in a teacher training MOOC. Australasian Journal of Educational Technology
The design of learning opportunities is an integral part of the work of all educators. However, educators often lack the design skills and knowledge that professional designers have. One of these basic skills is related to the evaluation of produced artefacts, an essential step in all design frameworks. As a result, evaluation of learning activities falters due to this lack of knowledge and mindset. The present study analyses how in-service educators perceived and accomplished an evaluation activity aimed at promoting assessment prior to enactment. Heuristic evaluation is an inspection method considered to be one of the easiest to learn and yet is efficient, time and cost effective. These characteristics make it suitable for the design work of educators; which is, above all, practice-driven and practice-oriented. Results show that educators grasped the value of such an activity (solving possible second-order barriers) but struggled to understand how to do the specifics of the task (first-order barrier), which was to define their own set of educational heuristics. Based on the lessons learned, the paper finalises with a proposal for a design task to include evaluation to support learning design; empowering educators to assess both existing learning activities and ICT-tools as well as their own designs.
Additional material:
- Handson MOOC – teacher training course on how to design ICT-based learning activities following a design learning studio approach http://www.handsonict.eu/
- Integrated Learning Design Environment (ILDE)https://ilde.upf.edu/
Pons J, Serra X. Randomly weighted CNNs for (music) audio classification. arXiv preprint
The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors. Following this idea, we study how non-trained (randomly weighted) convolutional neural networks perform as feature extractors for (music) audio classification tasks. We use features extracted from the embeddings of deep architectures as input to a classifier - with the goal to compare classification accuracies when using different randomly weighted architectures. By following this methodology, we run a comprehensive evaluation of the current deep architectures for audio classification, and provide evidence that the architectures alone are an important piece for resolving (music) audio problems using deep neural networks.
Additional material:
- Software and datasets in GitHub
- Slides from presentation made at QMU
Ceresa M,Olivares AL,Noailly J, González Ballester MA. Coupled Immunological and Biomechanical Model of Emphysema Progression. Frontiers in Physiology
Chronic Obstructive Pulmonary Disease (COPD) is a disabling respiratory pathology, with a high prevalence and a significant economic and social cost. It is characterized by different clinical phenotypes with different risk profiles. Detecting the correct phenotype, especially for the emphysema subtype, and predicting the risk of major exacerbations are key elements in order to deliver more effective treatments. However, emphysema onset and progression are influenced by a complex interaction between the immune system and the mechanical properties of biological tissue. The former causes chronic inflammation and tissue remodeling. The latter influences the effective resistance or appropriate mechanical response of the lung tissue to repeated breathing cycles. In this work we present a multi-scale model of both aspects, coupling Finite Element (FE) and Agent Based (AB) techniques that we would like to use to predict the onset and progression of emphysema in patients. The AB part is based on existing biological models of inflammation and immunological response as a set of coupled non-linear differential equations. The FE part simulates the biomechanical effects of repeated strain on the biological tissue. We devise a strategy to couple the discrete biological model at the molecular /cellular level and the biomechanical finite element simulations at the tissue level. We tested our implementation on a public emphysema image database and found that it can indeed simulate the evolution of clinical image biomarkers during disease progression.
Additional material:
- UPF news
- Postprint at UPF e-repository
https://www.frontiersin.org/articles/10.3389/fphys.2018.00388
Barbieri F,Marujo L,Karuturi P,Brendel W,Saggion H. Exploring Emoji Usage and Prediction Through a Temporal Variation Lens. arXiv preprint.
The frequent use of Emojis on social media platforms has created a new form of multimodal social interaction. Developing methods for the study and representation of emoji semantics helps to improve future multimodal communication systems. In this paper, we explore the usage and semantics of emojis over time. We compare emoji embeddings trained on a corpus of different seasons and show that some emojis are used differently depending on the time of the year. Moreover, we propose a method to take into account the time information for emoji prediction systems, outperforming state-of-the-art systems. We show that, using the time information, the accuracy of some emojis can be significantly improved.
Nurchis M, Bellalta B. Target Wake Time: Scheduled access in IEEE 802.11ax WLANs. arXiv preprint.
The increasing interest for ubiquitous networking, and the tremendous popularity gained by IEEE 802.11 Wireless Local Area Networks (WLANs) in recent years, is leading to very dense deployments where high levels of channel contention may prevent to meet the increasing users’ demands. To mitigate the negative effects of channel contention, the Target Wake Time (TWT) mechanism included in the IEEE 802.11ax amendment can have a significant role, as it provides an extremely simple but effective mechanism to schedule transmissions in time. Moreover, in addition to reduce the contention between stations, the use of TWT may also contribute to take full advantage of other novel mechanisms in the the IEEE 802.11 universe, such as multiuser transmissions, multiAP cooperation, spatial reuse and coexistence in high-density WLAN scenarios. Overall, we believe TWT may be a first step towards a practical collision-free and deterministic access in future WLANs.
Freire A, Ruiz-Garcia A, Moreno Oliver V. Wisibilízalas: promoting the role of women in ICT among secondary school students. International Conference on Gender Research
Although technology comprises an important part of our daily life, no matter your gender, women are underrepresented in the overall sector of Information and Communications Technologies (ICT). Among several reasons, the recurring stereotype that male students are more suitable for technical studies is the one that stands out. In this context, and with the aim of breaking this stereotype, we introduce Wisibilízalas, a contest addressed to high-school students to show inspirational women to young students, giving global visibility of female ICT professionals and making the students being in touch with technology. Participants have to create a website containing profiles of female ICT professionals. As a requirement, the chosen profiles must correspond to Spanish women who are currently developing their professional career. This way, we promote the active contribution that female engineers have in our current economy and society. 05 students in 15 different teams across Spain took part, creating 50 highly-diverse profiles of women in academia, industry or entrepreneurs, and with different levels of seniority. The contest achieved external impact in media and international recognition. Through some questionnaires, we could evaluate the positive impact of the contest in both students and teachers. The sample is equitable in terms of gender, what makes both girls and boys (and female and male teachers) talk about women working in the ICT domain. The results show the potential interest and relevance that the initiative can have in the educational system and, subsequently, a second edition has been launched.
Additional material
- Postprint at the UPF e-repository
- Wisibilízalas contesthttps://www.upf.edu/web/wisibilizalas
Mangado N,Pons-Prats J,Coma M,Mistrik P,Piella G, Ceresa M, González Ballester MA. Computational evaluation of cochlear implant surgery outcomes accounting for uncertainty and parameter variability. Frontiersin Physiology-Computational Physiology and Medicine
Cochlear implantation (CI) is a complex surgical procedure that restores hearing in patients with severe deafness. The successful outcome of the implanted device relies on a group of factors, some of them unpredictable or difficult to control. Uncertainties on the electrode array position and the electrical properties of the bone make it difficult to accurately compute the current propagation delivered by the implant and the resulting neural activation. In this context, we use uncertainty quantification methods to explore how these uncertainties propagate through all the stages of CI computational simulations. To this end, we employ an automatic framework, encompassing from the finite element generation of CI models to the assessment of the neural response induced by the implant stimulation. To estimate the confidence intervals of the simulated neural response, we propose two approaches. First, we encode the variability of the cochlear morphology among the population through a statistical shape model. This allows us to generate a population of virtual patients using Monte Carlo sampling and to assign to each of them a set of parameter values according to a statistical distribution. The framework is implemented and parallelized in a High Throughput Computing environment that enables to maximize the available computing resources. Secondly, we perform a patient-specific study to evaluate the computed neural response to seek the optimal post-implantation stimulus levels. Considering a single cochlear morphology, the uncertainty in tissue electrical resistivity and surgical insertion parameters is propagated using the Probabilistic Collocation method, which reduces the number of samples to evaluate. Results show that bone resistivity has the highest influence on CI outcomes. In conjunction with the variability of the cochlear length, worst outcomes are obtained for small cochleae with high resistivity values. However, the effect of the surgical insertion length on the CI outcomes could not be clearly observed, since its impact may be concealed by the other considered parameters. Whereas the Monte Carlo approach implies a high computational cost, Probabilistic Collocation presents a suitable trade-off between precision and computational time. Results suggest that the proposed framework has a great potential to help in both surgical planning decisions and in the audiological setting process.
Version at UPF e-repository:http://hdl.handle.net/10230/35406
López-Raventós A, Wilhelmi F, Barrachina-Muñoz S, Bellalta B. Machine Learning and Software Defined Networks for High-Density WLANs. arXiv pre-print.
Next generation of wireless local area networks (WLANs) will operate in dense and highly dynamic scenarios. In addition, chaotic deployments may cause an important degradation in terms of user's experience due to uncontrolled high interference levels. Flexible network architectures, such as the software-defined networking (SDN) paradigm, will provide to WLANs with new capabilities to deal with users' demands, while achieving greater levels of efficiency and flexibility in those dynamic and complex scenarios. Moreover, the use of machine learning (ML) techniques will improve network resource usage and management by identifying feasible configurations through learning. ML techniques can drive WLANs to reach optimal working points by means of parameter adjustment, in order to cope with different network requirements and policies, as well as with the dynamic conditions. In this paper we overview the work done in SDN for WLANs, as well as the pioneering works considering ML for WLAN optimization. Moreover, we validate the suitability of such an approach by developing a use-case in which we show the benefits of using ML, and how those techniques can be combined with SDN.
Kim J, Won M, Serra X, Liem C.C.S. Transfer Learning of Artist Group Factors to Musical Genre Classification. Proceedings of the 2018 World Wide Web Conference
The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy. Artist labels are less subjective and less noisy, while certain artists may relate more strongly to certain genres. At the same time, at prediction time, it is not guaranteed that artist labels are available for a given audio segment. Therefore, in this work, we propose to apply the transfer learning framework, learning artist-related information which will be used at inference time for genre classification. We consider different types of artist-related information, expressed through artist group factors, which will allow for more efficient learning and stronger robustness to potential label noise. Furthermore, we investigate how to achieve the highest validation accuracy on the given FMA dataset, by experimenting with various kinds of transfer methods, including single-task transfer, multi-task transfer and finally multi-task learning.
Additional material:
- The experiments were run on GPU-accelerated hardware and software environments. We used Lasagne (see in Zenodo), Theano (see in arXiv) and Keras (see in Github) as main experimental frameworks. The main Software for the experiments can be found in GitHub
- FMA dataset
- Post-print at UPF e-repository
Ferrés D, Saggion H, Ronzano F, Bravo À. PDFdigest: an adaptable layout-aware PDF-to-XML textual content extractor for scientific articles. Language Resources and Evaluation Conference (LREC 2018)
The availability of automated approaches and tools to extract structured textual content from PDF articles is essential to enable scientific text mining. This paper describes and evaluates the PDFdigest tool, a PDF-to-XML textual content extraction system specially designed to extract scientific articles’ headings and logical structure (title, authors, abstract,...) and its textual content. The extractor deals with both text-based and image-based PDF articles using custom rule-based algorithms implemented with existing state-of-the-art open-source tools for both PDF-to-HTML conversion and image-based PDF Optical Character Recognition.
Additional material:
- Postprint at UPF e-repository
- Access to PDFdigest tool
Muñoz-Cristobal J.A., Hernández-Leo D, Martinez-Maldonado R, Thompson K, Wardak D, Goodyear P. 4FAD: A framework for mapping the evolution of artefacts in the learning design process. Australasian Journal of Educational Technology
A number of researchers have explored the role and nature of design in education, proposing a diverse array of life cycle models. Design plays subtly different roles in each of these models. The learning design research community is shifting its attention from the representation of pedagogical plans to considering design as an ongoing process. As a result, the study of the artefacts generated and used by educational designers is also changing: from a focus on the final designed artefact (the product of the design process) to the many artefacts generated and used by designers at different stages of the design process (e.g., sketches, reflections, drawings, or pictures). However, there is still a dearth of studies exploring the evolution of such artefacts throughout the learning design life cycle. A deeper understanding of these evolutionary processes is needed – to help smooth the transitions between stages in the life cycle. In this paper, we introduce the four-dimensional framework for artefacts in design (4FAD) to generate understanding and facilitate the mapping of the evolution of learning design artefacts. We illustrate the value of the framework by applying it in the analysis of an authentic design case.
Additional material:
Barrachina-Munoz S, Wilhelmi F, Bellalta B. To overlap or not to overlap: Enabling Channel Bonding in High Density WLANs
Wireless local area networks (WLANs) are the most popular kind of wireless Internet connection. However, the number of devices accessing the Internet through WLANs such as laptops, smartphones, or wearables, is increasing drastically at the same time that applications’ throughput requirements do. To cope with the later challenge, channel bonding (CB) techniques are used for enabling higher data rates by transmitting in wider channels. Nonetheless, some important issues such as higher potential co-channel interference arise when bonding channels. In this paper we address this point at issue: is it convenient for high density WLANs to use wider channels and potentially overlap in spectrum? We show that, while the performance of static CB is really poor, spectrum overlapping is highly convenient when adapting to the medium through dynamic channel bonding (DCB); specially for low to moderate traffic loads. Contradicting most of current thoughts, the presented results suggest that future wireless networks should be allowed to use all available spectrum, and locally adapt to desirable configurations.
Additional material:
Software: Komondor https://github.com/wn-upf/Komondor/releases/tag/v1.2
arXiv preprint https://arxiv.org/abs/1803.09112
Aragón P, Sáez-Trumper D, Redi M, Hale SA, Gómez V, Kaltenbrunner A. Online Petitioning Through Data Exploration, and What We Found There: A Dataset of Petitions from Avaaz.org. International AAAI Conference on Web and Social Media (ICWSM 2018)
The Internet has become a fundamental resource for activism as it facilitates political mobilization at a global scale. Petition platforms are a clear example of how thousands of people around the world can contribute to social change. Avaaz.org, with a presence in over 200 countries, is one of the most popular of this type. However, little research has focused on this platform, probably due to a lack of available data. In this work we retrieved more than 350K petitions, standardized their field values, and added new information using language detection and named-entity recognition. To motivate future research with this unique repository of global protest, we present a first exploration of the dataset. In particular, we examine how social media campaigning is related to the success of petitions, as well as some geographic and linguistic findings about the worldwide community of Avaaz.org. We conclude with example research questions that could be addressed with our dataset
Additional material
Barbieri F,Ballesteros M,Ronzano F,Saggion H. Multimodal Emoji Prediction. Proceedings of the North American Chapter of ACL (NAACL)
Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram posts are composed of pictures together with texts which sometimes include emojis. We show that these emojis can be predicted by using the text, but also using the picture. Our main finding is that incorporating the two synergistic modalities, in a combined model, improves accuracy in an emoji prediction task. This result demonstrates that these two modalities (text and images) encode different information on the use of emojis and therefore can complement each other.
Barbieri F, Camacho-Collados J, Ronzano F, Espinosa-Anke L, Ballesteros M, Basile V, Patti V, Saggion H. SemEval-2018 Task 2: Multilingual Emoji Prediction. Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018)
Paper describing theSemEval-2018 Task 2, Multilingual Emoji Prediction(under review)
Additional material:
Raote I,Ortega-Bellido M, Santos A,Foresti O, Zhang C,Garcia-Parajo MF,Campelo F,Malhotra V.TANGO1 builds a machine for collagen export by recruiting and spatially organizing COPII, tethers and membranes. eLife 2018
Collagen export from the endoplasmic reticulum (ER) requires TANGO1, COPII coats, and retrograde fusion of ERGIC membranes. How do these components come together to produce a transport carrier commensurate with the bulky cargo collagen? TANGO1 is known to form a ring that corrals COPII coats and we show here how this ring or fence is assembled. Our data reveal that a TANGO1 ring is organized by its radial interaction with COPII, and lateral interactions with cTAGE5, TANGO1-short or itself. Of particular interest is the finding that TANGO1 recruits ERGIC membranes for collagen export via the NRZ (NBAS/RINT1/ZW10) tether complex. Therefore, TANGO1 couples retrograde membrane flow to anterograde cargo transport. Without the NRZ complex, the TANGO1 ring does not assemble, suggesting its role in nucleating or stabilising of this process. Thus, coordinated capture of COPII coats, cTAGE5, TANGO1-short, and tethers by TANGO1 assembles a collagen export machine at the ER.
Additional material:
Rethage D, Pons J, Serra X.A Wavenet for Speech Denoising.43rd IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP2018)
Currently, most speech processing techniques use magnitude spectrograms as front-end and are therefore by default discarding part of the signal: the phase. In order to overcome this limitation, we propose an end-to-end learning method for speech denoising based on Wavenet. The proposed model adaptation retains Wavenet's powerful acoustic modeling capabilities, while significantly reducing its time-complexity by eliminating its autoregressive nature. Specifically, the model makes use of non-causal, dilated convolutions and predicts target fields instead of a single target sample. The discriminative adaptation of the model we propose, learns in a supervised fashion via minimizing a regression loss. These modifications make the model highly parallelizable during both training and inference. Both computational and perceptual evaluations indicate that the proposed method is preferred to Wiener filtering, a common method based on processing the magnitude spectrogram.
Additional material
Funke J,Zhang C, Tobias P, Gonzalez Ballester MA, Saalfeld S. The Candidate Multi-Cut for Cell Segmentation. IEEE International Symposium on Biomedical Imaging (ISBI'18)
Two successful approaches for the segmentation of biomedical images are (1) the selection of segment candidates from a merge-tree, and (2) the clustering of small superpixels by solving a Multi-Cut problem. In this paper, we introduce a model that unifies both approaches. Our model, the Candidate Multi-Cut (CMC), allows joint selection and clustering of segment candidates from a merge-tree. This way, we overcome the respective limitations of the individual methods: (1) the space of possible segmentations is not constrained to candidates of a merge-tree, and (2) the decision for clustering can be made on candidates larger than superpixels, using features over larger contexts. We solve the optimization problem of selecting and clustering of candidates using an integer linear program. On datasets of 2D light microscopy of cell populations and 3D electron microscopy of neurons, we show that our method generalizes well and generates more accurate segmentations than merge-tree or Multi-Cut methods alone.
Additional material:
Barrachina-Munoz S, Wilhelmi F, Bellalta B. Performance Analysis of Dynamic Channel Bonding in Spatially Distributed High Density WLANs. arXiv preprint.
In this paper we discuss the effects on throughput and fairness of dynamic channel bonding (DCB) in spatially distributed high density (HD) wireless local area networks (WLANs). First, we present an analytical framework based on continuous time Markov networks (CTMNs) for depicting the phenomena given when applying different DCB policies in spatially distributed scenarios, where nodes are not required to be within the carrier sense of each other. Then, we assess the performance of DCB in HD IEEE 802.11ax WLANs by means of simulations. Regarding spatial distribution, we show that there may be critical interrelations among nodes – even if they are located outside the carrier sense range of each other – in a chain reaction manner. Results also show that, while always selecting the widest available channel normally maximizes the individual long-term throughput, it often generates unfair scenarios where other WLANs starve. Moreover, we show that there are scenarios where DCB with stochastic channel width selection improves the latter approach both in terms of individual throughput and fairness. It follows that there is not a unique DCB policy that is optimal for every case. Instead, smarter bandwidth adaptation is required in the challenging scenarios of next-generation WLANs.
Additional material:
- Spatial-Flexible Continuous Time Markov Network( SFCTMN), analytical framework based on Continuous Time Markov Networks (CTMNs). https://github.com/ sergiobarra/SFCTMN
- Komondor, a wireless networks simulator built on top of COST library https://github.com/wn-upf/ Komondor
- arXiv pre-print: https://arxiv.org/abs/1801.00594
Link: 10.1109/TMC.2019.2899835
Domínguez, M, Burga, A, Farrús, M, Wanner L. Compilation of Corpora for the Study of the Information Structure-Prosody Interface. Language Resources and Evaluation Conference (LREC2018)
Theoretical studies on the information structure–prosody interface argue that the content packaged in terms of theme and rheme correlates with the intonation of the corresponding sentence. However, there are few empirical studies that support this argument and even fewer resources that promote reproducibility and scalability of experiments. In this paper, we introduce a methodology for the compilation of annotated corpora to study the correspondence between information structure and prosody. The application of this methodology is exemplified on a corpus of read speech in English annotated with hierarchical thematicity and automatically annotated prosodic parameters.
Additional material:
- Data and code available in GitHub
- Postprint version at UPF e-repository
Dalmazzo D, Ramirez R. Air violin: a machine learning approach to fingering gesture recognition. MIE 2017- Proceedings of the 1st ACM SIGCHI International Workshop on Multimodal Interaction for Education
We train and evaluate two machine learning models for predicting fingering in violin performances using motion and EMG sensors integrated in the Myo device. Our aim is twofold: first, provide a fingering recognition model in the context of a gamification virtual violin application where we measure both right hand (i.e. bow) and left hand (i.e. fingering) gestures, and second, implement a tracking system for a computer assisted pedagogical tool for self-regulated learners in high-level music education. Our approach is based on the principle of mapping-by-demonstration in which the model is trained by the performer. We evaluated a model based on Decision Trees and compared it with a Hidden Markovian Model.
Version in Zenodo: http://doi.org/10.5281/zenodo.1193758
Barbieri F, Ballesteros M, Saggion H. Are emojis predictable? 15th Conference of the European Chapter of the Association for Computational Linguistics (ACL2017)
Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are evoked by text-based tweet messages. We train several models based on Long ShortTerm Memory networks (LSTMs) in this task. Our experimental results show that our neural model outperforms two baselines as well as humans solving the same task, suggesting that computational models are able to better capture the underlying semantics of emojis.
Additional material:
- Postprint at UPF e-repository andarXiv
Oramas S, Ferraro A, Correya A, Serra X. MEL: A music entity linking system. 18th International Society for Music Information Retrieval Conference (ISMIR17)
In this work, we present MEL, the first Music Entity Linking system. MEL is able to identify mentions of musical entities (e.g., album, songs, and artists) in free text, and disambiguate them to a music knowledge base, i.e., MusicBrainz. MEL combines different state-of-the-art libraries and SimpleBrainz, an RDF knowledge base created from MusicBrainz after a simplification process. MEL is released as a REST API and as an online demo Web.
Additional material
- Online demohttp://mel.mtg.upf.edu/static/index.html
- Graph database SimpleBrainzhttps://github.com/andrebola/simplebrainz
Slizovskaia O, Gomez E, Haro G. Correspondence between audio and visual deep models for musical instrument detection in video recordings. 18th International Society for Music Information Retrieval Conference (ISMIR17)
This work aims at investigating cross-modal connections between audio and video sources in the task of musical instrument recognition. We also address in this work the understanding of the representations learned by convolutional neural networks (CNNs) and we study feature correspondence between audio and visual components of a multimodal CNN architecture. For each instrument category, we select the most activated neurons and investigate existing cross-correlations between neurons from the audio and video CNN which activate the same instrument category. We analyse two training schemes for multimodal applications and perform a comparative analysis and visualisation of model predictions.
Additional material:
Pons J, Nieto O, Prockup M, Schmidt EM, Ehmann AF, Serra X. End-to-end learning for music audio tagging at scale.In the workshop on Machine Learning for Audio Signal Processing (ML4Audio), NIPS.
The lack of data tends to limit the outcomes of deep learning research - specially, when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study we make use of musical labels annotated for 1.2 million tracks. This large amount of data allows us to unrestrictedly explore different front-end paradigms: from assumption-free models - using waveforms as input with very small convolutional filters; to models that rely on domain knowledge - log-mel spectrograms with a convolutional neural network designed to learn temporal and timbral features. Results suggest that while spectrogram-based models surpass their waveform-based counterparts, the difference in performance shrinks as more data are employed.
Additional material:
Gong, R,Pons J,Serra X. Audio to Score Matching by Combining Phonetic and Duration Information. The 18th International Society for Music Information Retrieval Conference (ISMIR17)
We approach the singing phrase audio to score matching problem by using phonetic and duration information – with a focus on studying the jingju a cappella singing case. We argue that, due to the existence of a basic melodic contour for each mode in jingju music, only using melodic information (such as pitch contour) will result in an ambiguous matching. This leads us to propose a matching approach based on the use of phonetic and duration information. Phonetic information is extracted with an acoustic model shaped with our data, and duration information is considered with the Hidden Markov Models (HMMs) variants we investigate. We build a model for each lyric path in our scores and we achieve the matching by ranking the posterior probabilities of the decoded most likely state sequences. Three acoustic models are investigated: (i) convolutional neural networks (CNNs), (ii) deep neural networks (DNNs) and (iii) Gaussian mixture models (GMMs). Also, two duration models are compared: (i) hidden semi-Markov model (HSMM) and (ii) post-processor duration model. Results show that CNNs perform better in our (small) audio dataset and also that HSMM outperforms the post-processor duration model.
Additional material:
- codein Github
- datain Zenodo
- arXiv
- ISMIR
- Complete registry at the web of the Music Technology Group
Pons J, Gong R, Serra X. Score-informed syllable segmentation for a capella singing voice with convolutional neural networks. In 18th International Society for Music Information Retrieval Conference (ISMIR2017)
This paper introduces a new score-informed method for the segmentation of jingju a cappella singing phrase into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated syllable onset detection function (ODF) and its score. Throughout the paper, we first examine the jingju syllables structure and propose a definition of the term "syllable onset". Then, we identify which are the challenges that jingju a cappella singing poses. Further, we investigate how to improve the syllable ODF estimation with convolutional neural networks (CNNs). We propose a novel CNN architecture that allows to efficiently capture different time-frequency scales for estimating syllable onsets. In addition, we propose using a score-informed Viterbi algorithm -instead of thresholding the onset function-, because the available musical knowledge we have (the score) can be used to inform the Viterbi algorithm in order to overcome the identified challenges. The proposed method outperforms the state-of-the-art in syllable segmentation for jingju a cappella singing. We further provide an analysis of the segmentation errors which points possible research directions.
Additional material:
Paun B, Bijnens B, Butakoff C. Relationship between the left ventricular size and the amount of trabeculations. International Journal for Numerical Methods in Biomedical Engineering
Contemporary imaging modalities offer non-invasive quantification of myocardial deformation; however, they make gross assumptions about internal structure of the cardiac walls. Our aim is to study the possible impact of the trabeculations on the stroke volume, strain and capacity of differently sized ventricles. The cardiac left ventricle is represented by an ellipsoid and the trabeculations by a tissue occupying a fixed volume. The ventricular contraction is modelled by scaling the ellipsoid whereupon the measurements of longitudinal strain, end-diastolic, end-systolic and stroke volume are derived and compared. When the trabeculated and non-trabeculated ventricles, having the same geometry and deformation pattern, contain the same amount of blood and contract with the same strain, we observed an increased stroke volume in our model of the trabeculated ventricle. When these ventricles contain and eject the same amount of blood, we observed a reduced strain in the trabeculated case. We identified that a trade-off between the strain and the amount of trabeculations could be reached with a 0.35-0.41 cm dense trabeculated layer, without blood filled recesses (for a ventricle with end-diastolic volume of about 150 ml). A trabeculated ventricle can work at lower strains compared to a non-trabeculated ventricle to produce the same stroke volume, which could be a possible explanation why athletes and pregnant women develop reversible signs of left ventricular non-compaction, since the trabeculations could help generating extra cardiac output. This knowledge might help to assess heart failure patients with dilated cardiomyopathies who often show signs of non-compaction.
Additional material:
- Open access version at UPF e-repository
- Articles in press
Dzhambazov G, Miron M, Serra X. Lyrics to audio alignment for karaoke in pop music. Music Information Retrieval Evaluation eXchange (MIREX 2017)
In this paper we describe an algorithm for automatic lyricsto-audio alignment. It has as a goal the automatic detection of word boundaries in multi-instrumental English pop songs. We rely on a phonetic recognizer based on hidden Markov models: a widely-used method for tracking phonemes in speech processing problems. Tracking lyrics in music audio is harder than tracking text in speech because, unlike speech, the singing voice is mixed with multiple instruments. To address this obstacle we apply a convolution neural networks-based method for singing voice separation. We present a prototype of a practical application based on the alignment method - the highliting of lyrics in a karaoke-like fashion
Additional material:
- Software https://github.com/georgid
Dzhambazov G,Miron M,Serra X. Lyrics to audio alignment in polyphonic audio. Music Information Retrieval Evaluation eXchange (MIREX 2017)
In this paper we describe the two algorithms we submitted for the MIREX 2017 task of Automatic Lyrics-to-Audio Alignment. The task has as a goal the automatic detection of word boundaries in multi-instrumental English pop music. We rely on a phonetic recognizer based on hidden Markov models (HMM): a widely-used method for tracking phonemes in speech processing problems. Tracking lyrics in music audio is harder than tracking text in speech because, unlike speech, the singing voice is mixed with multiple instruments. To address this obstacle we propose the application of two separate methods for segregating the singing voice from the multi-instrumental mix. One of them is based on the detection of vocal harmonic partials, whereas the other extracts the vocal content by means of source separation.
Additional material:
- Software https://github.com/georgid
Speck R, Röder M, Oramas S, Espinosa-Anke L, Ngonga Ngomo AC. Open Knowledge Extraction Challenge 2017. Semantic Web Challenges. SemWebEval 2017. Communications in Computer and Information Science
The Open Knowledge Extraction Challenge invites researchers and practitioners from academia as well as industry to compete to the aim of pushing further the state of the art of knowledge extraction from text for the Semantic Web. The challenge has the ambition to provide a reference framework for research in this field by redefining a number of tasks typically from information and knowledge extraction by taking into account Semantic Web requirements and has the goal to test the performance of knowledge extraction systems. This year, the challenge goes in the third round and consists of three tasks which include named entity identification, typing and disambiguation by linking to a knowledge base depending on the task. The challenge makes use of small gold standard datasets that consist of manually curated documents and large silver standard datasets that consist of automatically generated synthetic documents. The performance measure of a participating system is twofold base on (1) Precision, Recall, F1-measure and on (2) Precision, Recall, F1-measure with respect to the runtime of the system.
Additional material:
- Link of the challenge https://project-hobbit.eu/challenges/oke2017-challenge-eswc-2017/
Pons J,Nieto O,Prockup M,Schmidt EM,Ehmann AF,Serra X. End-to-end learning for music audio tagging at scale.The 18th International Society for Music Information Retrieval Conference (ISMIR17)
The lack of data tends to limit the outcomes of deep learning research – specially, when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study we make use of musical labels annotated for 1.2 million tracks. This large amount of data allows us to unrestrictedly explore different front-end paradigms: from assumption-free models – using waveforms as input with very small convolutional filters; to models that rely on domain knowledge – log-MEL spectrograms with a convolutional neural network designed to learn temporal and timbral features. Results suggest that, while spectrogrambased models surpass their waveform-based counterparts, the difference in performance shrinks as more data is employed.
Additional material:
- Software (GitHub)
- Preprint at arXiv https://arxiv.org/abs/1711.02520
Wilhelmi F,Cano C,Neu G,Bellalta B,Jonsson A,Barrachina-Muñoz S. Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits. Ad Hoc Networks
Next-generation wireless deployments are characterized by being dense and uncoordinated, which often leads to inefficient use of resources and poor performance. To solve this, we envision the utilization of completely decentralized mechanisms that enhance Spatial Reuse (SR). In particular, we concentrate in Reinforcement Learning (RL), and more specifically, in Multi-Armed Bandits (MABs), to allow networks to modify both their transmission power and channel based on their experienced throughput. In this work, we study the exploration-exploitation trade-off by means of theε-greedy, EXP3, UCB and Thompson sampling action-selection strategies. Our results show that optimal proportional fairness can be achieved, even if no information about neighboring networks is available to the learners and WNs operate selfishly. However, there is high temporal variability in the throughput experienced by the individual networks, specially forε-greedy and EXP3. We identify the cause of this variability to be the adversarial setting of our setup in which the set of most played actions provide intermittent good/poor performance depending on the neighboring decisions. We also show that this variability is reduced using UCB and Thompson sampling, which are parameter-free policies that perform exploration according to the reward distribution of each action.
https://doi.org/10.1016/j.adhoc.2019.01.006
Additional material:
- Article in arXiv:https://arxiv.org/abs/1710.11403
- Software in GitHub:https://github.com/wn-upf/Collaborative_SR_in_WNs_via_Selfish_MABs
- Dataset with results in Zenodo:https://doi.org/10.5281/zenodo.1036737
Fonseca, E.,Gong R.,Bogdanov D.,Slizovskaia O.,Gomez E., Serra X. Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. Workshop on Detection and Classification of Acoustic Scenes and Events
This work describes our contribution to the acoustic scene classification task of the DCASE 2017 challenge. We propose a system that consists of the ensemble of two methods of different nature: a feature engineering approach, where a collection of hand-crafted features is input to a Gradient Boosting Machine, and another approach based on learning representations from data, where log-scaled mel-spectrograms are input to a Convolutional Neural Network. This CNN is designed with multiple filter shapes in the first layer. We use a simple late fusion strategy to combine both methods. We report classification accuracy of each method alone and the ensemble system on the provided cross-validation setup of TUT Acoustic Scenes 2017 dataset. The proposed system outperforms each of its component methods and improves the provided baseline system by 8.2%.
Additional material:
- Support repository https://edufonseca.github.io/DCASE2017-Task1-ASC/
- Post-print available at UPF e-repository
Fonseca, E.,Pons J.,Favory X.,Font F.,Bogdanov D.,Ferraro A.,Oramas S.,Porter A.,Serra X. Freesound Datasets: A Platform for the Creation of Open Audio Datasets. 18th International Society for Music Information Retrieval Conference (ISMIR17)
Openly available datasets are a key factor in the advancement of data-driven research approaches, including many of the ones used in sound and music computing. In the last few years, quite a number of new audio datasets have been made available but there are still major shortcomings in many of them to have a significant research impact. Among the common shortcomings are the lack of transparency in their creation and the difficulty of making them completely open and sharable. They often do not include clear mechanisms to amend errors and many times they are not large enough for current machine learning needs. This paper introduces Freesound Datasets, an online platform for the collaborative creation of open audio datasets based on principles of transparency, openness, dynamic character, and sustainability. As a proof-of-concept, we present an early snapshot of a large-scale audio dataset built using this platform. It consists of audio samples from Freesound organised in a hierarchy based on the AudioSet Ontology. We believe that building and maintaining datasets following the outlined principles and using open tools and collaborative approaches like the ones presented here will have a significant impact in our research community.
Additional material:
- Freesound datasets web https://datasets.freesound.org/
- GitHubhttps://github.com/MTG/freesound-datasets/
- Open access version at UPF e-repository
Aragón P, Gómez V, García D, Kaltenbrunner A. Generative models of online discussion threads: state of the art and research challenges. Journal of Internet Services and Applications
Online discussion in form of written comments is a core component of many social media platforms. It has attracted increasing attention from academia, mainly because theories from social sciences can be explored at an unprecedented scale. This interest has led to the development of statistical models which are able to characterize the dynamics of threaded online conversations.
In this paper, we review research on statistical modeling of online discussions, in particular, we describe current generative models of the structure and growth of discussion threads. These are parametrized network formation models that are able to generate synthetic discussion threads that reproduce certain features of the real discussions present in different online platforms. We aim to provide a clear overview of the state of the art and to motivate future work in this relevant research field.
Hernández-Leo D, Rodriguez Triana MJ,Inventado PS,Mor Y. Connecting Learning Design and Learning Analytics. Interaction Design and Architecture(s) Journal
Learning Design (LD) and Learning Analytics (LA) are both domains of research and action that aim to improve learning effectiveness. The fact of sharing a common aim raises the potential of the synergies that may emerge between them. Indeed, there has been a growing interest and some initial effort in bringing them together. For example, learning design offers a representation of the learning activity, providing a contextualization framework for LA. Learning Analytics, in turn, supports the study of designs by gathering evidence during the learning process, and may inform future design decisions based on previous data. Despite their common goal, the collaboration between LD and LA is limited. Making these links operational and coherent is still an open challenge. This special issue brings together eight papers proposing approaches and discussing concerns that illustrate the benefits and challenges of the alignment between learning design and analytics.
Pezzatini D,Yagüe C, Rudenick P,Blat J, Bijnens B,Camara O. A Web-Based Tool for Cardiac Dyssynchrony Assessment on Ultrasound Data. Eurographics Workshop on Visual Computing for Biology and Medicine
Cardiac resynchronization therapy (CRT) is a broadly used therapy in patients that suffers from heart failure (HF). The positive outcome of CRT depends strongly on the parameters criteria used to select patients and a lot of research has been done to introduce new and more reliable parameters. In this paper we propose an interactive tool to perform visual assessment and measurements on cardiac ultrasound images of patient with cardiac dyssynchrony. The tool is developed as a web application, allowing doctors to remotely access images and measurements.
Manathunga K, Hernández-Leo D. Authoring and enactment of mobile pyramid-based collaborative learning activities. BritishJournalof EducationalTechnology
Collaborative learning flow patterns (CLFPs) formulate best practices for the orchestration of activity sequences and collaboration mechanisms that can elicit fruitful social interactions. Mobile technology features offer opportunities to support interaction mediation and content accessibility. However, existing mobile collaborative learning research has mostly focussed on simple activity orchestrations from the perspective of collaborative flow orchestration and flexibility requirements, predominantly in face-to-face pre-university educational contexts. This paper proposes a particularisation of the Pyramid CLFP to support flexible face-to-face and distance mobile learning scenarios in which learners interact in increasingly larger groups along a sequence of activities (Pyramid levels). PyramidApp implements this Pyramid particularisation that provides both a web-based authoring tool and an enactment tool accessible through web or mobile devices. The authoring tool was evaluated in workshops where teachers appreciated its design and applicability to their educational contexts. PyramidApp flows were enacted in three higher education settings. Learners enjoyed the activities but usage and satisfaction varied depending on several design and contextual factors like the epistemic tasks given, the education level and application mode (face-to-face or distance).
DO:doi:10.1111/bjet.12588
Additional material:
- PyramidApp is available as an open source project in GitHub (Manathunga, Abenia, & Hernández-Leo,2017), link in GitHub. Due to privacy issues, experiment data will not be made publicly available. But, an electronic version of anonymized data will be made available and shared with interested researchers under an agreement for data access (contact:[email protected]).
- Open access version at UPF e-repository
Abura’ed A, Chiruzzo L, Saggion H, Accuosto P, Bravo A. LaSTUS/TALN @ CLSciSumm-17: Cross-document Sentence Matching and Scientific Text Summarization Systems. Proceedings of the Second Joint Workshop on Bibliometric Enhanced Information Retrieval and Natural Language Processing for Digital Libraries
In recent years there has been an increasing interest in approaches to scientific summarization that take advantage of the citations a research paper has received in order to extract its main contributions. In this context, the CL-SciSumm 2017 Shared Task has been proposed to address citation-based information extraction and summarization. In this paper we present several systems to address three of the CL-SciSumm tasks. Notably, unsupervised systems to match citing and cited sentences (Task 1A), a supervised approach to identify the type of information being cited (Task 1B), and a supervised citation-based summarizer (Task 2).
Additional material:
- Poster
- Publication availale in open access at the web of the conference
- Postprint at UPF e-repository
Gerber N, Reyes M, Barazzetti L, Kjer HM, Vera S,Stauber M,Mistrik P,Ceresa M,Mangado N,Wimmer S,Stark T,Paulsen RR,Weber S,Caversaccio M, González Ballester MA.A multiscale imaging and modelling dataset of the human inner ear.Nature Scientific Data.
Understanding the human inner ear anatomy and its internal structures is paramount to advance hearing implant technology. While the emergence of imaging devices allowed researchers to improve understanding of intracochlear structures, the difficulties to collect appropriate data has resulted in studies conducted with few samples. To assist the cochlear research community, a large collection of human temporal bone images is being made available. This data descriptor, therefore, describes a rich set of image volumes acquired using cone beam computed tomography and micro-CT modalities, accompanied by manual delineations of the cochlea and sub-compartments, a statistical shape model encoding its anatomical variability, and data for electrode insertion and electrical simulations. This data makes an important asset for future studies in need of high-resolution data and related statistical data objects of the cochlea used to leverage scientific hypotheses. It is of relevance to anatomists, audiologists, computer scientists in the different domains of image analysis, computer simulations, imaging formation, and for biomedical engineers designing new strategies for cochlear implantations, electrode design, and others.
Barbieri F, Espinosa-Anke L, Ballesteros M, Soler J, Saggion H. Towards the Understanding of Gaming Audiences by Modeling Twitch Emotes. Proceedings of the 3rd Workshop on Noisy User-generated Text, ACL
Videogame streaming platforms have become a paramount example of noisy user-generated text. These are websites where gaming is broadcasted, and allows interaction with viewers via integrated chatrooms. Probably the best known platform of this kind is Twitch, which has more than 100 million monthly viewers. Despite these numbers, and unlike other platforms featuring short messages (e.g. Twitter), Twitch has not received much attention from the Natural Language Processing community. In this paper we aim at bridging this gap by proposing two important tasks specific to the Twitch platform, namely (1) Emote prediction; and (2) Trolling detection. In our experiments, we evaluate three models: a BOW baseline, a logistic supervised classifiers based on word embeddings, and a bidirectional long short-term memory recurrent neural network (LSTM). Our results show that the LSTM model outperforms the other two models, where explicit features with proven effectiveness for similar tasks were encoded.
Additional material:
- Best Paper Award, W-NUT 2017
- UPF press release
- Slides
- Open access version at UPF e-repository
Rullo A, Serra E, Bertino E, Lobo J. Shortfall-Based Optimal Placement of Security Resources for Mobile IoT Scenarios. Computer Security – ESORICS 2017. Lecture Notes in Computer Science, vol 10493.
We present a method for computing the best provisioning of security resources for Internet of Things (IoT) scenarios characterized by a high degree of mobility. The security infrastructure is specified by a security resource allocation plan computed as the solution of an optimization problem that minimizes the risk of having IoT devices not monitored by any resource. Due the mobile nature of IoT devices, a probabilistic framework for modeling such scenarios is adopted. We adapt the concept ofshortfallfrom economics as a risk measure and show how to compute and evaluate the quality of an allocation plan. The proposed approach fits well with applications such as vehicular networks, mobile ad-hoc networks, smart cities, or any IoT environment characterized by mobile devices that needs a monitoring infrastructure.
Additional material:
- Dataset-Syslog, SNMP, and tcpdump data for 5 years or more from wireless network at Dartmouth College
Dias, GM, Bellalta B, Oechsner S. The impact of dual prediction schemes on the reduction of the number of transmissions in sensor networks.Computer Communications.
Future Internet of Things (IoT) applications will require that billions of wireless devices transmit data to the cloud frequently. However, the wireless medium access is pointed as a problem for the next generations of wireless networks; hence, the number of data transmissions in Wireless Sensor Networks (WSNs) can quickly become a bottleneck, disrupting the exponential growth in the number of interconnected devices, sensors, and amount of produced data. Therefore, keeping a low number of data transmissions is critical to incorporate new sensor nodes and measure a great variety of parameters in future generations of WSNs. Thanks to the high accuracy and low complexity of state-of-the-art forecasting algorithms, Dual Prediction Schemes (DPSs) are potential candidates to optimize the data transmissions in WSNs at the finest level because they facilitate for sensor nodes to avoid unnecessary transmissions without affecting the quality of their measurements. In this work, we present a sensor network model that uses statistical theorems to describe the expected impact of DPSs and data aggregation in WSNs. We aim to provide a foundation for future works by characterizing the theoretical gains of processing data in sensors and conditioning its transmission to the predictions’ accuracy. Our simulation results show that the number of transmissions can be reduced by almost 98% in the sensor nodes with the highest workload. We also detail the impact of predicting and aggregating transmissions according to the parameters that can be observed in common scenarios, such as sensor nodes’ transmission ranges, the correlation between measurements of different sensors, and the period between two consecutive measurements in a sensor.
Additional material:
Dominguez M, Farrus M, Wanner L. A Thematicity-based Prosody Enrichment Tool for CTS. Interspeech 2017.
This paper presents a demonstration of a stochastic prosody tool for enrichment of synthesized speech using SSML prosody tags applied over hierarchical thematicity spans in the context of a CTS application. The motivation for using hierarchical thematicity is exemplified, together with the capabilities of the module to generate a variety of prosody control tags within a controlled range of values depending on the input thematicity label.
Additional material:
Mangado N, Ceresa M, Benav H, Mistrik P, Piella G, González Ballester MA. Towards a Complete In Silico Assessment of the Outcome of Cochlear Implantation Surgery. Mol Neurobiol (2017)
Cochlear implantation (CI) surgery is a very successful technique, performed on more than 300,000 people worldwide. However, since the challenge resides in obtaining an accurate surgical planning, computational models are considered to provide such accurate tools. They allow us to plan and simulate beforehand surgical procedures in order to maximally optimize surgery outcomes, and consequently provide valuable information to guide pre-operative decisions. The aim of this work is to develop and validate computational tools to completely assess the patient-specific functional outcome of the CI surgery. A complete automatic framework was developed to create and assess computationally CI models, focusing on the neural response of the auditory nerve fibers (ANF) induced by the electrical stimulation of the implant. The framework was applied to evaluate the effects of ANF degeneration and electrode intra-cochlear position on nerve activation. Results indicate that the intra-cochlear positioning of the electrode has a strong effect on the global performance of the CI. Lateral insertion provides better neural responses in case of peripheral process degeneration, and it is recommended, together with optimized intensity levels, in order to preserve the internal structures. Overall, the developed automatic framework provides an insight into the global performance of the implant in a patient-specific way. This enables to further optimize the functional performance and helps to select the best CI configuration and treatment strategy for a given patient.
Additional material:
Pons J,Slizovskaia O,Gong R,Gómez E, Serra X. Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. 25th European Signal Processing Conference (EUSIPCO)
The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms. We first review the trends when designing CNN architectures. Through this literature overview we discuss which are the crucial points to consider for efficiently learning timbre representations using CNNs. From this discussion we propose a design strategy meant to capture the relevant time-frequency contexts for learning timbre, which permits using domain knowledge for designing architectures. In addition, one of our main goals is to design efficient CNN architectures – what reduces the risk of these models to over-fit, since CNNs’ number of parameters is minimized. Several architectures based on the design principles we propose are successfully assessed for different research tasks related to timbre: singing voice phoneme classification, musical instrument recognition and music auto-tagging.
Additional material:
- Postprint in arXiv
- Code.The code to reproduce each of the experiments is available online:
- Phoneme classification of Jingu singing:github.com/ronggong/EUSIPCO2017
- Musical instrument recognition:github.com/Veleslavia/EUSIPCO2017
- Music auto-tagging:github.com/jordipons/EUSIPCO2017
- Datasets.This work was possible because several benchmarks/datasets are available for research purposes:
- Jingju a cappella singing dataset:github.com/MTG/jingjuPhonemeAnnotation
- IRMAS, a dataset for instrument recognition in musical audio signals:mtg.upf.edu/download/datasets/irmas
- MagnaTagATune dataset:mirg.city.ac.uk/codeapps/the-magnatagatune-datasetandgithub.com/keunwoochoi/magnatagatune-list
- Presentation slides in Zenodo
Oramas, S,Nieto O,Barbieri F, Serra X. Multi-label Music Genre Classification from Audio, Text and Images Using Deep Features. 18th International Society for Music Information Retrieval Conference (ISMIR 2017)
Music genres allow to categorize musical items that share common characteristics. Although these categories are not mutually exclusive, most related research is traditionally focused on classifying tracks into a single class. Furthermore, these categories (e.g., Pop, Rock) tend to be too broad for certain applications. In this work we aim to expand this task by categorizing musical items into multiple and fine-grained labels, using three different data modalities: audio, text, and images. To this end we present MuMu, a new dataset of more than 31k albums classified into 250 genre classes. For every album we have collected the cover image, text reviews, and audio tracks. Additionally, we propose an approach for multi-label genre classification based on the combination of feature embeddings learned with state-of-the-art deep learning methodologies. Experiments show major differences between modalities, which not only introduce new baselines for multi-label genre classification, but also suggest that combining them yields improved results.
Additional material:
MuMu isa Multimodal Music dataset with multi-label genre annotations that combines information from theAmazon Reviews datasetand theMillion Song Dataset (MSD). The former contains millions of album customer reviews and album metadata gathered from Amazon.com. The latter is a collection of metadata and precomputed audio features for a million songs.
To map the information from both datasets we useMusicBrainz.
Tartarus is a python module for Deep Learning experiments on Audio and Text and their combination
Zinemanas P, Arias P,Haro G, Gómez E.Visual music transcription of clarinet video recordings trained with audio-based labelled data.ICCV 2017Workshop on Computer Vision for Audio-Visual Media (CVAVM)
Abstract
Automatic transcription is a well-known task in the music information retrieval (MIR) domain, and consists on the computation of a symbolic music representation (e.g. MIDI) from an audio recording. In this work, we address the automatic transcription of video recordings when the audio modality is missing or it does not have enough quality, and thus analyze the visual information. We focus on the clarinet which is played by opening/closing a set of holes and keys. We propose a method for automatic visual note estimation by detecting the fingertips of the player and measuring their displacement with respect to the holes and keys of the clarinet. To this aim, we track the clarinet and determine its position on every frame. The relative positions of the fingertips are used as features of a machine learning algorithm trained for note pitch classification. For that purpose, a dataset is built in a semiautomatic way by estimating pitch information from audio signals in an existing collection of 4.5 hours of video recordings from six different songs performed by nine different players. Our results confirm the difficulty of performing visual vs audio automatic transcription mainly due to motion blur and occlusions that cannot be solved with a single view.
Additional material:
- Postprint in Zenodo
- Final version of the video (corrects typographics errors in the video shown below)
Leiva V, Freire A. Towards suicide prevention: early detection of depression on social media. 4th International Conference on Internet Science (INSCI 2017)
The statistics presented by the World Health Organization inform that 90% of the suicides can be attributed to mental illnesses in high-income countries. Besides, previous studies concluded that people with mental illnesses tend to reveal their mental condition on social media, as a way of relief. Thus, the main objective of this work is the analysis of the messages that a user posts online, sequentially through a time period, and detect as soon as possible if this user is at risk of depression. This paper is a preliminary attempt to minimize measures that penalize the delay in detecting positive cases. Our experiments underline the importance of an exhaustive sentiment analysis and a combination of learning algorithms to detect early symptoms of depression
Addtional material:
Asensio-Pérez, J. I., Dimitriadis, Y., Pozzi, F., Hernández-Leo, D., Prieto, L. P., Persico, D., & Villagrá-Sobrino, S. L. Towards teaching as design: Exploring the interplay between full-lifecycle learning design tooling and Teacher Professional Development. Computers & Education.
Recent research suggests that training teachers as learning designers helps promote technology enhanced educational innovations. However, little attention has been paid so far to the interplay between the effectiveness of Teacher Professional Development (TPD) instructional models promoting the role of teachers as designers and the capabilities (and pitfalls) of the heterogeneous landscape of available Learning Design (LD) tooling employed to support such TPD. This paper describes a mixed method study that explores the use of a novel Integrated Learning Design Environment (ILDE) for supporting a TPD program on Information and Communication Technologies (ICT) and Collaborative Learning (CL). 36 Adult Education (AE) and Higher Education (HE) in-service teachers, with little experience on both CL and ICT integration, participated in a study encompassing training workshops and follow-up fulllifecycle learning design processes (from initial conceptualization to implementation with a total of 176 students). The findings from our interpretive study showcase the benefits (and required effort) derived from the use of an integrated platform that guides teachers along the main phases of the learning design process, and that automates certain technological setup tasks needed for the classroom enactment. The study also highlights the need for adaptation of the TPD instructional model to the learning curve associated to the LD tooling, and explores its impact on the attitude of teachers towards future adoption of LD practices.
DOI: https://doi.org/10.1016/j.compedu.2017.06.011
Additional material:
Aragón P, Gómez V, Kaltenbrunner A. Detecting Platform Effects in Online Discussions. Policy & Internet, 2017
Online discussions are the essence of many social platforms on the Internet. These platforms are receiving increasing interest because of their potential to become deliberative spaces. Many studies have proposed approaches to measure online deliberation and to evaluate which are the best design principles for deliberative online platforms. However, little research has focused on how deliberation in online platforms is affected by the arrival of events like the emergence of new topics or the modification of platform features. In this article we present a methodology to detect events that affect online deliberation in online discussions. Our results on Menéame, the most popular Spanish social news site, show that a change in how discussions are shown to the user, from a linear to a hierarchical conversation view, significantly enhanced deliberation. In particular we observe that this type of interface induced argumentative structures of online discussion.
Additional material:
- General description at Pablo Aragón's blog
Amarasinghe I, Hernández-Leo D, Jonsson A. Intelligent group formation in computer supported collaborative learning scripts. IEEE International Conference on Advanced Learning Technologies (ICALT2017)
Well-structured collaborative learning groups scripted based on Collaborative Learning Flow Patterns (CLFPs) often result in successful collaborative learning outcomes. Formulation of such learner groups based on instructor defined criteria promises potentially effective performance of participating students. However, forming student groups manually based on multiple criteria often fails due to its complexity and the time limitations of practitioners. Hence, an intelligent assistance which supports adaptive collaboration scripting based on instructor defined criteria, while adhering to CLFPs is presented. Constraint Optimization techniques have been used for learner group formation and preliminary tests revealed that the proposed approach could be utilized when formulating student groups while satisfying team formation criteria
Additional material:
AbuRa'ed A, Saggion H, Chiruzzo L. What Sentence are you Referring to and Why? Identifying Cited Sentences in Scientific Literature. Recent Advances in Natural Languages Processing (RANLP2017)
Details to be added
Ferrés D, Saggion H, Gómez Guinovart X. An Adaptable Lexical Simplification Architecture for Major Ibero-Romance Languages. Proceedings of the Building Linguistically Generalizable NLP Systems Workshop of EMNLP 2017 (accepted)
Lexical Simplification is the task of reducing the lexical complexity of textual documents by replacing difficult words with easier to read (or understand) expressions while preserving the original meaning. The development of robust pipelined multilingual architectures able to adapt to newlanguagesisofparamountimportance in lexical simplification. This paper describes and evaluates a modular hybrid linguistic-statisticalLexicalSimplifierthat deals with the four major Ibero-Romance Languages: Spanish, Portuguese, Catalan, and Galician. The architecture of the system is the same for the four languages addressed, only the language resourcesusedduringsimplificationarelanguage specific.
Additional material:
- Open Access postprint at UPF e-repository
- Links to datasets and software available in the publication
Link: http://exemple.com
Amarasinghe I, Hernandez-Leo D, Jonsson A. Learner Group Formation Using Constraint Optimization. ACM Celebration of Women in Computing (womENcourage2017)
Learner group formation manually based on multiple criteria often fails due to its complexity and time limitations of practitioners. This paper presents a novel binary integer programming approach which models learner group formation problem based on Jigsaw Collaborative Learning Flow Paern (CLFP) using constraint optimization techniques. Suggested approach supports adaptive collaboration and provides flexibility when reformulating learner groups.
Miron M, Janer J, Gomez E. Monaural score-informed source separation for classical music using convolutional neural networks. ISMIR 2017
Score information has been shown to improve music source separation when included into non-negative matrix factorization (NMF) frameworks. Recently, deep learning approaches have outperformed NMF methods in terms of separation quality and processing time, and there is scope to extend them with score information. In this paper, we propose a score-informed separation system for classical music that is based on deep learning. We propose a method to derive training features from audio files and the corresponding coarsely aligned scores for a set of classical music pieces. Additionally, we introduce a convolutional neural network architecture (CNN) with the goal of estimating time-frequency masks for source separation. Our system is trained with synthetic renditions derived from the original scores and can be used to separate real-life performances based on the same scores, provided a coarse audio-to-score alignment. The proposed system achieves better performance (SDR and SIR) and is less computationally intensive than a score-informed NMF system on a dataset comprising Bach chorales.
Additional material:
-
The code is available through agithubrepository.
-
We test our method with a well known classical music dataset, Bach10, which can be foundonline.
-
The training data is generated from the audio samples in the RWC instrument samples dataset with the code in the github repository.
-
The separated tracks, the CNN trained model and the .mat files corresponding to the results in terms of SDR,SIR,SAR can be found at thezenodo repository.
Miron M, Janer J, Gomez E. Generating data to train convolutional neural networks for low latency classical music source separation. Proceedings of the 14th Sound and Music Computing Conference
Deep learning approaches have become increasingly popular in estimating time-frequency masks for audio source separation. However, training neural networks usually requires a considerable amount of data. Music data is scarce, particularly for the task of classical music source separation, where we need multi-track recordings with isolated instruments. In this work, we depart from the assumption that all the renditions of a piece are based on the same musical score, and we can generate multiple renditions of the score by synthesizing it with different performance properties, e.g. tempo, dynamics, timbre and local timing variations. We then use this data to train a convolutional neural network (CNN) which can separate with low latency all the renditions of a score or a set of scores. The trained model is tested on real life recordings and is able to effectively separate the corresponding sources. This work follows the principle of research reproducibility, providing related data and code, and can be extended to separate other pieces.
Additional material:
- Post-print at UPF repository
- Slides
- Code available on the source separation github repository DeepConvSep
Francès, G., Ramírez, M., Lipovetzky, N. and Geffner, H. Purely Declarative Action Representations are Overrated: Classical Planning with Simulators. In Proc. of the 26th Int. Joint Conf. on Artificial Intelligence (IJCAI 2017)
Classical planning is concerned with problems where a goal needs to be reached from a known initial state by doing actions with deterministic, known effects. Classical planners, however, deal only with classical problems that can be expressed in declarative planning languages such as STRIPS or PDDL. This prevents their use on problems that are not easy to model declaratively or whose dynamics are given via simulations. Simulators do not provide a declarative representation of actions, but simply return successor states. The question we address in this paper is: can a planner that has access to the structure of states and goals only, approach the performance of planners that also have access to the structure of actions expressed in PDDL? To answer this, we develop domain-independent, black box planning algorithms that completely ignore action structure, and show that they match the performance of state-of-the-art classical planners on the standard planning benchmarks. Effective black box algorithms open up new possibilities for modeling and for expressing control knowledge, which we also illustrate.
Additional material:
- Problem encodings and results at Github
- OA publication at the author's web
Saggion H, Ronzano F, Accuosto P, Ferrés D. MultiScien: a Bi-Lingual Natural Language Processing System for Mining and Enrichment of Scientific Collections. 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017), at SIGIR 2017
In the current online Open Science context, scientific datasets and tools for deep text analysis, visualization and exploitation play a major role. We present a system for deep analysis and annotation of scientific text collections. We also introduce the first version of the SEPLN Anthology, a bi-lingual (Spanish and English) fully annotated text resource in the field of natural language processing that we created with our system. Moreover, a faceted-search and visualization system to explore the created resource is introduced. All resources created for this paper will be available to the research community.
Additional material:
H. Gallego, D. Laniado, A. Kaltenbrunner, V. Gomez, and P. Aragon (2017). Lost in re-election: a tale of two Spanish online campaigns. SocInfo ’17 – The 9th International Conference on Social Informatics, Oxford, United Kingdom.
In the 2010 decade, Spanish politics have transitioned from bipartidism to multipartidism. This change led to an unstable situation which eventually led to the rare scenario of two general elections within six months. The two elections had a mayor difference: two important left-wing parties formed a coalition in the second election while they had run separately in the first one. In the second election and after merging, the coalition lost around 1M votes, contradicting opinion polls. In this study, we perform community analysis of the retweet networks of the online campaigns to assess whether activity in Twitter reflects the outcome of both elections. The results show that the left-wing parties lost more online supporters than the other parties. Furthermore, we find that Twitter activity of the supporters unveils a decrease in engagement especially marked for the smaller party in the coalition, in line with post-electoral traditional polls.
Keywords: Twitter, Politics, Political Parties, Spanish Elections, Online Campaigning, Political Coalition, Engagement, Political Participation
Additional material:
Accuosto P, Ronzano F, Ferrés D, Saggion H. Multi-level mining and visualization of scientific text collections. 6th International Workshop on mining scientific publications, Proceedings of The 6st International Workshop on Mining Scientific Publications. Joint Conference on Digital Libraries, Toronto, Canada, June 2017 (JCDL’17).
We present a system to mine and visualize collections of scientific documents by semantically browsing information extracted from single publications or aggregated throughout corpora of articles. The text mining tool performs deep analysis of document collections allowing the extraction and interpretation of research paper’s contents. In addition to the extraction and enrichment of documents with metadata (titles, authors, affiliations, etc), the deep analysis performed comprises semantic interpretation, rhetorical analysis of sentences, triple-based information extraction, and text summarization. The visualization components allow geographicalbased exploration of collections, topic-evolution interpretation, and collaborative network analysis among others. The paper presents a case study of a bilingual collection in the field of Natural Language Processing (NLP)
Additional details:
Segovia Aguas J, Jiménez S, Jonsson A. Generating context-free grammars using classical planning. IJCAI International Joint Conference on Artificial Intelligence; 2017 Aug 19-25; Melbourne, Australia.
This paper presents a novel approach for generating Context-Free Grammars (CFGs) from small sets of input strings (a single input string in some cases). Our approach is to compile this task into a classical planning problem whose solutions are sequences of actions that build and validate a CFG compliant with the input strings. In addition, we show that our compilation is suitable for implementing the two canonical tasks for CFGs, string production and string recognition
Additional material.
- Post-print at UPF repository
Segovia-Aguas, J, Jiménez, S, Jonsson, A. Unsupervised Classification of Planning Instances. Proceedings of the 27th International Conference on Automated Planning and Scheduling (ICAPS'17)
In this paper we introduce a novel approach for unsupervised classification of planning instances based on the recent formalism of planning programs. Our approach is inspired by structured prediction in machine learning, which aims at predicting structured information about a given input rather than a scalar value. In our case, each input is an unlabelled classical planning instance, and the associated structured information is the planning program that solves the instance. We describe a method that takes as input a set of planning instances and outputs a set of planning programs, classifying each instance according to the program that solves it. Our results show that automated planning can be successfully used to solve structured unsupervised classification tasks, and invites further exploration of the connection between automated planning and structured prediction.
Additional material:
- Post-print at UPF e-repository
- Final version in Open Acces here
-
The source code and benchmarks are available at
https://github.com/aig-upf/automated-programming-framework
Espinosa-Anke, L., Oramas S., Saggion H., & Serra X. ELMDist: A vector space model with words and MusicBrainz entities. Workshop on Semantic Deep Learning (SemDeep), collocated with ESWC 2017
Music consumption habits as well as the Music market have changed dramatically due to the increasing popularity of digital audio and streaming services. Today, users are closer than ever to a vast number of songs, albums, artists and bands. However, the challenge remains in how to make sense of all the data available in the Music domain, and how current state of the art in Natural Language Processing and semantic technologies can contribute in Music Information Retrieval areas such as music recommendation, artist similarity or automatic playlist generation. In this paper, we present an evaluate a distributional sense-based embeddings model in the music domain, which can be easily used for these tasks, as well as a device for improving artist or album clustering. The model is trained on a disambiguated corpus linked to the MusicBrainz musical Knowledge Base with an estimated precision of above 0.9, and following current knowledge-based approaches to sense-level embeddings, entity-related vectors are provided a` la WordNet, concatenating the id of the entity and its mention (in WordNet lingo, the entity’s synset and sense). The model is evaluated both intrinsically and extrinsically in a supervised entity typing task, and released for the use and scrutiny of the community.
Additional material:
ELMDist - sense-level embeddings model in the music domain, trained on a music-specific corpus of artist biographies, where musical entities have been automatically annotated with high precision against the musical KB MusicBrainz (MB).
The pretrained sense-level word2vec model against MusicBrainz can be downloaded from: http://mtg.upf.edu/system/files/projectsweb/elmdist_vectors.zip
If you want to retrain the vectors, ELMD 2.0 can be downloaded from here:
http://mtg.upf.edu/download/datasets/elmd
And you can train the model runing train_word2vec.py
- Link to the workshop, including open access to the publication
Oramas S., Nieto O., Sordo M., & Serra X. (2017) A Deep Multimodal Approach for Cold-start Music Recommendation.
An increasing amount of digital music is being published daily. Music streaming services oen ingest all available music, but this poses a challenge: how to recommend new artists for which prior knowledge is scarce? In this work we aim to address this so-called cold-start problem by combining text and audio information with user feedback data using deep network architectures. Our method is divided into three steps. First, artist embeddings are learned from biographies by combining semantics, text features, and aggregated usage data. Second, track embeddings are learned from the audio signal and available feedback data. Finally, artist and track embeddings are combined in a multimodal network. Results suggest that both spliing the recommendation problem between feature levels (i.e., artist metadata and audio track), and merging feature embeddings in a multimodal approach improve the accuracy of the recommendations.
Additional material:
- arXiv pre-print
- Source code at GitHub
- MSD-A dataset (dataset, data splits, feature embeddings and models)
The MSD-A is a dataset related to the Million Song Dataset (MSD). It is a collection of artist tags and biographies gathered from Last.fm for all the artists that have songs in the MSD.
DownloadsMSD-Taste triplets for artists and track ids
Tartarus: Library for deep learning experiments https://github.com/sergiooramas/tartarus
Aragón P,. Kaltenbrunner A., Calleja-López A., Pereira A., Monterde A., Barandiaran, X. & Gómez V. (2017). Deliberative Platform Design: The case study of online discussions in Decidim Barcelona. SocInfo ’17 – The 9th International Conference on Social Informatics, Oxford, United Kingdom.
With the irruption of ICTs and the crisis of political representation, many online platforms have been developed with the aim to improve participatory democratic processes. However, regarding platforms for online petitioning, previous research has not found examples of how to effectively introduce discussions, a crucial feature to promote deliberation. In this study we focus on the case of Decidim Barcelona, the online participatory-democracy platform launched by the City Council of Barcelona in which proposals can be discussed with an interface that combines threaded discussions and comment alignment with the proposal. This innovative approach allows to examine whether neutral, positive or negative comments are more likely to generate discussion cascades. The results reveal that, with this interface, comments marked as negatively aligned with the proposal were more likely to engage users in online discussions and, therefore, helped to promote deliberative decision making.
Additional material:
- Pre-print in arXiv: https://arxiv.org/abs/1707.06526
- Dataset https://github.com/elaragon/metadecidim
Toumanidou T, Noailly J, Ceresa M, Zhang C, López-Linares K, Macía I, González Ballester M.A. Patient-specific modeling of unruptured human abdominal aortic aneurysms using deformable hexahedral meshes.International Journal of Computer Assisted Radiology andSurgery, vol. 12, Suppl. 1 (CARS 2017)
Abdominal aortic aneurysm (AAA) disease is a pathological dilation of the aorta involving degeneration of the wall and can lead to aneurysmal rupture with a 90% mortality rate. Although the maximum transverse AAA diameter (DMAX) is the most commonly used predictor of rupture risk and warrants surgery for DMAX>5.5 cm, the reported rupture for smaller aneurysms is up to 23%. Instead, wall stresses obtained through finite element (FE) models that consider the heterogeneous and anisotropic behavior of the wall layers were suggested as a more accurate rupture predictor. Our goal is the generation of patient-specific volumetric FE meshes of unruptured AAA via open-source and automatic workflows aiming to address the following challenges: -Structured hexahedral meshes are required for mesh convergence in the radial direction -Thrombus and outer wall segmentation is challenging because of lack of contrast and fuzzy borders -The workflow should involve few if any manual operations -Ideally, open-source libraries would allow for adaptation of the workflow to any geometry
Manathunga K, Hernández-Leo D. Towards scalable collaborative learning flow pattern orchestratrion technologies. Paper presented at: 9th annual International Conference on Education and New Learning Technologies EDULEARN17; 2017 July 3-5; Barcelona, Spain.
Collaborative Learning Flow Patterns (CLFPs) structure learning flows to shape desired social interactions among learners leading to fruitful learning gains. It is worthwhile to study the possibilities of CLFP extensions to be applicable in large class contexts and also in Massive Open Online Courses (MOOCs) considering their dynamic, unpredictable nature. This study considers most commonly used patterns for the adaptability in such contexts from different dimensions like pedagogical interest, scalability and other related perspectives. As a result derived from the analysis, a collection of use cases is elaborated illustrating potential collaborative learning opportunities, design requirements, initial screen designs of such activities and expected functionality descriptions for novel CSCL orchestration technologies. One of these use cases is implemented in the PyramidApp tool.
Additional material:
Hernández-Leo D, Agostinho S, Beardsley M, Bennett S, Lockyer L. Helping teachers to think about their design problem: a pilot study to stimulate design thinking. 9th annual International Conference on Education and New Learning Technologies EDULEARN17, Barcelona, Spain.
Designing learning experiences for students is a key responsibility of teachers. This involves designing stimulating and engaging tasks, selecting and creating appropriate resources, and deciding how best to support students to successfully complete the tasks. This is a complex process in which many factors need to be considered. Learning design research and tooling is focused on how to support this teacher design work. Existing learning design tools support the authoring and sharing of learning activities, which - if represented computationally - can also be enacted in virtual learning environments. An important part of the learning design process is thinking about what it is that students are to learn. This then informs the design of the learning activities. However, research on how to support this early phase of the learning design process is scarce. Indeed, an emerging finding from research investigating teacher design practices is that teachers’ design work exhibits some characteristics synonymous with the broader field of design. Specifically, teachers formulate and work with a design problem. But, teachers generally don’t consider their work in terms of design. Thus there is scope to encourage and support design thinking in teachers along the whole learning design process, including in the initial phase of identifying a design problem. This paper reports on a pilot study where a learning design Problem Generation Tool was created, in the form of 20 stimulus questions, to generate deeper thinking about the design problem. The stimulus questions are based on 3 foci, which are to be considered in an iterative way to think about and generate the problem: Understand the nature of the design problem and your goals (e.g, What kind of problem is this? Why is this design being done?) Map your context (e.g., Who are the students? How will the course be taught? Who will teach in this course?), Plan your design approach (e.g., What preparation do you have to do? What is your initial plan or steps you will follow for your design process?) The tool was incorporated in the Integrated Learning Design Environment (ILDE), a community platform that integrates a number of learning design tools supporting conceptualization, authoring and implementation of learning activities. The Problem Generation Tool integrated in ILDE was used with eight participants, who were already familiar with ILDE, in a workshop setting in a postgraduate program at a local University in Barcelona, Spain. Participants had between one and five or more years of teaching experience. Results showed that participants found the Problem Generation Tool helpful. The level of perceived usefulness by question varied across participants, while a few questions were not sufficiently clear and need to be revised. Overall, there was evident elaboration of the participants’ design problems thus suggesting design thinking was stimulated and identification of the design problems scaffolded.
Additional material:
Pasarella E, Lobo J. A Datalog Framework for Modeling Relationship-based Access Control Policies. 22nd ACM on Symposium on Access Control Models and Technologies
Relationships like friendship to limit access to resources have been part of social network applications since their beginnings. Describing access control policies in terms of relationships is not particular to social networks and it arises naturally in many situations. Hence, we have recently seen several proposals formalizing different Relationship-based Access Control (ReBAC) models. In this paper, we introduce a class of Datalog programs suitable for modeling ReBAC and argue that this class of programs, that we called ReBAC Datalog policies, provides a very general framework to specify and implement ReBAC policies. To support our claim, we first formalize the merging of two recent proposals for modeling ReBAC, one based on hybrid logic and the other one based on path regular expressions. We present extensions to handle negative authorizations and temporal policies. We describe mechanism for policy analysis, and then discuss the feasibility of using Datalog-based systems as implementations.
Wilhelmi F, Bellalta B, Cano C, Jonsson A. Implications of Decentralized Q-learning Resource Allocation in Wireless Networks. arXiv pre-print.
Reinforcement Learning is gaining attention by the wireless networking community due to its potential to learn good-performing configurations only from the observed results. In this work we propose a stateless variation of Q-learning, which we apply to exploit spatial reuse in a wireless network. In particular, we allow networks to modify both their transmission power an dthe channel used solely based on the experienced throughput. We concentrate in a completely decentralized scenario in which no information about neighbouring nodes is available to the learners. Our results show that although the algorithm is able to find the best-performing actions to enhance aggregate throughput, there is high variability in the throughput experienced by the individual networks. We identify the cause of this variability as the adversarial setting of our setup, in which the most played actions provide intermittent good/poor performance depending on the neighbouring decisions. We also evaluate the effect of the intrinsic learning parameters of the algorithm on this variability
Additional material:
- Code for simulation (GitHub, commit: eb4042a1830c8ea30b7eae3d72a51afe765a8d86)
- Open access version at UPF repository and arXiv pre-print
Barrachina-Muñoz S, Bellalta B. Learning Optimal Routing for the Uplink in LPWANs Using Similarity-enhanced epsilon-greedyBarrachina-Muñoz S, Bellalta B. Learning Optimal Routing for the Uplink in LPWANs Using Similarity-enhanced epsilon-greedy. 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC)
Despite being a relatively new communication technology, Low-Power Wide Area Networks (LPWANs) have shown their suitability to empower a major part of Internet of Things applications. Nonetheless, most LPWAN solutions are built on star topology (or single-hop) networks, often causing lifetime shortening in stations located far from the gateway. In this respect, recent studies show that multi-hop routing for uplink communications can reduce LPWANs' energy consumption significantly. However, it is a troublesome task to identify such energetically optimal routings through trial-and-error brute-force approaches because of time and, especially, energy consumption constraints. In this work we show the benefits of facing this exploration/exploitation problem by running centralized variations of the multi-arm bandit's epsilon-greedy, a well-known online decision-making method that combines best known action selection and knowledge expansion. Important energy savings are achieved when proper randomness parameters are set, which are often improved when conveniently applying similarity, a concept introduced in this work that allows harnessing the gathered knowledge by sporadically selecting unexplored routing combinations akin to the best known one.
https://doi.org/10.1109/PIMRC.2017.8292373
Additional material:
- arXiv preprint
- DRESG framework for LPWANs (GitHub)
Albó L, Gelpí C. From a FutureLearn MOOC to a blended SPOC: the experience of a Catalan Sign Language course. HybridEd Workshop. Innovations in blended learning with MOOCs; 2017.
This paper presents a case study of transforming an existing MOOC into a SPOC for being used in a campus course using a blended learning ap-proach with the aim of providing a reflection of the experience and reporting the challenges of the hybridization process. Results point out that blended learn-ing with MOOCs can be a sustainable model for universities as well as a trigger to the change from teacher-centred to student-centred learning.
Additional material:
Albó L, Hernández-Leo D. Breaking the walls of a campus summer course for high school students with two MOOCs. HybridEd Workshop. Innovations in blended learning with MOOCs; 2017.
This paper presents a case study of integrating two external MOOCs in a face-to-face (f2f) summer course for high school students. The aim of the study is to explore the design challenges emerged from this blended learning ap-proach, the students’ learning outcomes and satisfaction with the course content as well as investigating the students’ behavior with the MOOCs once the f2f course ended. Results indicate that students learned through the course and were satisfied with the learning design. Moreover, some of them took advantage of the MOOCs once the campus course finished.
Additional material:
- Postprint at UPF e-repository
- Slides
- Workshop details at GTI group web
Derkach D, Sukno FM. Local Shape Spectrum Analysis for 3D Facial Expression Recognition. Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, in press
We investigate the problem of facial expression recognition using 3D data. Building from one of the most successful frameworks for facial analysis using exclusively 3D geometry, we extend the analysis from a curve-based representation into a spectral representation, which allows a complete description of the underlying surface that can be further tuned to the desired level of detail. Spectral representations are based on the decomposition of the geometry in its spatial frequency components, much like a Fourier transform, which are related to intrinsic characteristics of the surface. In this work, we propose the use of Graph Laplacian Features (GLF), which results from the projection of local surface patches into a common basis obtained from the Graph Laplacian eigenspace. We test the proposed approach in the BU-3DFE database in terms of expressions and Action Units recognition. Our results confirm that the proposed GLF produces consistently higher recognition rates than the curves-based approach, thanks to a more complete description of the surface, while requiring a lower computational complexity. We also show that the GLF outperform the most popular alternative approach for spectral representation, Shape-DNA, which is based on the Laplace Beltrami Operator and cannot provide a stable basis that guarantee that the extracted signatures for the different patches are directly comparable.
Additional material:
Fernandez-Lopez A, Martinez O, Sukno FM. Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database. Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, in press.
Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information that is complementary to the audio. Exploiting the visual information, however, has proven challenging. On one hand, researchers have reported that the mapping between phonemes and visemes (visual units) is one-to-many because there are phonemes which are visually similar and indistinguishable between them. On the other hand, it is known that some people are very good lip-readers (e.g: deaf people). We study the limit of visual only speech recognition in controlled conditions. With this goal, we designed a new database in which the speakers are aware of being read and aim to facilitate lip-reading. In the literature, there are discrepancies on whether hearing-impaired people are better lip-readers than normal-hearing people. Then, we analyze if there are differences between the lip-reading abilities of 9 hearing-impaired and 15 normal-hearing people. Finally, human abilities are compared with the performance of a visual automatic speech recognition system. In our tests, hearing-impaired participants outperformed the normal-hearing participants but without reaching statistical significance. Human observers were able to decode 44% of the spoken message. In contrast, the visual only automatic system achieved 20% of word recognition rate. However, if we repeat the comparison in terms of phonemes both obtained very similar recognition rates, just above 50%. This suggests that the gap between human lip-reading and automatic speech-reading might be more related to the use of context than to the ability to interpret mouth appearance.
Additional material:
Fernandez-Lopez A, Sukno FM. Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading. 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017)
Speech is the most common communication method between humans and involves the perception of both auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, but it has been demonstrated that video can provide information that is complementary to the audio. Thus, the study of automatic lip-reading is important and is still an open problem. One of the key challenges is the definition of the visual elementary units (the visemes) and their vocabulary. Many researchers have analyzed the importance of the phoneme to viseme mapping and have proposed viseme vocabularies with lengths between 11 and 15 visemes. These viseme vocabularies have usually been manually defined by their linguistic properties and in some cases using decision trees or clustering techniques. In this work, we focus on the automatic construction of an optimal viseme vocabulary based on the association of phonemes with similar appearance. To this end, we construct an automatic system that uses local appearance descriptors to extract the main characteristics of the mouth region and HMMs to model the statistic relations of both viseme and phoneme sequences. To compare the performance of the system different descriptors (PCA, DCT and SIFT) are analyzed. We test our system in a Spanish corpus of continuous speech. Our results indicate that we are able to recognize approximately 58% of the visemes, 47% of the phonemes and 23% of the words in a continuous speech scenario and that the optimal viseme vocabulary for Spanish is composed by 20 visemes.
Additional material:
- Postprint at UPF e-repository and arXiv
- Datasets
- AV@CAR: free multichannel multimodal database for automatic audio-visual speech recognition, including both studio and in-car recordings
Slizovskaia O, Gómez E, Haro G. Musical Instrument Recognition in User-generated Videos using a Multimodal Convolutional Neural Network Architecture. ACM International Conference on Multimedia Retrieval (ICMR 2017)
This paper presents a method for recognising musical instruments in user-generated videos. Musical instrument recognition from music signals is a well-known task in the music information retrieval (MIR) field, where current approaches rely on the analysis of the good-quality audio material. This work addresses a real-world scenario with several research challenges, i.e. the analysis of user-generated videos that are varied in terms of recording conditions and quality and may contain multiple instruments sounding simultaneous and background noise. Our approach does not only focus on the analysis of audio information, but we exploit the multimodal information embedded in the audio and visual domains. In order to do so, we develop a Convolutional Neural Network (CNN) architecture which combines learned representations from both modalities at a late fusion stage.
Our approach is trained and evaluated on two large-scale video datasets: YouTube-8M and FCVID. The proposed architectures demonstrate state-of-the-art results in audio and video object recognition, provide additional robustness to missing modalities, and remains computationally cheap to train.
Additional material:
- Postprint in Zenodo
- Code, extracted features, pre-trained models and experimental results (GitHub)
- Datasets:
Dalmazzo D, Ramirez R. A Virtual Violin System Based on Gesture Capture using EMG and Position Sensors. 1st International Workshop on Motor Learning for Music Performance.
We apply sensor technology to the automatic detection and analysis of gestures during music performances. Taking the violin as a case study, we measure both right hand (i.e. bow) and left hand (i.e. fingering) gestures in violin performances. Based on these data, we create the first model of an interactive music instrument as a virtual violin that gives musical feedback of gestural performance to novice users, by giving sound output directly mapped from motion data, as well learning from user gestures to adapt the sound manipulation.
http://www.infomus.org/MOTION2017/MOTION2017_david_and_ramirez.pdf
Manathunga K, Hernández-Leo D, Sharples M. A Social learning space grid for MOOCs: exploring a FutureLearn case. EMOOCs 2017 Fifth European MOOCs Stakeholders Summit; 2017 May 22 - 26; Madrid, Spain. [10 p.]
Collaborative and social engagement promote active learning through knowledge intensive interactions. Massive Open Online Courses (MOOCs) are dynamic and diversified learning spaces with varying factors like flexible time frames, student count, demographics requiring higher engagement and motivation to continue learning and for designers to implement novel pedagogies including collaborative learning activities. This paper looks into available and potential collaborative and social learning spaces within MOOCs and proposes a social learning space grid that can aid MOOC designers to implement such spaces, considering the related requirements. Furthermore, it describes a MOOC case study incorporating three collaborative and social learning spaces and discusses challenges faced. Interesting lessons learned from the case give an insight on which spaces to be implemented and the scenarios and factors to be considered.
Additional material:
Michos, K., Hernández-Leo, D., Jiménez, M., (2017)How educators value data analytics about their MOOCs, CEURProceedings of Work in Progress Papers of the Experience and Research Tracks and Position Papers of the Policy Track at EMOOCs 2017co-located with theEMOOCs 2017Conference (Vol-1841), Madrid, Spain, 77-82.
A range of data analytics is provided to educators about the profile, behavior and satisfaction of students participating in a Massive Open Online Course (MOOC). However, limited research has been conducted on how this informs the redesign of next MOOC editions. This work-in-progress paper presents a study of 4 MOOC educators from Universitat Pompeu Fabra regarding 3 MOOCs offered on the FutureLearn platform. The objective was to evaluate the usefulness and understandability of different types of data analytics of the courses they have offered with respect to specific monitoring goals. Preliminary results show that educators perceived the same information sources and data visualizations differently, satisfaction surveys and comments in the forum were among the most useful information but it was difficult to associate data analytics with the monitoring goals. Further studies for the alignment of educators´ monitoring needs for redesign purposes and the development of appropriate support tools are suggested.
Additional material:
Wang X, Liu Y, Wu Z, Zhou M, González Ballester MA, Zhang C. Automatic labeling of vascular structures with topological constraints via HMM. MICCAI2017 (accepted)
Identification of anatomical branches of vascular structures is a prerequisite task for diagnosis, treatment and inter-subject comparison. We propose a novel graph labeling approach to anatomically label vascular structures of interest. Our method first extracts bifurcations of interest from the centerlines of vessel tree structures, where a set of geometric features are also calculated. Then the probability distribution of these bifurcations is learned using a XGBoost classifier. Finally a Hidden Markov Model with a restricted transition strategy is constructed in order to find the most likely labeling configuration of the whole structure, while also enforcing topological consistency. In this paper, the proposed algorithm has been evaluated through leave-one-out cross validation on 50 subjects of centerline models obtained from MRA images of healthy volunteers’ Circle of Willis. Results demonstrate that our method can achieve higher accuracy and specificity than the best performing stateof-the-art methods, while obtaining similar precision and recall. It is also worth noting that our algorithm can handle different topologies, like circle, chain and tree. By using scale and coordinate independent geometrical features, our method does not require global alignment as a preprocessing step.
Additional material:
Amarasinghe I, Hernández - Leo D, Jonsson A. Towards Data - Informed Group Formation Support Across Learning Spaces. International Workshop on Learning Analytics Across Spaces (Cross-LAK), 7th International Conference on Learning Analytics & Knowledge (LAK'17)
Learning via collaboration has gained much success over past few decades given their learning benefits. Group composition has been seen as a relevant design element that contributes to the potential effectiveness of collaborative learning. To support practitioners in this context this paper addresses the problem of automatic group formation implementing policies related to well-known collaboration techniques and considering personal attributes in across-spaces contexts where multiple activities, places and tools are involved in a learning situation. Analytics of contextual and progress-in-activity information about learners presented as a summary would support practitioners to obtain a comprehensive knowledge about them to subsequently facilitate formation of effective collaborative groups to face forthcoming activities. The paper discusses a work in progress web based architecture of a group formation service to compute groupings which also assists in recommending grouping constraints via learning analytics which will facilitate practitioners in the adaptive set-up of the group formation design element across-spaces.
Additional material
Saggion H, Ferres D, Sevens L, Schuurman I, Ripolles M, Rodriguez O. Able to Read My Mail: An Accessible e-Mail Client with Assistive Technology. Web For All (W4A) 2017 – The Future of Accessible Work. Best Communication Paper Award
The Able to Include project aims at improving the living conditions of people with intellectual or developmental disabilities (IDD) in key areas of society. One of its focus points concerns improving the integration of people with IDD in the workplace by introducing accessible Web-based tools. This paper describes one of the tools developed as result of the project: an e-mail client with text simplification and other assistive technologies which makes information transmitted over the Internet more understandable to people with IDD therefore facilitating their labor integration. The accessible Web e-mail client has been developed following a User-Centered Design and tested with people with IDD. The results so far are encouraging.
Rankothge W, Le F, Russo A, Lobo J. Optimizing Resources Allocation for Virtualized Network Functions in a Cloud Center using Genetic Algorithms. IEEE Transactions on Network and Service Management ( Volume: PP, Issue: 99 )
With the introduction of Network Function Virtualization (NFV) technology, migrating entire enterprise data centers into the cloud has become a possibility. However, for a Cloud Service Provider (CSP) to offer such services, several research problems still need to be addressed. In previous work, we have introduced a platform, called Network Function Center (NFC), to study research issues related to Virtualized Network Functions (VNFs). In a NFC, we assume VNFs to be implemented on virtual machines that can be deployed in any server in the CSP network. We have proposed a resource allocation algorithm for VNFs based on Genetic Algorithms (GAs). In this paper, we present a comprehensive analysis of two resource allocation algorithms based on GA for: (1) the initial placement of VNFs, and (2) the scaling of VNFs to support traffic changes. We compare the performance of the proposed algorithms with a traditional Integer Linear Programming resource allocation technique. We then combine data from previous empirical analyses to generate realistic VNF chains and traffic patterns, and evaluate the resource allocation decision making algorithms. We assume different architectures for the data center, implement different fitness functions with GA, and compare their performance when scaling over the time.
Additional material:
- Datasets and software available here
D. Derkach, A. Ruiz and F.M. Sukno. Head Pose Estimation Based on 3-D Facial Landmarks Localization and Regression. FG 2017 Workshop on Dominant and Complementary Emotion Recognition Using Micro Emotion Features and Head-Pose Estimation, Washington DC, USA, in press, 2017.
In this paper we present a system that is able to estimate head pose using only depth information from consumer RGB-D cameras such as Kinect 2. In contrast to most approaches addressing this problem, we do not rely on tracking and produce pose estimation in terms of pitch, yaw and roll angles using single depth frames as input. Our system combines three different methods for pose estimation: two of them are based on state-of-the-art landmark detection and the third one is a dictionary-based approach that is able to work in especially challenging scans where landmarks or mesh correspondences are too difficult to obtain. We evaluated our system on the SASE database, which consists of ∼ 30K frames from 50 subjects. We obtained average pose estimation errors between 5 and 8 degrees per angle, achieving the best performance in the FG2017 Head Pose Estimation Challenge. Full code of the developed system is available on-line.
Additional information:
Raote I, Ortega Bellido M, Pirozzi M, Zhang C, Melville d, Parashuraman S, Zimmermann T, Malhotra V. TANGO1 assembles into rings around COPII coats at ER exit sites. Journal of Cell Biology Mar 2017, jcb.201608080; DOI: 10.1083/jcb.201608080
TANGO1 (transport and Golgi organization 1) interacts with CTAGE5 and COPII components Sec23/Sec24 and recruits ERGIC-53 (endoplasmic reticulum [ER]–Golgi intermediate compartment 53)–containing membranes to generate a mega-transport carrier for export of collagens and apolipoproteins from the ER. We now show that TANGO1, at the ER, assembles in a ring that encircles COPII components. The C-terminal, proline-rich domains of TANGO1 molecules in the ring are initially tilted onto COPII coats but appear to be pushed apart as the carrier grows. These findings lend support to our suggestion that growth of transport carriers for exporting bulky cargoes requires addition of membranes and not simply COPII-mediated accretion of a larger surface of ER. TANGO1 remains at the neck of the newly forming transport carrier, which grows in size by addition of ERGIC-53–containing membranes to generate a transport intermediate for the export of bulky collagens.
Cárdenes R, Zhang C, Klementieva O, Werner S, Guttmann P, Pratsch C, Cladera J, Bijnens B. 3D Membrane Segmentation and Quantification of Intact Thick Cells using Cryo Soft X-ray Transmission Microscopy: A Pilot Study. PLoS ONE 12(4): e0174324
doi:10.1371/journal.pone.0174324
Structural analysis of biological membranes is important for understanding cell and sub-cellular organelle function as well as their interaction with the surrounding environment. Imaging of whole cells in three dimension at high spatial resolution remains a significant challenge, particularly for thick cells. Cryo-transmission soft X-ray microscopy (cryo-TXM) has recently gained popularity to image, in 3D, intact thick cells (∼10μm) with details of sub-cellular architecture and organization in near-native state. This paper reports a new tool to segment and quantify structural changes of biological membranes in 3D from cryo-TXM images by tracking an initial 2D contour along the third axis of the microscope, through a multi-scale ridge detection followed by an active contours-based model, with a subsequent refinement along the other two axes. A quantitative metric that assesses the grayscale profiles perpendicular to the membrane surfaces is introduced and shown to be linearly related to the membrane thickness. Our methodology has been validated on synthetic phantoms using realistic microscope properties and structure dimensions, as well as on real cryo-TXM data. Results demonstrate the validity of our algorithms for cryo-TXM data analysis.
Additional material:
- Evaluation dataset: https://zenodo.org/record/259703
FERRÉS, Daniel; ABURA'ED, Ahmed; SAGGION, Horacio. Spanish Morphological Generation with Wide-Coverage Lexicons and Decision Trees. Procesamiento del Lenguaje Natural, [S.l.], v. 58, p. 109-116, mar. 2017. ISSN 1989-7553
Morphological Generation is the task of producing the appropiate inected form of a lemma in a given textual context and according to some morphological features. This paper describes and evaluates wide-coverage morphological lexicons and a Decision Tree algorithm that perform Morphological Generation in Spanish at state-of-the art level. The Freeling, Leffe and Apertium Spanish lexicons, the J48 Decision Tree algorithm and the combination of J48 with Freeling and Leffe lexicons have been evaluated with the following datasets for Spanish: i) CoNLL2009 Shared Task dataset, ii) Durrett and DeNero dataset of Spanish Verbs (DDN), and iii) SIGMORPHON 2016 Shared Task (task-1) dataset. The results show that: i) the Freeling and Leffe lexicons achieve high coverage and precision over the DDN and SIGMORPHON 2016 datasets, ii) the J48 algorithm achieves state-of-the-art results in all of the three datasets, and iii) the combination of Freeling, Leffe and the J48 algorithm outperformed the results of our other approaches in the three evaluation datasets, improved slightly the results of the CoNLL2009 and SIGMORPHON 2016 reported in the state-of-the-art literature, and achieved results comparable to the ones reported in the state-of-the-art literature on the DDN dataset evaluation.
Additional material.
M. Morenza-Cinos, V. Casamayor-Pujol, J. Soler-Busquets, J. L. Sanz, R. Guzman, and R. Pous (2017), Development of an RFID Inventory Robot (AdvanRobot). ROBOT OPERATING SYSTEM: the complete reference (Volume 2), 1st ed., A. (Ed. Koubaa, Ed. SPRINGER)
AdvanRobot proposes a new robot for inventorying and locating all the products inside a retail store without the need of installing any fixed infrastructure. The patent pending robot combines a laser-guided autonomous robotic base with a Radio Frequency Identification (RFID) payload composed of several RFID readers and antennas, as well as a 3D camera.
AdvanRobot is able not only to replace human operators, but to dramatically increase the efficiency and accuracy in providing inventory, while also adding the capacity to produce store maps and product location. Some important benefit of the inventory capabilities of AdvanRobot are the reduction in stock-outs, which can cause a drop in sales and are the most important source of frustration for customers; the reduction of the number of items per reference maximizing the number of references per square meter; and reducing the cost of capital due to over-stocking. Another important economic benefit expected from the inventorying and location capabilities of the robot is the ability to efficiently prepare on-line orders from the closest store to the customer, allowing retailers to compete with the likes of Amazon (a.k.a. omnichannel retail). Additionally, the robot enables to: produce a 3D model of the store; detect misplaced items; and assist customers and staff in finding products (wayfinding).
Keywords:
Professional service robotics applications with ROS, Logistic robots, Service robotics, Autonomous robots, Mobile robots, RB1-Base, RFID
Additional material:
A. Ruiz, O. Martinez, X. Binefa and F.M. Sukno. Fusion of Valence and Arousal Annotations through Dynamic Subjective Ordinal Modelling. In Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, in press, 2017.
An essential issue when training and validating computer vision systems for affect analysis is how to obtain reliable ground-truth labels from a pool of subjective annotations. In this paper, we address this problem when labels are given in an ordinal scale and annotated items are structured as temporal sequences. This problem is of special importance in affective computing, where collected data is typically formed by videos of human interactions annotated according to the Valence and Arousal (V-A) dimensions. Moreover, recent works have shown that inter-observer agreement of V-A annotations can be considerably improved if these are given in a discrete ordinal scale. In this context, we propose a novel framework which explicitly introduces ordinal constraints to model the subjective perception of annotators. We also incorporate dynamic information to take into account temporal correlations between ground-truth labels. In our experiments over synthetic and real data with V-A annotations, we show that the proposed method outperforms alternative approaches which do not take into account either the ordinal structure of labels or their temporal correlation.
Additional material:
A. Fernandez-Lopez, O. Martinez and F.M. Sukno. Towards estimating the upper bound of visual-speech recognition: The Visual Lip-Reading Feasibility Database. Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, 2017.
Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information that is complementary to the audio. Exploiting the visual information, however, has proven challenging. On one hand, researchers have reported that the mapping between phonemes and visemes (visual units) is one-to-many because there are phonemes which are visually similar and indistinguishable between them. On the other hand, it is known that some people are very good lip-readers (e.g: deaf people). We study the limit of visual only speech recognition in controlled conditions. With this goal, we designed a new database in which the speakers are aware of being read and aim to facilitate lip-reading. In the literature, there are discrepancies on whether hearing-impaired people are better lip-readers than normal-hearing people. Then, we analyze if there are differences between the lip-reading abilities of 9 hearing-impaired and 15 normal-hearing people. Finally, human abilities are compared with the performance of a visual automatic speech recognition system.
In our tests, hearing-impaired participants outperformed the normal-hearing participants but without reaching statistical significance. Human observers were able to decode 44% of the spoken message. In contrast, the visual only automatic system achieved 20% of word recognition rate. However, if we repeat the comparison in terms of phonemes both obtained very similar recognition rates, just above 50%. This suggests that the gap between human lip-reading and automatic speech-reading might be more related to the use of context than to the ability to interpret mouth appearance.
Additional material:
Aragón P, Gómez V, Kaltenbrunner A. To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion. 11th International AAAI Conference on Web and Social Media
Online discussion is essential for the communication and collaboration of online communities. The reciprocal exchange of messages between users that characterizes online discussion can be represented in many different ways. While some platforms display messages chronologically using a simple linear interface, others use a hierarchical (threaded) interface to represent more explicitly the structure of the discussion. Although the type of representation has been shown to affect communication, to the best of our knowledge, the impact of using either one or the other has not yet been investigated in a large and mature online community.
In this work we analyze Menéame, a popular Spanish social news platform which recently transitioned from a linear to a hierarchical interface, becoming an ideal research opportunity for this purpose. Using interrupted time series analysis and regression discontinuity design, we observe an abrupt and significant increase in social reciprocity after the adoption of a threaded interface. We furthermore extend state-of-the-art generative models of discussion threads by including reciprocity, a fundamental feature to explain better the structure of the discussions, both before and after the change in the interface.
Additional material:
Martins Dias G, Bellalta B, Oechsner S. Using data prediction techniques to reduce data transmissions in the IoT. 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT)
Part of the data in the Internet of Things (IoT) will be generated by wireless sensor nodes organized in Wireless Sensor Networks (WSNs). However, these nodes are mainly designed to have low costs, which implies constrained memory and energy supplies, and does not permit high data transfer rates. Despite that, modern applications rely on the knowledge acquired by WSNs to trigger other systems and sensed data has become critical to avoid economic-and living-losses. Therefore, it is important to optimize data transmissions in WSNs to support more wireless sensor nodes and a higher diversity of sensed parameters. The thesis “Prediction-Based Strategies for Reducing Data Transmissions in the IoT” extends a paradigm that exploits WSNs to the utmost: data that can be predicted does not have to be transmitted. We simulated and tested a self-managing WSN architecture in the cloud to adjust the sensor nodes' sampling intervals. For the future generations of WSNs, we designed a theoretical model for estimating the efficiency of prediction-based data reduction methods in WSNs. It will permit a strict control over the quality of the reported data without being harmed by the adoption of a higher number of sensor nodes; hence, collaborating to the IoT's growth.
Keywords: artificial intelligence, wireless sensor networks, data science, internet of things
Grimaldi A, Kane D, Bertalmío M. Natural image statistics as a function of dynamic range. Vision Sciences Society Annual Meeting 2017
The statistics of real world images have been extensively investigated, in virtually all cases using low dynamic range (LDR) image databases. The few studies that have considered high dynamic range (HDR) images have performed statistical analysis over illumination maps with HDR from different sets (Dror et al. 2001) or have examined the difference between images captured with HDR techniques against those taken with single-exposure LDR photography (Pouli et al. 2010). In contrast, in this study we investigate the impact of dynamic range upon the statistics of equally created natural images. To do so we consider the HDR database SYNS (Adams et al. 2016). For the distribution of intensity, we observe that the standard deviation of the luminance histograms increases noticeably with dynamic range. Concerning the power spectrum and in accordance with previous findings (Dror et al. 2001), we observe that as the dynamic range increases the 1/f power law rule becomes substantially inaccurate, meaning that HDR images are not scale invariant. We show that a second-order polynomial model is a better fit than a linear model for the power spectrum in log-log axis. A model of the point-spread function of the eye (considering light scattering, pupil size, etc.) has been applied to the datasets creating a reduction of the dynamic range, but the statistical differences between HDR and LDR images persist and further study needs to be performed on this subject. Future avenues of research include utilizing computer generated images, with access to the exact
reflectance and illumination distributions and the possibility to generate very large databases with ease, that will help performing more significant statistical analysis.
Data Modelling for the Evaluation of Virtualized Network Functions Resource Allocation Algorithms
Link: http://exemple.com
Link: http://exemple.com
Martinez-Maldonado, R., Goodyear, P., Carvalho, L., Thompson, K., Hernandez-Leo, D., Dimitriadis, Y., Prieto, L. P., and Wardak, D. (2017).Supporting Collaborative Design Activity in a Multi-User Digital Design Ecology. Computers in Human Behaviour, CHB, 71(June 2017), 327-342.
Across a broad range of design professions, there has been extensive research on design practices and considerable progress in creating new computer-based systems that support design work. Our research is focused on educational/instructional design for students' learning. In this sub-field, progress has been more limited. In particular, neither research nor systems development have paid much attention to the fact that design is becoming a more collaborative endeavor. This paper reports the latest research outcomes from R&D in the Educational Design Studio (EDS), a facility developed iteratively over four years to support and understand collaborative, real-time, co-present design work. The EDS serves to (i) enhance our scientific understanding of design processes and design cognition and (ii) provide insights into how designers' work can be improved through appropriate technological support. In the study presented here, we introduced a complex, multi-user, digital design tool into the existing ecology of tools and resources available in the EDS. We analysed the activity of four pairs of ‘teacher-designers’ during a design task. We identified different behaviors - in reconfiguring the task, the working methods and toolset usage. Our data provide new insights about the affordances of different digital and analogue design surfaces used in the Studio.
Additional material:
Soto-Iglesias D, Duchateau N, Butakoff C, Andreu D, Fernández-Armenta J, Bijnens B, Berruezo A, Sitges M, Cámara. Quantitative Analysis of Electro-Anatomical Maps: Application to an Experimental Model of Left Bundle Branch Block/Cardiac Resynchronization Therapy. IEEE Journal of Translational Engineering in Health and Medicine.
Electro-anatomical maps (EAMs) are commonly acquired in clinical routine for guiding ablation therapies. They provide voltage and activation time information on a 3-D anatomical mesh representation, making them useful for analyzing the electrical activation patterns in specific pathologies. However, the variability between the different acquisitions and anatomies hampers the comparison between different maps.
This paper presents two contributions for the analysis of electrical patterns in EAM data from biventricular surfaces of cardiac chambers. The first contribution is an integrated automatic 2-D disk representation (2-D bull's eye plot) of the left ventricle (LV) and right ventricle (RV) obtained with a quasi-conformal mapping from the 3-D EAM meshes, that allows an analysis of cardiac resynchronization therapy (CRT) lead positioning, interpretation of global (total activation time), and local indices (local activation time (LAT), surrogates of conduction velocity, inter-ventricular, and transmural delays) that characterize changes in the electrical activation pattern. The second contribution is a set of indices derived from the electrical activation: speed maps, computed from LAT values, to study the electrical wave propagation, and histograms of isochrones to analyze regional electrical heterogeneities in the ventricles. We have applied the proposed methods to look for the underlying physiological mechanisms of left bundle branch block (LBBB) and CRT, with the goal of optimizing the therapy by improving CRT response. To better illustrate the benefits of the proposed tools, we created a set of synthetically generated and fully controlled activation patterns, where the proposed representation and indices were validated. Then, the proposed analysis tools are used to analyse EAM data from an experimental swine model of induced LBBB with an implanted CRT device. We have analyzed and compared the electrical activation patterns at baseline, LBBB, and CRT stages in four animals: two without any structural disease and two with an induced infarction. By relating the CRT lead location with electrical dyssynchrony, we evaluated current hypotheses about lead placement in CRT and showed that optimal pacing sites should target the RV lead close to the apex and the LV one distant from it.
Additional material:
- Post-print (although note that this is an Open Access article) and supplementary material
- Software for flattening to be available in Open Source soon
Barbieri F, Espinosa-Anke L, Saggion H. Revealing Patterns of Twitter Emoji Usage in Barcelona and Madrid. CCIA 2016
Emojis are small sized images which are naturally combined with free text to visually complement or condense the meaning of a message. The set of available emojis is fixed, irrespective of a user’s location. However, their interpretation and the way they are used may vary. In this paper, we compare the meaning and usage of emojis across two Spanish cities: Barcelona and Madrid. Our results suggest that the overall semantics of the subset of emojis we studied is preserved overthesecities.However,someofthemareinterpreteddifferently,whichsuggests there may exist cultural differences between inhabitants of Barcelona and Madrid, and that these are reflected in how they communicate in social networks.
Keywords: Emojis, Distributional Semantics, Barcelona, Madrid
Mangado N., Piella G., Noailly J., Pons-Prats J., González Ballester M.A. Analysis of uncertainty and variability in finite element computational models for biomedical engineering: characterization and propagation. Frontiers in Bioengineering and Biotechnology, vol. 4(85), 2016. (doi:10.3389/fbioe.2016.00085)
Computational modeling has become a powerful tool in biomedical engineering thanks to its potential to simulate coupled systems. However, real parameters are usually not accurately known, and variability is inherent in living organisms. To cope with this, probabilistic tools, statistical analysis and stochastic approaches have been used. This article aims to review the analysis of uncertainty and variability in the context of finite element modeling in biomedical engineering. Characterization techniques and propagation methods are presented, as well as examples of their applications in biomedical finite element simulations. Uncertainty propagation methods, both non-intrusive and intrusive, are described. Finally, pros and cons of the different approaches and their use in the scientific community are presented. This leads us to identify future directions for research and methodological development of uncertainty modeling in biomedical engineering.
Keywords: uncertainty quantification, finite element models, random variables, intrusive and non-intrusive methods, sampling techniques, computational modeling
Thalmeier D, Gómez V, Kappen HJ. Action selection in growing state spaces: Control of Network Structure Growth. Journal of Physics A: Mathematical and Theoretical
The dynamical processes taking place on a network depend on its topology. Influencing the growth process of a network therefore has important implications on such dynamical processes. We formulate the problem of influencing the growth of a network as a stochastic optimal control problem in which a structural cost function penalizes undesired topologies. We approximate this control problem with a restricted class of control problems that can be solved using probabilistic inference methods. To deal with the increasing problem dimensionality, we introduce an adaptive importance sampling method for approximating the optimal control. We illustrate this methodology in the context of formation of information cascades, considering the task of influencing the structure of a growing conversation thread, as in Internet forums. Using a realistic model of growing trees, we show that our approach can yield conversation threads with better structural properties than the ones observed without control.
Additional material:
- arXiv
- Code to be available soon
Pons, J., Serra X. Designing Efficient Architectures for Modeling Temporal Features with Convolutional Neural Networks. 42th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
Many researchers use convolutional neural networks with small rectangular filters for music (spectrograms) classification. First, we discuss why there is no reason to use this filters setup by default and second, we point that more efficient architectures could be implemented if the characteristics of the music features are considered during the design process. Specifically, we propose a novel design strategy that might promote more expressive and intuitive deep learning architectures by efficiently exploiting the representational capacity of the first layer – using different filter shapes adapted to fit musical concepts within the first layer. The proposed architectures are assessed by measuring their accuracy in predicting the classes of the Ballroom dataset. We also make available the used code (together with the audio-data) so that this research is fully reproducible.
Additional material:
Wang S., Zhang C., González Ballester M.A., Yarkony J. Efficient pose and cell segmentation using column generation.ArXiv preprint1612.00437, 2016.
We study the problems of multi-person pose segmentation in natural images and instance segmentation in biological images with crowded cells. We formulate these distinct tasks as integer programs where variables correspond to poses/cells. To optimize, we propose a generic relaxation scheme for solving these combinatorial problems using a column generation formulation where the program for generating a column is solved via exact optimization of very small scale integer programs. This results in efficient exploration of the spaces of poses and cells.
Michos K, Hernández-Leo D. Towards understanding the potential of teaching analytics within educational communities. International Workshop on Teaching Analytics, EC-TEL 2016, Lyon, France,S eptember 2016. CEUR Proceedings Vol-1738, pp 1-8.
The use of learning analytics in ICT-rich learning environments assists teachers to (re)design their learning scenarios. Teacher inquiry is a process of intentional and systematic research of teachers into their students ́ learning. When teachers work in small groups or communities and present results of their practice more interpretations are generated around the use and meaning of this data. In this workshop paper we present preliminary research about four dimensions of learning analytics (engagement, assessment, progression, satisfaction), and their visualization as teaching analytics, that are hypothesized to be relevant to help teachers in the (re)design of their learning scenarios. Moreover, we evaluate teachers’ acceptance of exchanging these types of analytics within their teaching community. A workshop for blended MOOCs design (N=20 participants) showed that although all the analytics dimensions were valuable, assessment data was the most useful dimension for (re)designing while data about the engagement of students was the less useful. Educational practitioners also showed interest in knowing a combination of specific data (e.g. achievements related with the satisfaction of students). Last, most participants expressed their willingness to share visual learning analytics related to their designs with their colleagues. The role of contextual information to interpret the learning analytics was recognized as important.
Keywords: Teacher inquiry, professional learning communities, learning designRashid Z, Melià-Seguí J, Pous R, Peig E. Using Augmented Reality and Internet of Things to improve accessibility of people with motor disabilities in the context of Smart Cities. Future Generation Computer Systems
Smart Cities need to be designed to allow the inclusion of all kinds of citizens. For instance, motor disabled people like wheelchair users may have problems to interact with the city. Internet of Things (IoT) technologies provide the tools to include all citizens in the Smart City context. For example, wheelchair users may not be able to reach items placed beyond their arm’s length, limiting their independence in everyday activities like shopping, or visiting libraries. We have developed a system that enables wheelchair users to interact with items placed beyond their arm’s length, with the help of Augmented Reality (AR) and Radio Frequency Identification (RFID) technologies. Our proposed system is an interactive AR application that runs on different interfaces, allowing the user to digitally interact with the physical items on the shelf, thanks to an updated inventory provided by an RFID system. The resulting experience is close to being able to browse a shelf, clicking on it and obtaining information about the items it contains, allowing wheelchair users to shop independently, and providing autonomy in their everyday activities. Fourteen wheelchair users with different degrees of impairment have participated in the study and development of the system. The evaluation results show promising results towards more independence of wheelchair users, providing an opportunity for equality improvement.
Keywords: RFID; Augmented reality; Smart spaces; Motor disabled people; Inclusion; Retail
Oramas, S, Ostuni V C, Di Noia T, Serra X, Di Sciascio E. Sound and Music Recommendation with Knowledge Graphs. ACM Transactions on Intelligent Systems and Technology (TIST)
The Web has moved, slowly but steadily, from a collection of documents towards a collection of structured data. Knowledge graphs have then emerged as a way of representing the knowledge encoded in such data as well as a tool to reason on them in order to extract new and implicit information. Knowledge graphs are currently used, for example, to explain search results, to explore knowledge spaces, to semantically enrich textual documents, or to feed knowledge-intensive applications such as recommender systems. In this work, we describe how to create and exploit a knowledge graph to supply a hybrid recommendation engine with information that builds on top of a collections of documents describing musical and sound items. Tags and textual descriptions are exploited to extract and link entities to external graphs such as WordNet and DBpedia, which are in turn used to semantically enrich the initial data. By means of the knowledge graph we build, recommendations are computed using a feature combination hybrid approach. Two explicit graph feature mappings are formulated to obtain meaningful item feature representations able to catch the knowledge embedded in the graph. Those content features are further combined with additional collaborative information deriving from implicit user feedback. An extensive evaluation on historical data is performed over two different datasets: a dataset of sounds composed of tags, textual descriptions, and user’s download information gathered from Freesound.org and a dataset of songs that mixes song textual descriptions with tags and user’s listening habits extracted from Songfacts.com and Last.fm, respectively. Results show significant improvements with respect to state-of-the-art collaborative algorithms in both datasets. In addition, we show how the semantic expansion of the initial descriptions helps in achieving much better recommendation quality in terms of aggregated diversity and novelty
Additional material
- Datasets
- Music Recommendation Dataset (KGRec-music)
- Sound Recommendation Dataset (KGRec-sound)
Martins Dias G, Nurchis M, Bellalta B. Adapting Sampling Interval of Sensor Networks Using On-Line Reinforcement Learning. IEEE World Forum on Internet of Things 2016
Monitoring Wireless Sensor Networks (WSNs) are composed of sensor nodes that report temperature, relative humidity, and other environmental parameters. The time between two successive measurements is a critical parameter to set during the WSN configuration because it can impact the WSN's lifetime, the wireless medium contention and the quality of the reported data. As trends in monitored parameters can significantly vary between scenarios and within time, identifying a sampling interval suitable for several cases is also challenging. In this work, we propose a dynamic sampling rate adaptation scheme based on reinforcement learning, able to tune sensors' sampling interval on-the-fly, according to environmental conditions and application requirements. The primary goal is to set the sampling interval to the best value possible so as to avoid oversampling and save energy, while not missing environmental changes that can be relevant for the application. In simulations, our mechanism could reduce up to 73% the total number of transmissions compared to a fixed strategy and, simultaneously, keep the average quality of information provided by the WSN. The inherent flexibility of the reinforcement learning algorithm facilitates its use in several scenarios, so as to exploit the broad scope of the Internet of Things.
Additional material:
- Data: Intel Lab Data with after-processing to fill missing values as described in the article (if interested, contact the authors)
Barbieri F, Kruszewski G, Ronzano F, Saggion H. How Cosmopolitan Are Emojis?: Exploring Emojis Usage and Meaning over Different Languages with Distributional Semantics. ACM Multimedia 2016 Conference
Choosing the right emoji to visually complement or condense the meaning of a message has become part of our daily life. Emojis are pictures, which are naturally combined with plain text, thus creating a new form of language. These pictures are the same independently of where we live, but they can be interpreted and used in different ways. In this paper we compare the meaning and the usage of emojis across different languages. Our results suggest that the overall semantics of the subset of the emojis we studied is preserved across all the languages we analysed. However, some emojis are interpreted in a different way from language to language, and this could be related to socio-geographical differences.
Additional material:
- Page of the publication with general details of the work and access to raw data
Domínguez M, Latorre I, Farrús M, Codina-Filbà J, Wanner L. Praat on the Web: An Upgrade of Praat for Semi-Automatic Speech Annotation. 26th International Conference on Computational Linguistics (COLING 2016)
This paper presents an implementation of the widely used speech analysis tool Praat as a web application with an extended functionality for feature annotation. In particular, Praat on the Web addresses some of the central limitations of the original Praat tool and provides (i) enhanced visualization of annotations in a dedicated window for feature annotation at interval and point segments, (ii) a dynamic scripting composition exemplified with a modular prosody tagger, and (iii) portability and an operational web interface. Speech annotation tools with such a functionality are key for exploring large corpora and designing modular pipelines.
Additional material:
- Code: Github repository
- Open online demo
- Slides in ppt and pdf
Domínguez M, Farrús M, Wanner L. An Automatic Prosody Tagger for Spontaneous Speech. 26th International Conference on Computational Linguistics (COLING 2016)
Speech prosody is known to be central in advanced communication technologies. However, despite the advances of theoretical studies in speech prosody, so far, no large scale prosody annotated resources that would facilitate empirical research and the development of empirical computational approaches are available. This is to a large extent due to the fact that current common prosody annotation conventions offer a descriptive framework of intonation contours and phrasing based on labels. This makes it difficult to reach a satisfactory inter-annotator agreement during the annotation of gold standard annotations and, subsequently, to create consistent large scale annotations. To address this problem, we present an annotation schema for prominence and boundary labeling of prosodic phrases based upon acoustic parameters and a tagger for prosody annotation at the prosodic phrase level. Evaluation proves that inter-annotator agreement reaches satisfactory values, from 0.60 to 0.80 Cohen’s kappa, while the prosody tagger achieves acceptable recall and f-measure figures for five spontaneous samples used in the evaluation of monologue and dialogue formats in English and Spanish. The work presented in this paper is a first step towards a semi-automatic acquisition of large corpora for empirical prosodic analysis.
Additional material:
class="repository-meta-content"> itemprop="about">Github repository that includes the Praat scripts for a modular prosody tagger deployed under the extended Praat for feature annotations.
class="repository-meta-content"> itemprop="about">Open online demo.
Espinosa-Anke, L, Camacho-Collados, J, Rodríguez-Fernández, S, Saggion, H, Wanner, L. Extending WordNet with Fine-Grained Collocational Information via Supervised Distributional Learning. Coling 2016.
WordNet is probably the best known lexical resource in Natural Language Processing. While it is widely regarded as a high quality repository of concepts and semantic relations, updating and extending it manually is costly. One important type of relation which could potentially add enormous value to WordNet is the inclusion of collocational information, which is paramount in tasks such as Machine Translation, Natural Language Generation and Second Language Learning. In this paper, we present ColWordNet (CWN), an extended WordNet version with fine-grained collocational information, automatically introduced thanks to a method exploiting linear relations between analogous sense-level embeddings spaces. We perform both intrinsic and extrinsic evaluations, and release CWN for the use and scrutiny of the community.
Additional material:
Espinosa-Anke L, Camacho-Collados J, Delli Bovi C, Saggion H. Supervised Distributional Hypernym Discovery via Domain Adaptation. 2016 Conference on Empirical Methods on Natural Language Processing (EMNLP 2016)
Lexical taxonomies are graph-like hierarchical structures that provide a formal representation of knowledge. Most knowledge graphs to date rely on is-a (hypernymic) relations as the backbone of their semantic structure. In this paper, we propose a supervised distributional framework for hypernym discovery which operates at the sense level, enabling large-scale automatic acquisition of disambiguated taxonomies. By exploiting semantic regularities between hyponyms and hypernyms in embeddings spaces, and integrating a domain clustering algorithm, our model becomes sensitive to the target data. We evaluate several configurations of our approach, training with information derived from a manually created knowledge base, along with hypernymic relations obtained from Open Information Extraction systems. The integration of both sources of knowledge yields the best overall results according to both automatic and manual evaluation on ten different domains.
Additional material
In this link the following information is available:
- Training Data: Wikidata, KB-Unify
- Nasari Domain Labels
- SensEmbed Sense Vectors
- Python API
Michos, K., Hernández-Leo, D. Understanding Collective Behavior of Learning Design Communities. Proceedings of the 11th European Conference on Technology Enhanced Learning (EC-TEL 2016)
Social computing enables collective actions and social interaction with rich exchange of information. In the context of educators’ networks where they create and share learning design artifacts, little is known about their collective behavior. Learning design tooling focuses on supporting educators (learning designers) in making explicit their design ideas and encourages the development of “learning design communities”. Building on social elements, this paper aims to identify the level of engagement and interactions in three communities using an Integrated Learning Design Environment (ILDE). The results show a relationship between the exploration of different artifacts and creation of content in all the three communities confirming that browsing influence the community’s outcomes. Different patterns of interaction suggest specific impact of language and length of support for users.
Addtional material:
- ILDE: Integrated Learning Design Environment
Keywords: class="Keyword">Learning design class="Keyword">Communities of educators class="Keyword">Collective behavior class="Keyword">Social network analysis
Manatunga, K., Hernández-Leo, D. PyramidApp: Scalable Method Enabling Collaboration in the Classroom. Proceedings of the 11th European Conference on Technology Enhanced Learning (EC-TEL 2016)
Computer Supported Collaborative Learning methods support fruitful social interactions using technological mediation and orchestration. However, studies indicate that most existing CSCL methods have not been applied to large classes, means that they may not scale well or that it’s unclear to what extent or with which technological mechanisms scalability could be feasible. This paper introduces and evaluates PyramidApp, implementing a scalable pedagogical method refining Pyramid (aka Snowball) class="details" > collaborative learning flow pattern. Refinements include rating and discussing to reach upon global consensus. Three different face-to-face classroom situations were used to evaluate different tasks of pyramid interactions. Experiments led to conclude that pyramids can be meaningful with around 20 participants per pyramid of 3–4 levels, with several pyramids running in parallel depending on the classroom size. An underpinning algorithm enabling elastic creation of multiple pyramids, using control timers and triggering flow awareness facilitated scalability, dynamism and overall user satisfaction in the experience.
Keywords: Computer-Supported Collaborative Learning, Pyramid / Snowball, Collaborative Learning Flow, Pattern, Large Groups, Classroom
Additional material
- pdf file at UPF open access e-repository
- PyramidApp webwith additional details and introductory videos
- Dataset
Dalmau V, Krokhin A, Manokaran R. Towards a Characterization of Constant-Factor Approximable Finite-Valued CSPs
In this paper we study the approximability of (Finite-)Valued Constraint Satisfaction Problems (VCSPs) with a fixed finite constraint language consisting of finitary functions on a fixed finite domain. An instance of VCSP is given by a finite set of variables and a sum of functions belonging to and depending on a subset of the variables. Each function takes values in [0, 1] specifying costs of assignments of labels to its variables, and the goal is to find an assignment of labels to the variables that minimizes the sum. A recent result of Ene et al. says that, under the mild technical condition that contains the equality relation, the basic LP relaxation is optimal for constant-factor approximation for VCSP unless the Unique Games Conjecture fails. Using the algebraic approach to the CSP, we give new natural algebraic conditions for the finiteness of the integrality gap for the basic LP relaxation of VCSP. We also show how these algebraic conditions can in principle be used to round solutions of the basic LP relaxation, and how, for several examples that cover all previously known cases, this leads to efficient constant-factor approximation algorithms. Finally, we show that the absence of another algebraic condition leads to NP-hardness of constant-factor approximation.
arXiv e-print: https://arxiv.org/abs/1610.01019
Dzhambazov G, Goranchev K. Sing Master: an Intelligent Mobile Game for Teaching Singing. 8th International Conference on Games and Virtual Worlds for Serious Applications, (VS-GAMES 2016)
In this paper we present Sing Master - a new mobile application that allows singing enthusiasts to fulfil their dream: to learn singing. Sing Master is a game-like singing tutor, which provides continuous feedback on singers’ pitch and timing. To this end, it presents a series of singing exercises build around musical intervals (e.g. major third). First, we motivate the use of software to evaluate and teach singing compared to a conventional singing coach. Next, we summarize some very similar software applications and outline what are the benefits of Sing Master compared to them. We finish by describing Sing Master’s most useful feature to ’hear’ the melody being sung and the algorithm, which ’hears’ the musical pitch.
Bosch, J. Gómez E. Melody extraction based on a source-filter model using pitch contour selection. 13th Sound and Music Computing Conference (SMC 2016)
This work proposes a melody extraction method which combines a pitch salience function based on source-filter modelling with melody tracking based on pitch contour selection. We model the spectrogram of a musical audio signal as the sum of the leading voice and accompaniment. The leading voice is modelled with a Smoothed Instantaneous Mixture Model (SIMM), and the accompaniment is modelled with a Non-negative Matrix Factorization (NMF). The main benefit of this representation is that it incorporates timbre information, and that the leading voice is enhanced, even without an explicit separation from the rest of the signal. Two different salience functions based on SIMM are proposed, in order to adapt the output of such model to the pitch contour based tracking. Candidate melody pitch contours are then created by grouping pitch sequences, using auditory streaming cues. Finally, melody pitch contours are selected using a set of heuristic rules based on contour characteristics and smoothness constraints. The evaluation on a large set of challenging polyphonic music material, shows that the proposed salience functions help increasing the salience of melody pitches in comparison to similar methods. The complete melody extraction methods also achieve a higher overall accuracy than state-of-the-art approaches when evaluated on both vocal and instrumental music.
Additional material:
- Postprint (pdf)
- Source code
- Datasets
- ORCHSET dataset: a dataset for melody extraction in symphonic music recordings
- MedleyDB
Rodríguez-Fernández S, Espinosa-Anke L, Carlini R, Wanner L. Semantics-Driven Recognition of Collocations Using Word Embeddings. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016)
L2 learners often produce “ungrammatical” word combinations such as, e.g., *give a suggestion or *make a walk. This is because of the “collocationality” of one of their items (the base) that limits the acceptance of collocates to express a specific meaning (‘perform’ above). We propose an algorithm that delivers, for a given base and the intended meaning of a collocate, the actual collocate lexeme(s) (make / take above). The algorithm exploits the linear mapping between bases and collocates from examples and generates a collocation transformation matrix which is then applied to novel unseen cases. The evaluation shows a promising line of research in collocation discovery
Additional material:
Rodríguez-Fernández S, Espinosa Anke L, Carlini R, Wanner L: Semantics-Driven Collocation Discovery. Procesamiento del Lenguaje Natural 57: 57-64 (2016)
Collocations are combinations of two lexically dependent elements, of which one (the base) is freely chosen because of its meaning, and the choice of the other (the collocate) depends on the base. Collocations are difficult to master by language learners. This difficulty becomes evident in that even when learners know the meaning they want to express, they often struggle to choose the right collocate. Collocation dictionaries, in which collocates are grouped into semantic categories, are useful tools. However, they are scarce since they are the result of costintensive manual elaboration. In this paper, we present for Spanish an algorithm that automatically retrieves for a given base and a given semantic category the corresponding collocates
Espinosa-Anke L, Tello J, Pardo A, Medrano I, Ureña A, Salcedo I, Saggion H. Savana: A Global Information Extraction and Terminology Expansion Framework in the Medical Domain. Procesamiento del Lenguaje Natural 57: 23-30 (2016) - Chapter 6.4
Terminological databases constitute a fundamental source of information in the medical domain. They are used daily both by practitioners in the area, as well as in academia. Several resources of this kind are available, e.g. CIE, SnomedCT or UMLS (Unified Medical Language System). These terminological databases are of high quality due to them being the result of collaborative expert knowledge. However, they may show certain drawbacks in terms of faithfully representing the ever-changing medical domain. Therefore, systems aimed at capturing novel terminological knowledge in heterogeneous text sources, and able to include them in standard terminologies have the potential to add great value to such repositories. This paper presents, first, Savana, a Biomedical Information Extraction system which, combined with a validation phase carried out by medical practitioners, is used to populate the Spanish branch of SnomedCT with novel knowledge. Second, we describe and evaluate a system which, given a novel medical term, finds its most likely hypernym, thus becoming an enabler in the task of terminological database enrichment and expansion.
Aragón P, Gómez V, Kaltenbrunner A. Measuring Platform Effects in Digital Democracy. The Internet, Policy & Politics Conference
Online discussions are the essence of many social platforms on the Internet. Conversations in online forums are traditionally presented in a hierarchical structure. In contrast, online social networking services usually show discussions linearly by sorting messages in chronological order. How discussion networks are a ected by choosing a specific view has never been investigated in the literature. In this article we present an analysis of the discussion threads in Meneame, the most popular Spanish social news platform. In January 2015, this site turned the original linear conversation view into a hierarchical one. Our ndings prove that the new interface promoted new discussion network structures. In particular, the hierarchical view increased deliberation and reciprocity based on the rhizomatic structure of discussions
Keywords: Digital democracy, Online deliberation, Online discussion, Human-Computer Interfaces, Conversation view
Best Paper Award at The Internet, Policy & Politics Conference, University of Oxford
Mc Fee B, Humphrey EJ, Urbano J. A plan for sustainable MIR evaluation. 17th International Society for Music Information Retrieval Conference (ISMIR2016)
The Music Information Retrieval Evaluation eXchange (MIREX) is a valuable community service, having established standard datasets, metrics, baselines, methodologies, and infrastructure for comparing MIR methods. While MIREX has managed to successfully maintain operations for over a decade, its long-term sustainability is at risk without considerable ongoing financial support. The imposed constraint that input data cannot be made freely available to participants necessitates that all algorithms run on centralized computational resources, which are administered by a limited number of people. This incurs an approximately linear cost with the number of submissions, exacting significant tolls on both human and financial resources, such that the current paradigm becomes less tenable as participation increases. To alleviate the recurring costs of future evaluation campaigns, we propose a distributed, community-centric paradigm for system evaluation, built upon the principles of openness, transparency, reproducibility, and incremental evaluation. We argue that this proposal has the potential to reduce operating costs to sustainable levels. Moreover, the proposed paradigm would improve scalability, and eventually result in the release of large, open datasets for improving both MIR techniques and evaluation methods.
Additional material:
Paun B, Bijnens B, Iles T, Iaizzo P, Butakoff C. Patient Independent Representation of the Detailed Cardiac Ventricular Anatomy. Medical Image Analysis
Reparameterization of surfaces is a widely used tool in computer graphics known mostly from the remeshing algorithms. Recently, the surface reparameterization techniques started to gain popularity in the field of medical imaging, but mostly for convenient 2D visualization of the information initially represented on 3D surfaces (e.g. continuous bulls-eye plot). However, by consistently mapping the 3D information to the same 2D domain, surface reparameterization techniques allow us to put into correspondence anatomical shapes of inherently different geometry. In this paper, we propose a method for anatomical parameterization of cardiac ventricular anatomies that include myocardium, trabeculations, tendons and papillary muscles. The proposed method utilizes a quasi-conformal flattening of the myocardial surfaces of the left and right cardiac ventricles and extending it to cover the interior of the cavities using the local coordinates given by the solution of the Laplace’s equation. Subsequently, we define a geometry independent representation for the detailed cardiac left and right ventricular anatomies that can be used for convenient visualization and statistical analysis of the trabeculations in a population. Lastly we show how it can be used for mapping the detailed cardiac anatomy between different hearts, which is of considerable interest for detailed cardiac computational models or shape atlases.
Additional material:
- Supplementary Raw Research Data. This is open data under the CC BY license http://creativecommons.org/licenses/by/4.0/
Slizovskaia O, Gómez E, Haro G. Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. 13th Sound and Music Computing Conference (SMC 2016)
The goal of this work is to incorporate the visual modality into a musical instrument recognition system. For that, we first evaluate state-of-the-art image recognition techniques in the context of music instrument recognition, using a database of about 20000 images and 12 instrument classes. We then reproduce the results of state-of-the-art methods for audio-based musical instrument recognition, considering standard datasets including more than 9000 sound excerpts and 45 instrument classes. We finally compare the accuracy and confusions in both modalities and we showcase how they can be integrated for audio-visual instrument recognition in music videos. We obtain around 0.75 F1-measure for audio and 0.77 for images and similar confusions between instruments. This study confirms that visual (shape) and acoustic (timbre) properties of music instruments are related to each other and reveals the potential of audiovisual music description systems.
Additional material:
- Code in Github
- IRMAS: A dataset for instrument recognition in musical audio signals
- RWC Music Database
- ImageNet ILSVRC collection
Espinosa-Anke L., Oramas S, Camacho-Collados J, Sagion H. Finding and Expanding Hypernymic Relations in the Music Domain. 19th International Conference of the Catalan Association for Artificial Inteligence (CCIA 2016)
Lexical taxonomies are tree or directed acyclic graph-like structures where each node represents a concept and each edge encodes a binary hypernymic (is-a) relation. These lexical resources are useful for AI tasks like Information Retrieval or Machine Translation. Two main trends exist in the construction and exploitation of these resources: On one hand, general purpose taxonomies like WordNet, and on the other,domain-specific databases such as the CheBi chemical ontology, or MusicBrainz in the music domain. In both cases these are based on finding correct hypernymic relations between pairs of concepts. In this paper, we propose a generic framework for hypernym discovery, based on exploiting linear relations between (term, hypernym) pairs in Wikidata, and apply it to the domain of music. Our promising results, based on several metrics used in Information Retrieval, show that in several cases we are able to discover the correct hypernym for a given novel term.
Keywords.Semantics, taxonomy learning, word sense disambiguation.
Lotinac D, Jonsson A. Constructing Hierarchical Task Models Using Invariance Analysis. Proceedings of the 22nd European Conference on Artificial Intelligence (ECAI'16), 2016
Hierarchical Task Networks (HTNs) are a common model for encoding knowledge about planning domains in the form of task decompositions. We present a novel algorithm that uses invariant analysis to construct an HTN from the PDDL description of a planning domain and a single representative instance. The algorithm defines two types of composite tasks that interact to achieve the goal of a planning instance. One type of task achieves fluents by traversing invariants in which only one fluent can be true at a time. The other type of task applies a single action, which first involves ensuring that the precondition of the action holds. The resulting HTN can be applied to any instance of the planning domain, and is provably sound. We show that the performance of our algorithm is comparable to algorithms that learn HTNs from examples and use added knowledge
Oramas S., Espinosa-Anke L., Sordo M., Saggion H., Serra X. Information extraction for knowledge base construction in the music domain. Data and Knowledge Engineering.
The rate at which information about music is being created and shared on the web is growing exponentially. However, the challenge of making sense of all this data remains an open problem. In this paper, we present and evaluate an Information Extraction pipeline aimed at the construction of a Music Knowledge Base. Our approach starts off by collecting thousands of stories about songs from the songfacts.com website. Then, we combine a state-of-the-art Entity Linking tool and a linguistically motivated rule-based algorithm to extract semantic relations between entity pairs. Next, relations with similar semantics are grouped into clusters by exploiting syntactic dependencies. These relations are ranked thanks to a novel confidence measure based on statistical and linguistic evidence. Evaluation is carried out intrinsically, by assessing each component of the pipeline, as well as in an extrinsic task, in which we evaluate the contribution of natural language explanations in music recommendation. We demonstrate that our method is able to discover novel facts with high precision, which are missing in current generic as well as music-specific knowledge repositories.
Additional material:
- Dataset KBSF
- ELVIS software at Github
Saggion H, AbuRa'ed A, Ronzano F. Trainable citation-enhanced summarization of scientific articles. Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016). CEUR Workshop Proceedings
In order to cope with the growing number of relevant scientific publications to consider at a given time, automatic text summarization is a useful technique. However, summarizing scientific papers poses important challenges for the natural language processing community. In recent years a number of evaluation challenges have been proposed to address the problem of summarizing a scientific paper taking advantage of its citation network (i.e., the papers that cite the given paper). Here, we present our trainable technology to address a number of challenges in the context of the 2nd Computational Linguistics Scientific Document Summarization Shared Task.
Additional material:
Ronzano F, Freire A, Saez-Trumper D, Saggion H. Making sense of massive amounts of scientific publications: the scientific knowledge miner project. Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016). CEUR workshop Proceedings.
The World Wide Web has become the hugest repository ever for scientific publications and it continues to increase at an unprecedented rate. Nevertheless, this information overload makes the exploration of this content a very time-consuming task. In this landscape, the availability of text mining tools to characterize and explore distinctive features of the scientific literature is mandatory. We present the Scientific Knowledge Miner (SKM) Project, that aims to investigate new approaches and frameworks to facilitate the extraction of knowledge from scientific publications across different disciplines. More specifically, we will focus on citation characterization, recommendation and scientific document summarization.
Additional material:
Ferrés D., Marimon M., Saggion H., AbuRa'ed A. YATS: Yet Another Text Simplifier. Natural Language Processing and Information Systems. 21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016
We present a text simplifier for English that has been built with open source software and has both lexical and syntactic simplification capabilities. The lexical simplifier uses a vector space model approach to obtain the most appropriate sense of a given word in a given context and word frequency simplicity measures to rank synonyms. The syntactic simplifier uses linguistically-motivated rule-based syntactic analysis and generation techniques that rely on part-of-speech tags and syntactic dependency information. Experimental results show good performance of the lexical simplification component when compared to a hard-to-beat baseline, good syntactic simplification accuracy, and according to human assessment, improvements over the best reported results in the literature for a system with same architecture as YATS.
Additional material:
. Page of the publication with additional details
Bosch J.J., Bittner R.M., Salamon J., Gómez E. A Comparison of Melody Extraction Methods Based on Source-Filter Modelling. Proc. 17th International Society for Music Information Retrieval Conference (ISMIR 2016)
This work explores the use of source-filter models for pitch salience estimation and their combination with differentpitch tracking andvoicing estimation methods for automatic melody extraction. Source-filter models are usedto create a mid-level representation of pitch that implicitly incorporates timbre information. The spectrogram ofa musical audio signal is modelled as the sum of the leading voiceproduced by human voice or pitched musical instruments) and accompaniment. The leading voice is thenmodelled with a Smoothed Instantaneous Mixture Model(SIMM) based on a source-filter model. The main advantage of such a pitch salience function is that itenhancesthe leading voice even without explicitly separating it fromthe rest of the signal. We show that this is beneficialfor melody extraction, increasing pitch estimation accuracy and reducing octave errors in comparison with simplerpitch salience functions. Theadequate combination withvoicing detection techniques based on pitch contour characterisation leads to significant improvements over state-of-the-art methods, for both vocal and instrumental music.
Additional material:
- Postprint version in UPF repository
- ORCHSET dataset: a dataset for melody extraction in symphonic music recordings
- MedleDB dataset
- Software in Github
Ruiz-Garcia A., Freire A., Subirats L., Lessons learned in promoting new technologies and engineering in girls through a girls hackathon and mentoring. 8th International Conference on Education and New Learning Technologies (EduLearn16)
The under-representation of women in engineering is becoming a matter of concern as it has implications both for the women themselves and for the development of the digital economy sector. To address this issue, a girls hackathon and a mentorship program have been held at Universitat Pompeu Fabra (Barcelona, Spain) with the aim of bringing more women into computing. Questionnaires deployed after the event indicate that it was a powerful initiative to encourage girls to study engineering degrees and to increase the visibility of women in ICT.
Additional details:
Ronzano F, Saggion H. An Empirical Assessment of Citation Information in Scientific Summarization. Proceedings of the 21st International Conference on Applications of Natural Language to Information Systems (NLDB 2016), 22-24 June 2016, Manchester, UK
Considering the recent substantial growth of the publication rate of scientific results, nowadays the availability of effective and automated techniques to summarize scientific articles is of utmost importance. In this paper we investigate if and how we can exploit the citations of an article in order to better identify its relevant excerpts. By relying on the BioSumm2014 dataset, we evaluate the variation in performance of extractive summarization approaches when we consider the citations to extend or select the contents of an article to summarize. We compute the maximum ROUGE-2 scores that can be obtained when we summarize a paper by considering its contents together with its citations. We show that the inclusion of citation-related information brings to the generation of better summaries.
Keywords:citation-based summarization, scientific text mining, summary evaluation
Additional material:
- Dataset TAC2014 Biomedical Summarization Data, from the Biomedical Summarization Track, Text Analysis Conference 2014
Hernández-Leo, D., Pardo, A., Towards Integrated Learning Design with Across-spaces Learning Analytics: A Flipped Classroom Example. Proceedings of the First International Workshop on Learning Analytics Across Physical and Digital Spaces co-located with 6th International Conference on Learning Analytics & Knowledge (LAK 2016). Edinburgh, Scotland, UK, April 25-29, 2016, (pp. 74-78)
In this paper we discuss work in progress regarding the integration of learning analytics and learning design in the frame of the Integrated Learning Design Environment (ILDE). ILDE is a community platform where teachers can design learning activities using multiple authoring tools. Authoring tools can be generic, meaning that designs authored can be deployed in multiple learning systems, or specific, when designs authored can be deployed in particular systems (e.g., mobile learning applications). These particular systems may be devoted to supporting activities in specific virtual or physical spaces. For across-spaces learning designs involving multiple systems to support activities in diverse spaces, ILDE enables the selection and articulation of multiple authoring tools in what we call “design workflows”. This paper argues that this integrated approach to learning design can also benefit an articulated, meaningful interpretation of learning analytics across-spaces. This calls for an extension of ILDE incorporating learning analytics. The proposed extension is illustrated with activities across-spaces in a flipped classroom scenario.
Additional material:
The ILDE supports cooperation within "learning design" communities in which their members share and co-create multiple types of learning design solutions covering the complete lifecycle. This has been achieved by the integration of a number of existing free-and open-source tools. ILDE uses the LdShake platform to provide social networking features and to manage the integrated access to designs and tooling including conceptualization tools (OULDI templates), editors (WebCollage, OpenGLM), and deployment into VLEs (e.g., Moodle) via GLUE!-PS
See: Hernández-Leo, D., Asensio-Pérez, J.I., Derntl, M., Prieto, L.P., Chacón, J., (2014) ILDE: Community Environment for Conceptualizing, Authoring and Deploying Learning Activities. In: Proceedings of 9th European Conference on Technology Enhanced Learning, pp. 490-493.
Link: http://exemple.com
Sukno FM, Domínguez M, Ruiz A, Schiller D, Lingenfelser F, Pragst L, Kamateri E, Vrochidis S. A Multimodal Annotation Schema for Non-Verbal Affective Analysis in the Health-Care Domain. MARMI'16: 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction Proceedings.
The development of conversational agents with human interaction capabilities requires advanced affective state recognition integrating non-verbal cues from the different modalities constituting what in human communication we perceive as an overall affective state. Each of the modalities is often handled by a different subsystem that conveys only a partial interpretation of the whole and, as such, is evaluated only in terms of its partial view. To tackle this shortcoming, we investigate the generation of a unified multimodal annotation schema of non-verbal cues from the perspective of an inter-disciplinary group of experts. We aim at obtaining a common ground-truth with a unique representation using the Valence and Arousal space and a discrete non-linear scale of values. The proposed annotation schema is demonstrated on a corpus in the health-care domain but is scalable to other purposes. Preliminary results on inter-rater variability show a positive correlation of consensus level with high (absolute) values of Valence and Arousal as well as with the number of annotators labeling a given video sequence.
Additional material:
Bosch, J., Marxer, R., Gomez, E., “Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music”, Journal of New Music Research
The extraction of pitch information is arguably one of the most important tasks in automatic music description systems. However, previous research and evaluation datasets dealing with pitch estimation focused on relatively limited kinds of musical data. This work aims to broaden this scope by addressing symphonic western classical music recordings, focusing on pitch estimation for melody extraction. This material is characterized by a high number of overlapping sources, and by the fact that the melody may be played by different instrumental sections, often alternating within an excerpt. We evaluate the performance of eleven state-of-the-art pitch salience functions, multipitch estimation and melody extraction algorithms when determining the sequence of pitches corresponding to the main melody in a varied set of pieces. An important contribution of the present study is the proposed evaluation framework, including the annotation methodology, generated dataset and evaluation metrics. The results show that the assumptions made by certain methods hold better than others when dealing with this type of music signal, leading to a better performance. Additionally, we propose a simple method for combining the output of several algorithms, with promising results.
Additional material:
- Dataset ORCHSET - A dataset for melody extraction in symphonic music recordings
- Open Access version at UPF e-repository
Pons J, Lidy T, Serra X. Experimenting with Musically Motivated Convolutional Neural Networks. 14th International Workshop on Content-based Multimedia Indexing (CBMI2016)
A common criticism to deep learning relates to the difficulty of understanding the underlying relationships that the neural networks are learning, thus behaving like a blackbox. In this article we explore various architectural choices of relevance for music signals classification tasks in order to start understanding what the chosen networks are learning. We first discuss how convolutional filters with different shapes can fit specific musical concepts and based on that we propose several musically motivated architectures. These architectures are then assessed by measuring the accuracy of the deep learning model in the prediction of various music classes using a known dataset of audio recordings of ballroom music. The classes in this dataset have a strong correlation with tempo, what allows assessing if the proposed architectures are learning frequency and/or time dependencies. Additionally, a black-box model is proposed as a baseline for comparison. With these experiments we have been able to understand what some deep learning based algorithms can learn from a particular set of data.
Additional material:
- Best Paper Award at #CBMI2016
- Version at UPF repository
- Codein Github (Python, it requires having Lasagne-Theano and Essentia installed)
- Ballroom dataset
- Slides presented at the conference
- Updated record of the publication at the web of the Music Technology Group
Oramas, S., Espinosa-Anke L., Lawlor A., Serra X., & Saggion H. Exploring Customer Reviews for Music Genre Classification and Evolutionary Studies. 17th International Society for Music Information Retrieval Conference (ISMIR'16)
In this paper, we explore a large multimodal dataset ofabout 65k albums constructed from a combination of Amazoncustomer reviews,MusicBrainz metadata and AcousticBrainzaudio descriptors. Review texts are further enrichedwith named entity disambiguationalong with polarityinformation derived from an aspect-based sentimentanalysis framework. This dataset constitutes the cornerstoneof two main contributions: First, we perform experimentson music genre classification, exploring a varietyof feature types, including semantic, sentimental andacoustic features. These experiments show that modelingsemantic informationcontributes to outperforming strongbag-of-words baselines. Second, we provide a diachronicstudy of the criticism of musicgenres via a quantitativeanalysis of the polarity associated to musical aspects overtime. Our analysis hints at a potential correlation betweenkey cultural and geopolitical events and the language andevolving sentiments found in music reviews.
Additional material:
-
MARD: Multimodal Album Reviews Dataset (details and download)
-
Subset of MARD used for genre classification is released together with the evaluation code in the following GitHub repository
-
The MARD dataset will be introduced in the next ISMIR tutorial "Natural Language Processing for MIR" https://wp.nyu.edu/ismir2016/
event/tutorials/
Espinosa-Anke L, Carlini R, Ronzano F, Saggion H. DEFEXT: A Semi Supervised Definition Extraction Tool. Globalex: Lexicographic Resources for Human Language Technology
We present DEFEXT, an easy to use semi supervised Definition Extraction Tool. DEFEXT is designed to extract from a target corpus those textual fragments where a term is explicitly mentioned together with its core features, i.e. its definition. It works on the back of a Conditional Random Fields based sequential labeling algorithm and a bootstrapping approach. Bootstrapping enables the model to gradually become more aware of the idiosyncrasies of the target corpus. In this paper we describe the main components of the toolkit as well as experimental results stemming from both automatic and manual evaluation. We release DEFEXT as open source along with the necessary files to run it in any Unix machine. We also provide access to training and test data for immediate use.
Keywords: bootstrapping, crf, definition extraction, lexicography, Machine learning
Additional material:
- Download pdf
- arXiv version
- DefExt Software at BitBucket
Fisas B, Ronzano F, Saggion H. A Multi-Layered Annotated Corpus of Scientific Papers. Language Resource and Evaluation Conference 2016
Scientific literature records the research process with a standardized structure and provides the clues to track the progress in a scientific field. Understanding its internal structure and content is of paramount importance for natural language processing (NLP) technologies. To meet this requirement, we have developed a multi-layered annotated corpus of scientific papers in the domain of Computer Graphics. Sentences are annotated with respect to their role in the argumentative structure of the discourse. The purpose of each citation is specified. Special features of the scientific discourse such as advantages and disadvantages are identified. In addition, a grade is allocated to each sentence according to its relevance for being included in a summary. To the best of our knowledge, this complex, multi-layered collection of annotations and metadata characterizing a set of research papers had never been grouped together before in one corpus and therefore constitutes a newer, richer resource with respect to those currently available in the field.
Additional material:
- Download pdf
- Dr. Inventor Multi-layer ScientificCorpus Download
Francès G, Geffner H. E-STRIPS: Existential Quantification in Planning and Constraint Satisfaction. Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016)
Existentially quantified variables in goals and action preconditions are part of the standard PDDL planning language, yet few planners support them, while those that do compile them away at an exponential cost. In this work, we argue that existential variables are an essential feature for representing and reasoning with constraints in planning, and that it is harmful to compile them away or avoid them altogether, since this hides part of the problem structure that can be exploited computationally. We show how to do this by formulating an extension of the standard delete-relaxation heuristics that handles existential variables. While this extension is simple, the consequences for both modeling and computation are important. Furthermore, by allowing existential variables in STRIPS and treating them properly, CSPs can be represented and solved in a direct manner as action-less, fluent-less STRIPS planning problems, something important for problems involving restrictions. In addition, functional fluents in Functional STRIPS can be compiled away with no effect on the structure and informativeness of the resulting heuristic. Experiments are reported comparing our native ESTRIPS planner with state-of-the-art STRIPS planners over compiled and propositional encodings, and with a Functional STRIPS planner.
Additional material:
- The FS Planner: The exact version of the
FS
planner used to run the tests for the experimental section will soon be published into the main planner repository, but is available until then under request. - Benchmarks: The exact benchmarks that were used to run the tests for the experimental section of the paper can be found here. Each subdirectory name (e.g.
block-grouping-strips-ex/
) is made up by the name of the particular planning domain (block-grouping
) plus a number of tags indicating the type of encoding (STRIPS with existential quantification, in this case, but could also be e.g.fn
for Functional STRIPS, or merelystrips
for standard STRIPS without existential quantification). Random instance generators are available under request.
Lotinac D, Segovia-Aguas J, Jiménez S, Jonsson A. Automatic Generation of High-Level State Features for Generalized Planning. Proceedings of the 25th International Joint Conference on Artificial Intelligence; 2016 July 9-15; New York, United States. Palo Alto: AAAI Press; 2016. p. 3199-3205
In many domains generalized plans can only be computed if certain high-level state features, i.e. features that capture key concepts to accurately distinguish between states and make good decisions, are available. In most applications of generalized planning such features are hand-coded by an expert. This paper presents a novel method to automatically generate high-level state features for solving a generalized planning problem. Our method extends a compilation of generalized planning into classical planning and integrates the computation of generalized plans with the computation of features, in the form of conjunctive queries. Experiments show that we generate features for diverse generalized planning problems and hence, compute generalized plans without providing a prior high-level representation of the states. We also bring a new landscape of challenging benchmarks to classical planning since our compilation naturally models classification tasks as classical planning problems.
Additional material:
Segovia-Aguas J, Jiménez S, Jonsson A. Hierarchical Finite State Controllers for Generalized Planning. Proceedings of the 25th International Joint Conference on Artificial Intelligence; 2016 July 9-15; New York, United States. Palo Alto: AAAI Press; 2016. p. 2325-41.
Finite State Controllers (FSCs) are an effective way to represent sequential plans compactly. By imposing appropriate conditions on transitions, FSCs can also represent generalized plans that solve a range of planning problems from a given domain. In this paper we introduce the concept of hierarchical FSCs for planning by allowing controllers to call other controllers. We show that hierarchical FSCs can represent generalized plans more compactly than individual FSCs. Moreover, our call mechanism makes it possible to generate hierarchical FSCs in a modular fashion, or even to apply recursion. We also introduce a compilation that enables a classical planner to generate hierarchical FSCs that solve challenging generalized planning problems. The compilation takes as input a set of planning problems from a given domain and outputs a single classical planning problem, whose solution corresponds to a hierarchical FSC.
Addtitional material:
- Postprint version in UPF e-repositori
- IJCAI-16 Distinguished Paper Award
Manathunga K, Hernández-Leo D. A Multiple Constraints Framework for Collaborative Learning Flow Orchestration. Advances in Web-Based Learning – ICWL 2016. 15th International Conference
Collaborative Learning Flow Patterns (e.g., Jigsaw) offer sound pedagogical strategies to foster fruitful social interactions among learners. The pedagogy behind the patterns involves a set of intrinsic constraints that need to/nbe considered when orchestrating the learning flow. These constraints relate to the organization of the flow (e.g., Jigsaw pattern - a global problem is divided into sub-problems and a constraint is that there need to be at least one expert group working on each sub-problem) and group formation policies (e.g., groups solving the global problem need to have at least one member coming from a different previous expert group). Besides, characteristics of specific learning situations such as learners’ profile and technological tools used provide additional parameters that can be considered as context-related extrinsic constraints relevant to the orchestration (e.g., heterogeneous groups depending on experience or interests). This paper proposes a constraint framework that considers different constraints for orchestration services enabling adaptive computation of orchestration aspects. Substantiation of the framework with a case study demonstrated the feasibility, usefulness and the expressiveness of the framework.
Additional material:
Ronzano F, Saggion H. Knowledge Extraction and Modeling from Scientific Publications. Enhancing Scholarly Data Workshop –SAVE-SD2016
During the last decade the amount of scientific articles available online has substantially grown in parallel with the adoption of the Open Access publishing model. Nowadays researchers, as well as any other interested actor, are often overwhelmed by the enormous and continuously growing amount of publications to consider in order to perform any complete and careful assessment of scientific literature. As a consequence, new methodologies and automated tools to ease the extraction, semantic representation and browsing of information from papers are necessary. We propose a platform to automatically extract, enrich and characterize several structural and semantic aspects of scientific publications, representing them as RDF datasets. We analyze papers by relying on the scientific Text Mining Framework developed in the context of the European Project Dr. Inventor. We evaluate how the Framework supports two core scientific text analysis tasks: rhetorical sentence classification and extractive text summarization. To ease the exploration of the distinct facets of scientific knowledge extracted by our platform, we present a set of tailored Web visualizations. We provide on-line access to both the RDF datasets and the Web visualizations generated by mining the papers of the 2015 ACL-IJCNLP Conference.
Keywords: scientific knowledge extraction, knowledge modeling, RDF, software framework
Additional material:
- Dr. Inventor Text Mining Framework (Version 1.4, released on: 28/4/2016)
Ronzano, F., & Saggion, H.: Dr. Inventor Framework: Extracting Structured Information from Scientific Publications. Discovery Science (pp. 209-220). Springer International Publishing. (2015)
Fisas, B., Ronzano, F., & Saggion, H. (2015). A Multi-Layered Annotated Corpus of Scientific Papers. To appear in the LREC Conference 2016.
This Corpus includes 40 Computer Graphics papers containing 8,877 sentences that have been manually annotated with respect to their scientific discourse rhetorical category. Moreover, the corpus includes for each paper three handwritten summaries of maximum 250 words.
Saggion, H.: SUMMA: A robust and adaptable summarization tool. Traitement Automatique des Langues, 49(2) (2008
Urbano J, Marrero M. Toward Estimating the Rank Correlation between the Test Collection Results and the True System Performance. International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016
The Kendall and AP rank correlation coefficients have become mainstream in Information Retrieval research for comparing the rankings of systems produced by two different evaluation conditions, such as di erent e ectiveness measures or pool depths. However, in this paper we focus on the expected rank correlation between the mean scores observed with a test collection and the true unobservable means under the same conditions. In particular, we propose statistical estimators of and AP correlations following both parametric and non-parametric approaches, and with special emphasis on small topic sets. Through large scale simulation with TREC data, we study the error and bias of the estimators. In general, such estimates of expected correlation with the true ranking may accompany the results reported from an evaluation experiment, as an easy to understand gure of reliability. All the results in this paper are fully reproducible with data and code available online.
Keywords: Evaluation; Test Collection; Correlation; Kendall; Average Precision; Estimation
id="docs-internal-guid-6a53cc18-ac33-3ee0-17f9-c4d3b68e278a" >Downloads:
-
Full text: PDF
-
Citation: BibTEX
- Code and data: GitHub
Aragon P, Ǵomez V, Kaltenbrunner A. Visualization Tool for Collective Awareness in a Platform of Citizen Proposals. 10th International AAAI Conference on Web and Social Media - ICWSM-16
Online debate tools for participatory democracy and crowdsourcing legislation are limited by different factors. One of them arises when discussion of proposals reaches a large number of contributions and therefore citizens encounter difficulties in mapping the arguments that constitute the dialectical debate. To address this issue, we present a visualization tool that shows the discussion of any proposal as an interactive radial tree. The tool builds on Decide Madrid, a recently created platform for direct democracy launched by the City Council of Madrid. Decide Madrid is one of the most relevant platforms that allows citizens to propose, debate and prioritise policies for city policies.
Additional material:
- Technical details of the implementation (e.g. layout parametrization) in DecideViz Github repository
- Pablo Aragon's blog with updates on his work and related materials for download
- Decide Madrid
Oramas S, Espinosa-Anke L, Sordo M, Saggion H, Serra X. ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain. Proceedings of the Language Resource and Evaluation Conference 2016
In this paper we present a gold standard dataset for Entity Linking (EL) in the Music Domain. It contains thousands of musical named entities such as Artist, Song or Record Label, which have been automatically annotated on a set of artist biographies coming from the Music website and social network Last.fm. The annotation process relies on the analysis of the hyperlinks present in the source texts and in a voting-based algorithm for EL, which considers, for each entity mention in text, the degree of agreement across three state-of-the-art EL systems. Manual evaluation shows that EL Precision is at least 94%, and due to its tunable nature, it is possible to derive annotations favouring higher Precision or Recall, at will. We make available the annotated dataset along with evaluation data and the code
Keywords: entity linking, Language Resources, music information retrieval
Additional material:
- Code in Github:ELVIS (Entity Linking Voting and Integration System):Framework to homogenize and combine the output of different entity linking tools, using the level of agreement as a conficence score
- ELMD dataset and evaluation data
Barbieri F, Ronzano F, Saggion H. What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis. Proceedings of the Language Resource and Evaluation Conference 2016 (in press)
Emojis allow us to describe objects, situations and even feelings with small images, providing a visual and quick way to communicate. In this paper, we analyse emojis used in Twitter with distributional semantic models. We retrieve 10 millions tweets posted by USA users, and we build several skip gram word embedding models by mapping in the same vectorial space both words and emojis. We test our models with semantic similarity experiments, comparing the output of our models with human assessment. We also carry out an exhaustive qualitative evaluation, showing interesting results
Keywords: Emojis, Social Networks, Embeddings
Additional material:
- Web of the publication with downloads
News:
- An automatic method extracts the meaning of popular emojis (UPF news, 20/04/2016)
- Crean un método automático que extrae significado de los Emoticonos teléfono (La Vanguardia, 20/04/2016)
- Un mètode automàtic extrau el significat dels populars emojis (EixDiari, 20/04/2016)
- La UPF analiza de manera automática el significado de los emoji (Informáticaesmas - CODDII, 20/04/2016)
- Un mètode automàtic extreu el significat de les Emoticones del telèfon (Diari de Girona, 21/04/2016)
Rankothge W, Ma J, Le F, Russo A, Lobo J. Towards making network function virtualization a cloud computing service. Proceedings of the 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM)
By allowing network functions to be virtualized and run on commodity hardware, NFV enables new properties (e.g., elastic scaling), and new service models for Service Providers, Enterprises, and Telecommunication Service Providers. However, for NFV to be offered as a service, several research problems still need to be addressed. In this paper, we focus and propose a new service chaining algorithm. Existing solutions suffer two main limitations: First, existing proposals often rely on mixed Integer Linear Programming to optimize VM allocation and network management, but our experiments show that such approach is too slow taking hours to find a solution. Second, although existing proposals have considered the VM placement and network configuration jointly, they frequently assume the network configuration cannot be changed. Instead, we believe that both computing and network resources should be able to be updated concurrently for increased flexibility and to satisfy SLA and Qos requirements. As such, we formulate and propose a Genetic Algorithm based approach to solve the VM allocation and network management problem. We built an experimental NFV platform, and run a set of experiments. The results show that our proposed GA approach can compute configurations to to three orders of magnitude faster than traditional solutions.
Additional material:
-
Datasets for the Evaluation of Virtualized Network Functions Resource Allocation Algorithms. This repository contains all the details about how we modelled general data into the specific data we wanted, with along the software we used and the assumptions we made during the data modelling process. Using this data and programs, the evaluation results presented in our publications can be easily reproduced.
Rankothge W, Le F, Russo A, Lobo J. Experimental results on the use of genetic algorithms for scaling virtualized network functions. 2015 IEEE Conference on Virtualization and Software Defined Network (NFV-SDN)
Network Function Virtualization (NFV) is bringing closer the possibility to truly migrate enterprise data centers into the cloud. However, for a Cloud Service Provider to offer such services, important questions include how and when to scale out/in resources to satisfy dynamic traffic/application demands. In previous work [1], we have proposed a platform called Network Function Center (NFC) to study research issues related to NFV and Network Functions (NFs). In a NFC, we assume NFs to be implemented on virtual machines that can be deployed in any server in the network. In this paper we present further experiments on the use of Genetic Algorithms (GAs) for scaling out/in NFs when the traffic changes dynamically. We combined data from previous empirical analyses [2], [3] to generate NF chains and for getting traffic patterns of a day and run simulations of resource allocation decision making. We have implemented different fitness functions with GA and compared their performance when scaling out/in over time.
Additional material:
- Datasets for the Evaluation of Virtualized Network Functions Resource Allocation Algorithms. This repository contains all the details about how we modelled general data into the specific data we wanted, with along the software we used and the assumptions we made during the data modelling process. Using this data and programs, the evaluation results presented in our publications can be easily reproduced.