Back Transversal actions 2016-17: Innovation

The María de Maeztu program aims at establishing transversal projects that contribute to an increased impact of the research activities conducted at the Department, both at international and at local level.

When creating a new program, a special focus has been put in 1) launching novel initiatives that cover an existing gap and/or supporting existing external pioneering projects; 2) mobilizing novel sources of funds currently underexploited (third sector, public and private sources targeting social challenges) to promote the sustainability of he actions over time when the funds from the María de Maeztu project end.

In a series of blog posts we will cover the advance across three different segments:

  • Scientific community (post)
  • Industrial communities, open science and innovation (this post)
  • Non-scientific communities, general public and students at all pre-university levels, with a focus on those sectors less covered by general outreach actions

 

INDUSTRY AND TECH TRANSFER COMMUNITY

This section covers the work done to improve the exploitation of research results and to explore and support novel ways to interact with the industry.

In terms of the classical tech transfer (see some indicators at this blog post), the program is not implementing direct actions by itself, as all the services at the University are run by the Innovation Unit of the University, which in the period 2016-17 identified and registered 18 technologies, including three patents (two on image processing for cinema applications and one on electrical mapping for biomedical application). But as detailed in the mentioned post

“While spin-offs provide a high social and economical general benefit, their impact in the department in the mid-term is very low. When an external company is created, it “resets” the internal knowledge (technology and people) portfolio. In practical and internal terms, those processes may result in a weaker department. And the current reality is that the incomes generated by traditional technology transfer (such as royalties, licences, etc) are marginal. This line of work, by itself, results insufficient for a long-term strategy as a department.”

An interesting result from this process is the fact that the alternative funding models explored, while reduced, are already obtaining higher benefits to the department than those from the traditional direct licensing of technology.

 

The Open Science and Innovation program

The rationale for the creation of the program is described with detail in the post “Opening Science to Open Innovation”. During 2017 we closed and started the implementation of a specific action to promote the most promising mid-term exploitation plans for the ongoing María de Maeztu projects (web), building on already started initiatives and focusing on their sustainability beyond the funds provided by the María de Maeztu program and the alignment the wider Open Science strategy (postDeveloping a department of open scholars).The selected projects address several of the aimed objectives:

  • Funding Open Science as a complement (and not in opposition) to the traditional tech transfer strategies
  • Designing business models for research groups and their initiatives across varied potential funding sources
  • The support to alternative professional profiles, like software developers, acknowledging their increasingly important role in ICT departments despite the lack of mechanisms to develop a professional career in that role in the academic world.

The selected cases are not the only ones available at the Department. Essentia, open-source library and tools for audio and music analysis, description and synthesis, has been already working to create exploitation models and industrial collaborations around an open library. Essentia is available under open licensing for non-commercial uses, and is licensed in industrial projects to several companies, promoting a sustainable model for its further development.

It is important to highlight that the program also favours exploitation models directly run from the department, and not necessarily requiring the creation of a separate spin-off, in order to make the department stronger as a result of this exploitation. Below we present the four initial selected projects. The starting point for each of the projects is significantly different. Each of the projects also targets a very different industrial and research field. But they also share common approaches such as the potential of integrating research results and needs in platforms which are sustainable over time (facing some of the problems in academic research, such as loss of continuity of individual efforts and results, or the fact that many of the platforms specifically funded in research projects are stopped after the projects end) or the impotant work needed to define proper data management and licensing terms for the provision of services. Therefore, in addition to the individual results of the projects, we expect the program to generate valuable knowledge about these processes in order to feed wider policies in our context. Until now, the projects supported have managed to advance in their development, to initiate the cooperation of relevant external institutions for its targets (such as Google, the Bill & Melinda Gates Foundation, the Zenodo platform at CERN or the Barcelona Consortium of Education) and to increase their base of users in promising directions.

 

Sustainable Freesound:

Starting point: Freesound is consolidated as the leading international site for sound sharing under CC licences (see the blog post on impact in 2016, with over 150 million page views). The innovation program aims to reinforce its sustainability and to promote Freesound (see blog post) as a site to support the creative industries that use sounds in their productions.

Main objective and sustainability model: The main original goal of the project is to make Freesound.org sustainable by promoting donations from the users. Freesound.org is already a success story in the open movement (the largest platform in the world for sharing sounds under creative common licenses, with over 8 million registered users in the world) and represents an excellent showcase about the challenges that successful open initiatives also need to face in a university environment.

In addition, Freesound has been always used by professionals (such as in the audiovisual industry), with an impact which is unfortunately difficult to measure.  The current needs of machine learning models (which in addition to large numbers, require high-quality datasets) have opened new scenarios for the industrial use and benefits of Freesound (see Freesound Datasets: A Platform for the Creation of Open Audio Datasets).. Building and maintaining datasets using open tools and collaborative approaches are a promising approach which, in addition, can reinforce the crowdfunding target described below

Results so far:

  • Crowdfunding: the project has designed improvements in the design of the platform and is starting to launch campaigns to potential donors, focusing first on the most active users of the platform. The main target is to allow the sustainability of Freesound by covering in the mid-term the expenses for the two system administrators and developers running the web. It is still early to draw conclusions, but according to the current evolution, during the first year a coverage of 50% cuould be reached.   
  • Freesound Datasets: In parallel, the work has also contributed to the launch of Freesound Datasets, (see the announcement of its launch) a new platform developed during 2017 to foster the re-use of Freesound content in research contexts and that will eventually help us make Freesound better and better. The award of a Google Faculty Research Award to support the development of Freesound Datasets and the first dataset (FSD) and the start of a collaboration with Google’s Machine Perception Team to do research on machine listening represent a major outcome of this process of increasing the industrial value of the platform, while remaining open.

 

Sustainable Open Science Integrated Learning Design Environment - ILDE:

Starting point: Developed iteratively over the past 8 years and tested by 8 educational communities (educational centers, training units, transversal initiatives), involving over 1000 participants, the technological infrastructure ILDE (Integrated Learning Design Environment) enables the modelling and sharing of learning designs, already used by a number of institutions and groups working in the education industry worldwide. ILDE is a Community Environment that integrates a number of learning design conceptualization, authoring and implementation tools.

Main objective and sustainability model: In the context of the Educational Data Science project, the further development of ILDE and analytics tools is expected to reach a level of development suitable for valorisation and exploitation. Specific objectives include:

  • Integration of educational data science techniques
  • Actions to promote the use of ILDE and the creation and growth of communities of users
  • Creation of high-quality datasets (including the adoption of more responsible data gathering and management) and its promotion in the research community
  • Design of services models with the associated business plans, targeting the all institutions involved in the educational industry

Results so far: ILDE continues progressing in making the code and services openly available in GitHub, with an open API https://ilde.upf.edu/api/html/. The overall conceptual framework for the integration of data analytics has been published (Hernández-Leo, D., Martinez-Maldonado, R., Pardo, A., Muñoz-Cristóbal, J. A., & Rodríguez-Triana, M. J.. (accepted) Analytics for learning design: A layered framework and tools, British Journal of Educational Technology, pre-print) and its integration is continuing. An important work during this period has been the update of the legal documents associated to the use of the platform (such as the licence terms, cookie policy, terms of use, etc), in cooperation with idLaw partners, which has been also shared and very useful for other platforms under development at the department. In terms of wider visibility, the project is combining actions addressing the scientific community, including the visits of Daniel Spikol (Malmö University) or Roberto Martínez - Maldonado (University of Technology, Sydney) in the context of the Educational Data Science project, the promotion of the open release of datasets and agreements with international academic institutions for their use of ILDE, with actions targeting the users such as workshops for school teachers and the ILDE / edCrumble competition. Following the implementation of the María de Maeztu outreach program, we are currently also closing agreements to carry out direct actions in 10 schools in Barcelona, together with the Barcelona employment agency Barcelona Activa and The Consortium of Education, the entity managing all public schools in town, to promote a wider adoption of ILDE by teachers.

 

Development of the Rocket platform for collaborative clinical studies: Rocket is a unified (cloud-based) platform aimed at assisting clinicians, allowing them to store, visualize and process different types of clinical data: measurements, images, and reports. Current status here

Starting point: The Rocket platform is a web-based platform that was originally designed for the visualization and processing of multi-modal imaging and meta-data of cardiac patients to allow the exploration of common data by clinicians and engineers, as part of an European project (VP2HF, http://vp2hf.eu/). In the last two years, mainly due to the Maria de Maetzu UPF grant support, the Rocket platform has evolved towards a more general cloud-based architecture that builds customized and user-friendly platforms for different purposes such as benchmarking of algorithms to promote reproducible research. For instance, the Rocket platform has been successfully adapted to fulfill the requirements of the Neubias community (COST-EU project, network of European bioimage analysts, http://eubias.org/NEUBIAS/), which wanted to create a platform / social network of biomaging applications, sample data, people and a benchmarking system to test algorithms in the Cloud.

Main objective and sustainability model: A more extensive use of Rocket beyond the initial pilot case and its realisation as a tool to promote its use in collaborative clinical studies is the lack of some components linked to security, anonymization, distribution of computing, robust protocols for data sharing and storage, user-friendly interfaces, etc).

Results so far: the development of the Rocket platform has continued, with the recent presentation of Rocket app, a web platform that provides a user-friendly interface for sharing of data and benchmarking of algorithms, with a repository of biomedical data, a computational infrastructure to execute algorithms in the Cloud, and statistical tools to interpret results. Its presentation has generated great interest and showed its strong potential in contexts such as a collaborative data manager, or as a crowdsourced biomage analysis tool builder. Some of its components have been also released individually, such as the Rocker viewer (details). The Rocket viewer allows visualizing different kinds of data such as medical and biological images, 3D surfaces, electric signals (ECGs) and documents, either from the web or loading information from the local file system. Its code is available open source at the GitHub repository of BCN MedTech where further description, tutorials, and demo videos can be found. Also, the Rocket platform is already being used in a project granted by the Bill & Melinda Gates Foundation, in collaboration with The Aga Khan University (Karachi, Pakistan) and a pilot with Hospital del Mar is under preparation. 

 

Scientific Text Mining and Summarization Services: NLP tools for scientific analysis and easy reading:

Starting point: Over the past years, a number of technologies for summarisation and text mining have been developed and integrated in tools such as SUMMA and the Dr. Inventor Text Mining Framework, with a strong potential, both in “standard” exploitation models (such as spin-off creation) and in open science scenarios.

Main objective and sustainability model:  The project proposes the creation of a Scientific Text Mining Platform (ScienTMin), a Web framework that enables a wide range of services for content analyses and aggregations of scientific publications thanks to Text Mining, Summarization, Retrieval and Data Visualization techniques. ScienTMin relies on the technology implemented in the Dr Inventor Text Mining Library (EU Project Dr Inventor and the Maria de Maeztu project “Mining the Knowledge of Scientific Publications”) and the SUMMA summarization software (implemented by Dr Saggion and currently licensed by UPF). One of the main barriers we face in this project is the current prototypical status of the underlying software libraries. In order to develop a robust, stable platform able to deliver timely annotations, a core set of services has to be improved, make it able to deal with heavy batch processing (e.g. parallelization) and heavily tested. A limited number of robust services and a front-end interface need to be put in place to support users and process to consume our services.

Results so far: The development of the platform is underway, focusing currently on the document enrichment service. A first tool, PDF parser, has been released. The availability of clients willing to pay for services at this stage is very reduced, and three main strategies are being in parallel pursued:

  • Around customized services and domain / language adaptation, on the one hand first contacts are being established with Zenodo, the open access repository managed at CERN, to analyse and test the use of the tools in a real production system, advance in the knowledge about the needs a specific large repository has, and support the deployment of such a flagship open repository in Europe. In terms of domain, health was the chosen option as it is one of the fields with a wider potential users. The project has from the moment succeeded in the open call within the Advancement of the Natural Language Processing Plan of the Digital Agenda for Spain for the semi-automatic extension of the Unified Medical Language System (UMLS) to Spanish.
  • For single document summarization services, a pilot (SciSumServices) has been adapted and integrated OpenMinTed following the tender specfication. OpenMinTed is a H2020 project working on the creation of a platform that offer services and functionalities that are useful for text and data mining (such as being able to find or build corpora from OA scientific and scholarly literature data sources), and allow miners to share their tools and build their own workflows
  • Propose the tools, datasets and services in hackathons (the most recent result is the award to best summarization service at the CL-SciSumm 2018 challenge during the BIRNDL 2018 workshop at SIGIR 2018)