The María de Maeztu Strategic Research Program on data-driven knowledge extraction at DTIC-UPF contributes to the overall strategic plan of boosting synergistic research initiatives across our different research areas. The objective is to deploy a strategic program that should allow us to tackle even more ambitious research and development problems, promote collaborations both within the department and internationally and boost the visibility of UPF around ICTs.
The program executes four actions aimed at strengthening the excellence and impact of our research, with a potential transformative effect at the level of the department that goes beyond the specific funds coming from the MdM program: support to research projects, data management infrastructure, exploitation and outreach, and management.
SUPPORT TO RESEARCH PROJECTS
During this first year we have selected via two internal calls a total of 18 research projects across the different research areas, which address core problems in knowledge extraction related to data gathering and structuring and to data modeling and interpretation, promoting the connection of the different areas and the consolidation of the existing research teams. The funds available from the MdM program are mostly used to cofund researchers at PhD or postdoc level and the expenses associated to research activities and dissemination of results (over 25 positions cofunded, a future post will provide details of all of them and their profiles), but the activities part of the strategic research program go beyond this direct financial support.
The list of projects is presented below, together with the presentations done during the María de Maeztu data-driven knowledge extraction workshop in June 2016 (for the projects selected at that time). In the corresponding web page for each project, additional information, including a video presentation for most of them and the results obtained so far can be found.
Knowledge Extraction for Retail
Large-Scale Multimedia Music Data
Educational Data Science
Understanding and Improving Social Interactions in Online Participation Platforms
HDR models and methods for cinema postproduction
Machine learning approaches for structuring large sound and music collections
Multimodal annotation for expressive communication
Automatic Topology Analysis for Distributed Anomalies Prevention Systems in the IoT
Technology Enhanced Learning for Instrument learning & analytics tools for assessment
Bio Image and Signal Analysis
Mining the Knowledge of Scientific Publications
Music meets Natural Language Processing
Enhancing Usability and Dissemination of Planning Tools
Data-driven distributed computation
Wireless networks with cognitive topology
Autism Spectrum Condition Multimodal Embodiment Open Repository
Wireless Networking through Learning: Searching for Optimality in Highly-dynamic and Decentralized scenarios
STOP.es: Suicide prevenTion in sOcial Platforms in Spain
The current level of execution of the projects is rather heterogeneous. While some projects continued existing activities, some of them create new connections between groups, starting only by the end of the year. 46 publications in international congresses and journals directly linked to the execution of the project have been made available. Among these publications, it is relevant to highlight some awards received:
The MdM strategy encompasses actions to foster collaborations across different groups. This inter-group collaboration will in turn facilitate the department to undertake educational activities and research projects in new areas. The strategic program allows to cofund shared postdocs and graduate students that establish bridges across areas, and further stimulate the interest generated by other actions such as the Integrative Seminar Series started in 2015.
14 of the selected projects include active cooperation between different research groups, both providing additional support to collaborations already initiated (some in the context of the DTIC PhD interdisciplinary fellowships, program that is expanded with MdM funds), and by establishing new ones. 2 research groups (the Natural Language Processing and the Artificial Intelligence and Machine Learning) have been specially active in establishing synergies with other research teams. The Bio Image and Signal Analysis Project is also an excellent example, as it aligns the work of all different group working in Biomedical systems, and involves experts from other research areas in DTIC such as the Interactive Technologies and the Artificial Inteligence and Machine Learning groups.
Examples of results based on inter-group collaborations:
In addition, 7 research assistantships were offered to current MSc students to support these collaborations. The requirement for the positions was to align them with any of the projects, and involve the supervision of 2 different staff members. In addition to the benefit to students, it is expected that some of them may be the seed for future longer term collaborations.
DATA MANAGEMENT INFRASTRUCTURE
The computing infrastructure is critical for data-driven research. We have been advancing during this year towards a state-of-the-art data hosting and sharing infrastructure building on the existing equipment and technical staff, but with the long-term objective of going beyond shared access to computation and storage, and be able to offer advanced technological and research services both internally and to external collaborators. This evolution of the computing services towards the model of general research infrastructures includes the definition of a mid-term sustainability model and the associated mechanisms to implement it (technical and financial).
During this year we advanced in launching the tender to upgrade the system, cofunded with the Call from the Ministry of Economy and Competitiveness for Scientific Infrastructures, supported by FEDER funds. It is expected that the new system will be fully functioning before summer, allowing to overcome the shortcomings of the current HPC. The UPF Informatics Unit will guarantee that the deployment of this new system will include the capacity to monitor the usage by any user (internal or external) and assign an estimation of the costs, as part of its own future sustainability.
In parallel, the current system is being equipped with initial GPU cards, thanks to donations kindly provided by the company Nvidia to members of the department.
The main challenges with respect to data management infrastructure are however more social and organisational than technical. The process is being useful to reconsider the governance for this critical infrastructure and evolve towards a capacity that parallels that of international competitors. During this year, the department has appointed an academic manager of the infrastructure (Jérôme Noailly), responsible for setting the basis to ensure that the future management of the infrastructure responds to the scientific development of the department, and its organisational and operational needs. Ongoing work includes a detailed survey of the anticipated needs across the different research groups. The result of this survey will guide the implementation of the new HPC, the definition of a policy by internal and external users, as well as the consideration of other external providers of cloud services. In fact, the full alignment with reproducible and open research (see below) creates additional challenges in terms of the management of the infrastructure (such as the funding model, which supports research creation and publication, but not the costs linked to later re-use) and the need for new arrangements between infrastructure managers and scientific experts. UPF needs to find the solutions to expand and consolidate the hardware/software infrastructure that supports and exploits a number of data collections, following data policies and repository requirements specified in the scientific community and taking initiatives to have it internationally recognized by peers. The work has started during this year together with the Library and Informatics Units at UPF, but a simple strategy that prioritizes the goals of reusability, scalability, sustainability and openness has not been fully defined so far.
TRAINING, EXPLOITATION AND OUTREACH
We are organising a number of activities related to the research objectives and priorities identified, both to train researchers and to present research results. We aim at communicating our advances via our web, including press releases (see the News section, and our blog, the twitter account of the department) and results (publications, datasets and software), and reinforce the outreach activities carried out by our researchers (such as the ongoing regular collaboration by Emilia Gómez in Radio Clásica).
The term Open Science is used to describe the research trend that promotes open collaborations together with open and reproducible publishing models. In addition to the model for the infrastructure described above, the Strategic Program has been during the year putting significant effort in the aspects associated to reproducibility of research. During this year the effort has been directed to create awareness at the level of the department and start training and support actions.
In terms of awareness and training, the program has launched two awards at undergraduate and postgraduate level (María de Maeztu Reproducibility Award - PhD workshop 2016 and Award for Reproducibility in Software - Best ICT Bachelor's Thesis in Spain 2016), it is preparing guidelines to facilitate the publication in open access of results, the use of software tools for software design and distribution (such as group github accounts) and it has organised several seminars with external speakers:
and internal speakers:
These actions aim at promoting that our staff publishes data and software together with publications. A first Open Access policy has been defined for MdM positions, which has been recommended to be expanded for all positions funded with DTIC funds (such as PhD fellowships). During the year we have carried out joint work with UPF Library and Informatics to both improve the way results are accessible in the future at the UPF open access repository and to streamline the process to deposit data. While a better repository for the scientific production is available, we are making the effort to facilitate in our publication list the links to the relevant additional material, including also press material or presentations, guaranteeing that these resources are accessible and available with an adequate license. A community has been created in Zenodo, alternative repository recommended for use when a domain-specific repository is not internationally recognised for that type of results. Finally, for the case of large repositories such as Freesound, we keep Freesound Labs as a directory of projects, hacks, apps, research and other initiatives that use content in Freesound or the Freesound API. However, further work is needed to finalise an optimal internal data policy (we have started work with id law partners and UPF services) and create detailed recommendations and policy.
Several researchers are further working in this line by creating specific pages for each publication which include layman summaries or demos, such as:
And finally, a growing number of researchers explain their results in personal blogs, such as those by Pablo Aragón, Emilia Gómez, Davinia Hernández-Leo or Jordi Pons, providing additional access to research results. The María de Maeztu Program has also started its own blog, where advances in the project are shared, and it is expected to further use this blog to make additional dissemination of results. According to the statistics available in the blog, the entries have received over 200.000 visits in 2016.
The interest in deep learning continues to grow worldwide, and this interest spans several of our research groups. For that purpose, a number of actions have been launched and supported during the year.
Deep Learning Study Group: On the one hand, the creation of the deep-learning study group, a self-organised study group (led by Ramón Nogueira), where participants learn and present several topics linked to deep learning, following the book https://www.deeplearningbook.org/. The slides from these sessions are available in the web. Speakers: Miquel Junyent (AI and ML group), Oualid Benkarim (Simbiosys), Gabriel Bernardino (Physense), Olga Slizovskaia (MTG), Constantine Butakoff (Physense), Andrea Insabato (CNS), Ramón Nogueira (TCN), Gerard Sanromà (Simbiosys), Jordi Pons (MTG), Cecilia Nunes (Physense), Jeremy Barnes (UPF Dept. Language Sciences), Marius Miron (MTG), Rutgero Bettinardi (CNS)
Deep Learning and Data Science Seminars: In addition, in collaboration with other researchers interested in Barcelona in the topic (specially with the Barcelona Machine Learning Study Group), we are running and hosting several talks linked to the topic, widely disseminated in Barcelona:
As part of the exploitation and dissemination efforts, the researchers part of the MdM strategic program have organised a number of events, which complement the annual outreach event. The collaboration and participation of external partners is sought, with a special focus on involving other Severo Ochoa and María de Maeztu centers and units. A non-exhaustive list of events organised by or linked to the strategic program in 2016 can be found below:
The María de Maeztu Strategic Research programs seeks to strengthen the links between the research and educational activities at the Department. A main milestone by the Department is the launch in 2017-18 of the new degree in Mathematical Engineering in Data Science.
The projects seek the involvement of students in their execution, offering positions and bachelor's and master's thesis asociated to them. As indicated previously, an additional specific program of research assistantships for current MSc students to support the projects supported by María de Maeztu funds has been launched. This program will pilot the potential impact of such an action both with respect to promoting the cross-collaboration between DTIC members, setting seeds for future PhD projects and reinforce the links between research and master's programs.
The events previously described are also an excellent opportunity for students to take part in international research and networking activities and, as such, are disseminated among them and their particiption encouraged. Efforts are also being made to repeat activities carried out in international congresses in Barcelona in the context of the master's programs, such as the Tutorial "Natural Language Processing for Music Information Retrieval" to be held on January 30th 2017, that replicates the one offered at the ISMIR 2016 conference in NYC. This open tutorial is offered in the context of the Master in Sound and Music Computing and the Master in Intelligent Interactive Systems, and open to the broader community in Barcelona (other researchers working at DTIC, but also on Computational Linguistics at the UPF Dept. of Language Sciences and Translation have already confirmed also their interest in taking part in it). This type of effort and material is expected to bring additional impact by the reuse of material with a very low cost.
It is worth mentioning that this alignment has included also the gender program described below, with the launch of a new mentoring program supported by the UPF program for Innovation in Teaching. And, finally, that this direct relation with research is bringing results in the form of research awards for students at undergraduate and postgraduate level (see the blog post about the last one obtained).
In addition to the support of the traditional technology and knowledge transfer mechanisms, the María de Maeztu Strategic Research Program aims at contributing to bidirectional transfer models aligned with the Open Science and Open Innovation movements. To facilitate the conversion of research results into economic value, innovation ecosystems should be created by integrating all relevant stakeholders. In order to properly carry out our research and education goals, it is important to be active partners in real innovation ecosystems. Only if we are adequately integrated into real and fruitful social and economic contexts we will succeed in defining and carrying out relevant research and in adequately training researchers capable to be active in the current socio-economic context. It could be sufficient for the Department to focus on doing good research and thus to rely on external companies to convert our results into socio-economic value. This is what most research groups do. But we want to do more than just that, apart from doing good research we want to develop and maintain technologies and services that by themselves have socio-economic value and we want to be close to our users. During 2017 we will close and implement a specific strategy to promote transfer in this line (progress will be shared in this web page), building on already started initiatives, such as:
An additional strategy increasingly promoted by the Department, and aligned with main policies at Catalan, Spanish and European level, is the execution of Industrial PhD projects, seeking to find synergies among research, education and transfer. So far, 14 positions have started with the support of public agencies (see the list of Industrial PhD positions at the Department).
Even with the best of intentions and deliberate efforts to achieve balance in the recruitment of staff, the reality TODAY is that there are simply not enough qualified female candidates to achieve this goal. The only sustainable, long-term solution is to RETAIN TALENTED YOUNG FEMALES in Science, Technology, Engineering and Mathematics (STEM) fields AT EARLIER STAGES. Some will then continue to higher stages of education, and change the reality of TOMORROW for the recruitment of staff. For this purpose, we are running a series of activities specially targeted to young females, but also to current professionals to guarantee they are retained in the field and the success in their professional careers gets maximised. The interest in these actions is also acknowledged by the large number of entities taking part in them, as co-organisers, collaborators and sponsors, and is endorsed by the UPF Equality Program (UPF Igualtat). The program works on 4 lines:
We have been organising several events for girls from 7 to 17 years old (#GirlsHacks) to get in contact with several technologies: scratch, app inventor, virtual reality, 3D printing, Yoway, personalities in robots, building your home automation, Lego NXT and Arduino. In these activities, over 1.000 people have taken part during the year. The evaluation of one of the events is described in Lessons learned in promoting new technologies and engineering in girls through a girls hackathon and mentoring, presented at Edulearn16.
In addition, we have been adding a gender parity criterion in other outreach activities (such as the ones at the American Space described below). These actions are being carried out in collaboration with external organisations such as GirlsinLab and MujeresTech, and with the financial support and collaborations or organisations such as Google, Eurecat, Accenture or Abacus.
We have launched this year MENTOS, a mentoring program for female students of our engineering programs, which during this academic course puts them in contact with professionals to support their professional and personal development. The program is significantly inspired by the Women in Music Information Retrieval (WiMIR) mentoring program, with the participation of Emilia Gómez (Verónica Moreno, Maria Rauschenberger and Aurelio Ruiz complete the leading team). The program was submitted to the UPF CLIK Innovation program for evaluation and support (proposal here).
In parallel, the Polytechnical School is conducting a study on gender perspective in our engineering degrees.
Finally, we are supporting and driving a number of talks and networks that support women in ICT, and promoting the participation of our female researchers in similar external actions, and joining international celebrations (such as the Ada Lovelace Day or participating in UPF activities on March 8th).
We support the Gender and Technology series, an effort by Civic Lab Barcelona, OuiShare and Autentika, to run initiatives based on Civic Tech related to the fight to reduce the digital gender gap and break the glass ceiling that makes it difficult for women in technology. Specifically we are co-organising the Gender & Wikipedia day on 17/01/2017 and the celebration of the Girls and Technology International Day.
We have been also promoting that the gender perspective is included in internal training actions, such as the DTIC seminars, and supporting the Gender & Science Journal Club launched by researchers at the department. During this year the gender-related seminars and talks were:
In line with the vision to generate impct that goes beyond the academic and industrial fields, the Strategic Program has started a program that aims at fostering the potential that the academia has to generate social impact, specially in actions promoting the active participation and leadership of citizens and civic organisations. The program aims at reinforcing the rol of the "citizen scientist", having the academia at their service. A specific target for 2017 is to seek the sustainability of these actions in the mid-term, in terms both of engagement of the communities, and of needed resources (an optimistic result is the recent announcement by the consulting company NAE of their sponsoring for 2017 activities on gender & ICT driven by the Department, we expect more private sponsors to follow the example).
The Gender and ICT plan has been the first of the actions, which has now enough mass to become a program on its own, with actions such as the Viquidones group which aim at improving the impact of collaborative endevours as Wikipedia. Several of the research projects, such as Understanding and Improving Social Interactions in Online Participation Platforms, Educational Data Science or Autism Spectrum Condition Multimodal Embodiment Open Repository are, on their own definition, also direclty aligned with these ambitions that this strategy expects to reinforce. Some other highlights:
During this year the program has set up the management mechanisms to support its efficient execution. Led by the Scientific Director and with a project manager in charge of the implementation of the strategy, it is organized around a scientific board (including the MdM guarantors), an executive direction (including the DTIC director and DTIC deputy director for research), the participation of all PIs of MdM projects in regular meetings, the support of the external advisory board of the department during their visit for evaluation in June 2016 (see blog post) and open presentations of the program open to the whole community (kick-off presentation after the award, and the MdM workshop organised in June), guaranteeing that the strategy, its policies and its progress are shared with the whole department. This governing structure has been monitoring and implementing the definition of calls (in this link the first one launched at the beginning of the program, and in this link the second one launched in September 2016), selection of projects, definition of procedures to distribute resources, etc. The execution of the program strongly relies on the existing procedures available at DTIC for its regular work (such as for the announcement of open positions).
The program has also been actively participating in activities aimed at strenthening the network of Units and Center of Excellence, such as the award ceremony in Madrid (news). the meeting with the King and Queen of Spain (news) or the networking event held at UPF in conjunction with the official presentation of the Spanish Research Agency in Barcelona (news). DTIC maintains close collaborations with several of them, which have been involved in the activities during this year (such as CEXS-UPF, BSC, IBEC or CRG), and this cooperation is expected to be increased in the future in the context of the future networking activities planned.
As a strategic program of the department, its execution requires to align existing funds to those provided by the Spanish Ministry. As indicated above, a criterion for the selection of projects has been the existence of other sources that cofund its execution. Several of the projects are aligned with funded EU projects such as TELMI or KRISTINA, the COST action NEUBIAS, and new projects directly aligned with MdM actions obtained in 2016 such as ChangeMakers, led by Davinia Hernández Leo. Additional funds available for the MdM projects have also included 4 PhD grants within the program funded by the Ministry for predoctoral research, and the participation in the "la Caixa" - INPhINIT fellowships for doctoral studies at Spanish Research Centers of Excellence (news and positions offered).
In addition, the MdM strategy has been fully aligned with the activities already funded by the Department (such as the PhD fellowships) in order to add critical mass to both actions and increase the impact.
With respect to matching funds secured during the year, it is specially relevant to highlight the 4 ERC grants obtained by DTIC members actively involved in the MdM program (3 of them guarantors): Angel Lozano (Advanced Grant, guarantor and IP of the MdM project Wireless networks with cognitive topology), Albert Guillén (Consolidator Grant, guarantor), Toni Ivorra (consolidator Grant, participant in MdM project Bioimage and signal analysis) and Marcelo Bertalmío (Proof of Concept Grant, guarantor and PI of the project HDR models and methods for cinema postproduction). In addition to the funds, these recognition provide an external high-quality endorsement of the research activities conducted by DTIC faculty. In these links some news outline the projects (Consolidator grants, Advanced Grant and Proof of Concept Grants).
At the level of the department, during this year the Strategic Research Program has also achieved to obtain funds to secure the infrastructure needed to carry out the planned data-driven research via Ministry / FEDER funds (see above in cluster) and the participation of private and public institutions in its outreach actions (such as google, Accenture, etc). Future work will include the search of funds that complement all other actions lines in the program (such as exploitation, outreach).