THE EVOLUTION OF THE OPEN SCIENCE APPROACH IN THE DTIC-UPF MARÍA DE MAEZTU PROGRAM
The concepts of open science, open innovation or open scholarship are in continuous evolution, and their promotion requires context-specific and adaptable solutions to advance successfully. This report shows this evolution in the context of the DTIC-UPF María de Maeztu program. The evolution can be also perceived following the entries from the start of the program (blog post Day Zero), the previous report for the first year of development and the post Opening Science to Open Innovation published in February 2017.
The principles of open science have been in the core of the María de Maeztu program since its inception. But specially since 2017, a special effort has been put in formalising this implementation to support and evaluate its progress. Two main documents are being used in this process: the “Open innovation, open science and open to the world - a vision for Europe” by the European Commission's Directorate-General for Research & Innovation (RTD), and the recent specific recommendations issued by LERU for universities in “Open science and its role in universities: a roadmap for cultural change”, which take into account the specificities of universities (such as its multi-layered nature, or the link between research and education).
Open Science from the FOSTER project OS in "Open Innovation, Open Science, Open to the world "
In a simplified way, the program encourages a scientific approach “as open as possible”, for two main reasons:
- Philosophical: open science fosters:
- the imperative of the reproducibility of research
- the obligation to bring back to the community the results of the investment done, especially in a department like ours, “over-dependent” on public funding
- Pragmatic: The ICT field is one of the fastest-moving areas in research. It is also an area which by its nature is global, networked and increasingly open. New scientific results and services are deployed and scaled quickly, and by a growing number of stakeholders - for instance, the research orientation of the ICT industrial sector is much higher than in other fields. Open source software is well-embedded in the ICT field. The open source movement has already a long tradition and has provided numerous success stories of the universal involvement of thousands of researchers and developers in high-quality and transformative results, supporting the wider cultural transformation to open science approaches, and giving examples of business models that support it (one of the exceptions is ICT for health, highly regulated as it deals with highly sensitive data such as patient data, and sensitive conclusions such as they affect human lives, but also since its adoption requires not only the acceptance by users, but also health administrators, professionals and regulators). The pragmatic reasons therefore include:
- The María de Maeztu program targets specifically the strategic area of data-driven computational research at the department. The growth in the access to data, tools and instruments, and the computational nature of its activity produces a natural expansion of the pursuit of academic research, aligned with open science concepts.
- The María de Maeztu program provides a unique opportunity to explore, experiment and develop actions that promote a cultural change that can later be adopted widely in the department, both by the capacity to allocate resources to the actions and by the capacity to drive novel pilot actions and adapt their adequacy for their later wider adoption by the department.
THE CONCEPTUAL FRAMEWORK SO FAR, AND STEPS GIVEN
As previously mentioned, the María de Maeztu program is allowing to define and test actions that promote open science and which are adequate to our specific context and the resources available. We present below a summary of the progress so far following the recommendations issued by LERU in each of the eight priority areas identified by the European Commission (detailed explanations for them will be available in separate blog posts):.
- 1 - CULTURAL CHANGE: The María de Maeztu is one strategic research program within the broader activities of the department, and as such has taken the leadership in developing the actions that support approaches across the eight pillars (scholarly publishing, data, infrastructure, education, recognition, metrics, integrity, citizen science) which are
i) specific to each individual context and as a department, and to the different areas represented by the research groups;
ii) realistic about their capacity to be realised;
iii) transferable and scalable to the wider activity of the department;
iv) continuously communicated internally, within the university and the wider research communities for continuous discussion and improvement
- 2 - SCHOLARLY PUBLISHING:
- Publications: the report on the scholar production in 2016-17 provides details about the current status in the promotion of the communication and sharing of all the inputs used for the results described in a specific publication (such as software or data). It also advocates the use of mechanisms to help discoverability and reúse of open outputs such as ORCID for authors or DOIs for research outputs and their associated material. Future steps will include the analysis of other elements such as FundRef, DataCite or Open Citations. This process has been supported by the organisation of training and communication actions around the invitation to a relevant actor in the different topics covered by open science (reproducibility, open repositories, open publication, etc see all the seminars in the reproducibility and transfer sections of the web, and later in the training section below). The main challenges for a complete adoption are linked to the availability and / or perception of rewards, especially with respect to recruitment and promotion, as well as to the capacity to find adequate solutions to the specificities of the broad range of fields covered by DTIC-UPF, also discussed below.
- Software: In addition, for an ICT department software (not added as an explicit pillar) is of grat relevance. The software code represents in many cases the specific implementation of the research process and / or the research result itself. The open source software movement has a long tradition and its values are embedded in a large part of our community, although not necessarily its best practices. It is also relevant that a significant part of staff in an interdiscplinary department as DTIC-UPF does not have a computer science background. The objective by the end of the program will be the definition and consolidation across the department of good practices around open source software. The initial effort has focused on the training actions and the promotion of the use of software repositories (GitHub, see a compilation of DTIC-UPF repositories in GitHub here). An initial effort to establish a community of software developers in the department has not succeeded so far, but new efforts will be made so that several of the existing good practices on open source software are shared and adopted by the wider DTIC-UPF community.
- 3 - FAIR DATA: the program has worked in cooperation with the different university services dealing with data management since 2016, to support the deployment of a wider data management strategy at the institutional level, as well as to understand its capabilities and limitations in order to establish external collaborations that are able to respond to the evolving needs of the researchers of the department (in general with more demanding needs in terms of data management than other departments at the university).
- For the management of research data previous to publication, the work on the infrastructure is detailed in the next item. Given the delay in the availability of the new infrastructure, it was needed to work on alternative solutions that did not block the deployment of the open science strategy until the infrastructure was available. The issue of the technical infrastructure is of special relevance when dealing with live collections of datasets, and with very large datasets (which in addition to storage may require solutions for data exploration). Another relevant aspect is to promote that the whole lifecycle of research data is considered as much as possible, to avoid many different storage solutions / procedures at each of its different steps.
- For the case of data storage and publication, the repository at the university is adequate for a reduced set of researchers, type of datasets and exploitation models, and it has limitations when integrating in the regular workflow (such as allowing blind reviews during publication, or the communication of datasets which for different reasons cannot be made public). A cooperation with Zenodo has been initiated, which includes not only storage and publication of data (see DTIC community in Zenodo), but also cooperations with the scientific mining tools developed at the department (see the blog entry on the first Bachelor’s thesis proposed). The solution, however, does not yet meet all the requirements from the department, specially for very large data such as those produced in biomedical research (for instance, synchrotron data and simulations), as well as with respect to the use of the new HPC as a data repository aligned with the international standards for repositories.
- For the management of data, a great challenge is the difficulty to define single policies for the wide variety of data that facilitate the work by the researchers, as the different types of data used (image in different contexts - including sensitive images, text, sound, clinical data, etc) each with its specific associated technical and legal aspects) and research strategies at the department. A cooperation has been established with idLaw partners to support the legal aspects linked to the definition of specific research data management plans and documentation for the publication of sensitive data in specific projects, as well as the transition to the new General Data Protection Regulation for platforms such as Freesound and the Integrated Learning Design Environment (ILDE), as this specific support is not available at the institution. A final next step will be the translation of a full basic protocol for data processing and its promotion within the department.
- For the exploitation of data and reuse, ongoing actions are currently being explored, such as the creation of field-specific communities in Zenodo where there is no one for the field (such as Music Information Retrieval or Educational Data Analytics) to promote accessibility, or the organisation and participation in research challenges or collaborative creation and annotation of datasets (details below)
i) provides now the computational and storage capacity to develop generic and specific tools to exploit data collections
ii) is able to provide analytics about its use for present and future management. On the one had, the explosion of data and computational requirements generates new demands and, on the other, the availability of external infrastructures and the perspectives for new ones such as the European Open Science Cloud, will require better decision-making processes to find optimal solutions over time that guarantee the capacity to conduct excellent research.
The infrastructure is now providing service to DTIC-UPF, with the basic analytics system in place. Given its critical relevance to many of the research groups at DTIC-UPF, a researcher (Jérôme Noailly) has been appointed as academic contact for the management of the infrastructure. Future steps will include its capacity to optimally support the open science mandates.
- 5 - EDUCATION: a specific training program on different aspects of open science has been embedded in the regular training activities of PhD students during the first year to guarantee their training as future open scientists, a broad coverage at the department and its future sustainability. The program includes different seminars, the need to report on a personal reflection about the way they may apply them in their research plan and the identification of areas where they may require additional support. The seminars are open to the whole community (and, in fact, open to anybody interested to attend also outside the university and the videos from most talks are made openly available), and have been used in parallel as the milestones to promote discussion with all staff about the benefits and challenges that open science provides. In addition to internal speakers from UPF, it included:
- General reproducibility principles in computational research by Victoria Stodden from University of Illinois at Urbana - Champaign (video here). The seminar and parallel discussions allowed to define a first set of guidelines for its application in the department
- Open publishing models by Christian Barillot, from CNRS and editor in Chief of Frontiers in ICT (video here). The seminar and parallel discussions allowed to define the first open access policy and recommendations of the program
- Open repositories with special attention to datasets by Lars Holm Nielsen, Zenodo project leader at CERN (slides here). In addition to obtaining recommendations for the use of Zenodo (DTIC-UPF community), it allowed to start discussion about potential collaborations in scientific text mining.
- Open source software compliance for engineers by idLaw partners and Linux Foundation (blog entry here), to consider the use of FOSSology, an open source license compliance software system and toolkit, and help put DTIC-UPF in the map of institutions promoting open source to push collaborations with other actors.
- Big data management in large research infrastructures by Anne Bonnin, from the Paul Sherrer Institute
- Open knowledge sharing platforms by Diego Sáez-Trump from the Wikimedia Foundation. Wikipedia represents a special case for the sustainability of open actions, and the discussions provided input to the open science and innovation program, and to start collaborative research actions involving undergraduate students (details)
- The development of innovative exploitation and educational models around open initiatives by Eric Rosenbaum from the Scratch team at the MIT. The discussions around to provide input to several of the transversal actions of the program.
In order to improve the acquisition and use of the skills linked to open science, the program has incentivised the further use of the skills gained with awards, providing specific funding in its open science and innovation program, etc, detailed in section “recognition” below.
DTIC staff is also promoting and participating in training and advocacy actions in external fora such as:
- Case studies:
- The European Research Council Executive Agency (ERCEA) has procured a study on ‘Open access to publications and research data management and sharing within ERC projects’. The goal of this study is to better understand the current practices and attitudes among ERC funded researchers to the provision of open access publications as well as research data management, sharing and reuse. (see more details here). The CompMusic ERC project, led by Xavier Serra, was identified as one of the use cases for its promotion of open access to its research results (study).
- Publications about the experience of designing reproducible research: Publications about the experience around making results reproducible such as Seeking reproducibility: Assessing a multimodal study of the testing effect“by M. Yoshimi, D. Hernández-Leo and R. Ramírez)
- Contribution to open science policy and awareness in Spain: in the context of the participation of DTIC-UPF in the Severo Ochoa and Maria de Maeztu Alliance (SOMMA), and specifically in the activities of the workpackage “Exchange of knowledge”, DTIC-UPF will initially lead a working group on Open Science that fosters the interchange of good practices based on case studies across the 41 centers that form the network.
- Actions in specific scientific communities:
- Training: Open Science in Technology enhanced Learning webinar. Davinia Hernández-Leo, head of the TIDE research group, is since Sep. 2017 elected Vice-President of the European Association of Technology-Enhanced Learning (EATEL). As part of EATEL's mission to enable and facilitate TEL Research and Education, the association is sparking off community discussion that could help to maintain high standards of research quality and professionalism in the domain of TEL. In this line, Davinia started and co-chairs, with Stian Haklev (from EPFL), a Webinar Series on “The Profession”, which offers a space to discuss overarching aspects related to the TEL research profession. Open Science was selected as the first topic.
- Collaborative events: Organisation of events such as the first NeuBIAS taggathon that brought together researchers in the life science and image processing communities for the creation of a searchable online repository for the bioimage analysis; Freesound collaborative annotation (both in physical events and online); support in the outreach program to the collective annotation on Open Street Map for humanitarian action with Doctors without Borders with outreach actions or the collective edition of female profiles in Wikipedia and the organisation of Wikipedia days, etc
- Challenges: Although not specifically training actions, the organization and participation is supported in challenges in order to promote the reuse of results and the benefits of the availability of high-quality datasets. Examples include the organization with Google of the Freesound General-Purpose Audio Tagging Challenge in Kaggle (a well-known platform that hosts machine learning competitions) and under the framework of the DCASE Challenge (an academic competition featuring tasks related to the computational analysis of sound events) or the MediaEval 2018’s AcousticBrainz Genre Task. Researchers have also successfully taken part in different international challenges and hackathons (F. Ronzano won the first 4YFN Language Technologies Hackathon; M. Won the www 2018 Challenge - Learning to recognise musical genre; D. Derkach, A. Ruiz and F. Sukno won the FG2017 Head Pose Estimation Challenge at FG2017, etc)
- 6 - REWARDS AND INCENTIVES: One of the main barriers to open science is the general lack recognition in terms of professional development or research funding. The María de Maeztu program has promoted several actions, with the objective to help define higher impact objectives to promote changes at both the institutional level and beyond, contribute to the success of external international actions and internally prepare the research staff for assessments made with alternative indicators, which are an increasing international trend. They include:
- Sustainable Freesound: The main goal of the project is to make Freesound.org sustainable by promoting donations from the users. Freesound.org is already a succes story in the open movement (the largest platform in the world for sharing sounds under creative common licenses, with over 8 million registered users in the world) and represents an excellent showcase about the challenges that succesful open initiatives also need to face in a university environment.
- Sustainable Open Science Integrated Learning Design Environment - ILDE: Developed iteratively over the past 8 years and tested by 8 educational communities (educational centers, training units, transversal initiatives), involving over 1000 participants, the technological infrastructure ILDE enables the modelling and sharing of learning designs with the aim of fostering communities of educators collaborating through learning designs to improve the quality of education.
- Development of the Rocket platform for collaborative clinical studies: Rocket is a unified (cloud-based) platform aimed at assisting clinicians, allowing them to store, visualize and process different types of clinical data: measurements, images, and reports. Current status here
- Scientific Text Mining and Summarization Services: ScienTMin, a Web framework that enables a wide range of services for content analyses and aggregations of scientific publications thanks to Text Mining, Summarization, Retrieval and Data Visualization techniques. ScienTMin relies on the technology implemented in the Dr Inventor Text Mining Library (EU Project Dr Inventor and the Maria de Maeztu project “Mining the Knowledge of Scientific Publications”) and the SUMMA summarization software (implemented by Dr Saggion and currently licensed by UPF)
- Provide external recognition to open source pioneers such as by the award to Dorcas Muthoni of a Honoris Causa Doctorate (summary of the ceremony and videos with the talks), for her her capacity to contribute to the growth of Free Open Source Software (FOSS) in Africa and the successful commercial exploitation of open source solutions
- Finally, an objective for the last part of the program is to obtain a broader internal knowledge around alternative metrics and its capacity to inform recruitment and promotion processes (and if possible linked to producing contributions linked to the different activities of the groups). In addition to internal actions to promote good practices in the publication of datasets and software that incentivise an impact in traditional indicators such as citations, quantifying the impact that contributions other than research publications have to the advancement of research, an increasing effort is being put in sharing the advances with the local and international community, such as:
- 7 - METRICS: the department lacks consolidated knowledge and a common policy around traditional bibliometrics. The current effort is to guarantee that basic good practices are understood and adopted (linked to the use of ORCIDs, DOIs, etc). An objective until the end of the program is to consolidate this policy but, in addition, generate additional knowledge around the use of alternative metrics (altmetrics)
- 8 - INTEGRITY: open science fosters transparency, and the capacity to reproduce findings is at the core of research integrity. All the previously actions are aligned with the objective of showing that open science guarantees the adherence to the highest standards of research.
- 9 - CITIZEN SCIENCE: The program organised the session Promoting the role of academia in empowering participatory and collaborative action at the Social Impact Conference 2016, aiming at discussing and exploring the existing opportunities and challenges to overcome and unlock the potential that collaborative and participatory processes have already demonstrated in bringing about transformative effects in several societal challenges, unlocking the potential of large numbers of citizens willing to contribute their time and knowledge to issues of interest to themselves, to their communities or to society in general. The scope of the outreach program of the María de Maeztu program defined in line with the content of this session exceeds (but includes) open science concepts. It will be detailed in a separate post, but the main summary is that it has already involved over 3000 people in different actions, with a strong focus on the youth in vulnerable situations and the promotion of diversity in science and technology. Furthermore, the session defined a starting point for the reflection about the own definition of impact of research, the need to understand the limitations of current indicators and the need to take into account novel ones, and the definition of several of the actions described above.
SUMMARY AND CONCLUSIONS
The María de Maeztu program represents an example of the effort at the scale of a university department to design and implement a strategy that promotes a cultural change around the process of conducting and the objectives of research. It is important to highlight that it is not a separate line of work, but it is directly embedded in the regular functioning of the department, with the convincement that there are imperatives to do it to keep on conducting high-impact research following international standards and trends, and in order to generate transformative effects across all levels (individual, department, institution, research community, open science stakeholders and the general public) using different strategies (communication, allocation of resources, training).
In addition to the further advancement of the work described around the different pillars, the main objectives for the coming period are:
i) Consolidate the cooperation of DTIC-UPF with external active actors in the open science movement, in order to promote not only the internal change of culture, but also contribute to the advancement of the broader open science movement
ii) Advance in the understanding and uptake of metrics that reward individual efforts
iii) Find adequate technical solutions for all the existing research activities conducted at the department, as well as clear and basic guidelines and / or support services that faclitate the work by the researchers