10. Kaleidoscope

Big data: an opportunity for biomedical research

Ferran Sanz

Ferran Sanz, director of the Research Programme on Biomedical Informatics (GRIB, UPF-IMIM)

The field of health has great potential for the beneficial application of the concept of big data. In biomedicine, big data come from a wide range of sources, such as the clinical data currently recorded in all healthcare processes; the data generated by biomedical research and, especially, omic technologies and the scientific publications reporting the findings of that research; and a significant share of the posts to social media and other related information, such as environmental and social data. 

“The reuse of clinical data is particulary complicated because they are subject to strict regulations regarding personal data protection"

Although it is possible to perform an integrated analysis of these data to generate new understandings and knowledge about diseases, their causes and treatments, such analyses are not easy for several reasons. First, there is still the problem of the compatibility of heterogeneous data sources. Second, the reuse of certain types of data is particularly complicated. This is the case with clinical data, which are subject to strict regulations regarding personal data protection, or data resulting from pharmaceutical research, the use of which is hindered by the intellectual property protection policies followed by the pharmaceutical industry. An additional challenge is the fact that most of the useful information is recorded as free text, including scientific publications, many electronic medical records and social media posts, amongst other sources. Using information in free-text format requires the development and application of effective text-mining techniques. 

Examples of European projects incorporating major data strategies include Open PHACTS, which developed an integrated information infrastructure, or infostructure, to facilitate R&D on medicines; EMIF, which is developing a European platform for the reuse of clinical data for biomedical research; and eTOX, a project aimed at improving preclinical drug safety evaluation by means of the integrated use of the pharmaceutical industry’s toxicology reports. The eTOX project’s goals will soon be expanded through a new project, eTRANSAFE, which will study drug safety evaluation from a translational point of view. 

“We will soon be able to harness the full potential of big data phenomenon in the field of health" 

Another key international initiative that will certainly facilitate the use of biomedical big data is the FAIR Guiding Principles for scientific data management and stewardship. These principles, which propose a series of measures to make scientific data findable, accessible, interoperable and reusable, are being promoted and adopted by important bodies and organizations. 

The effective use of big data in biomedicine is still in the early stages. Nevertheless, the large number of projects being carried out and the emergence of increasingly powerful analytical techniques suggest that we will soon be able to harness the full potential of the big data phenomenon in the field of health and provide very notable benefits to society.