4. Kaleidoscope

AI is here to stay: the main goal should be to develop AI with positive outcomes for people and the planet

min
Coloma Ballester

Coloma Ballester, coordinator of the Intelligent Multimodal Vision Analysis (IMVA) group, Department of Information and Communication Technologies (DTIC), UPF

Everyone agrees: artificial intelligence (AI) is here to stay. The main goal should be to develop AI with positive outcomes for people and the planet. This means both expanding human knowledge (which includes enhancing our intellectual comprehension of reality by leveraging both appropriate theoretical models to emulate natural intelligence and considerable amounts of real data) and improving human and planetary wellbeing.

AI for sustainable entertainment and media content is the rationale of EMERALD, a 30-month European Horizon project that Josep Blat and I coordinate from UPF. EMERALD’s seven-partner interdisciplinary consortium is made up of leading companies and institutions, including the British Broadcasting Corporation (BBC) and Disguise Systems Limited from the United Kingdom,  FilmLight  GmbH from Germany, MOG Technologies from Portugal, Trinity College Dublin from Ireland, and Brainstorm Multimedia and UPF (including the Interactive Technologies Group (GTI) and Intelligent Multimodal Vision Analysis (IMVA) research group, from the Department of Information and Communication Technologies) from Spain. EMERALD aims to develop and demonstrate exemplary tools for the digital entertainment and media industries using AI, machine learning and deep learning (ML/DL), and Big Data technologies to automate and speed up processing, increase production efficiency, reduce energy use, and enhance content quality. 

Indeed, digital media industries are a significant component of the global economy, with video-related content for both entertainment and work accounting for more than 80% of Internet traffic. The growing demand for content has prompted the industries underpinning film, broadcasting, streaming media, games and live entertainment to innovate in technologies to streamline production and reduce costs in order to stay competitive. AI and, in particular, DL can achieve a very high level of accuracy in several digital media production and postproduction tasks, but their deployment in this context poses several challenges. On the one hand, media production workflows require 99-100% accuracy. On the other, they involve massive volumes of data (on the order of many tens of petabytes for large companies) and computation (tens of thousands of processing units or CPUs/GPUs), either in-house or in the Cloud. Accordingly, sustainability and energy use are of current global concern. The next generation of hybrid physical and virtual media experiences enabled by virtual production (VP) – with content creators working remotely and the creation of distributed experiences that replicate live events for multiple audiences at a distance – can help shrink carbon footprints by reducing energy consumption for transport. However, this should not come at the expense of increasing the burden on computing resources. 

To resolve these issues, we are: (a) designing new ML/DL and process-automation tools based on more efficient data use to increase the speed and quality of digital content creation, enable user-guided control, and reduce the energy/resource demands of large-scale media data-processing; (b) developing energy consumption metrics for quantifying the energy consumption and impact of these algorithms and hardware pipelines at the granular level; and (c) fostering acceptance and demand for AI and sustainable entertainment and media production, specifically by exploring how these techniques could make the media industry more sustainable.

In particular, in the IMVA group, we are developing AI DL-based tools for sports video analysis, video matting, and real-time pose estimation in broadcasting, streaming media, and live entertainment scenarios. For instance, video matting, which involves separating foreground objects from their background so the foreground can be composited with a new background, is a core process in virtual production and media postproduction. We will develop matting tools using DL to produce high-quality results for the automated real-time integration of remote presenters or performers into virtual scenes and sets for broadcast/streaming media, even under non-optimal capture conditions and settings, which will be incorporated in the tools of companies such as Brainstorm. Also, an accurate real-time body- and head-pose estimation of a presenter’s orientation in virtual studio broadcasting will enable the automatic correction of camera and scene parameters, selection of the most suitable camera in multi-camera scenarios, and re-orientation of the shot in single-camera settings. At the same time, the GTI group will create an open-source web-based end-to-end tool to generate and edit character animations from a single video/webcam and render them. GTI is also creating tools to generate specifically targeted datasets to train specific neural networks for individual animator goals.