Data Governance and its importance for artificial Intelligence, by Manuel Portela

The increasing use of data for automated systems and artificial intelligence is well known. However, in particular sectors, the availability of data is limited, causing difficulties in implementing and developing prediction models that could improve quality of life. In the context of urban planning and smart cities, this problem has been identified and presents the following difficulties: the complexity of data, the accessibility of measuring certain human and non-human behaviour, and the privacy or confidentiality of data. 

In Europe, novel regulations were put in place to facilitate the re-use of data. For example, the Open Data Directive (Directive 2003/98/EC) promotes governmental data sharing and the Data Act Regulation (Regulation 2023/2854) which makes it mandatory to share non-personal data on various topics that can be used for the common good. The General Data Protection Regulation (Regulation 2016/679) promotes the use of data for research and policy in certain cases. 

In the context of urban settings, digital twins – virtual representations of real cities that users can interact with – and citiverses, a type of cross-sectoral adaptation of metaverses for immersive urban experiences - are artefacts built using data, data analytics, and machine learning. These are leveraged to develop simulation models that can be adjusted in real time. Various actors including institutional (e.g., government agencies) and non-institutional (e.g., businesses and urban residents) generate data through several sources (e.g., geospatial data, sensor data, social and economic data, administrative and open data).  However, establishing this data ecosystem poses several challenges, including adequate infrastructure and secure data access. Nonetheless, there is a need to increase the number of datasets to cover the spectrum of all possible solutions to achieve the promises of informed policies through integral analytics.

I faced these challenges during my contribution as a professional and researcher in urban science and urban policy since 2012 and my high involvement in the context of smart cities since then. In 2022, when discussing with several cities their policies about open data and the involvement of citizenship, I found that few steps have been improving this situation. Similarly, municipal data offices were struggling with the same limitations. Data is not available to key stakeholders in the urban science field such as universities, research centres and municipalities. Moreover, data holders, such as utility companies or other service providers lack mechanisms to share data in a secure and compliant manner. 

At that time, with Prof. Vladimir Estivill-Castro we came up with a potential solution. A neutral third party can help to intermediate to provide reliable services and respond to the owners of the data. DATALOG was born with the conceptual structure of a data trust, an entity where power can be conceded to control and process data in the name of the data owners, and allowing to anonymize and aggregate data to be shared with key stakeholders. Initial funding was obtained from Fundación BITHABITAT to establish the data trust during 2023.

In the meantime, the Data Governance Act  – DGA –  (Regulation 2022/868) was approved, offering a solution by providing a legal framework for neutral data sharing and introducing new data intermediates, like Data Intermediary Service Providers (DISPs) and Data Altruism Organizations (RDAOs). Their formal recognition demonstrates the increasing diversity and complexity of the emerging European data ecosystem. However, managing such an ecosystem fairly remains an open issue, especially given the diverse interpretations of the regulations by the member states. DISPs can offer a trusted and secure data-sharing environment between companies and the public sector. At the same time, RDAOs can unlock the potential of citizen-generated data by acting as clearinghouses where citizens contribute valuable data for urban policy. These new actors are crucial for expanding data access and facilitating data reuse, ultimately contributing to more effective and representative digital twins and citiverses. 

In 2023, DATALOG became the first RDAO in Europe, meaning that DATALOG was ahead of its time. Despite the good news, several barriers keep extremely difficult to achieve its goals. I can classify these barriers into four dimensions: 

  • Legal: Despite having several regulations such as the GDPR and the DGA, the overlapping nature of these regulations and the limitations that are imposed in some cases (e.g. the DGA prevent RDAOs and DISPs from offering additional services limiting their potential income), make it difficult to understand many implications for data intermediations and to foresee sustainable models for institutions like DATALOG. At the same time, the business opportunities are also unclear due to a non-existent but potential data market. European authorities are making a great effort to clarify the rules to empower RDAOs and DISPs but regulatory reforms are not immediate and require long negotiations. 

  • Technological: Mechanisms for ensuring fair digital authentication and identification (e.g. eIDAS), general consent, data sharing and data governance are still under development. This is not only a technical challenge but also requires standardization and compatibility with other initiatives, such as the European Data Spaces. The absense of these common tools in accessible manners makes it difficult for small organizations like DATALOG to fully develop their potential. The cost of high-skilled personnel and the need for highly reliable infrastructure makes in-house development more difficult, risking their long-term operation and maintenance. 

  • Social: Creating trust in society is one key challenge in data-sharing policies, but is not the only one. Raising awareness about the value of data and the potential for society has been one of the first goals of DATALOG. We carried out activities with citizens, companies and NGOs to understand and discuss their needs and their views about data. Engaging different publics is still one of the most important activities to increase participation in the data economy. 

  • Financial: Summing up to previous dimensions, barriers to providing financial support to entities like DATALOG are few. For example, funding oriented to data space projects is not suitable for small entities but is focused on large companies and the public sector. Moreover, the limitations for generating value from data make it difficult to foresee financial autonomy. The lack of awareness in society in general, makes it hard to convince to have sustainable income from donations or private investors. Therefore, it is probable that once the other barriers are solved it will become easier to attract sustainable funding. 

The development of new RDAOs and DISPs could bring benefits for many other parts of the European data ecosystem, such as Urban Digital Twins  (UDT) or Citiverses. One example UDT project is vCity, carried by the Barcelona Supercomputing Centre. The project has the goal of offering a platform for experimentation in several ambits of the city management, supported by different public administrations. These virtual environments can also become spaces for experimenting with regulatory learning, with good data governance as its foundation. This aspect is particularly important because emerging technologies like immersive technologies and AI systems, used in digital twins and citiverses, generate a wealth of data that must be handled in compliance with data sharing and storage laws (ITU, 2024). Data Intermediaries (DISPs and RDAOs) can provide a trusted and secure environment where organisations, companies, or individuals can share data. RDAOs and DISPs could help enable actors who are not technologically prepared (from NGOs to SMEs, and individuals) to participate in the data ecosystems. Additionally, civil society inclusion may contribute to the participatory digitalisation of the urban environment, facilitating citizen engagement and decision-making.

Similarly, data intermediaries may facilitate the development of and participation of other actors in the data spaces. Through these decentralised architectures, data exchange can be turned into an economic activity between actors. Data spaces therefore become drivers of data economies in particular industries, including those that are key to urban development, while promoting fair and just values and principles. Data intermediaries should be ready to use data space connectors and standards to seamlessly participate in these ecosystem services.

These aspects were analysed in depth in the JRC Report “Unlocking Green Deal Data: Innovative Approaches for Data Governance and Sharing in Europe” in which I contributed to analysing the data intermediaries landscape (link).

This is an exciting opportunity, but rather than being a simple technological solution, it involves a big change in societal and organisational levels. During the last two years, I have been in conversation with multiple agents, discussing our lessons from DATALOG as an innovative case in the context of the European Data Ecosystem. I attended key conferences such as the Data Justice Conference and Data Power Conference to debate the challenges we are facing. Data availability will be key to certain sectors, not only in urban planning and smart cities, but these lessons could help others develop new business models in medicine, culture or industrial sectors. These opportunities will be discussed in June during the Data For Policy Conference 2025 under the special track “Bridging the Gap: The Role of Data Intermediaries in the Creation of Urban Digital Twins” which I chair together with Giovanni Maccani and Marisa Ponti. The opportunity to train more advanced AI models due to the availability of data can help to jump levels (not without new challenges) in AI development.