Blogs

New industrial PhD project between the AI and Machine Learning Group and NTENT

The project “Representation learning on graphs for web data” is a new industrial PhD project, support by the Catalan Government, between the Machine Learning and Artificial Intelligence Research Group at DTIC-UPF and the company NTENT. See other industrial PhD projects at the Department here.

NTENT HISPANIA SL is a Spanish technological research center lead by Dr. Ricardo Baeza-Yates and specialized in creating mobile software solutions in the field of semantic search, which allows to put in context and interpret the search made by users; also develops Natural Language Processing techniques (NLP), which studies the interactions between computers and human language. This level of automatic intelligence allows to predict and send relevant information to a user according to his/her needs.

Project abstract

An important challenge that arises in many data science problems is how to learn low dimensional representations of network data (data defined as a set of nodes and edges between them with possible some attributes in the nodes/edges). Recently, deep generative models have been proposed as a method to learn such representations [1,2]. These models can be useful, not only to encode compactly massive graphs, but also in tasks such as prediction. Moreover, some of these models can also be used as network formation processes, that is, as a mechanism to synthesize artificial networks with some (or better) properties than the real ones.

However, the applicability of these methods in real-world problems is currently limited by various reasons. In particular, current methods do not deal satisfactorily with issues such as node relabeling of the network data, or other type of symmetries. Also, many of these methods do not scale up to very large networks, of the scale of millions of nodes.

This PhD will be focused on the development and analysis of methods for learning this network embeddings and their application to user browsing behavior and web graph data. In particular, the following

- Characterization of typical real-world problems where network embeddings are needed. For example, web domain network both at a global scale or at individual browsing network scale.

- Development and analysis of a method that improves the state-of-the-art in learning successfully embeddings for the particular social media or web domain under consideration.

- Exploring novel applications of the method in problems such as influencing the natural formation of a network by means of the reinforcement learning / optimal control framework.