Back Catherine Pelachaud: “Virtual agents can be used to collect information on patients or users, which can be useful to the professionals”

Catherine Pelachaud: “Virtual agents can be used to collect information on patients or users, which can be useful to the professionals”

Catherine Pelachaud is a scientist specializing in intelligent virtual agents that can interact with people by means of both verbal language and gestures or facial and corporal expressions. Recently, she was one of the guest speakers at the 1st International Symposium on Multimodal Communication, held at the end of April on the UPF Poblenou campus.

12.05.2023

Imatge inicial

Catherine Pelachaud is a director of research at the French National Centre for Scientific Research (CNRS) and is a member of the Institute of Intelligent Systems and Robotics (ISIR), linked to the CNRS and Sorbonne University. Specializing in human-machine interaction, she has focused much of her research on intelligent virtual agents. A PhD in Computer Graphics from the University of Pennsylvania (1994), she designed her first animal conversational agent over two decades ago and, since then, has led several research projects in this field.

Her work has earned her several recognitions, such as the research award in the field of artificial intelligence granted to her by the ACM -the world’s largest association in the field of computer technology- in 2015. The following year, the University of Geneva also awarded her the title of honorary doctorate.

Pelachaud participated in the 1st International Symposium on Multimodal Communication, held on the Poblenou campus from 26 to 28 April, organized by the GrEPG (Prosodics Study Group) at the UPF Department of Translation and Language Sciences and the GEHM (Gestures and Head Movements in Language) network of the University of Copenhagen. During the symposium, we had the opportunity to talk to her.

Over the past few months, we’ve heard a lot about a particular virtual assistant, chatGPT, for automatic text generation. But you specialize in research into virtual assistants that also mimic and reproduce human corporal gestures and expressions. Why do virtual assistants need to take other dimensions of human communication into account, beyond verbal and textual language?

Because these virtual agents are for interacting with users and people, when we speak, we use verbal language, but also gestures, prosody (stress), facial expression, head movements… These gestures are very important for speakers for their interlocutors to properly understand them and to generate empathy. If you talked without moving anything, it would be very boring and a lot of information would be lost. A gesture can help provide additional information, for example, when you describe something, you can make some iconic signal. So, these expressions have many functions in an interaction and, therefore, it is important that virtual agents should take them into account.

But what does it mean that a virtual assistant can reproduce and interpret our emotions? How is it trained to do so?

Your question includes two issues. The first is to learn how human emotions are expressed, for example, when a person, two interlocutors… express themselves with a smile, a movement of the forehead or the body… Now, we have the tools to detect and analyse these signals. But their adaptation (for the virtual assistant to use them) requires their interpretation. That is the second issue. How can a smile be interpreted as a sign of happiness? It might not be, I mean, sometimes we fake an emotion. It is a matter of interpretation, which can be quite complex, because interpretation depends on the context of interaction. For the analysis, detection… I think we have pretty good tools that are working in real time, but, for interpretation, which depends far more on context, we still need to keep working.

Models can be built to reproduce human expressions through mockup data (an analysis model that allows analysing and testing structures or movements, for example by means of moving images or animated gifs) or video analysis. The most important thing is the understanding and modelling of the expressions at a specific moment. An expression cannot be interpreted out of nothing. It has to be done at a certain time, depending on what the user is saying and how they are saying it.

If the interpretation of expressions depends on the context, it will also be conditioned by cultural factors. Do virtual assistants currently take cultural differences into account?

Many models are based on North American language. But models have also been built in Japan, France, Germany, Spain… This implies that models are being generated that have cultural differences, because they have been trained using data collected in each of these countries. But they have not been conceived as cultural models (that can be transferred to other countries with which cultural features are shared). For example, if a model is built with data collected in France, it may not be transferable to Spain. We still don’t know how to do that. Undoubtedly, we must go further to better adapt virtual assistants to each cultural context.

“Now, research is far more focused on interaction (…). The agent will have to manage its turn better, I mean, start talking when appropriate; it will have to be an active listener when you are talking; show with its attitude whether or not it agrees with what you are saying…”

You designed your first conversational agent, Greta, in 1999. How do you rate the evolution of virtual assistants since then?

At first, research focused on the gestures used for clinical evaluation (they can also be used for the diagnosis of diseases), such as the gestures we use when we speak; prosody (stress), for example when we emphasize certain expressions or phrases… It focused on making the virtual agent capable of using verbal and non-verbal language and associate gestures with a certain intention.

Now, research is far more focused on interaction. During the interaction, the agent will have to manage its turn better, I mean, start talking when appropriate; it will have to be an active listener when you are talking; show with its attitude whether or not it agrees with what you are saying…

Work on adaptation also needs to be improved, for example with regard to coherence in the language styles of the virtual agent and of its human interlocutor. That is, if a human uses a more formal register, the assistant should also use a more formal register (or the other way round). There are different types of agent adaptation, which are required in an interaction.

How have artificial intelligence machine learning techniques contributed to the development of virtual assistants so far and how can they do so in the future?

We have worked and are working a lot with these new instruments and models for our templates, for example, for the computer analysis of gestures. All of this is done through the machine learning approach. When you want to generate a new sentence, what you do is compute the gestures that are associated with it. From a model that is built image by image, a very fluid animation can be built, where the assistant seems to express itself in a very natural and expressive way.

However, these models still don’t capture the semantics of gestures well enough (the intentions meant to be expressed with each of them). In this sense, the new models lose information with respect to previous models regarding the semantics of gesture. The initial models focused on the more informative side of gestures (on the relationship between the gesture and its intention). Therefore, now the challenge is to generate virtual assistant models that properly capture the semantics of the gesture and seem natural at the same time.

Virtual assistants have also used facial recognition techniques. What are these techniques and how are they able to capture our facial expressiveness?

Some facial recognition techniques work by detecting elements of non-verbal language. For example, the FACS system (Facial Action Coding System), defined by the psychologist Paul Ekman, has identified more than 40 units of action to describe facial expressions to convey emotions, for example, a smile, the movement made when frowning… Computational models are trained to recognize each of these units of action, and can track them. There are also image-based techniques, which can be used to detect these units of action or facial movements as well. 

"Some studies have shown that interviews with virtual medical assistants provide similar, and in some cases even better responses than those achieved by human professionals (…).  But that doesn’t mean the agent is a therapist. Virtual agents can be used to collect information on patients or users, which can be useful to the professionals”

Current virtual assistants are not only capable of imitating human emotions, but also of modifying them, which has both positive potential and possible risks. If we look at the positive side, could you offer an example of the social benefits that these virtual agents can bring?

There are applications that capture information from people when they interact with them and that, in some way, can show empathy and understanding towards humans. Interacting with these virtual agents can make it easier for people to express their emotions, which may be beneficial for people suffering from depression, for example. Some studies have shown that interviews with virtual medical assistants provide similar, and in some cases even better responses than those achieved by human professionals. In these cases, one of the existing hypotheses to explain this is that human beings do not feel judged by the virtual agent, compared to the shame they might feel when they are attended to by a person. But that doesn’t mean the agent is a therapist. Virtual agents can be used to collect information on patients or users, which can be useful to the professionals

“It is a matter that should make us reflect on many ethical aspects, since it could be manipulated through virtual agents (…) You can manipulate through the media, fake news, altered images… But, with virtual agents, the risk increases because the relationship with people will be interactive”

What do you think about the risks associated with virtual assistants that might affect our emotional state?

Yes, of course, there are potential risks. It is a matter that should make us reflect on many ethical aspects, since it could be manipulated through virtual agents. Indeed, this can be done already with other current tools. You can manipulate through the media, fake news, altered images... But, with virtual agents, the risk increases because the relationship with people will be interactive. For example, the person interacting with the virtual agent may have the impression that the assistant is their friend and the machine can build up a great knowledge of the person. This poses extreme risks in terms of manipulation.

How can these risks be prevented?

There aren’t any solutions yet. Some people feel that agents should introduce themselves: “I’m not a human, I’m an agent”. Then, laws should be promoted on how and where these agents should be used. But I also think it’s very important for human users to be aware of the risks of using these agents.

However, they also offer quite a few advantages, such as the medical applications we have just mentioned. It’s very important. When human patients converse with the virtual agent, a mutual relationship is built up during the interaction, because humans provide more information about themselves, about their mental state, their emotional state... It all depends on how the agent is used, because they are also misused, as we can also see today.

“In these interactions, virtual agents collect personal data and this personal data could also be transferred to an insurance company, to an employer… What I mean is that data protection is really an important issue (...) We also need to regulate this issue”

Regarding personal data protection, what risks does the development of these virtual agents pose? 

Yes, there is a big risk in this field too. For example, let’s imagine that you have a virtual agent that moves (virtually with you). That can be positive, for example, if you are suffering a form of depression, to talk about your problems, for therapy... But, in these interactions, virtual agents collect personal data and this personal data could also be transferred to an insurance company, to an employer… What I mean is that data protection is really an important issue. It’s absolutely crucial. We also need to regulate this issue.

“There are quite a few applications in which virtual assistants operate as video tutorials for learning. There are also question/answer-type applications, which can be applied in the commercial field”

Many of us have heard of Alexa, Siri…, virtual voice assistants that offer help and personalized guidance. But for what purposes can virtual assistants be used today?

There are many applications, the health applications we have already mentioned, but there are also educational ones, for example there are quite a few applications in which virtual assistants operate as video tutorials for learning. There are also question/answer-type applications, which can be applied in the commercial field (to answer consumers’ queries).

There are educational games, which are applications that use video game technology but for training purposes. For example, you can have educational games that enable people to speak in front of a large audience and that can be useful for preparing conferences or interviews, such as the one we are holding right now; or selection processes, for example to improve the presentation of your work or your role in a debate with other people. Some people are very shy... and the technology provided by these virtual agents can be very useful for them. You can manipulate the virtual agent to be kind, dominant, aggressive... or whatever you want and you can train the person to interact with each type of agent.

On this subject, 10 years ago, we developed an EU project with the aim of making video games that would be used to train young people for job interviews. It targeted young people who found interviews difficult, to help them convey a better image to different types of recruiters.

There are also virtual agents that companies use in selection processes. What is your opinion of the use of virtual agents in interviews and selection processes and the risks involved?

The risks depend on whether the decisions are made by the virtual agent or not. If the virtual agent is used to collect information from candidates, but the decision (on who is hired, goes beyond a phase of the selection process…) is taken by a human, they can be a good tool. But if the decision is made by the virtual agent, this is a risky issue. There are also some automatic tools developed by some companies to filter the videos submitted by candidates for a job. The companies ask candidates for videos. They are videos where they have to record themselves answering some questions. The automatic tools, from these videos, somehow already classify the candidates. But no virtual agents are involved.

In general terms, how would you summarize what has been achieved so far in the field of artificial intelligence, in relation to virtual assistants, and what remains to be done?

One of the main advances in recent times concerns the representation of virtual agents and that is the ability to display them in three dimensions. This has meant a huge improvement, in terms of image. It’s amazing how real they can be. However, they cannot reproduce non-verbal aspects of language as realistically. So, now the problem is that, if you see a  super realistic-looking virtual agent, you also expect their behaviour to be too.

This has to do with the concept of ‘uncanny valley’ (a term that refers to the unease that people can feel in front of humanoid robots that look a lot like humans, but are not realistic enough in many other aspects). The more similar a virtual agent’s appearance is to a person’s, the greater our expectations about its ability to act like a human. And this is where we are right now. Therefore, either progress is made by improving the non-verbal skills and behaviour of highly real looking virtual agents, or if not, it might be better to use previous, less real-looking models so that appearance and behaviour are more balanced.

We still have to climb this peak. We need to work on the aspects most related to the movements of virtual agents, so that they acquire the ability to react to a smile, to a look…, in order to improve their interactions with humans. This is a crucial aspect we have not yet achieved. But we must do more than that: we must also adapt them to the cultural differences of non-verbal language.

Machines can simulate emotions (...). From this point of view, they have some emotional intelligence, but they are not feelings”

The transhumanist movement assures that by the end of this decade machines will have emotional intelligence and that the line separating humans and machines will be increasingly thin. Will emotions always be genuinely human?

Humans, like all living beings, can feel emotions. Machines can simulate emotions. They can somehow interpret the emotional state of the users, based on certain facial or verbal expressions or vocal qualities, and know how to respond to them. From this point of view, they have some emotional intelligence, but they are not feelings. They do not have the capacity to feel, but they can simulate it.

Multimedia

Categories:

SDG - Sustainable Development Goals:

Els ODS a la UPF

Contact