Deciphering the irony and semantics of emoticons
Deciphering the irony and semantics of emoticons
This is the research line pursued by Francesco Barbieri for his doctoral thesis under the supervision of Horacio Saggion, considered the best doctoral thesis at the 35th Congress of the Spanish Society of Natural Language Processing 2019, held from 24 to 27 September at the University of the Basque Country.
The thesis entitled Machine Learning Methods for Understanding Social Media Communication: Modeling Irony and emojis defended by Francesco Barbieri on 25 January 2018 at Pompeu Fabra University, studies two important phenomena from a computational perspective to understand communication in Twitter: the detection of irony and the understanding of emoticons or emojis. The thesis was supervised by Horacio Saggion, coordinator of the Large Scale Text Understanding Systems Lab and a researcher of the Research Group on Natural Language Processing (TALN) of the Department of Information and Communication Technologies (DTIC) at UPF.
Based on methods of machine learning and artificial intelligence, the thesis proposes classification systems for identifying figurative language such as irony. This research won the award for best doctoral thesis at the 35th congress of the Spanish Society for Natural Language Processing, held from 24 to 27 September 2019 at the University of the Basque Country.
We proposed new automated systems based on machine learning algorithms capable of recognizing and interpreting these two phenomena: irony and emoticons
As Saggion and Barbieri explain: ”we proposed new automated systems based on machine learning algorithms capable of recognizing and interpreting these two phenomena. We approached the detection of irony as a problem of binary classification where, given a tweet, the task was to recognize whether it was ironic or not. To solve this task, we proposed a machine learning approach whereby a tweet is represented by several characteristics calculated according to their length and number of words.
We were able to automatically recognize if a tweet belonged to a satirical or non-satirical Twitter account
We also tested our approach to the detection of irony by recognizing if a news publication on Twitter was satirical or real, in several languages: English, Spanish and Italian, and we obtained significant results. We were able to automatically recognize if a tweet belonged to a satirical Twitter account or not”, Barberi adds.
In addition, using distributional semantics methods, the thesis proposes a model to study the semantics of emojis. Thus, Barbieri explored whether the meaning and use of emojis varied from one language to another and how, as well as whether this variation is affected by the time of year (spring, summer, autumn or winter). “For this, we use distributional semantics models to represent the meaning of emojis in each language, location and season, respectively”, says the author.
The meaning of emojis can vary according to the language and the time of year
The results highlight that some emojis have different meanings in different countries or in different seasons. As Barbieri indicates, this is in line with many previous findings that suggest that emojis are used highly subjectively and that we interpret them differently.
For example, in Spain, the clover emoticon 🍀 is used in a context of friendship and love, while in other countries it is used mainly in relation to luck or the symbol of Ireland
“Our results suggest that even though the general semantics of emojis is similar in different languages, we have identified some of them that are not used in the same way in one language or in another, which may be related to the cultural differences between countries”, they state.
For example, in Spain, the clover emoticon 🍀 is used in a context of friendship and love, while in other countries it is used mainly in relation to luck and the symbol of Ireland. Regarding changes in meaning according to the seasons of the year, “we found that even though most emojis retain their semantics, specific differences are identified. Two examples are the gift 🎁 and pine 🌲 emojis, which are used in winter as emoji related to Christmas but in spring and summer are used to mark a birthday present and a tree, respectively”.
This research could be useful in future studies to understand the language of social networks. “In the future, we are planning more extensive analyses to automatically detect and interpret finer differences in the semantics of emojis”, Barbieri adds.
Detecting real time changes in semantics according to social events and trends; thoroughly investigating the compositional meaning of emojis, or predicting the most likely emoticons for applying to text are some of the themes for developing in this research line. “This is not an easy task, since we have seen that emojis are used highly subjectively, and we all use them differently”, Barbieri concludes.
Francesco Barbieri (2018), Machine Learning Methods for Understanding Social Media Communication: Modeling Irony and emojis, doctoral thesis supervised by Horacio Saggion, defended on 25 January at Pompeu Fabra University. Best doctoral thesis according to the Spanish Society for Natural Language Processing 2019. Published in the TDX repository.