"Feature Engineering for Author Profiling and Identification: On the Relevance of Syntax and Discourse"

-Dr. Juan Soler Company

Thesis supervisor: Leo Wanner

6/7/2017

 

 

 

 

 

 

Brief Description of the Thesis
 
Author profiling and identification are two areas of data-driven computational linguistics that have gained a lot of relevance due to their potential applications in, e.g., forensic linguistic studies, marketing analysis, and historic/literary authorship verification. Author profiling aims to identify demographic traits of the authors, while author identification aims to identify the authors themselves by searching for distinctive linguistic patterns that distinguish them. The majority of approaches in the related work tends to focus on the content of the texts. We argue that focusing on structure rather than content can be more effective. The main focus of the thesis is thus on feature engineering, the development, evaluation and application of the feature set in the context of machine learning techniques to author profiling and identification. We prove the profiling potential of syntactic and discourse features, which achieve state-of-the-art performance in many different scenarios, especially when combined with other features.
 
Experience as a PhD student
 
My experience as a PhD student in the DTIC department has been very positive. I have met great people, learnt a lot and have had a lot of fun. I have grown immensely both in the academic/professional field and as a person. Being able to teach in these four years has also been great, the interactions with the students have been very fullfilling and this helped me acquire another skill that I didn't think I could develop. UPF has provided me with everything I needed to work confortably. The secretaries always helped me whenever I needed anything and overall, I would say that this was a great experience.