Blogs

First results by BSc students in cooperation with Wikimedia Research

(Entry by Ever Alfonso García, BSc student participant in the program of internships to cooperate with Wikimedia Research, supervised by Diego Sáez)

Your Language, My Language, Our Language: Studying International Auxiliary. Languages in Wikipedia with Data Science.

Internship Accomplishments

The goal of this internship was to study International Auxiliary Languages (IALs) in Wikipedia. More specifically, we focused on

• understanding the behavior of users that edit in such languages in Wikipedia,

• analyzing their co-editing behavior.

Towards this goal we used Quarry1 to extract the data we needed and then analyzed it using statistical techniques developed in Python and whose code is public and can be found on GitHub. A comprehensive description of all the steps (datasets, methods and results) has been published at Meta-Wiki2. We also greatly summarized all of our work in a research paper that was submitted to SocInfo 20193 and whose approval is being announced on June 20th, 2019. It isn’t online anywhere yet, so I’m attaching a copy with this document.

Our main findings were

• Simple English is by far the most popular IAL on Wikipedia;

• there exists a strong co-editing behavior between editors of different IALs, suggesting that they indeed form a community, a fact that should still be analyzed further;

• due to the previously explained finding, users that edit (and therefore speak) in one IAL are often able to do so in other IALs.

The details of the work can be found in this article.

 

1 A web, SQL environment for querying Wikipedia’s public database. See https://quarry.wmflabs.org.

2 Official Wikipedia page for discussing research projects related to the site. See https://meta.wikimedia.org/wiki/Main_Page.

3 A CS international conference mainly focused on social informatics. See https://socinfo2019.qcri.org.