[MCSQ]: The Multilingual Corpus of Survey Questionnaires is an entity-relationship (ER) database of survey questionnaires in the English language and their translations into Catalan, Czech, French, German, Norwegian, Portuguese, Spanish and Russian. The last version Rosalind Franklin includes 306 questionnaires comprising 766.000 sentences (more than 4 million tokens) and includes new annotations and datasets to the corpus. 

Version 3 of the tool is compiled from the European Social Survey (ESS), the European Values Study (EVS), the Survey of Health Ageing and Retirement in Europe (SHARE), and the Wage Indicator (WI) survey. In total, there are 160 ESS, 76 EVS, 52 SHARE and 18 WIS questionnaires in the tool. Check the number of rounds included from each survey project in the MCSQ manual or clicking this link available after registering.

Multilingual social surveys have become the main source of data to conduct comparative research across countries. Translation procedures in social surveys are transitioning to larger adoption of Computer-Assisted Translation tools. The adoption of technology in translation requires the creation of text data. In line with the FAIR (Findable, Accessible, Interoperable, Reusable) data principles, this corpus is openly accessible (in a format that is compatible with Computer-Assisted Translation tools) and will represent an important resource for both corpus linguists, computational linguists, statisticians, social scientists, as well as translation scholars and localizers.

Survey questionnaires are made up of survey items. Survey items constitute the basic unit of analysis in the MCSQ. They were divided into sentences that constitute segments in the database.  The survey questionnaires included in this corpus were administered as in-person interviews. The answers were recorded in a standardized way either on paper or in a Computer Assisted Personal Interview (CAPI) device. The MCSQ is an open-source and open-access artifact. The MCSQ_compiling repository stores the modules used to generate the MCSQ. The code developed for the corpus compilation is available on Github, and its documentation is hosted on Read the docs.

The MCSQ is part of the Social Sciences and Humanities Open Cloud (SSHOC) project.

If you want to know more about the MCSQ, please read our dedicated users’ manual. A peer-reviewed academic article is under review. If you want a preprint, please contact us.