Back Release of MCSQ version 3.0 (Rosalind Franklin)

Release of MCSQ version 3.0 (Rosalind Franklin)

We are pleased to announce the recently released Version 3.0 of the Multilingual Corpus of Survey Questionnaires (MCSQ)

10.08.2021

 

The latest Version 3.0 of the Multilingual Corpus of Survey Questionnaires (MCSQ), named after the scientist Rosalind Franklin, is composed of 306 distinct questionnaires comprising approximately 766.000 sentences (more than 4 million tokens) and includes new annotations and datasets to the corpus.

The following datasets were included in Version 3.0 (in addition to the existing ones):

  • Wage Indicator: round 1 and COVID-19 questionnaires for the English (source and Great Britain), Czech, French (France), German (Germany), Norwegian, Portuguese (Portugal), Russian (Russian Federation), and Spanish (Spain)1 
  • European Values Study: wave 2 questionnaires concerning the English (Great Britain and Ireland), French (France), German (Germany), Portuguese (Portugal), and Spanish (Spain)
  • European Social Survey: rounds 8 and 9 questionnaires concerning Catalan, Czech, English (Ireland and Great Britain), Portuguese (Portugal), and Spanish (Spain), Russian (Estonia in both rounds, Israel in round 8, and Latvia in round 9).

Additionally, the following questionnaires in the European Social Survey rounds 8 and 9, which were previously released with missing data in version 2.0, were completed in this release: French (Belgium, Switzerland, France), German (Austria, Switzerland), Norwegian and Russian (Lithuania and Russian Federation in round 8 and Latvia in round 9)

Lastly, we added Named Entity Recognition (NER) annotations to the corpus. This annotation was executed with pre-trained models from different sources, namely FlairNLP (English, German, French, and Spanish), SpaCy (Catalan, Norwegian and Portuguese), and Slavic BERT from DeepPavlov (Czech and Russian).

 

1: We attributed the questionnaire languages to the aforementioned countries due to metatada consistency. In reality, for a given language, the same questionnaire is administered in several other countries (e.g., French is administered in Belgium, Switzerland, Canada, etc ), the only difference being the salary range answer options. We opted for including only one questionnaire for each of the aforementioned languages to avoid text repetition in the database.

Multimedia

Categories:

SDG - Sustainable Development Goals:

Els ODS a la UPF

Contact