Corpora with limited access

Here is a list of resources with limited access. If you are interested in using these resources for your research, please contact the related researchers to request access.

Corpora on CQPweb

These corpora are available through CQPweb but cannot be accessed with the guest account.

Corpus del Español: A purchased copy of the original Corpus del Español, containing nearly two billion words of data in web pages from 21 different Spanish-speaking countries. For every 200 words, 10 consecutive words are replaced by @s for copyright reasons. Purchased under the project CONNECT2 (MINECO FFI2016-76045-P).
COCA (Corpus of Contemporary American English): A purchased copy of the original Corpus of Contemporary American English. It contains more than one billion words of text (25+ million words each year 1990-2019) in American English from eight genres. For every 200 words, 10 consecutive words are replaced by @s for copyright reasons. Purchased under the project CONNECT2 (MINECO FFI2016-76045-P).

Corpora available through TEITOK

A corpus of medieval Spanish texts from the Hispanic Seminary of Medieval Studies with paleographic information. This is a subset of the larger OLDES corpus, developed under the Spanish Plan Nacional project FFI2010-15006.
A version of the Computerized Corpus of Old Catalan (CICA). CICA was originally developed in a project led by Joan Torruella (ICREA-UAB), Manuel Pérez Saldanya (UV-IEC) and Josep Martines (UA-IEC). It is a corpus of texts written in the Catalan language that contains works from the 11th to the 18th century. Linguistic annotation and adaptation of the palaeographic information of this corpus to the TEI digitized text encoding format was partially funded by Spanish Plan Nacional projects FFI2013-41301-P, and FFI2016-76045-P (AEI/MINECO/FEDER), as well as by funds from an ICREA Academia award granted to L. McNally. The tasks of linguistic annotation and adaptation to the TEI standard were coordinated by Maarten Jannsen and Josep M. Fontana with the collaboration of Toni Bassaganyas and Daniela Corbetta.

External corpora hosted on the GLiF server

CoDiAJe (Corpus diacrónico anotado del judeoespañol): a structured multi-genre diachronic corpus that includes text samples, classified by types, period, and geographical origins, from the 16th century to the 21st century, enriched with different kinds of linguistic annotations. It is part of two research projects supported by the Israel Science Foundation (ISF) (grant No. 473/11 and grant No. 486/19).
CoOrAJe (Corpus oral anotado del judeoespañol): a multi-modal corpus in the initial phase of its development that includes oral text samples in Judeo-Spanish, enriched with different types of linguistic annotations. Each audio is enriched with orthographic attributes and levels of linguistic analysis.