Back Informatización y estudio del corpus PAAU (junio 1992)

Informatización y estudio del corpus PAAU (junio 1992)

Informatización y estudio del corpus PAAU (junio 1992)
PI: Paz Battaner. PB93-0392.

 

Corpus PAAU 1992

The PAAU Corpus, June, 1992 ( Corpus 92 ) is a small corpus, about 350.000 words, representative of a non-expert written discourse. It collects the June 1992 answers to questions belonging to the Spanish university entrance exams (also known as 'Selectividad'). These texts were gathered in six Spanish universities located across the country (University of Barcelona, Complutense de Madrid, University of Murcia, University of Oviedo, University of Salamanca, and University of Sevilla) and they include all the subjects which are assessed in these examinations. The Corpus 92 was collected in order to study academic text, generally hand-writing, that Spanish students practice immediately before enrolling the university and specially for documenting the level of academic writing of high school students in their mother language in the academic field so as to put forward an effective teaching. The rationale is derived from a previous project (DGICYT PS88-0026) "The written text for specific purposes" (academic and commercial discourse) developed at the University of Barcelona. The first objectives aimed mainly at having representative and reliable data in order to know the level of written academic language when enrolling the university.

However, the conversion of data to a computer format has made the Corpus 92 interesting to other linguistic studies.

Research results:

  1. Availability of the Corpus 92 in three subcorpus: Science, General, and Humanities, each one in four digital versions.
  2. Analysis of real command of written language for academic purposes by pre-undergraduates, at the following levels:
    • Orthographic: Study of graphematic and punctuation errors (particularly, the use of commas, area in which students make more mistakes)
    • Lexicon: Characterization of the vocabulary used by students in the different sub-corpuses, with regard to both general (comparing the vocabulary used in different subjects) and particular (for instance: characterization of the use of adjectives and of the verbs which are most frequently used).
    • Syntactic: Description of some syntactic structures which usually raise some kind of special difficulty among students (prepositional verbs, for instance).
    • Discourse: General discourse organization, paying special attention to initial and final paragraphs, informative progression, and those specific microstructures of each subject (for instance, raising hypotheses or relative clauses in relation with informative development).
  3. Proposals of didactic application of the results of the description, since it is proved these are more complex than the impressions the teacher may get in the classroom.

It should also be pointed out that the Corpus 92 has been donated to the RAE's CREA, to Microsoft (USA) and private researchers. In addition to this, it has been the object of analysis in three doctoral theses, corresponding to C. López, M. Pujol, and E. Atienza. Similarly, the 'VI Jornadas sobre Corpus Lingüísticos: Corpus per a l'ensenyament' were organized in 1998, with the participation of Professors G. Aston, P. Battaner, T. McEnery, and F. Roussel.

The articles presented in these conferences have been published in Edicions de l'IULA, Sèrie Activitats; 6: P. Battaner i C. López ( ed. VI Jornada de Corpus Lingüístics: corpus lingüístics i ensenyament de llengües. Barcelona: Institut Universitari de Lingüística Aplicada, 1998.

Even though the Project has not received any public fund since 1997, five researchers have gone on working on it. Succintly, the results of the subsequent research are the following:

  • Creation of a public-use, textual database that allows for complex searches.
  • Study of other linguistic aspects of the data which are already included in the database. It is thanks to this fact that we currently have a global characterization of the type of text which is object of analysis.
  • Publishing of the book Enseñar y aprender: la redacción de exámenes. Madrid: Antonio Machado Libros, 2002.
  • Publishing of the book S. Torner, P. Battaner (ed.). El corpus PAAU 1992. Estudios descriptivos, textos y vocabulario. Barcelona: Institut Universitari de Lingüística Aplicada, DOCUMENTA UNIVERSITARIA, 2005 (Edicions de l'IULA, Sèrie Monografies; 9), which presents the descriptions of the corpus once tagged in order to allow for other possible applications, especially those that teachers and professors wish to make about the type of academic discourse students use when they finish their Bachillerato or other secondary studies.

Principal researchers

Dr. Paz Battaner
Funded by the Spanish Department of Education (PB93-0392)