Boosting the Creation of a Treebank

  • Authors
  • Arias, B.; Bel, N.; Lorente, M.; Marimón, M.; Milà, A.; Vivaldi, J.; Padró, M.; Fomicheva, M.; Larrea, I
  • UPF authors
  • BEL RAFECAS, NÚRIA; ARIAS BADIA, BLANCA; MILA GARCIA, ALBA; LORENTE CASAFONT, MERCÈ; MARIMON FELIPE, MONTSERRAT;
  • Authors of the book
  • Calzolari, N.; Choukri, K.; Declerck, T.; Loftsson, H.; Maegaard, B.; Mariani, J.; Moreno, A.; Odijk, J.; Piperidis, S. (eds.)
  • Book title
  • Proceedings of the Ninth International Conference on Language Resources and Evaluation
  • Publisher
  • European Language Resources Association
  • Publication year
  • 2014
  • Pages
  • 775-781
  • ISBN
  • 978-2-9517408-8-4
  • Abstract
  • We present the results of the experiment of bootstrapping a Treebank for Catalan by using a Dependency Parser trained with Spanish sentences. In order to save time and cost, our approach was to profit from the typological similarities between Catalan and Spanish to create a first Catalan data set quickly by (i) automatically annotating with a delexicalized Spanish parser, (ii) manually correcting the parses, and (iii) using the Catalan corrected sentences to train a Catalan parser. The results showed that the number of parsed sentences required to train a Catalan parser is about 1000, which were achieved in 4 months with 2 annotators.
  • Complete citation
  • Arias, B.; Bel, N.; Lorente, M.; Marimón, M.; Milà, A.; Vivaldi, J.; Padró, M.; Fomicheva, M.; Larrea, I. Boosting the Creation of a Treebank. In: Calzolari, N.; Choukri, K.; Declerck, T.; Loftsson, H.; Maegaard, B.; Mariani, J.; Moreno, A.; Odijk, J.; Piperidis, S. (eds.). Proceedings of the Ninth International Conference on Language Resources and Evaluation. 1 ed. Reykjavik: European Language Resources Association; 2014. p. 775-781.