Accueil du site Accueil du site Adhésion Contact Plan du site

The Index Thomisticus Treebank Project : Annotation, Parsing and Valency Lexicon

Barbara McGillivray* , Marco Passarotti** , Paolo Ruffolo**

* University of Pisa, Italy
b.mcgillivray@ling.unipi.it

** Catholic University of the Sacred Heart, Milan, Italy
marco.passarotti@unicatt.it, paolo.ruffolo@poste.it


We present an overview of the Index Thomisticus Treebank project (IT-TB). The IT-TB consists of around 60,000 tokens from the Index Thomisticus by Roberto Busa SJ, an 11-million-token Latin corpus of the texts by Thomas Aquinas. We briefly describe the annotation guidelines, shared with the Latin Dependency Treebank (LDT). The application of data-driven dependency parsers on IT-TB and LDT data is reported on. We present training and parsing results on several datasets and provide evaluation of learning algorithms and techniques. Furthermore, we introduce the IT-TB valency lexicon extracted from the treebank. We report on quantitative data of the lexicon and provide some statistical measures on subcategorisation structures.


Télécharger:
Fichier PDF
Barbara McGillivray , Marco Passarotti , Paolo Ruffolo
386.9 ko

TAL Volume 50 2009 . 2. Traitement automatique des langues et langues anciennes

Date de dernière mise à jour : 8 janvier 2010, auteur : Rédacteurs en chef.