The Index Thomisticus Treebank Project : Annotation, Parsing and Valency Lexicon
Barbara McGillivray* , Marco Passarotti** , Paolo Ruffolo**
* University of Pisa, Italy
** Catholic University of the Sacred Heart, Milan, Italy
We present an overview of the Index Thomisticus Treebank project (IT-TB). The IT-TB consists of around 60,000 tokens from the Index Thomisticus by Roberto Busa SJ, an 11-million-token Latin corpus of the texts by Thomas Aquinas. We brieﬂy describe the annotation guidelines, shared with the Latin Dependency Treebank (LDT). The application of data-driven dependency parsers on IT-TB and LDT data is reported on. We present training and parsing results on several datasets and provide evaluation of learning algorithms and techniques. Furthermore, we introduce the IT-TB valency lexicon extracted from the treebank. We report on quantitative data of the lexicon and provide some statistical measures on subcategorisation structures.
Barbara McGillivray , Marco Passarotti , Paolo Ruffolo