Accueil du site Accueil du site Adhésion Contact Plan du site

Computational and Linguistic Issues in Designing a Syntactically Annotated Parallel Corpus of Indo-European Languages

Dag T. T. Haug, Marius L. Jøhndal, Hanne M. Eckhoff, Eirik Welo, Mari J. B. Hertzenberg, Angelika Müth

Department of Philosophy, Classics, History of Arts and Ideas
P.O. Box 1020 Blindern
N-0315 Oslo
Norway

This paper reports on the development of the PROIEL parallel corpus of New Testament texts, which contains the Greek original of the New Testament and its earliest Indo-European translations, into Latin, Gothic, Old Church Slavic and Classical Armenian. A web application has been constructed specifically for the purpose of annotating the texts at multiple levels : morphology, syntax, alignment at sentence, dependency graph and token level, information structure and semantics. We describe this web application and our annotation schemes. Although designed for investigating pragmatic resources, the corpus with its rich annotation is an important resource in contrastive and historical Indo-European syntax and pragmatics, easily expandable to include other old Indo-European languages.


Télécharger:
Fichier PDF
Dag T. T. Haug, Marius L. Jøhndal, Hanne M. Eckhoff, Eirik Welo, Mari J. B. Hertzenberg, Angelika Müth
195.5 ko

TAL Volume 50 2009 . 2. Traitement automatique des langues et langues anciennes

Date de dernière mise à jour : 8 janvier 2010, auteur : Rédacteurs en chef.