Accueil du site Accueil du site Adhésion Contact Plan du site

Cocytus : parallel NLP over disparate data

Noah Evans, Masayuki Asahara, Yuji Matsumoto

Nara Institute of Science and Technology
8916-5, Takayama-cho, Ikoma-shi
Nara 630-0192 JAPAN

As NLP deals with larger datasets and more computationally expensive algorithms, cutting-edge NLP research is increasingly becoming the province of companies like Google who can use an astronomical amount of resources to do NLP tasks. Smaller institutions are being left behind. In addition to this lack of resources, what resources a typical researcher does have access to are represented in a variety of differing, incompatible data formats and operating system semantics. NLP researchers devote a large amount of research time developing NLP tools to support a variety of different data formats, time that could be spent doing productive research. To solve these problems of data representation and processing huge data, this paper presents Cocytus, a platform for creating NLP tools loosely based on Unix, that handles different data formats and parallel computation transparently, thus allowing institutions to make maximum use of their resources.


Télécharger:
Fichier PDF
Noah Evans, Masayuki Asahara, Yuji Matsumoto
4.5 Mo

TAL Volume 49 2008 . 2. Plate-formes pour le traitement automatique des langues

Date de dernière mise à jour : 17 juin 2009, auteur : Rédacteurs en chef.