Free Language Resources

2011 volume 52 number 3.

Direction : Núria Bel and Benoit Sagot.

Latest developments and applications in the field of Natural Language Processing have significantly changed the paradigms for the design, construction, operation and evaluation of, the so called, Language Resources. In particular, the growing importance of empirical approaches has raised the demand of corpora and lexica as major sources of data on languages. An increasingly important number of languages, including French and other Romance languages, are gradually becoming equipped with new and richer resources.

Language Resources can be textual, oral or multimodal. They include:

-  written or oral corpora, raw or, more interestingly, enriched with annotations of several kinds, which can range from morphosyntactic labels to prosodic information, from dependence graphs to discourse structure markup.

-  lexical resources with information about one or more levels of linguistic analysis, and also grammars or other forms of modeling different types of language structures - the boundary between lexicon and grammar depends often on the approach (cf. TAG, lexicon-grammar, etc.).
-  complex resources, combining an annotated corpus and an associated lexical database or combining information from different levels of linguistic analysis.

A language resource may involve one or more variants of the language or languages covered: standard language, the language of journalism, specialized language (often with great importance given to issues of terminology), oral language (if transcribed), blogs and forums, SMS, etc..

However, the development of language resources is a long, expensive and highly error prone task regardless of the approaches used. This is one of the reasons for the increasing importance of resources which are freely available, so that authors and owners, while keeping the authorship, allow for their exploitation, transformation and redistribution.

The goal of this special issue is to focus on the full range of scientific issues related to language resources that go beyond the description of the resources by themselves. Thus, expected contributions can cover all topics related to language resources, including:

-  Modeling of the language data that constitute the resource (both linguistically motivated formal frameworks, representation of linguistic information in the form of structured and/or quantitative data, standards for Language Resources ...)

-  Methodologies for the development of Language Resources, either by hand, completely automatic or hybrid (semi-supervised techniques, interlingual transfer methods, use of pre-annotation tools, validation / correction interfaces, etc.), approaches promoting linguistic relevance while minimizing the "human cost" are most welcome.

-  Methods for validation and evaluation of language resources including extrinsic evaluation within NLP systems and for experimental linguistic purposes. Contributions must show how the use of language resources has caused system performance improvements, better understanding of phenomenon and/or be a means to assess or improve these language resources.

Language Resources for French, or better multilingual resources or language independent methods where French is one of the addressed languages, are the main objective of this call. Works on resources for other Romance languages are also relevant. In addition, contributions related to free resources will be particularly appreciated.

Scientific Committee:
-  Nicolas Asher
-  Delphine Bernhard
-  Paul Buitelaar
-  Maria Teresa Cabré
-  Emmanuel Cartier
-  Anne Condamines
-  Benoît Crabbé
-  Cécile Fabre
-  Mikel Forcada
-  Gregory Grefenstette
-  Eva Hajicová
-  Nancy Ide
-  Christine Jacquin
-  Anne Lacheret
-  Marie-Claude L’Homme
-  Joseph Mariani
-  Piet Mertens
-  Asunción Moreno
-  Sophie Rosset
-  Agata Savary
-  Jesse Tseng
-  Agnès Tutin
-  Gertjan van Noord
-  Piek Vossen
-  Geoffrey Williams

Important Dates:
-  Deadline for submission: 30 September 2011
-  List of papers selected: 15 January 2012
-  Deadline for camera ready papers: 3 March 2012
-  Publication on line: mid 2012


TAL (Traitement Automatique des Langues / Natural Language Processing) is a forty year old international journal published by ATALA (French Association for Natural Language Processing) with the support of CNRS (National Centre for Scientific Research). It has moved to an electronic mode of publication, with printing on demand. This affects in no way its reviewing and selection process.


Authors intending to submit a paper are encouraged to contact the guest editors of the issue: Benoît Sagot (INRIA Paris Rocquencourt), benoit.sagot at, et Núria Bel(IULA, Universitat Pompeu Fabra, Barcelona), nuria.bel at

Contributions (25 pages maximum, PDF format) must be sent by e-mail to the addresses below : benoit.sagot at and nuria.bel at

Style sheets are available for download on the Web site of the journal:

The journal only publishes original contributions in French or in English.

Check the instructions to authors:

