Accueil du site Accueil du site Adhésion Contact Plan du site

Learning Domain-Specific, L1-Specific Measures of Word Readability

Shane Bergsma, David Yarowsky

Dept. of Computer Science and Human Language Technology Center of Excellence
Johns Hopkins University
Baltimore, Maryland
USA
shane.a.bergsma@gmail.com, yarowsky@cs.jhu.edu

Improved readability ratings for second-language readers could have a huge impact in areas such as education, advertising, and information retrieval. We propose ways to adapt readability measures for users who (a) are proficient in a particular domain, and (b) have a particular native language (L1). Specifically, we predict the readability of individual words. Our learned models use a range of creative features based on diverse statistical, etymological, lexical, and morphological information. We evaluate on a corpus of computational linguistics articles divided according to seven L1s ; we show that we can accurately predict the target readability scores in this domain. Our technique improves over several reasonable baselines. We provide an in-depth analysis showing which kinds of information are most predictive of word difficulty in different L1s, and show how this differs for style and content words.



Télécharger:
Fichier PDF
Shane Bergsma, David Yarowsky
288.7 ko
Fichier Zip
Shane Bergsma, David Yarowsky
200.4 ko


Date de dernière mise à jour : 22 mai 2014, auteur : Rédacteurs en chef.