Accueil du site Accueil du site Adhésion Contact Plan du site

Learning Domain-Specific, L1-Specific Measures of Word Readability

Shane Bergsma, David Yarowsky

Dept. of Computer Science and Human Language Technology Center of Excellence
Johns Hopkins University
Baltimore, Maryland

Improved readability ratings for second-language readers could have a huge impact in areas such as education, advertising, and information retrieval. We propose ways to adapt readability measures for users who (a) are proficient in a particular domain, and (b) have a particular native language (L1). Specifically, we predict the readability of individual words. Our learned models use a range of creative features based on diverse statistical, etymological, lexical, and morphological information. We evaluate on a corpus of computational linguistics articles divided according to seven L1s ; we show that we can accurately predict the target readability scores in this domain. Our technique improves over several reasonable baselines. We provide an in-depth analysis showing which kinds of information are most predictive of word difficulty in different L1s, and show how this differs for style and content words.

Fichier PDF
Shane Bergsma, David Yarowsky
288.7 ko
Fichier Zip
Shane Bergsma, David Yarowsky
200.4 ko

Date de dernière mise à jour : 22 mai 2014, auteur : Rédacteurs en chef.