Accueil du site Accueil du site Adhésion Contact Plan du site

Four types of context for automatic spelling correction

Michael Flor

Educational Testing Service
Rosedale Road
Princeton, NJ, 08541
USA

This paper presents an investigation on using four types of contextual information for improving the accuracy of automatic correction of single-token non-word misspellings. The task is framed as contextually-informed re-ranking of correction candidates. Immediate local context is captured by word n-grams statistics from a Web-scale language model. The second approach measures how well a candidate correction fits in the semantic fabric of the local lexical neighborhood, using a very large Distributional Semantic Model. In the third approach, recognizing a misspelling as an instance of a recurring word can be useful for reranking. The fourth approach looks at context beyond the text itself. If the approximate topic can be known in advance, spelling correction can be biased towards the topic. Effectiveness of proposed methods is demonstrated with an annotated corpus of 3,000 student essays from international high-stakes English language assessments. The paper also describes an implemented system that achieves high accuracy on this task.


Télécharger:
Fichier PDF
Michael Flor
751.6 ko

TAL Volume 53 2012 . 3. Du bruit dans le signal : gestion des erreurs en traitement automatique des langues

Date de dernière mise à jour : 15 juillet 2013, auteur : Rédacteurs en chef.