Four types of context for automatic spelling correction

Michael Flor^*

^*Educational Testing Service; Rosedale Road; Princeton, NJ, 08541; USA

Résumé (en anglais)

This paper presents an investigation on using four types of contextual information for improving the accuracy of automatic correction of single-token non-word misspellings. The task is framed as contextually-informed re-ranking of correction candidates. Immediate local context is captured by word n-grams statistics from a Web-scale language model. The second approach measures how well a candidate correction fits in the semantic fabric of the local lexical neighborhood, using a very large Distributional Semantic Model. In the third approach, recognizing a misspelling as an instance of a recurring word can be useful for reranking. The fourth approach looks at context beyond the text itself. If the approximate topic can be known in advance, spelling correction can be biased towards the topic. Effectiveness of proposed methods is demonstrated with an annotated corpus of 3,000 student essays from international high-stakes English language assessments. The paper also describes an implemented system that achieves high accuracy on this task.

Paru dans

Du bruit dans le signal : gestion des erreurs en traitement automatique des langues

Document

TAL_53_3_3.pdf

Rank