ESL filters and highliting
Filter query results to sentences with a specific grammatical error and/or specific native language.
Corpus (UD v2.3)
The Treebank of Learner English (TLE) is a collection of 5,124 English as a Second Language (ESL) sentences (97,681 words), manually annotated with POS tags and dependency trees in the Universal Dependencies formalism. The dataset represents upper-intermediate level adult English learners from 10 native language backgrounds, with over 500 sentences for each native language. The sentences were drawn from the Cambridge Learner Corpus First Certificate in English (FCE) corpus.
The dataset and the annotation guidelines were developed in a collaborative project between MIT CSAIL, MIT Linguistics and CBMM. Detailed information about the annotations is available in this paper. A draft of the manual used by the annotators is available here. Please contact Yevgeni Berzak (berzak at mit dot edu) if you spot any errors or inconsistencies in the treebank, or if you have questions about the ESL annotation guidelines.
11/15/2018 The treebank now conforms to UD v2 guidelines!
You are encouraged to cite the following papers when using the dataset:
The TLE is available publicly through the UD repository (English-ESL).
Annotation and guidelines: Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza, Boris Katz, Margarita Misirpashayeva.
Web search: Emily Kellison-Lynn, Yevgeni Berzak, Andrei Barbu, Jessica Kenney.
Team Annotations: Yevgeni, Lucia, Sophie, Jessica, Jing, Sebastian.