What Is a Corpus?
3 Ways to Use Corpus Data in ESL Teaching
There are about 3,000 high-frequency English word families that cover 95% of everyday text. Corpus frequency data tells you which words are worth teaching at each level. Don't spend 20 minutes on 'serendipity' when your B1 student doesn't know 'although.'
Before teaching 'strong,' check COCA for its most common collocations: strong coffee, strong smell, strong feeling, strong argument. The corpus is a collocation dictionary more accurate than any published reference.
Corpus data reveals which words and phrases are formal, informal, spoken, written, academic, or journalistic. 'Get' is extremely high-frequency in spoken English but relatively rare in academic writing. Knowing this helps teach register appropriately.
Corpus Tools for Teachers
COCA
Corpus of Contemporary American English — free, enormous, searchable by genre
BNC
British National Corpus — best for British English frequency and collocations
SketchEngine
Paid tool but more teacher-friendly — visualizations, collocation graphs, frequency lists
Teacher Tip
“When a student asks 'Is this natural?' — don't guess. Search COCA. If a phrase returns thousands of hits, it's natural. If it returns zero, it's not. This transforms you from opinion-giver to evidence-provider, which is far more powerful and much more honest.”
Frequently Asked Questions
Do I need a linguistics background to use corpus tools?
No. The basic functions — searching for a word, finding its collocations, seeing frequency by genre — are intuitive. COCA's interface is accessible to any teacher willing to spend 30 minutes exploring it.
Can I use corpus data with students directly?
Yes — data-driven learning (DDL) has students explore corpus data themselves and induce rules from examples. This works particularly well with B2+ learners who can handle inductive reasoning with authentic language data.
How do corpora handle spoken English vs. written English?
Good corpora are sub-divided by register: written news, written academic, spoken conversation, spoken formal. COCA has all of these, allowing you to search specifically for spoken or written usage patterns.