DrillKitDrillKit
schedule7 min read

Corpus Linguistics for ESL Teachers: Using Real Language Data in the Classroom

A corpus is millions of real text examples — and it tells you what native speakers actually say.

✍️

Matthew James Soldato

ESL Teacher & Founder of DrillKitFeb 8, 2026

What Is a Corpus?

A corpus (plural: corpora) is a large, systematic collection of real text or speech samples compiled for linguistic analysis. The British National Corpus (BNC) contains 100 million words of written and spoken British English. The Corpus of Contemporary American English (COCA) contains over 1 billion words.
These databases let teachers (and researchers) answer questions that no textbook can: How often is 'however' actually used vs. 'but'? What words most commonly follow 'make'? Is 'I am agree' ever acceptable? The corpus answers with real evidence.

3 Ways to Use Corpus Data in ESL Teaching

1. Frequency teaching
There are about 3,000 high-frequency English word families that cover 95% of everyday text. Corpus frequency data tells you which words are worth teaching at each level. Don't spend 20 minutes on 'serendipity' when your B1 student doesn't know 'although.'
2. Collocation research
Before teaching 'strong,' check COCA for its most common collocations: strong coffee, strong smell, strong feeling, strong argument. The corpus is a collocation dictionary more accurate than any published reference.
3. Register analysis
Corpus data reveals which words and phrases are formal, informal, spoken, written, academic, or journalistic. 'Get' is extremely high-frequency in spoken English but relatively rare in academic writing. Knowing this helps teach register appropriately.

Corpus Tools for Teachers

🔍

COCA

Corpus of Contemporary American English — free, enormous, searchable by genre

🇬🇧

BNC

British National Corpus — best for British English frequency and collocations

📱

SketchEngine

Paid tool but more teacher-friendly — visualizations, collocation graphs, frequency lists

Teacher Tip

When a student asks 'Is this natural?' — don't guess. Search COCA. If a phrase returns thousands of hits, it's natural. If it returns zero, it's not. This transforms you from opinion-giver to evidence-provider, which is far more powerful and much more honest.

Frequently Asked Questions

Do I need a linguistics background to use corpus tools?

No. The basic functions — searching for a word, finding its collocations, seeing frequency by genre — are intuitive. COCA's interface is accessible to any teacher willing to spend 30 minutes exploring it.

Can I use corpus data with students directly?

Yes — data-driven learning (DDL) has students explore corpus data themselves and induce rules from examples. This works particularly well with B2+ learners who can handle inductive reasoning with authentic language data.

How do corpora handle spoken English vs. written English?

Good corpora are sub-divided by register: written news, written academic, spoken conversation, spoken formal. COCA has all of these, allowing you to search specifically for spoken or written usage patterns.

Love this post? Share the magic!

Ready to make some magic?

Join thousands of ESL teachers using DrillKit to create professional lessons in seconds.

No credit card required. Cancel anytime.