LexMC's directors have been involved in English corpus lexicography since its beginnings in the early 1980s. They have played a leading role in the design and development of several major corpora, including the British National Corpus (BNC). They are now at the forefront of new initiatives to use the Web as a source of corpus data.
What we do.
Drawing on our extensive experience in the development and exploitation of corpora, we can offer a complete corpus-gathering service, including any or all of the following:
- advice on design principles and corpus building
- data collection and copyright clearance
- document header design and other documentation
- encoding and annotation: standardization, lemmatization, tokenization, POS-tagging, shallow parsing
Corpus initiatives we have been involved in:
- Longman-Lancaster Corpus (30 million words, 1989)
- Oxford Pilot Corpus (17 million words, 1989-1993)
- British National Corpus (100 million words, 1993)
- American National Corpus (ANC) (100 million words plus, 2002- )
- Oxford English Corpus (2 billion words, 2003- )
- New Corpus for Ireland (255 million words, 2004)

