For faster navigation, this Iframe is preloading the Wikiwand page for Oxford English Corpus.

Oxford English Corpus

The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University Press' language research programme. It is the largest corpus of its kind, containing nearly 2.1 billion words.[1] It includes language from the UK, the United States, Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore, and South Africa.[2] The text is mainly collected from web pages; some printed texts, such as academic journals, have been collected to supplement particular subject areas.[2] The sources are writings of all sorts, from "literary novels and specialist journals to everyday newspapers and magazines and from Hansard to the language of blogs, emails, and social media".[2] This may be contrasted with similar databases that sample only a specific kind of writing. The corpus is generally available only to researchers at Oxford University Press, but other researchers who can demonstrate a strong need may apply for access.[2][3]

The digital version of the Oxford English Corpus is formatted in XML and usually analysed with Sketch Engine software.[4] By April 27, 2006, the dictionary database had 1 billion words. [5]

Each document in the OE Corpus is accompanied by metadata including:

  • title
  • author (if known; many websites make this difficult to determine reliably)
  • author gender (if known)
  • language type (e.g. British English, American English)
  • source website
  • year (+ date, if known)
  • date of collection
  • domain + subdomain
  • document statistics (number of tokens, sentences, etc.)[4]

See also

References

  1. ^ "The Oxford English Corpus". Sketch Engine. Lexical Computing CZ s.r.o. Retrieved 27 October 2016.
  2. ^ a b c d "The Oxford English Corpus". Oxford Dictionaries Online. Oxford University Press. Archived from the original on 1 January 2012. Retrieved 8 November 2014.
  3. ^ "Compare COCA". Corpus of Contemporary American English. Archived from the original on 7 November 2014. Retrieved 8 November 2014.
  4. ^ a b The Oxford English Corpus. Retrieved February 4, 2014.
  5. ^ "Dictionary database has billion words". Northwest Herald. 27 April 2006. p. 2. Retrieved 15 March 2020 – via Newspapers.com.


{{bottomLinkPreText}} {{bottomLinkText}}
Oxford English Corpus
Listen to this article

This browser is not supported by Wikiwand :(
Wikiwand requires a browser with modern capabilities in order to provide you with the best reading experience.
Please download and use one of the following browsers:

This article was just edited, click to reload
This article has been deleted on Wikipedia (Why?)

Back to homepage

Please click Add in the dialog above
Please click Allow in the top-left corner,
then click Install Now in the dialog
Please click Open in the download dialog,
then click Install
Please click the "Downloads" icon in the Safari toolbar, open the first download in the list,
then click Install
{{::$root.activation.text}}

Install Wikiwand

Install on Chrome Install on Firefox
Don't forget to rate us

Tell your friends about Wikiwand!

Gmail Facebook Twitter Link

Enjoying Wikiwand?

Tell your friends and spread the love:
Share on Gmail Share on Facebook Share on Twitter Share on Buffer

Our magic isn't perfect

You can help our automatic cover photo selection by reporting an unsuitable photo.

This photo is visually disturbing This photo is not a good choice

Thank you for helping!


Your input will affect cover photo selection, along with input from other users.

X

Get ready for Wikiwand 2.0 🎉! the new version arrives on September 1st! Don't want to wait?